This example demonstrates how to fit a Spatial Bayesian
Hierachical Poisson Model using the INLA package
in R version 4.5.1.
We illustrate the workflow using a real dataset of school
infrastructures and associated covariates, and analyze the posterior
estimates.
Spatial, Bayesian,
INLA, Poisson# Clear the workspace
rm(list = ls())
# --- Libraries for Bayesian inference and statistical modeling ---
library(INLA) # Integrated Nested Laplace Approximations
# --- Libraries for spatial statistics ---
library(sf) # Spatial data
library(spdep) # Spatial dependence (e.g., poly2nb)
# --- Libraries for visualization ---
library(mapview) # Visualize map
library(RColorBrewer) # Color schemes
# --- Notes ---
# Ensure all packages are installed before running:
# install.packages(c("INLA", "sf", "spdep", "mapview", "RColorBrewer")The full dataset, including its source and cleaning process, is detailed in Alvin Bong (2025). For illustration purposes, we directly present the prepared data here.
sch_sf)The areal data is organized by mukim (sub-district). The variables include:
We visualise schools counts per mukim as exploratory data analysis:
## Simple feature collection with 6 features and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 114.0759 ymin: 4.002644 xmax: 114.8581 ymax: 4.6508
## Geodetic CRS: WGS 84
## mukim schools geometry area
## 1 Mukim Melilas 1 POLYGON ((114.8557 4.279265... 61.014152
## 2 Mukim Sukang 1 POLYGON ((114.4861 4.164966... 81.870168
## 3 Mukim Kuala Balai 0 POLYGON ((114.421 4.558823,... 32.148976
## 4 Mukim Kuala Belait 7 POLYGON ((114.2525 4.596173... 8.828924
## 5 Mukim Bukit Sawat 2 POLYGON ((114.554 4.646316,... 29.907543
## 6 Mukim Seria 4 POLYGON ((114.4084 4.648287... 16.489154
## population house_price
## 1 0.0029 0.2410940
## 2 0.0081 0.2342315
## 3 0.0016 0.2964615
## 4 2.8793 0.3619440
## 5 0.0759 0.2407938
## 6 1.8313 0.4880000
We fit a Spatial Bayesian Hierachical Poisson Model
using inla(). Bayesian hierarchical models enable us to
obtain smoothed relative abundance (RA) of schools by including
covariates and random effects to borrow information from neighboring
areas.
The relative abundance (RA) \(\theta_i\), analogous to a relative risk in disease mapping, quantifies whether an area \(i\) has higher (\(\theta_i > 1\)) or lower (\(\theta_i < 1\)) school count than the national average. For example, \(\theta_i = 0.5\) means mukim \(i\) has half the number of schools expected under the national average, highlighting disparities in school infrastructure.
Let \(Y_i\) and \(E_i\) denote the observed and expected counts of schools respectively, in mukim \(i \in \{1, \dotsc, n\}\). We assume the following Poisson hierarchical model, incorporating spatial structure via the Besag-York-Mollié (BYM2):
\[ Y_i \mid \theta_i \sim \text{Poisson}(E_i \cdot \theta_i), \quad i = 1, \dotsc, n \]
\[ \log(\theta_i) = \beta_0 + \beta_1 \cdot \text{pop}_i + \beta_2 \cdot \text{area}_i + \beta_3 \cdot \text{hp}_i + u_i + v_i, \]
where:
\(\beta_0\) is the intercept,
\(\beta_1\), \(\beta_2\), and \(\beta_3\) are regression coefficients for the standardized covariates:
\(\text{pop}_i\): population (in units of 10,000),
\(\text{area}_i\): mukim size (in units of 10 km²),
\(\text{hp}_i\): median house price (in BND $1,000,000),
\(u_i\) is a structured spatial effect, modelled using an intrinsic conditional autoregressive (CAR) prior \(u_i \mid u_{-i} \sim \mathcal{N}(\bar{u}_{\delta_i}, \frac{1}{\tau_u n_{\delta_i}})\)
\(v_i\) is an unstructured random effect, \(v_i \sim \text{Normal}(0, \frac{1}{\tau_v})\)
The expected number of schools per mukim is computed using indirect standardization:
\[ E_i = \sum_{j=1}^{n} Y_j \cdot \frac{\text{pop}_i}{\sum_{j=1}^{n} \text{pop}_j} \]
Given that our spatial data are in areal units (mukims), we define spatial neighbors based on shared boundaries (queen contiguity) between mukims. Creating this neighborhood structure (as an adjacency matrix) is required for fitting spatial models such as BYM2 in INLA.
# 1. Create a neighborhood list based on polygon contiguity
# Note: Extra steps required to convert/read data if it is not in sf format
nb <- spdep::poly2nb(sch_sf)
# 2. Convert the neighbor list to INLA adjacency format
spdep::nb2INLA("map.graph", nb)
# 3. Set the location of the adjacency file for INLA
map.adj <- file.path(getwd(), "map.graph")
# 4. Load adjacency matrix as a graph for INLA spatial models
g <- inla.read.graph(filename = map.adj)
# # 5. Optional: Visualize the adjacency matrix
# image(inla.graph2matrix(g),
# xlab = "Region",
# ylab = "Region",
# main = "Spatial Adjacency Matrix")We now fit the BYM2 model including covariates and spatial random effects.
# index for spatial effects
sch_sf$re_u <- 1:nrow(sch_sf)
# specify model formula
formula <- schools ~ population + area + house_price + f(re_u, model = "bym2", graph = g)
# run model
res <- inla(formula, family = "poisson", data = sch_sf, E = expected_schools,
control.predictor = list(compute = TRUE),
control.compute = list(return.marginals.predictor = TRUE),
verbose = TRUE)We summarize the fixed effects, hyperparameters, and model fit metrics, and visualize the posterior estimates.
| Parameter | Mean | 95% Credible Interval | Interpretation |
|---|---|---|---|
| (Intercept) | 1.076 | 0.239 – 1.893 | Intercept for log-relative abundance at baseline, when all covariates are zero. |
| population | -0.436 | -0.579 – -0.295 | Effect of population on RA/school count: negative, higher population corresponds to lower relative abundance of schools. Credible interval does not include 0 → significant. |
| area | 0.015 | -0.002 – 0.032 | Effect of mukim size on RA: weak/no effect, 95% CI includes zero, not significant |
| house_price | -0.758 | -3.174 – 1.705 | Effect of house price on RA: weak/no effect, 95% CI includes zero, not significant |
We visualize the posterior mean of relative abundance \(\theta_i\):
We compute the non-exceedance probability, which represents the probability that the posterior i.e. relative abundance (RA) in a given mukim falls below a threshold value (set at 0.7 in our case). This allows us to identify mukims that are likely to have lower-than-expected school access.
# -------------------------------
# 1. Compute non-exceedance probabilities from posterior distributions
sch_sf$exc <- sapply(res$marginals.fitted.values,
FUN = function(marg){inla.pmarginal(q = 0.7, marginal = marg)})
# 2. Plot
at <- c(0,0.25,0.5,0.75,1)
mapview(sch_sf, zcol = "exc", col.region=pal, at=at,
layer.name="Non-Exceedance Probability RA < 0.7")