1 Description

This example demonstrates how to fit a Spatial Bayesian Hierachical Poisson Model using the INLA package in R version 4.5.1.
We illustrate the workflow using a real dataset of school infrastructures and associated covariates, and analyze the posterior estimates.

  • Modeling goal: Identify administrative regions in Brunei where school availability is significantly lower than the national baseline, after adjusting for population and socioeconomic factors.
  • Tutorial goal: Demonstrate the syntax and workflow of INLA for fitting spatial Poisson models, with potential applications to similar count-based areal data — e.g., healthcare access, crime incidence, or infrastructure distribution in other regions.
  • Model type: Spatial Bayesian Hierachical Poisson Model
  • Tags: Spatial, Bayesian, INLA, Poisson

1.1 Load necessary libraries for statistical modeling, spatial analysis, and visualization

# Clear the workspace
rm(list = ls())

# --- Libraries for Bayesian inference and statistical modeling ---
library(INLA)          # Integrated Nested Laplace Approximations

# --- Libraries for spatial statistics ---
library(sf)            # Spatial data
library(spdep)         # Spatial dependence (e.g., poly2nb)

# --- Libraries for visualization ---
library(mapview)       # Visualize map
library(RColorBrewer)  # Color schemes

# --- Notes ---
# Ensure all packages are installed before running:
# install.packages(c("INLA", "sf", "spdep", "mapview", "RColorBrewer")

2 Data Description

The full dataset, including its source and cleaning process, is detailed in Alvin Bong (2025). For illustration purposes, we directly present the prepared data here.

2.1 Areal data (sch_sf)

The areal data is organized by mukim (sub-district). The variables include:

  • mukim: Names of all 39 mukims in Brunei Darussalam
  • schools: Number of schools in each mukim
  • geometry: Spatial boundaries of each mukim (simple features format)
  • area: Size of each mukim (in square kilometers)
  • population: Population count in each mukim
  • house_price: Median house price in each mukim (in BND millions)

2.2 Overview

We visualise schools counts per mukim as exploratory data analysis:

# Load & view data
load("data.Rdata")
head(sch_sf )
## Simple feature collection with 6 features and 5 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 114.0759 ymin: 4.002644 xmax: 114.8581 ymax: 4.6508
## Geodetic CRS:  WGS 84
##                mukim schools                       geometry      area
## 1      Mukim Melilas       1 POLYGON ((114.8557 4.279265... 61.014152
## 2       Mukim Sukang       1 POLYGON ((114.4861 4.164966... 81.870168
## 3  Mukim Kuala Balai       0 POLYGON ((114.421 4.558823,... 32.148976
## 4 Mukim Kuala Belait       7 POLYGON ((114.2525 4.596173...  8.828924
## 5  Mukim Bukit Sawat       2 POLYGON ((114.554 4.646316,... 29.907543
## 6        Mukim Seria       4 POLYGON ((114.4084 4.648287... 16.489154
##   population house_price
## 1     0.0029   0.2410940
## 2     0.0081   0.2342315
## 3     0.0016   0.2964615
## 4     2.8793   0.3619440
## 5     0.0759   0.2407938
## 6     1.8313   0.4880000
# --- Plot with mapview ---
pal <- colorRampPalette(brewer.pal(9, "YlOrRd"))
mapview(sch_sf, zcol="schools", col.region=pal)

3 Model Specification and Fitting

We fit a Spatial Bayesian Hierachical Poisson Model using inla(). Bayesian hierarchical models enable us to obtain smoothed relative abundance (RA) of schools by including covariates and random effects to borrow information from neighboring areas.

The relative abundance (RA) \(\theta_i\), analogous to a relative risk in disease mapping, quantifies whether an area \(i\) has higher (\(\theta_i > 1\)) or lower (\(\theta_i < 1\)) school count than the national average. For example, \(\theta_i = 0.5\) means mukim \(i\) has half the number of schools expected under the national average, highlighting disparities in school infrastructure.

Let \(Y_i\) and \(E_i\) denote the observed and expected counts of schools respectively, in mukim \(i \in \{1, \dotsc, n\}\). We assume the following Poisson hierarchical model, incorporating spatial structure via the Besag-York-Mollié (BYM2):

\[ Y_i \mid \theta_i \sim \text{Poisson}(E_i \cdot \theta_i), \quad i = 1, \dotsc, n \]

\[ \log(\theta_i) = \beta_0 + \beta_1 \cdot \text{pop}_i + \beta_2 \cdot \text{area}_i + \beta_3 \cdot \text{hp}_i + u_i + v_i, \]

where:

  • \(\beta_0\) is the intercept,

  • \(\beta_1\), \(\beta_2\), and \(\beta_3\) are regression coefficients for the standardized covariates:

    • \(\text{pop}_i\): population (in units of 10,000),

    • \(\text{area}_i\): mukim size (in units of 10 km²),

    • \(\text{hp}_i\): median house price (in BND $1,000,000),

  • \(u_i\) is a structured spatial effect, modelled using an intrinsic conditional autoregressive (CAR) prior \(u_i \mid u_{-i} \sim \mathcal{N}(\bar{u}_{\delta_i}, \frac{1}{\tau_u n_{\delta_i}})\)

  • \(v_i\) is an unstructured random effect, \(v_i \sim \text{Normal}(0, \frac{1}{\tau_v})\)

3.1 Expected School Count

The expected number of schools per mukim is computed using indirect standardization:

\[ E_i = \sum_{j=1}^{n} Y_j \cdot \frac{\text{pop}_i}{\sum_{j=1}^{n} \text{pop}_j} \]

sch_sf$expected_schools <- sum(sch_sf$schools) * sch_sf$population/sum(sch_sf$population)

3.2 Spatial Neighborhood Structure

Given that our spatial data are in areal units (mukims), we define spatial neighbors based on shared boundaries (queen contiguity) between mukims. Creating this neighborhood structure (as an adjacency matrix) is required for fitting spatial models such as BYM2 in INLA.

# 1. Create a neighborhood list based on polygon contiguity
# Note: Extra steps required to convert/read data if it is not in sf format
nb <- spdep::poly2nb(sch_sf)

# 2. Convert the neighbor list to INLA adjacency format
spdep::nb2INLA("map.graph", nb)

# 3. Set the location of the adjacency file for INLA
map.adj <- file.path(getwd(), "map.graph")

# 4. Load adjacency matrix as a graph for INLA spatial models
g <- inla.read.graph(filename = map.adj)

# # 5. Optional: Visualize the adjacency matrix
# image(inla.graph2matrix(g), 
#       xlab = "Region", 
#       ylab = "Region", 
#       main = "Spatial Adjacency Matrix")

3.3 Inla call/ fit

We now fit the BYM2 model including covariates and spatial random effects.

# index for spatial effects
sch_sf$re_u <- 1:nrow(sch_sf)

# specify model formula
formula <- schools ~ population + area + house_price + f(re_u, model = "bym2", graph = g)

# run model
res <- inla(formula, family = "poisson", data = sch_sf, E = expected_schools,
            control.predictor = list(compute = TRUE),
            control.compute = list(return.marginals.predictor = TRUE),
            verbose = TRUE)

3.4 Posterior Results and Summary

We summarize the fixed effects, hyperparameters, and model fit metrics, and visualize the posterior estimates.

# fixed effects, hyperparameters, and model fit metrics
summary(res)

# posterior estimates: Relative Abundance (RA)
head(res$summary.fitted.values)

3.5 Interpretation of Results

Parameter Mean 95% Credible Interval Interpretation
(Intercept) 1.076 0.239 – 1.893 Intercept for log-relative abundance at baseline, when all covariates are zero.
population -0.436 -0.579 – -0.295 Effect of population on RA/school count: negative, higher population corresponds to lower relative abundance of schools. Credible interval does not include 0 → significant.
area 0.015 -0.002 – 0.032 Effect of mukim size on RA: weak/no effect, 95% CI includes zero, not significant
house_price -0.758 -3.174 – 1.705 Effect of house price on RA: weak/no effect, 95% CI includes zero, not significant

3.6 Spatial Mapping of Relative Abundance

We visualize the posterior mean of relative abundance \(\theta_i\):

# Store values to our sf data frame
sch_sf$RA <- res$summary.fitted.values[, "mean"]

# Plot
pal <- colorRampPalette(brewer.pal(9, "YlOrRd"))
at <- c(0,0.5,1,2,3,4,10)
mapview(sch_sf, zcol = "RA", col.region=pal, at=at, layer.name="RA")

3.7 Non-exceedance Probability

We compute the non-exceedance probability, which represents the probability that the posterior i.e. relative abundance (RA) in a given mukim falls below a threshold value (set at 0.7 in our case). This allows us to identify mukims that are likely to have lower-than-expected school access.

# -------------------------------
# 1. Compute non-exceedance probabilities from posterior distributions
sch_sf$exc <- sapply(res$marginals.fitted.values,
                      FUN = function(marg){inla.pmarginal(q = 0.7, marginal = marg)})

# 2. Plot
at <- c(0,0.25,0.5,0.75,1)
mapview(sch_sf, zcol = "exc", col.region=pal, at=at, 
        layer.name="Non-Exceedance Probability RA < 0.7")
# -------------------------------

4 References