Validation — Urbanetric

1 · What “validation” means here

A road model can be internally consistent — every link balances, every junction discharges — and still be wrong about the city it claims to represent. Validation is the test that the modelled flows, queues and journey times match the road as it actually performs, link by link, against independent count data nobody involved in building the model had a hand in collecting.

The bar is not “the model looks plausible.” The bar is “the model passes a published, third-party statistical test against measured counts on the high-flow links of the network.”

2 · Ground-truth count sources

The Cork baseline calibrates against three independent count sources, layered for coverage and cross-checking:

Transport Infrastructure Ireland (TII) traffic count points

TII maintains permanent automatic count stations on the national road network — hourly directional flows, published with AADT/AAWT/AAHT summaries. We use the count stations on and around the city's strategic network as the spine of the calibration: the N20, the N22, the N25, the N40 / South Ring corridor.

Cork City Council classified turning counts

For the junctions inside the city's central island where TII has no permanent count, we calibrate against periodic classified turning counts commissioned by the council for their own assessments (typically 12-hour neutral-day counts). These give per-movement flow at signalised junctions where the strategic counts can only see the link, not the turning split.

CSO Place of Work, School or College — Census of Anonymised Records (POWSCAR)

Origin–destination journey-to-work data from the Central Statistics Office is used to validate the demand, not the flow. If POWSCAR says X thousand trips per day originate from a small area and head to a workplace area Y, the synthetic population should produce a comparable count of trips between those small areas.

All three sources are open or available under standard data-share agreements. None of them require commissioning bespoke counts. That keeps the cost of recalibrating the model low and the inputs auditable by anyone.

3 · Statistical fit measures

Three measures are used together. No one of them tells the full story; together they cover bias, precision and per-link adequacy.

GEH (per link)

The GEH statistic is the transport-planning standard for modelled-vs-observed counts. For a modelled flow M and an observed flow O:

GEH = √( 2 · (M − O)² ÷ (M + O) )

GEH is designed for traffic flows specifically: it weighs absolute difference more heavily on low-flow links (where 50 vs 200 is a real problem) and more leniently on high-flow links (where 4 800 vs 5 000 is fine). The standard acceptance bands — long carried in DMRB TA 11/04 (the legacy UK guidance still recognised in Irish practice) and now codified for Irish schemes in TII’s Traffic and Transport Assessment Guidelines (PE-PDV-02045) and Project Appraisal Guidelines — are:

GEH < 5 — the flow is acceptable.
5 ≤ GEH < 10 — investigate; the link may be a calibration outlier or a real local anomaly worth fixing.
GEH ≥ 10 — the model is wrong about this link; calibrate or document.

R² on the count-vs-modelled scatter

The coefficient of determination across all calibrated links is published alongside the run. It captures how well the model lines up across the network as a whole, not just on the worst link.

RMSE on hourly profiles

For the strategic count stations where we have hourly data, root-mean-square error on the 24-hour profile catches a model that gets the daily total right but the peak shape wrong — which would let you down on a peak-hour scheme assessment.

4 · Acceptance thresholds

A calibrated Urbanetric baseline meets all three of these before it is released as a scenario base:

GEH coverage. ≥ 85% of calibrated links sit inside the GEH < 5 band, and no calibrated link sits at GEH ≥ 10 without an explicit documented reason (e.g. a known closure on the count day).
R² across the calibrated link set. ≥ 0.92 for the peak period being calibrated.
RMSE on the hourly profile. ≤ 12% of mean observed flow for each calibrated count station.

These are the standard acceptance bar across UK and Irish practice — not a bespoke Urbanetric threshold, but the same line a council reviewer will hold their own consultant’s model to. They are published with every run, so a reviewer can confirm them rather than take them on trust.

Current Cork baseline

The latest Cork calibration is reviewed against these thresholds before release. Headline numbers — GEH coverage on the calibrated link set, R² against the strategic count stations, RMSE on the hourly profile — ship in the calibration bundle attached to every published run (see section 6). The current baseline summary, suitable for a council technical reviewer, is available on request.

5 · The calibration loop

Calibration is iterative. Each iteration touches a small number of parameters in a documented order, so the model's behaviour is the result of a recorded sequence of adjustments, not a fitted black box.

Network audit — resolve any OSM ambiguities flagged by the ingester (orphan ways, conflicting tags) before tuning anything else.
Demand sanity — verify POWSCAR totals at the small-area level match the synthetic population's generated trips per area within 5%.
Free-flow times — check that off-peak runs reproduce observed journey times within 8% on the strategic corridors. If they don't, the issue is the network model, not the demand or signals; fix here first.
Peak loading — run the AM peak; compute GEH on the calibrated link set; identify the worst 5%.
Targeted adjustment — on those worst-5% links, adjust either the assignment weights (if the demand is mis-routed) or the discharge parameters (if the link's capacity is mis-set), one cause at a time.
Re-run and re-score; iterate from step 4 until the acceptance thresholds are met.
Hold out and verify — run an uncalibrated peak (PM) or a non-calibrated link set as a hold-out; report its scores alongside the calibrated ones, so over-fitting to the calibrated set is visible.
Lock and release — the calibrated parameters are checkpointed into the scenario base. Future edits are tested against this base.

6 · Reproducibility

Every calibrated run is recorded with its inputs, parameters and seed. A reviewer running the same scenario against the same base gets the same numbers we did, to the byte. That matters because a forecast that can't be reproduced isn't evidence — it's assertion.

The validation report attached to every run shows the GEH distribution, the R² scatter and the RMSE profile for the calibrated link set. The full per-link table is exportable as CSV so the underlying numbers can be checked outside the tool. A reviewer wanting the current Cork baseline bundle — the per-link GEH table, the scatter plot, the hourly RMSE profiles — can request it as a single methodology pack rather than reading it off a live run.

7 · Open inputs, open audit

The model is calibrated on a network anyone can inspect (OpenStreetMap) using count data anyone can request (TII, council). The calibrated parameter set is not the secret; the count data and the network are published, and the GEH/R²/RMSE acceptance bar is the same one your reviewer will hold their own model to. That openness is deliberate: it makes the difference between “trust us” and “check us.”

8 · What calibration does not promise

A calibrated baseline is a strong claim about a representative weekday at the reference period. It is not:

A claim about non-typical days (matches, festivals, marathons, severe weather).
A claim about every link in the network — only the calibrated set carries the GEH < 5 guarantee. Uncalibrated minor links are modelled but not certified.
A claim about future-year demand without a documented forecast assumption (population growth, mode-share targets etc.) bolted on.
A substitute for a junction-design package's saturation-flow analysis on a critical movement.

A calibrated model lets you compare a scheme against the do-nothing baseline with confidence about where the differences land. It does not promise to be right about everything. That distinction is the whole basis of evidence.