Fallback Routing for JS-Disabled Crawlers

An automated accessibility scan is only as complete as the DOM it can reach, and a client-rendered application hands a JavaScript-disabled crawler almost nothing: an empty <div id="root">, a linked bundle that never executes, and no route tree to traverse. The crawler dutifully reports HTTP 200, records zero violations, and moves on — a clean pass over a page that, to any real assistive technology running without script, is a blank shell. That silent false negative is the specific obstacle this page solves. Fallback routing establishes a deterministic, server-rendered navigation layer that any crawler can walk without executing a line of client code, so page discovery and structural analysis stay intact when scripting is stripped to bypass anti-bot protection, cut cloud compute, or satisfy a strict Content Security Policy. This is the implementation reference for that layer within the broader Enterprise WCAG Audit Architecture & Standards Mapping strategy.

The audience is accessibility specialists, frontend QA teams, and Python automation engineers who already run scans and now need coverage that does not collapse the moment JavaScript is unavailable. Where the dynamic content boundary detection layer fixes when a scan fires against a script-enabled page, fallback routing fixes what a crawler can reach at all when script never runs. The two are complementary halves of the same reachability problem. For the narrow, reproduction-first version of this failure — isolating a no-JS crawler and diagnosing the empty accessibility tree it produces — see designing fallback routes for JavaScript-disabled audit crawlers; this page is the architecture around it.

The dual-path decision below shows how an incoming audit request is served either through normal hydration or via a pre-rendered static fallback:

Prerequisites & Environment Context

Fallback routing spans a build step, an edge routing rule, and a validation harness, so the parity requirements cut across all three. A fallback set generated against one build and served through a mismatched proxy config is worse than none — it reports a clean structure that the live edge never actually serves.

Python 3.11+ for the generation and validation harness. The examples here use only the standard library (json, re, urllib.request, html.parser) so the generator runs on a minimal CI image with no third-party dependency to pin. Where you drive a real browser for the script-enabled path instead, the same version discipline described in the Playwright headless scanning workflows applies — commit the lockfile so timing is identical everywhere.
Access to the framework’s build output. Route manifests live in build artifacts, not source: .next/routes-manifest.json for Next.js, the generated app.routes output for Angular, the .nuxt route table for Nuxt. The generator must run after npm run build and before deploy, in the same job that produced the artifact, so the fallback set and the real route table cannot drift.
Edge or proxy control. You need to own the rewrite layer — an Nginx map, a Cloudflare Worker, or a framework middleware — because the fallback is selected by User-Agent or an explicit audit_mode flag at request time. A property behind a CDN you cannot configure needs the fallback served from origin with a cache-bypass rule for the audit UA.
A stable audit User-Agent contract. Fix one UA string (or one query flag) and treat it as an interface between the crawler and the edge. If the crawler UA and the edge match rule disagree by a single token, every request falls through to the empty SPA shell and the whole layer silently no-ops.
A downstream data contract. The per-route coverage log — which routes were discovered, which returned a fallback, how many landmarks each exposed — is data, not console output. Retain it under your audit data storage and retention policies so a disputed coverage number can be traced to the exact fallback that produced it.

How Fallback Routing Works

The mechanism has three moving parts, and getting their ordering right is what separates real coverage from a fallback set that looks complete but never gets served.

Route extraction is the source of truth. The manifest the framework emits at build time is the authoritative list of navigable paths; deriving the fallback set from anything else — a hand-maintained sitemap, a crawl of the live SPA — reintroduces exactly the drift the layer exists to remove. Dynamic segments (/products/[id]) collapse to a single representative path so the set stays finite.

Static generation turns each extracted route into a minimal, semantically structured HTML document: a skip link, ARIA landmarks, and the full set of in-app navigation anchors, so a crawler can walk the entire route graph without hydration. These documents are intentionally thin — they carry structure and links, not the page’s full content — because their job is reachability and landmark presence, not to mirror the rendered application.

Request-time selection decides, per request, whether the origin serves the hydrating SPA or the static fallback. The decision keys off the audit UA or an audit_mode flag, never off feature detection, because the whole point is to serve deterministic HTML to a client that will not run the detection script in the first place.

The ordering is strict: extract from the manifest so the set is authoritative, generate structure a no-JS client can parse, then select at the edge so the audit client actually receives it. Skip any one and the layer fails quietly rather than loudly — an unserved fallback and a missing fallback look identical in a coverage report unless you measure both.

Step-by-Step Implementation

The workflow moves from manifest extraction, through static generation and edge selection, into standard-library validation. Each step is a focused, copy-pasteable module.

1. Extract Route Definitions from Framework Configuration

Parse the build-time manifest and normalize paths before any hydration occurs. Collapsing dynamic segments keeps the set finite; normalizing trailing slashes keeps it deduplicated.

import json
import re


def extract_routes_from_manifest(build_dir: str) -> list[str]:
    """Parse framework route manifests and normalize paths for crawler consumption."""
    routes = set()
    manifest_path = f"{build_dir}/.next/routes-manifest.json"

    try:
        with open(manifest_path, "r") as f:
            data = json.load(f)
            # Next.js routes-manifest.json exposes staticRoutes and
            # dynamicRoutes as arrays of objects, each carrying a "page" key.
            for entry in data.get("staticRoutes", []) + data.get("dynamicRoutes", []):
                page = entry.get("page")
                if not page:
                    continue
                # Strip dynamic segments and normalize trailing slashes so
                # /products/[id] collapses to one representative /products entry.
                clean_path = re.sub(r"/\[.*?\]", "", page).rstrip("/") or "/"
                routes.add(clean_path)
    except FileNotFoundError:
        raise RuntimeError("Framework route manifest not found. Verify build output directory.")

    return sorted(routes)

2. Generate Static Fallback HTML

For each extracted route, emit a minimal, semantically structured document. The skip link, nav landmark, and main landmark are the structural anchors a crawler checks for; the injected anchor list is what lets it traverse to every other route without script.

from html import escape


def render_fallback_document(route_path: str, all_routes: list[str]) -> str:
    """Build a thin, landmark-complete HTML shell a no-JS crawler can traverse."""
    anchors = "\n".join(
        f'      <li><a href="{escape(r)}">{escape(r)}</a></li>' for r in all_routes
    )
    return f"""<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Fallback: {escape(route_path)}</title>
  <!-- Fallbacks are audit-only surfaces; keep them out of search indexes. -->
  <meta name="robots" content="noindex, nofollow">
</head>
<body>
  <a href="#main-content" class="skip-link">Skip to main content</a>
  <nav aria-label="Fallback navigation">
    <ul>
{anchors}
    </ul>
  </nav>
  <main id="main-content">
    <h1>{escape(route_path)}</h1>
    <p>Server-rendered route surface for accessibility audit traversal.</p>
  </main>
</body>
</html>"""

Write each document to a path that mirrors the route (/products → public/fallback/products.html) so the edge rewrite in the next step can map a request URL to its fallback with a single deterministic rule.

3. Configure Request-Time Selection at the Edge

Serve the static HTML only when the User-Agent matches the audit crawler contract or when ?audit_mode=true is present. An Nginx map block keeps the decision out of the location context, where bare if is famously fragile.

# Prefer a `map` block over bare `if` in location contexts ("if is evil" in nginx).
map $http_user_agent $is_audit_bot {
    default 0;
    "~*AccessibilityAuditBot|WCAGCrawler" 1;
}

server {
    location / {
        # Audit UA gets the pre-rendered fallback; everyone else hydrates.
        if ($is_audit_bot) {
            rewrite ^(.*)$ /fallback$1.html last;
        }
        # $uri already begins with "/", so prefix without an extra slash.
        try_files $uri $uri/ /fallback$uri.html =404;
    }

    location ~* \.html$ {
        add_header X-Audit-Fallback "true";
        expires 1h;
    }
}

The X-Audit-Fallback header is deliberate: it is how the validator in the next step proves the request was actually served the fallback and did not silently fall through to the SPA shell.

4. Validate Route Resolution with the Standard Library

Confirm each fallback resolves and carries its structural anchors. The urllib.request module plus html.parser gives a dependency-free validator that runs on a bare CI image — no browser, no third-party package.

import urllib.request
from html.parser import HTMLParser


class AuditLinkExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.links = []
        self.landmarks = 0

    def handle_starttag(self, tag, attrs):
        attr_map = dict(attrs)
        if tag == "a" and not (attr_map.get("href") or "").startswith("#"):
            self.links.append(attr_map.get("href"))
        # main/nav are implicit landmarks; count explicit ARIA roles too.
        if tag in ("main", "nav") or attr_map.get("role") in ("main", "navigation"):
            self.landmarks += 1


def validate_fallback_route(url: str) -> dict:
    req = urllib.request.Request(url, headers={"User-Agent": "AccessibilityAuditBot/1.0"})
    with urllib.request.urlopen(req, timeout=10) as response:
        html = response.read().decode("utf-8")
        status = response.status
        served_fallback = response.getheader("X-Audit-Fallback") == "true"

    parser = AuditLinkExtractor()
    parser.feed(html)

    return {
        "status": status,
        "served_fallback": served_fallback,   # proves the edge rule fired
        "discovered_links": len(parser.links),
        "has_skip_link": "skip-link" in html,
        "landmark_count": parser.landmarks,
    }

The served_fallback assertion is the load-bearing check. A 200 with discovered_links > 0 can still be the wrong page; only the header confirms the audit path was actually taken. Feed the resulting per-route records into the coverage gate below, and hand the extracted violations to your standards layer for severity attribution against the A/AA/AAA compliance level mapping.

5. Gate the Pipeline on Coverage

Embed generation and validation in CI so a route added to the app without a corresponding fallback fails the build instead of shipping a coverage hole. The workflow below builds, generates, and validates in a matrix.

name: WCAG Fallback Route Validation
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  generate-fallbacks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install dependencies and build
        run: npm ci && npm run build
      - name: Extract and generate fallback routes
        run: python scripts/generate_fallback_routes.py
      - name: Cache fallback artifacts
        uses: actions/cache@v4
        with:
          path: ./public/fallback
          key: fallback-${{ runner.os }}-${{ hashFiles('public/fallback/**') }}

  audit-validation:
    needs: generate-fallbacks
    runs-on: ubuntu-latest
    strategy:
      matrix:
        route: ["/", "/products", "/checkout", "/account"]
    steps:
      - uses: actions/checkout@v4
      - name: Validate fallback route
        run: python scripts/audit_fallback.py --route "${{ matrix.route }}" --min-coverage 0.95
      - name: Upload coverage report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: fallback-coverage-${{ strategy.job-index }}
          path: ./reports/*.json

Enforce a hard failure threshold: if fallback discovery drops below 95% of the extracted route set, or a required landmark is missing, block the deploy rather than warn. Route the raw discovery numbers through your error categorization and triage pipelines so a coverage regression is attributed to either a missing fallback (generation gap) or an unserved one (edge-rule gap) — the two demand different fixes.

Configuration Reference

Every value below should be environment-configurable so the same generator and validator run identically across local, CI, and production-mirror. The defaults are conservative starting points, not tuned production values.

Parameter	Type	Default	Description
`build_dir`	str	`./`	Root the manifest path is resolved against. Must point at the artifact directory produced by the same job’s build step, never a stale checkout.
`manifest_relpath`	str	`.next/routes-manifest.json`	Framework-specific manifest location. Swap for the Angular or Nuxt equivalent; a wrong path raises rather than silently emitting an empty set.
`audit_user_agent`	str	`AccessibilityAuditBot/1.0`	The single UA string shared between crawler and edge `map` rule. Treat as an interface — a one-token mismatch routes every request to the SPA shell.
`audit_mode_flag`	str	`audit_mode`	Query parameter that forces the fallback independent of UA, for manual verification and for crawlers that spoof a browser UA.
`min_coverage`	float	`0.95`	Minimum ratio of served-and-valid fallbacks to extracted routes before the CI gate fails. Lower masks real coverage holes.
`require_skip_link`	bool	`True`	Whether a missing `.skip-link` fails validation. Keep `True`; the skip link is the cheapest keyboard-operability signal to enforce.
`min_landmarks`	int	`2`	Minimum landmark count (`nav` + `main`) per fallback before the route is counted as covered.
`request_timeout_s`	int	`10`	Per-route HTTP timeout in the validator. Bounds a hung edge so one bad route reports failure instead of stalling the matrix job.

Because these are launch and rule-object fields an engineer copies directly, keep the table in a container that scrolls horizontally on narrow screens rather than wrapping cells.

Verification & Testing

The fallback layer gates other people’s deployments, so it needs its own tests before it can be trusted to do that.

Served-not-shell guard. Point the validator at a route with the audit UA and assert served_fallback is True, not merely status == 200. A 200 that returns the empty SPA shell is the exact failure this layer exists to prevent, so a green build must distinguish the two. This is the highest-value regression test in the suite.
Manifest-to-fallback parity. Diff the set from extract_routes_from_manifest() against the files in public/fallback/. Any route in the manifest without a fallback file is a generation gap; assert the symmetric difference is empty so a newly added app route cannot ship uncovered.
Landmark and skip-link assertion. For each generated document, assert has_skip_link and landmark_count >= min_landmarks. Generate a deliberately broken fixture (a fallback with the <nav> removed) and confirm validation fails, proving the check has teeth rather than always passing.
Local vs CI parity. Run the same validation locally and in CI and diff both the discovered-link counts and the served_fallback flags. A route that serves the fallback locally but the shell in CI points at an edge-config difference — usually a UA header dropped by a CI proxy — before it manifests as an intermittent coverage failure.

Once coverage is reliable, the structure these fallbacks expose feeds the standards layer: map the landmark, heading-order, and link-name findings against the WCAG 2.2 vs 3.0 success criteria taxonomy. The WHATWG HTML sections and landmark specification defines the semantics the crawler asserts against, which is precisely what a thin, landmark-complete fallback lets you validate without hydration.

Failure Modes & Troubleshooting

Silent fall-through to the SPA shell. Symptom: coverage looks fine, status is 200 everywhere, yet reported violations are implausibly clean. Root cause: the crawler UA and the edge map regex disagree by a token, so no request ever hits the fallback branch — the crawler scans the empty root <div>. Fix: assert served_fallback on every route, never status alone, and treat the audit UA as a fixed contract tested on both sides. Add a canary route with a known injected defect so a clean pass on it fails the build.

Manifest drift after a build change. Symptom: a route that clearly exists in the app is missing from every report. Root cause: the generator ran against a stale artifact, or the framework changed its manifest shape between major versions and the parser silently returned an empty list. Fix: run generation in the same job that produced the build, and make extract_routes_from_manifest() raise on a missing or empty manifest rather than returning []. Gate on manifest-to-fallback parity so drift fails loudly.

Exploded fallback set from uncollapsed dynamic routes. Symptom: generation emits thousands of near-identical files and the cache step balloons. Root cause: dynamic segments were not collapsed, so /products/[id] expanded per-id from a data source. Fix: strip dynamic segments to one representative path (as in step 1); audit the template’s accessibility once, not every instance of it. Instance-level content checks belong in the script-enabled path, coordinated through the batch validation architecture, not the fallback set.

False positives from the thin fallback itself. Symptom: axe-core reports region or landmark-unique violations on the fallback that do not exist on the real rendered page. Root cause: the fallback is deliberately minimal and can trip rules that assume a fully populated page, or the audit engine is run with a ruleset tuned for the hydrated app. Fix: scope the fallback scan to structural rules (landmarks, skip link, link names, lang) and run content-dependent rules only against the hydrated path. Align the ruleset with your axe-core enterprise configuration so the fallback and hydrated passes use intentionally different rule sets rather than fighting each other.

Fallbacks leaking into search or exposing internal paths. Symptom: audit-only URLs appear in the sitemap or a staticRoutes entry exposes an admin path. Root cause: noindex was omitted, or the manifest included internal routes that should never be publicly reachable. Fix: emit noindex, nofollow and an X-Robots-Tag header on every fallback, filter internal-only prefixes out of the extracted set before generation, and reconcile the exposed surface against your security and privacy framework integration controls.

Frequently Asked Questions

How do I stop axe-core from flagging region and landmark rules on the thin fallback?

Run the fallback scan with a structural ruleset only — landmarks, skip link, link names, html-has-lang, heading order — and disable content-dependent rules that assume a fully populated page. The fallback exists to prove reachability and landmark presence, so scanning it with the ruleset tuned for the hydrated app produces noise, not signal. Keep the two rulesets deliberately separate and align both with your central axe-core enterprise configuration.

My CI job passes locally but reports zero coverage on the runner. What changed?

Almost always the audit User-Agent header is being stripped or rewritten by a proxy between the CI runner and the origin, so every request falls through to the SPA shell instead of the fallback branch. Assert served_fallback (the X-Audit-Fallback header) rather than status, and add the ?audit_mode=true query flag as a UA-independent fallback so a dropped header still selects the static path.

Should the CI gate ever pass when a fallback returns the SPA shell?

No. A shell served with a 200 is the precise false negative fallback routing removes; treating it as a pass defeats the layer. Gate on the served-not-shell guard so a shell response fails the build, and keep a canary route with a known injected defect that must be caught — if the canary passes clean, the crawler is scanning the wrong document.

Do I need a fallback for every dynamic route instance?

No — collapse dynamic segments to one representative path and audit the template once. Generating a fallback per /products/[id] instance explodes the set and re-audits identical structure thousands of times. Instance-level content checks belong in the script-enabled batch path, not the fallback set, which is concerned with route reachability and landmark structure.

Where should the fallback coverage log live, and for how long?

Treat the per-route coverage record as audit data, not build console output: persist it as structured JSON in immutable, access-controlled storage with lifecycle rules, under the same retention windows as the rest of your audit trail. That lets a disputed coverage number be traced to the exact fallback and edge decision that produced it, which console logs discarded at job end cannot.

Fallback Routing for JS-Disabled Crawlers

Prerequisites & Environment Context #

How Fallback Routing Works #

Step-by-Step Implementation #

1. Extract Route Definitions from Framework Configuration #

2. Generate Static Fallback HTML #

3. Configure Request-Time Selection at the Edge #

4. Validate Route Resolution with the Standard Library #

5. Gate the Pipeline on Coverage #

Configuration Reference #

Verification & Testing #

Failure Modes & Troubleshooting #

Frequently Asked Questions #

Related #