Batch Validation Architecture

Running an accessibility engine against one URL at a time is a solved problem; running it against forty thousand routes on every merge — without flooding CI runners, leaking browser sessions, or emitting reports that no two workers shape the same way — is not. The specific obstacle this page solves is throughput at deterministic cost: how to fan a large route set out across a pool of headless workers, keep the queue from starving or stampeding, and reassemble heterogeneous engine output into a single schema-valid dataset before anything downstream touches it. Get the boundaries wrong and you inherit the classic failure signature of naive parallel scanning — nondeterministic pass rates, out-of-memory kills on the crawl host, and violation counts that drift run to run because two shards evaluated the same component in different DOM states.

This guide is part of the broader Automated Scanning & Dynamic Content Ingestion strategy, and it assumes you have already settled the single-page mechanics documented there. Where that parent guide establishes how one page is driven to a stable DOM and evaluated, this page establishes how thousands of those evaluations are coordinated — the ingestion contract, the queue, the worker lifecycle, and the aggregation gate. It decouples URL ingestion from DOM evaluation so that traversal tuning, rule configuration, and result normalization can each scale and fail independently.

Prerequisites and Environment Context

Batch validation is infrastructure code, and version drift between the crawl host and CI runners is the most common source of irreproducible results. Pin the following before implementing anything below:

Python 3.11+ for the orchestration layer. The examples use asyncio, dataclasses, and structured TaskGroup semantics available from 3.11.
A broker with visibility timeouts — Redis 7.x (used here via redis-py 5.x) or RabbitMQ 3.12+. The broker must support per-message TTL and dead-letter routing.
Playwright 1.44+ with pinned browser builds. Browser binaries must be identical across the crawl host and every CI runner; a Chromium minor bump can shift contrast rounding and target-size math enough to change violation counts. Cache ~/.cache/ms-playwright as a keyed CI artifact.
axe-core 4.9+ injected at runtime (never a preinstalled extension), configured exactly as your axe-core enterprise configuration prescribes so every worker evaluates the same rule set against the same tags.
A pinned JSON Schema (jsonschema 4.x) that defines the finding contract — this is the seam shared with JSON Schema validation for accessibility data and the reason aggregation can be strict.

Environment parity is not optional: the staging or preview target under scan must render the same bundle that production will serve, or the batch reports a compliance baseline for markup no user will ever receive. Provision workers with a fixed CPU/memory quota per container so that concurrency math (below) is predictable rather than best-effort.

Conceptual Model: Ingestion, Execution, and Aggregation as Independent Tiers

The architecture is three tiers with a queue between the first two and a schema gate before the third emits. Each tier is a pure function of its input so it can be tested, replayed, and horizontally scaled in isolation.

Ingestion turns a route source (sitemap, router manifest, analytics export) into normalized, prioritized queue messages. It never opens a browser. Its job is to strip query noise, apply exclusion rules, assign a priority tier, and enqueue an idempotent task with a TTL and retry budget.

Execution is a pool of stateless workers that pull tasks, drive a browser to a stable DOM using Playwright headless scanning workflows, inject the engine, and serialize raw violations. A worker owns nothing across tasks — every task gets a fresh browser context so cookies, storage, and service-worker state cannot leak between routes.

Aggregation collects serialized findings, deduplicates identical violations across route variants, validates every record against the pinned schema, and only then emits the dataset onward to error categorization triage pipelines. Malformed records are quarantined, not silently dropped.

The flow below traces one route from discovery through the compliance gate. Traversal and evaluation are the expensive middle; ingestion is cheap and aggregation is strict.

The reason the queue sits between ingestion and execution — rather than ingestion calling workers directly — is backpressure. A bounded queue with per-worker prefetch limits means a burst of forty thousand routes cannot spawn forty thousand concurrent browsers; workers pull only as fast as they can drain, and the broker holds the rest. The reason the schema gate sits before aggregation emits is that every downstream consumer — triage, dashboards, the compliance data lake — can then assume a fixed contract and skip defensive parsing.

Step-by-Step Implementation

1. Normalize route ingestion and priority routing

Ingestion produces a stable, idempotent task per route. The task ID is a hash of the normalized URL so re-enqueuing the same route is a no-op the broker can dedupe, and the priority tier drives queue ordering.

import hashlib
import re
from dataclasses import dataclass, asdict

EXCLUDE = re.compile(r"/(admin|staging|__preview)/|/logout\b")

@dataclass(frozen=True)
class ScanTask:
    task_id: str      # deterministic: dedupes re-enqueues of the same route
    url: str          # normalized, canonical form
    priority: int     # 0 = highest (P0); larger = lower priority

def normalize(url: str) -> str:
    # Drop query/fragment noise and collapse trailing slashes so that
    # /page, /page/, and /page?utm=... map to one canonical task.
    url = url.split("#", 1)[0].split("?", 1)[0]
    return url.rstrip("/") or "/"

def to_task(url: str, violation_density: float, monthly_views: int) -> ScanTask | None:
    if EXCLUDE.search(url):
        return None  # never scan admin/staging/auth-teardown routes
    canonical = normalize(url)
    task_id = hashlib.sha256(canonical.encode()).hexdigest()[:16]
    # Priority blends historical risk with reach; both are known before any scan.
    score = violation_density * 0.6 + min(monthly_views / 10_000, 1.0) * 0.4
    priority = 0 if score > 0.7 else 1 if score > 0.3 else 2
    return ScanTask(task_id, canonical, priority)

2. Enqueue with TTL, retry budget, and a dead-letter path

Push normalized tasks to the broker with an explicit visibility timeout and a bounded retry count. A task that exceeds its retries lands in a dead-letter queue for inspection instead of blocking the pipeline.

import json
import redis  # redis-py 5.x

r = redis.Redis(host="broker", decode_responses=True)

def enqueue(task: ScanTask, ttl_seconds: int = 900, max_retries: int = 3) -> None:
    payload = json.dumps({**asdict(task), "attempts": 0, "max_retries": max_retries})
    # ZADD by priority gives the worker a cheap "pull lowest score first".
    # The idempotent task_id means a duplicate enqueue overwrites, not appends.
    r.zadd("scan:pending", {payload: task.priority})
    r.expire("scan:pending", ttl_seconds, nx=True)

3. Run a stateless worker with an isolated context per task

Each worker leases a task, opens a fresh browser context, and disposes of it in a finally block. This is the single most important line in the whole system: a shared context is how session state, cookies, and cached auth leak across routes and produce phantom violations.

import asyncio
from playwright.async_api import async_playwright

AXE_SRC = open("node_modules/axe-core/axe.min.js").read()

async def scan_one(browser, task: dict) -> dict:
    context = await browser.new_context()  # clean state per route — no leakage
    try:
        page = await context.new_page()
        await page.goto(task["url"], wait_until="networkidle")
        # Wait for framework hydration, not a fixed sleep. The sentinel is the
        # app's own "ready" signal; see the parent traversal guide for details.
        await page.wait_for_selector("[data-a11y-ready]", timeout=10_000)
        await page.add_script_tag(content=AXE_SRC)
        result = await page.evaluate("async () => await axe.run(document)")
        return {"url": task["url"], "violations": result["violations"]}
    finally:
        await context.close()  # dispose even on failure — no orphaned contexts

async def worker(worker_id: int, concurrency: int = 4) -> None:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        sem = asyncio.Semaphore(concurrency)  # cap in-flight pages per worker
        async def bounded(task):
            async with sem:
                return await scan_one(browser, task)
        # ... lease tasks from the broker and dispatch through `bounded` ...
        await browser.close()

4. Aggregate, deduplicate, and schema-validate

Aggregation fingerprints each violation so the same failure on /product/1 and /product/2 collapses to one record with a list of affected URLs, then validates every record against the pinned contract before emitting.

import hashlib
import json
from jsonschema import Draft202012Validator

with open("schemas/finding.schema.json") as fh:
    VALIDATOR = Draft202012Validator(json.load(fh))

def fingerprint(url: str, rule_id: str, target: str) -> str:
    # DOM-path + rule identity; deliberately excludes the URL so identical
    # violations across route variants deduplicate into one finding.
    return hashlib.sha256(f"{rule_id}::{target}".encode()).hexdigest()[:16]

def aggregate(worker_outputs: list[dict]) -> list[dict]:
    findings: dict[str, dict] = {}
    for out in worker_outputs:
        for v in out["violations"]:
            for node in v["nodes"]:
                target = node["target"][0]
                fp = fingerprint(out["url"], v["id"], target)
                rec = findings.setdefault(fp, {
                    "fingerprint": fp, "rule_id": v["id"],
                    "impact": v["impact"], "target": target, "urls": [],
                })
                rec["urls"].append(out["url"])
    records = list(findings.values())
    errors = [e.message for r in records for e in VALIDATOR.iter_errors(r)]
    if errors:
        raise ValueError(f"{len(errors)} findings failed schema validation")
    return records

Configuration Reference

The parameters that govern throughput and determinism. Tune concurrency against real worker memory headroom, not aspirationally — a Chromium context peaks near 150–250 MB under load.

Parameter	Type	Default	Description
`WORKER_CONCURRENCY`	int	`4`	In-flight pages per worker. Cap so `concurrency × ~250 MB` stays under container memory minus overhead.
`QUEUE_TTL_SECONDS`	int	`900`	Message visibility timeout. Must exceed the slowest expected `goto → axe.run` cycle or tasks re-lease mid-scan.
`MAX_RETRIES`	int	`3`	Attempts before a task is dead-lettered. Excludes malformed-URL failures, which dead-letter on first parse error.
`HYDRATION_TIMEOUT_MS`	int	`10000`	Ceiling for the `data-a11y-ready` sentinel wait before the task is marked flaky.
`PREFETCH_COUNT`	int	`1`	Tasks a worker leases at once. Keep at `1` for even load distribution; raise only for very short scans.
`SHARD_COUNT`	int	`8`	Parallel workers/CI matrix legs. Total browsers = `SHARD_COUNT × WORKER_CONCURRENCY`; keep under host capacity.
`IMPACT_GATE`	str	`"serious"`	Minimum axe impact (`minor`/`moderate`/`serious`/`critical`) that blocks the pipeline.
`DEDUPE_SCOPE`	str	`"rule+target"`	Fingerprint basis. `rule+target` collapses cross-route repeats; add `url` only when per-route counts matter.

CI/CD Integration and Threshold Gating

Position batch validation as a dedicated stage after asset compilation and before deployment: Lint & Build → Route Discovery → Batch Validation → Threshold Gate → Deploy / Block. Distribute the route set across a CI matrix (SHARD_COUNT legs), but cap SHARD_COUNT × WORKER_CONCURRENCY below the runner’s browser capacity to avoid host-level OOM kills. Cache the pinned browser binaries and axe-core across runs to eliminate cold-start latency; inject short-lived service tokens for authenticated routes via CI secrets and rotate them per execution cycle. The per-run mechanics of wiring this into a pipeline are covered in running Playwright accessibility checks in CI/CD, and the same batch dispatch tuned for very large route sets is detailed in configuring axe-core for enterprise-scale batch scanning.

Avoid a binary pass/fail gate — it rewards suppression over remediation. Use tiered logic instead: fail immediately on critical or serious violations on primary journeys; warn and attach moderate or minor findings to the PR as tracked debt; and run regression detection against a baseline snapshot so the gate blocks new violations on previously clean routes even when the absolute count is within budget. Align the blocking threshold with your conformance target using the A/AA/AAA compliance level mapping rather than a raw violation ceiling, so the gate encodes legal intent instead of an arbitrary number.

Verification and Testing

Determinism check. Scan the same fixed route set twice against a frozen build and diff the aggregated findings. A non-empty diff means non-determinism — usually a hydration race or a leaked context, not an engine bug. Fix it before trusting any threshold.
Idempotent ingestion. Enqueue the same URL list twice and assert scan:pending cardinality is unchanged; the deterministic task_id should absorb the duplicate.
Schema conformance in CI. Run aggregate() against a golden set of worker outputs and assert it raises on a deliberately corrupted record. This is your early-warning that an engine upgrade changed the output shape.
Concurrency ceiling. Load-test one worker at WORKER_CONCURRENCY and watch RSS; if it approaches the container limit, lower concurrency or raise the quota before it manifests as intermittent CI failures.
Dead-letter drain. Inject a malformed URL and a deliberately slow route, then assert both land in the dead-letter queue within MAX_RETRIES × QUEUE_TTL_SECONDS without stalling healthy tasks.

Failure Modes and Troubleshooting

Browser stampede / OOM kill on the crawl host. Ingestion enqueued the full route set and workers spawned a browser per task with no ceiling. Root cause: unbounded concurrency or PREFETCH_COUNT too high. Fix: enforce the per-worker Semaphore, keep prefetch at 1, and size SHARD_COUNT × WORKER_CONCURRENCY against measured per-context memory.

Phantom violations that vanish on re-run. The same route reports different findings between runs. Root cause is almost always a reused browser context leaking auth/storage state, or evaluation firing before hydration. Fix: one fresh new_context() per task disposed in finally, and gate axe.run on the app’s readiness sentinel rather than networkidle alone. This overlaps with dynamic content boundary detection, which defines what “settled” means for a given framework.

Tasks re-leasing mid-scan and duplicating work. A slow route exceeds QUEUE_TTL_SECONDS, the broker assumes the worker died, and a second worker picks up the same task. Fix: set the visibility timeout above the p99 of goto → axe.run, and make aggregation fingerprint-deduplicate so a rare double-scan cannot inflate counts.

Dead-letter queue silently filling. Malformed payloads or a route that always times out accumulate unnoticed and coverage quietly drops. Fix: alert on dead-letter depth, and reconcile scanned-route count against the ingestion manifest every run so missing coverage surfaces as a gate warning, not a blind spot.

False positives from third-party widgets and dynamic SVGs. Marketing embeds and chart libraries trip contrast or role rules the team does not own. Fix: scope them out at axe-core enterprise configuration via exclude, and route the residue through error categorization triage pipelines so approved suppressions never re-enter the blocking set.

Frequently Asked Questions

Why does my batch report a different violation count every run?

Non-determinism in batch scanning is nearly always one of two things: a browser context reused across tasks (leaking cookies, storage, or auth state) or evaluation firing before the framework has hydrated. Give every task a fresh new_context() disposed in finally, and gate axe.run on an explicit readiness sentinel. If counts still drift, pin the Chromium build across every runner — a minor bump can shift contrast and target-size math.

How many concurrent browsers can one worker actually run?

Budget 150–250 MB per Chromium context under load and leave headroom for the browser process and OS. On a 4 GB worker, WORKER_CONCURRENCY of 4–6 is realistic; above that you trade throughput for intermittent OOM kills that surface as flaky CI. Measure RSS at your target concurrency rather than guessing, and treat SHARD_COUNT × WORKER_CONCURRENCY as the real browser count against host capacity.

Should the CI gate fail on every axe violation?

No. A binary gate pushes teams to suppress findings rather than fix them. Block only on critical and serious impact on primary journeys, warn on moderate and minor as tracked debt, and add regression detection against a baseline so new violations on previously clean routes block even when the total is within budget. Bind the blocking threshold to your conformance level, not a raw count.

Where do suppressed false positives belong — in the engine config or the pipeline?

Both, at different layers. Structural exclusions you never own (third-party iframes, analytics widgets) belong in the engine’s exclude scoping so they never generate a finding. Context-specific suppressions that require review belong in the triage layer, where they are auditable and reversible. Never hard-code suppressions inline in worker code, where they escape review and drift out of sync with the standard.

How do I keep batch runs inside a normal CI timeout?

Shard across a CI matrix, cache browser binaries and axe-core between runs, and reserve full-batch scans for scheduled off-peak jobs while CI triggers run delta scans against only changed routes. A queue-based dispatch with worker autoscaling keeps most enterprise batches inside a 15–30 minute window; if you routinely exceed it, the bottleneck is usually cold browser installs or an uncapped, memory-starved worker pool.

Batch Validation Architecture

Prerequisites and Environment Context #

Conceptual Model: Ingestion, Execution, and Aggregation as Independent Tiers #

Step-by-Step Implementation #

1. Normalize route ingestion and priority routing #

2. Enqueue with TTL, retry budget, and a dead-letter path #

3. Run a stateless worker with an isolated context per task #

4. Aggregate, deduplicate, and schema-validate #

Configuration Reference #

CI/CD Integration and Threshold Gating #

Verification and Testing #

Failure Modes and Troubleshooting #

Frequently Asked Questions #

Why does my batch report a different violation count every run? #

How many concurrent browsers can one worker actually run? #

Should the CI gate fail on every axe violation? #

Where do suppressed false positives belong — in the engine config or the pipeline? #

How do I keep batch runs inside a normal CI timeout? #

Related #