Configuring Axe-Core for Enterprise-Scale Batch Scanning

To scan thousands of routes with axe-core without noisy output, memory blowups, or drifting results, pin one version-controlled options object, run conformance tags only, chunk the URL queue across isolated browser contexts, and gate on serialized severity — not on a per-tab extension. This page resolves the specific failure where a configuration that works fine for a single page collapses the moment it runs across a fleet of workers.

It is a focused extension of the parent axe-core enterprise configuration guide, which establishes the deterministic options surface, and it sits under the broader Automated Scanning & Dynamic Content Ingestion strategy that orchestrates ingestion, evaluation, and reporting end to end. Where the parent page defines what a correct configuration looks like, this page answers how that configuration has to change when the same engine runs at batch scale.

When This Configuration Applies

Reach for the settings below only when all of the following hold. Outside these conditions the single-page defaults from the parent guide are sufficient and this page’s trade-offs (enabling elementRef, chunked contexts, explicit garbage collection) add cost for no benefit.

You are evaluating hundreds to thousands of URLs per run, sourced from a queue rather than a fixed test list.
Scans run on shared CI runners or a distributed worker pool where heap pressure and version drift between machines are real risks.
The target properties are heavy single-page applications with large DOM trees (tens of thousands of nodes) that must hydrate before evaluation.
Findings are consumed inside the same browser context — for example, a triage step that dereferences a live node handle before results leave the page.

If results are instead serialized out of the page and shipped across the network, keep elementRef off exactly as the parent guide recommends; a DOM handle cannot survive JSON serialization anyway.

Minimal Reproducible Example: the config that dies at scale

The naive approach reuses one page, evaluates the whole document with every registered rule, and never releases memory between URLs. It passes on a laptop against ten URLs and then exhausts the heap — and floods the backlog with best-practice noise — against ten thousand.

# ANTI-PATTERN — do not ship this.
from playwright.async_api import async_playwright

async def scan_all(urls: list[str]) -> list[dict]:
    results = []
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()          # one page reused for every URL
        for url in urls:
            await page.goto(url)                 # no readiness wait -> empty scopes
            await page.add_script_tag(url="https://cdn.example/axe.min.js")
            # No runOnly: every rule, including best-practice, runs on the
            # entire document. Detached DOM from prior pages is never released.
            report = await page.evaluate("() => axe.run(document)")
            results.append(report)               # full passes/inapplicable kept
    return results

Three defects compound at scale: the reused page accumulates detached DOM across navigations until the heap collapses; running all rules on the whole document returns best-practice findings that carry no conformance obligation and train teams to ignore the gate; and retaining passes and inapplicable nodes inflates every payload. None of these surface at low volume, which is why the configuration looks correct until it is deployed.

Correct Implementation: a batch-tuned options object

The batch configuration narrows the rule population to WCAG conformance tags, trims the result payload, and — because this page’s triage runs inside the browser context — deliberately keeps elementRef on. Critically, there is no timeout field: axe.run() has no option that waits for a single-page app to render. Hydration waiting is the traversal layer’s job, performed before the engine is invoked; frameWaitTime bounds only cross-frame messaging.

{
  "runOnly": {
    "type": "tag",
    "values": ["wcag2a", "wcag2aa", "wcag22aa"]
  },
  "absolutePaths": true,
  "elementRef": true,
  "resultTypes": ["violations", "incomplete"],
  "frameWaitTime": 2000
}

runOnly tag values combine with OR, not AND: the run is the union of every listed tag. Keeping the list to conformance tags (wcag2a, wcag2aa, and wcag22aa for the criteria new in 2.2) is what produces a conformance-only scan — adding a category tag such as cat.forms or leaving best-practice in the list broadens coverage and reintroduces the noise the anti-pattern suffered from. Restricting resultTypes to violations and incomplete skips building passes and inapplicable, shrinking output across the route set. Refine individual checks through a rules object rather than by muting whole tags — for example disabling color-contrast-enhanced (an AAA check) while keeping color-contrast for AA.

Worker isolation and explicit reclamation

Chunk the queue into discrete batches of fifty to one hundred URLs and give each batch a fresh browserContext that is torn down before the next cycle. This is what lets Chromium release detached DOM references instead of accumulating them.

The diagram below shows how the URL queue fans out to an isolated worker pool and reconverges at aggregation.

import gc
import json
from pathlib import Path
from playwright.async_api import async_playwright

# Load the version-controlled options object once per worker so every route in
# the shard evaluates against an identical, auditable rule set.
AXE_CONFIG = json.loads(Path("config/axe-batch-config.json").read_text())
AXE_ENGINE = "node_modules/axe-core/axe.min.js"  # pinned build, not a CDN or extension

async def scan_batch(urls: list[str]) -> list[dict]:
    results = []
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        # A dedicated context per batch prevents cross-session state leakage.
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            ignore_https_errors=True,
            user_agent="Enterprise-A11y-Scanner/2.0",
        )
        try:
            for url in urls:
                page = await context.new_page()
                try:
                    # networkidle is a coarse readiness signal; a settled-DOM
                    # check belongs here for heavy SPAs (see Gotchas).
                    await page.goto(url, wait_until="networkidle")
                    await page.add_script_tag(path=AXE_ENGINE)
                    report = await page.evaluate(
                        "async (cfg) => await axe.run(document, cfg)", AXE_CONFIG
                    )
                    results.append({"url": url, "violations": report["violations"]})
                finally:
                    await page.close()          # release the page's DOM promptly
        finally:
            await context.close()
            await browser.close()
    gc.collect()  # explicit CPython reclamation between batches
    return results

Injecting the engine with page.add_script_tag(path=...) from the pinned node_modules build — never a browser extension, which is unavailable under a headless sandbox and auto-updates anyway — is what eliminates version drift between runners. Assert on report["testEngine"]["version"] in a pre-flight so a mismatched engine fails the build instead of silently reporting different counts. For legacy portals with deeply nested owned iframes, keep frameWaitTime high enough that slow frames return real findings rather than incomplete.

Pipeline Integration

At batch scale the gate is not a single pass/fail on one report; it is a reduction over an aggregated payload. Parse each worker’s output against a schema, then apply severity thresholds before deciding the exit code. This is where batch configuration hands off to the surrounding automation:

Schema validation — validate every payload against a fixed shape, the same discipline formalized in JSON Schema validation for accessibility data, so a malformed result fails loudly instead of scoring zero violations.
Severity weighting — map impact (critical, serious, moderate, minor) to exit codes; block only when critical or serious counts exceed the agreed baseline.
Instance deduplication — group by id and node locator so one repeated component does not register as hundreds of failures; this is the entry point to the error categorization and triage pipelines.
Baseline allowlisting — keep a version-controlled baseline.json of accepted debt, requiring explicit approval for new entries.

# GitHub Actions gating step over the aggregated batch output.
- name: Evaluate Axe Results
  run: |
    python scripts/evaluate_a11y.py \
      --input results/scan_output.json \
      --schema schemas/axe-v4.schema.json \
      --block-on critical,serious \
      --allowlist config/baseline.json

Sharding the queue and aggregating results is the responsibility of the batch validation architecture; this page only supplies the per-worker configuration that architecture executes. Sequencing of the fan-out itself is governed by the Playwright headless scanning workflows that drive each runner.

Gotchas

networkidle is not DOM-settled. In auth-gated multi-tenant apps the network can idle while the tenant shell is still swapping views, so a scoped include matches nothing and the page scores a false clean. Pair wait_until="networkidle" with a settled-DOM check (no new nodes appended for ~500 ms) or a MutationObserver tied to a known landmark before injecting axe. Where the audit boundary itself is ambiguous across tenants, defer to dynamic content boundary detection.
Authenticated sessions leak across a reused context. Because a batch shares one browserContext, a cookie or token set on URL n can alter the render of URL n+1. For SSO-protected estates, either reset storage state between URLs or shard by tenant so one context never spans trust boundaries.
Infinite scroll and virtualized lists under-scan. A single evaluation only sees the initial viewport, so lazy-loaded rows never enter the population. Scroll to trigger each segment and re-run against the newly rendered region — the coverage pattern belongs to async crawling for infinite scroll pages — and clear the DOM cache between segments so coverage does not reintroduce the memory blowup.

Frequently Asked Questions

Why does axe.run() ignore my timeout value on slow SPAs?

Because axe.run() has no timeout option — it never waits for a page to render. Waiting for hydration is the driver’s job, performed before the engine is injected. Use a settled-DOM or MutationObserver gate in the traversal layer; frameWaitTime only bounds cross-frame messaging, not application readiness.

Should elementRef be true for batch scanning?

Only when triage runs inside the same browser context and dereferences live node handles before results leave the page. If results are serialized to JSON and shipped over the network, leave it false — a handle cannot survive serialization — and rely on absolutePaths/selectors to relocate the node.

My violation count jumped after I "narrowed" the scan. Why?

runOnly tag values combine with OR, so adding a category tag such as cat.forms or leaving best-practice in the list broadens the run to the union of all tags. Keep runOnly to wcag2a/wcag2aa/wcag22aa and trim individual checks through the rules object.

Configuring Axe-Core for Enterprise-Scale Batch Scanning

When This Configuration Applies #

Minimal Reproducible Example: the config that dies at scale #

Correct Implementation: a batch-tuned options object #

Worker isolation and explicit reclamation #

Pipeline Integration #

Gotchas #

Frequently Asked Questions #

Related #