Validating Accessibility Metadata with JSON Schema

This page resolves a specific, recurring failure: a schema pinned to the axe-core result shape rejects findings that look perfectly valid. The two symptoms are always the same — additionalProperties violations triggered by framework-specific debug keys the engine injected, and impact enum failures caused by severity casing or synonyms drifting through a serialization layer. The fix is not to loosen the contract; it is a deterministic pre-flight normalization step that rewrites the payload into the canonical shape before validation, so the gate stays strict and the false rejections disappear.

This is a focused how-to within the JSON Schema Validation for Accessibility Data guide, itself part of the broader Automated Scanning & Dynamic Content Ingestion strategy. Where the parent guide defines the canonical schema and the routing gate, this page zooms in on the single seam where valid-but-noisy engine output meets a closed contract — and how to reconcile the two without dropping records.

When This Applies

Normalization is only worth the extra step under specific conditions. If your workers run one pinned engine build and write axe results straight to the validator with no middleware in between, you will rarely see these rejections — the shapes already match. The failure pattern shows up when one or more of the following is true:

Instrumented capture. The payload passes through a wrapper — a React/Vue test harness, a custom reporter, or an APM shim — that appends debug keys such as framework_debug or _internal_trace to each violation. A schema with additionalProperties: false rejects the whole record on the first unmodelled key.
Serialization round-trips. Findings are serialized to a queue (Redis, SQS, Kafka) and rehydrated downstream. Intermediate transforms lower-case, title-case, or remap severity, so impact arrives as "High" or "medium" instead of the canonical axe strings.
Mixed engine versions. A fleet mid-upgrade runs two axe-core enterprise configuration builds at once; target selectors arrive as a bare string on the old build and a string array on the new one.
Dynamic rendering. Scans against lazy-loaded or virtualized UI — see async crawling for infinite-scroll pages — capture partially hydrated ARIA state, producing sparse nodes that trip type constraints.

If none of these describe your pipeline, the parent guide’s plain validation gate is enough. If any do, read on.

Minimal Reproducible Example

The smallest reproduction is a two-property schema fragment and a single violation that fails it. Assume the contract closes the object and pins impact to the four canonical axe severities:

import jsonschema

FINDING_SCHEMA = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "additionalProperties": False,  # any unmodelled key is a hard rejection
    "required": ["id", "impact", "nodes"],
    "properties": {
        "id": {"type": "string"},
        "impact": {"enum": ["minor", "moderate", "serious", "critical"]},
        "nodes": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {"target": {"type": "array", "items": {"type": "string"}}},
            },
        },
    },
}

# A finding that "looks fine" but was touched by a reporter shim and a queue transform.
raw_violation = {
    "id": "color-contrast",
    "impact": "High",                       # wrong case + synonym, not a canonical enum value
    "framework_debug": {"render_ms": 42},   # injected debug key
    "nodes": [{"target": "div.hero > p"}],  # bare string, schema wants an array
}

validator = jsonschema.Draft202012Validator(FINDING_SCHEMA)
for err in validator.iter_errors(raw_violation):
    print(f"{list(err.absolute_path)}: {err.message}")

Running this prints three distinct errors, each anchored to the exact node:

[]: Additional properties are not allowed ('framework_debug' was unexpected)
['impact']: 'High' is not one of ['minor', 'moderate', 'serious', 'critical']
['nodes', 0, 'target']: 'div.hero > p' is not of type 'array'

None of these are real accessibility problems. The finding is a legitimate contrast violation; it was simply mangled in transit. Loosening the schema — flipping additionalProperties to true or widening the enum — would hide the very drift the gate exists to catch. The correct move is to canonicalize the payload first.

Correct Implementation

Insert a pure, deterministic normalization function between capture and validation. It rewrites known-recoverable deviations into the canonical shape and leaves everything else untouched, so a genuinely malformed record still fails. Note the two design choices that keep the contract honest: severity is remapped to a canonical string (never a number), and any sortable weight lives in a separate field so impact remains a valid enum value.

import jsonschema

# Canonical axe impact strings; the schema's `impact` enum expects these exact values.
CANONICAL_IMPACT = {"minor", "moderate", "serious", "critical"}
IMPACT_SYNONYMS = {"low": "minor", "medium": "moderate", "high": "serious"}
# Numeric weight kept in a SEPARATE field so `impact` stays a valid enum string.
IMPACT_WEIGHT = {"minor": 1, "moderate": 2, "serious": 3, "critical": 4}

# Debug keys instrumentation layers are known to inject; strip, never model.
DEBUG_KEYS = ("framework_debug", "_internal_trace")


def normalize_finding(violation):
    """Rewrite recoverable engine/transport drift into the canonical schema shape.

    Pure and idempotent: running it twice yields the same result, and it never
    invents data — a finding with no salvageable `impact` is coerced to the
    lowest severity so it still validates and reaches triage rather than being
    silently dropped.
    """
    # 1. Canonicalize impact: normalize case/synonyms but keep it a string enum value.
    raw_impact = (violation.get("impact") or "minor").strip().lower()
    impact = IMPACT_SYNONYMS.get(raw_impact, raw_impact)
    if impact not in CANONICAL_IMPACT:
        impact = "minor"
    violation["impact"] = impact
    violation["impact_weight"] = IMPACT_WEIGHT[impact]  # sortable, lives on its own key

    # 2. Drop injected debug keys so additionalProperties:false does not reject the record.
    for key in DEBUG_KEYS:
        violation.pop(key, None)

    # 3. Coerce bare-string DOM targets into the string arrays the schema requires.
    for node in violation.get("nodes", []):
        if isinstance(node.get("target"), str):
            node["target"] = [node["target"]]

    return violation


def validate_findings(raw_axe_results, validator):
    """Normalize then validate; yield (finding, errors) so callers route, not crash."""
    for violation in raw_axe_results.get("violations", []):
        finding = normalize_finding(violation)
        errors = [
            f"{list(e.absolute_path)}: {e.message}"
            for e in validator.iter_errors(finding)
        ]
        yield finding, errors


validator = jsonschema.Draft202012Validator(FINDING_SCHEMA)
for finding, errors in validate_findings({"violations": [raw_violation]}, validator):
    status = "PASS" if not errors else "QUARANTINE"
    print(status, finding["id"], errors)

With normalization in place the same raw_violation now prints PASS color-contrast []. The schema never relaxed; the payload was simply canonicalized into it. Cross-check the field names against the official axe-core documentation RuleResult and NodeResult structures whenever you pin a new engine version, and confirm the enum against the JSON Schema draft 2020-12 specification so the dialect in the $schema keyword matches the validator you construct.

The core flow — capture, normalize, gate on structural validity, then route — is a single deterministic pass:

Pipeline Integration

Normalization belongs on the ingestion side of the boundary, running per-record immediately after capture and before the payload reaches the aggregation gate. It is deliberately the only place shape-fixing happens: the batch validation architecture that aggregates findings and the error categorization triage pipelines that route them both assume every record already conforms, so they can skip defensive parsing entirely. Keep the normalization map (IMPACT_SYNONYMS, DEBUG_KEYS) version-controlled alongside the schema and the engine pin — when an axe upgrade introduces a new synonym or debug key, the diff to add it is one line, reviewed like any other contract change. In CI this step is cheap enough to run inline on every pull request; only escalate a record to a hard failure when it still fails validation after normalization, since that is the signal of genuine drift rather than transport noise.

Gotchas

Authenticated routes mutate severity context. A finding captured behind a login can carry tenant- or role-specific metadata the public route never emits. Normalize the shape of that metadata, but never let an auth-only key silently pass additionalProperties: false — model it explicitly or strip it, so a leaked debug field from a privileged session cannot corrupt the contract.
Multi-tenant routing collides IDs. When one worker scans several tenants, two findings can share a rule id but belong to different surfaces. Normalization must not deduplicate on id alone; preserve the url and any tenant discriminator so downstream aggregation attributes each violation correctly.
Viewport variance changes node counts, not shape. A responsive component evaluated at mobile and desktop widths yields different nodes arrays for the same rule. That is expected and valid — do not treat a differing node count as drift. Only a changed type (a bare string where an array is required) is a normalization concern; a changed count is real signal for triage.

FAQ

Why not just set `additionalProperties: true` to stop the rejections?

Because that trades a loud, reviewable failure for silent data corruption. The closed contract is what turns a new engine field into an explicit rejection you can inspect. Strip known debug keys in normalization instead, and keep the root object closed so genuinely unmodelled fields still surface.

Should `impact` ever hold a numeric severity?

No. Keep impact a canonical enum string so it validates and reads consistently across languages and dashboards. Put any sortable severity in a separate impact_weight field, as the implementation above does — mixing a number into the enum field breaks both the schema and every consumer that filters on severity by name.

Where do I put the normalization step relative to the validator?

Immediately before it, per record, on the ingestion side. Normalize, then call iter_errors. Running the validator first defeats the purpose, and running normalization after aggregation means malformed records have already polluted the batch.

The validator still rejects a record after normalization — what now?

That is the intended outcome: a post-normalization failure is real drift, not transport noise. Route the record to quarantine with its JSON Pointer path intact, and treat a spike in the quarantine rate — not any single record — as the trigger to update the schema or the engine pin.

Validating Accessibility Metadata with JSON Schema

When This Applies #

Minimal Reproducible Example #

Correct Implementation #

Pipeline Integration #

Gotchas #

FAQ #

Why not just set additionalProperties: true to stop the rejections? #

Should impact ever hold a numeric severity? #

Where do I put the normalization step relative to the validator? #

The validator still rejects a record after normalization — what now? #

Related #