Friday, 1 May 2026

Autonomous QA — Architecture Diagram

⚡ Stage 6: Autonomous QA — Architecture

5-Layer Quality Gate Pipeline · Bedrock-Powered · Fully Autonomous

In my last two blogs we build the autonomus SDLC AI powered system , then we build the Auto-healing , and In this blow we will go through the desgin part of Autonomous QA — Architecture

STAGE 6 — AUTONOMOUS QA PIPELINE ๐Ÿ”” TRIGGER: PRCreatedEvent QaOrchestrationService listens → polls GitHub Pages LAYER 1 Structure Check JAVA ๐Ÿ“„ index.html exists ๐Ÿ”— No broken refs (JS/CSS/img) ๐Ÿ“ Valid file structure ๐Ÿ“ HTML5 valid ⚡ HTTP 200 OK Engine: StructureCheckService.java — Pure Java HttpClient + regex scanning — No AI — Deterministic rules PASS ✓ LAYER 2 Security Audit HYBRID ๐Ÿ”’ OWASP A01–A10 scan ๐Ÿ›ก️ XSS detection (inline JS) ๐Ÿ”‘ Credential exposure ๐Ÿšซ Open redirect ๐Ÿ“ฆ CSP headers Engine: SecurityAuditService.java — Static regex rules (fast) + Bedrock deep analysis (qa-security-review.txt) — OWASP Top 10 mapping ๐Ÿ›ก️ PASS ✓ LAYER 3 Functional E2E Tests BEDROCK ๐Ÿงช User flow simulation ๐Ÿ“‹ Form submit validation ๐Ÿ”€ Navigation paths ๐Ÿ” Auth flow check ⚠️ Error handling Engine: FunctionalTestService.java — Bedrock-simulated (qa-functional-test.txt) — AI reads HTML+JS, traces user journeys, reports failures ๐Ÿง  PASS ✓ LAYER 4 Accessibility Audit HYBRID ♿ WCAG 2.1 AA compliance ๐Ÿท️ ARIA labels & roles ๐ŸŽจ Contrast ratio ≥ 4.5:1 ⌨️ Keyboard nav ๐Ÿ“ฑ Responsive Engine: AccessibilityAuditService.java — Java rules (contrast calc, ARIA check) + Bedrock deep review (qa-accessibility-review.txt) ๐Ÿ‘️ PASS ✓ LAYER 5 Performance Audit BEDROCK ⚡ Asset size analysis ๐Ÿ–ผ️ Image optimization ๐Ÿ“ฆ Render-blocking resources ๐Ÿ’พ Caching headers ๐Ÿš€ Load strategy Engine: PerformanceAuditService.java — Bedrock-simulated (qa-performance-review.txt) — AI analyzes asset graph + render path ๐Ÿš€ ๐Ÿ“Š QA Report (HTML + JSON) QaReportBuilder → DB + /api/qa/{reqId} ๐Ÿ“ PR Description Updated GitHub API PATCH → QA badge + findings ๐Ÿ“ก SSE: QA_COMPLETE Real-time dashboard update via SSE SEQUENTIAL EXECUTION → Score P/F 0–10 0–10 0–10 0–10

๐Ÿ”” Trigger: PRCreatedEvent

The entire QA pipeline is event-driven. When CodeGenerationService creates a pull request on GitHub, Spring publishes a PRCreatedEvent. The QaOrchestrationService listens for this event via @EventListener and kicks off the QA pipeline asynchronously (@Async).

Activation Sequence

StepActionDetail
1Event receivedPRCreatedEvent(reqId, prUrl, pagesUrl) captured by listener
2Poll GitHub PagesHTTP GET to pagesUrl every 15s, up to 3 min timeout, waiting for HTTP 200
3Fetch all pagesHttpClient crawls all HTML/JS/CSS from the deployed Pages site
4Execute 5 layersSequential execution — each layer receives the fetched content + previous layer results
5Aggregate & reportQaReportBuilder compiles findings → DB save → PR update → SSE broadcast

Why Event-Driven?

  • Decoupled — Code generation doesn't wait for QA; QA runs independently
  • Non-blocking — User sees PR created immediately; QA results stream in via SSE
  • Retry-safe — If Pages isn't ready, polling handles the delay gracefully

Layer 1: Structure Check Pure Java

The fastest, cheapest gate. Pure deterministic Java rules — no AI, no network calls to Bedrock. Catches deployment-breaking issues in milliseconds.

What It Checks

CheckRuleSeverity on Fail
Entry point existsindex.html must exist at repo rootCRITICAL
Broken referencesEvery <script src>, <link href>, <img src> must resolve to existing fileCRITICAL
File structureAll HTML files reference-able from root; no orphaned pagesHIGH
HTML5 validity<!DOCTYPE html>, <html lang>, <meta charset> presentMEDIUM
HTTP 200GitHub Pages URL returns 200 statusCRITICAL

Engine Details

  • Service: StructureCheckService.java
  • Technique: Java HttpClient for live URL checks; regex-based HTML parsing for reference extraction
  • Scoring: Pass/Fail (binary) — any CRITICAL finding = layer fails, pipeline short-circuits with report
  • Performance: Completes in <2 seconds typically
  • Why first? If the site doesn't load or has broken refs, deeper analysis is pointless

Layer 2: Security Audit Hybrid

Two-pass security analysis modeled on OWASP Top 10. First pass: fast static regex rules catch known patterns. Second pass: Bedrock deep analysis for nuanced vulnerabilities that pattern matching misses.

Pass 1: Static Rules (Java)

RulePatternMaps to OWASP
Inline JavaScript detectiononclick=, javascript:, eval(A03: Injection / XSS
Credential exposurepassword in URL params, hardcoded tokens, localStorage for secretsA07: Auth Failures
Form action validationForms with method="GET" containing password fieldsA04: Insecure Design
Open redirectUnvalidated window.location assignments from URL paramsA01: Broken Access
Missing security headersNo CSP meta tag, no X-Frame-OptionsA05: Security Misconfig

Pass 2: Bedrock Deep Analysis

  • Prompt: qa-security-review.txt — sends full HTML+JS source to Bedrock
  • AI analyzes: Authentication flow logic, session management, data sanitization patterns, DOM manipulation safety, third-party script risks
  • Output: JSON array of findings with severity, owaspCategory, location, remediation

Scoring

  • Security Score: 0–10 scale (10 = no findings)
  • Each CRITICAL finding: −3 points. HIGH: −2. MEDIUM: −1. LOW: −0.5
  • Gate threshold: Advisory only (no blocking) — but CRITICAL findings highlighted in PR

Layer 3: Functional E2E Tests Bedrock AI

Since the generated apps are static GitHub Pages sites (HTML/CSS/JS only), traditional browser automation (Selenium/Playwright) is overkill. Instead, Bedrock AI reads the complete source code and mentally simulates user journeys — tracing event handlers, form submissions, navigation flows, and state management.

What Bedrock Simulates

JourneyWhat AI TracesExpected Behavior
Login flowForm submit handler → validation → redirect → session storageInvalid creds show error; valid creds redirect to home
NavigationAnchor hrefs, window.location, back/forward logicAll links navigate to existing pages; no dead ends
CRUD operationsDOM manipulation, localStorage read/write, event chainsAdd/edit/delete reflect in UI; data persists across page loads
Auth guardssessionStorage/localStorage checks on page loadUnauthenticated users redirected to login
Error handlingTry/catch blocks, error display elements, edge casesGraceful degradation; user-visible messages

Why Bedrock-Simulated vs. Real Browser?

  • No infrastructure: No Selenium grid, no headless Chrome, no Docker containers
  • Deeper analysis: AI understands intent, not just DOM state — catches logic errors a click-test would miss
  • Cost-effective: One Bedrock invocation covers dozens of simulated journeys
  • Trade-off: Cannot catch rendering bugs or CSS layout issues (Layer 4 partially covers this)

Scoring

  • Score: 0–10 (10 = all journeys pass)
  • AI returns structured JSON: { journey, steps[], result: "pass"|"fail", issue?, remediation? }

Layer 4: Accessibility Audit Hybrid

Ensures WCAG 2.1 Level AA compliance through a combination of deterministic Java checks (machine-verifiable criteria) and Bedrock analysis (human-judgment criteria that require understanding context).

Pass 1: Java Rules (Deterministic)

CheckImplementationWCAG Criterion
Image alt textRegex: every <img> must have non-empty alt1.1.1 Non-text Content
Form labelsEvery <input> has associated <label> or aria-label1.3.1 Info and Relationships
Color contrastParse CSS color/background-color; compute luminance ratio ≥ 4.5:11.4.3 Contrast (Minimum)
Heading hierarchyVerify h1h2h3 sequence; no skips1.3.1 Info and Relationships
Language attribute<html lang="..."> present3.1.1 Language of Page
Focus stylesCSS includes :focus rules; no outline: none without replacement2.4.7 Focus Visible

Pass 2: Bedrock Deep Review

  • Prompt: qa-accessibility-review.txt
  • AI evaluates: Semantic HTML usage, ARIA roles/states correctness, keyboard navigation completeness, screen reader experience, touch target sizing, cognitive load assessment
  • Key insight: Many WCAG criteria (e.g., "meaningful sequence", "consistent navigation") require human-level understanding that pure regex cannot provide

Scoring

  • Accessibility Score: 0–10 (weighted: Java checks 40%, Bedrock analysis 60%)
  • Maps each finding to specific WCAG Success Criterion with conformance level (A, AA, AAA)

Layer 5: Performance Audit Bedrock AI

Analyzes the asset graph and render path of the deployed site. Since these are static sites without server-side rendering, performance analysis focuses on client-side loading strategy, asset optimization, and perceived performance.

What Bedrock Analyzes

CategoryAnalysisCommon Findings
Asset sizeTotal page weight, individual file sizes, unminified detectionUnminified JS >50KB, oversized images
Render blocking<script> without defer/async, CSS in <head> load orderRender-blocking scripts in <head>
Image optimizationFormat analysis (PNG vs WebP), dimensions, lazy loadingMissing loading="lazy", no width/height
CachingAsset fingerprinting, cache-control headers, CDN usageNo cache busting on CSS/JS filenames
Critical render pathFirst paint blocking resources, inline critical CSS presenceAll CSS loaded before any content renders

Why Bedrock Instead of Lighthouse?

  • No headless Chrome needed: Lighthouse requires a browser runtime; Bedrock works from source alone
  • Context-aware: AI understands that a login page's performance profile differs from a dashboard
  • Actionable output: AI provides specific remediation steps, not just scores
  • Trade-off: Cannot measure actual FCP/LCP/CLS metrics — these require real rendering

Scoring

  • Performance Score: 0–10
  • Deductions: unminified assets (−2), render-blocking scripts (−1.5), no lazy loading (−1), missing cache strategy (−1)

๐Ÿ“Š Output: Report, PR Update & SSE Broadcast

After all 5 layers complete, QaReportBuilder aggregates findings into a unified report. Three outputs are generated simultaneously:

1. QA Report (Database + API)

  • DB entities: QaReport (one per run) + QaFinding (one per issue) stored via JPA
  • API endpoint: GET /api/qa/{reqId} returns JSON; GET /requirements/{reqId}/qa renders HTML view
  • Schema: Flyway V18__qa_tables.sqlqa_report (id, req_id, overall_score, security_score, accessibility_score, performance_score, functional_score, structure_pass, created_at) + qa_finding (id, report_id, layer, severity, category, description, location, remediation)

2. PR Description Patch

  • Mechanism: GitHub API PATCH /repos/{owner}/{repo}/pulls/{number}
  • Content: Appends QA badge (overall score with color), summary table of findings per layer, and critical findings with remediation steps
  • Advisory only: Does not block merge — provides visibility for human reviewer

3. SSE Broadcast

  • Event: QA_COMPLETE sent via PipelineStreamService
  • Payload: Overall score, per-layer scores, critical finding count
  • Dashboard: Real-time update on requirement detail page — QA section appears with expandable layer results

Composite Scoring

ComponentWeightRange
StructureGate (must pass)Pass / Fail
Security30%0–10
Functional30%0–10
Accessibility25%0–10
Performance15%0–10
Overall100%0–10