⚡ Stage 6: Autonomous QA — Architecture
5-Layer Quality Gate Pipeline · Bedrock-Powered · Fully Autonomous
In my last two blogs we build the autonomus SDLC AI powered system , then we build the Auto-healing , and In this blow we will go through the desgin part of Autonomous QA — Architecture
🔔 Trigger: PRCreatedEvent
The entire QA pipeline is event-driven. When CodeGenerationService creates a pull request on GitHub, Spring publishes a PRCreatedEvent. The QaOrchestrationService listens for this event via @EventListener and kicks off the QA pipeline asynchronously (@Async).
Activation Sequence
| Step | Action | Detail |
|---|---|---|
| 1 | Event received | PRCreatedEvent(reqId, prUrl, pagesUrl) captured by listener |
| 2 | Poll GitHub Pages | HTTP GET to pagesUrl every 15s, up to 3 min timeout, waiting for HTTP 200 |
| 3 | Fetch all pages | HttpClient crawls all HTML/JS/CSS from the deployed Pages site |
| 4 | Execute 5 layers | Sequential execution — each layer receives the fetched content + previous layer results |
| 5 | Aggregate & report | QaReportBuilder compiles findings → DB save → PR update → SSE broadcast |
Why Event-Driven?
- Decoupled — Code generation doesn't wait for QA; QA runs independently
- Non-blocking — User sees PR created immediately; QA results stream in via SSE
- Retry-safe — If Pages isn't ready, polling handles the delay gracefully
Layer 1: Structure Check Pure Java
The fastest, cheapest gate. Pure deterministic Java rules — no AI, no network calls to Bedrock. Catches deployment-breaking issues in milliseconds.
What It Checks
| Check | Rule | Severity on Fail |
|---|---|---|
| Entry point exists | index.html must exist at repo root | CRITICAL |
| Broken references | Every <script src>, <link href>, <img src> must resolve to existing file | CRITICAL |
| File structure | All HTML files reference-able from root; no orphaned pages | HIGH |
| HTML5 validity | <!DOCTYPE html>, <html lang>, <meta charset> present | MEDIUM |
| HTTP 200 | GitHub Pages URL returns 200 status | CRITICAL |
Engine Details
- Service:
StructureCheckService.java - Technique: Java
HttpClientfor live URL checks; regex-based HTML parsing for reference extraction - Scoring: Pass/Fail (binary) — any CRITICAL finding = layer fails, pipeline short-circuits with report
- Performance: Completes in <2 seconds typically
- Why first? If the site doesn't load or has broken refs, deeper analysis is pointless
Layer 2: Security Audit Hybrid
Two-pass security analysis modeled on OWASP Top 10. First pass: fast static regex rules catch known patterns. Second pass: Bedrock deep analysis for nuanced vulnerabilities that pattern matching misses.
Pass 1: Static Rules (Java)
| Rule | Pattern | Maps to OWASP |
|---|---|---|
| Inline JavaScript detection | onclick=, javascript:, eval( | A03: Injection / XSS |
| Credential exposure | password in URL params, hardcoded tokens, localStorage for secrets | A07: Auth Failures |
| Form action validation | Forms with method="GET" containing password fields | A04: Insecure Design |
| Open redirect | Unvalidated window.location assignments from URL params | A01: Broken Access |
| Missing security headers | No CSP meta tag, no X-Frame-Options | A05: Security Misconfig |
Pass 2: Bedrock Deep Analysis
- Prompt:
qa-security-review.txt— sends full HTML+JS source to Bedrock - AI analyzes: Authentication flow logic, session management, data sanitization patterns, DOM manipulation safety, third-party script risks
- Output: JSON array of findings with
severity,owaspCategory,location,remediation
Scoring
- Security Score: 0–10 scale (10 = no findings)
- Each CRITICAL finding: −3 points. HIGH: −2. MEDIUM: −1. LOW: −0.5
- Gate threshold: Advisory only (no blocking) — but CRITICAL findings highlighted in PR
Layer 3: Functional E2E Tests Bedrock AI
Since the generated apps are static GitHub Pages sites (HTML/CSS/JS only), traditional browser automation (Selenium/Playwright) is overkill. Instead, Bedrock AI reads the complete source code and mentally simulates user journeys — tracing event handlers, form submissions, navigation flows, and state management.
What Bedrock Simulates
| Journey | What AI Traces | Expected Behavior |
|---|---|---|
| Login flow | Form submit handler → validation → redirect → session storage | Invalid creds show error; valid creds redirect to home |
| Navigation | Anchor hrefs, window.location, back/forward logic | All links navigate to existing pages; no dead ends |
| CRUD operations | DOM manipulation, localStorage read/write, event chains | Add/edit/delete reflect in UI; data persists across page loads |
| Auth guards | sessionStorage/localStorage checks on page load | Unauthenticated users redirected to login |
| Error handling | Try/catch blocks, error display elements, edge cases | Graceful degradation; user-visible messages |
Why Bedrock-Simulated vs. Real Browser?
- No infrastructure: No Selenium grid, no headless Chrome, no Docker containers
- Deeper analysis: AI understands intent, not just DOM state — catches logic errors a click-test would miss
- Cost-effective: One Bedrock invocation covers dozens of simulated journeys
- Trade-off: Cannot catch rendering bugs or CSS layout issues (Layer 4 partially covers this)
Scoring
- Score: 0–10 (10 = all journeys pass)
- AI returns structured JSON:
{ journey, steps[], result: "pass"|"fail", issue?, remediation? }
Layer 4: Accessibility Audit Hybrid
Ensures WCAG 2.1 Level AA compliance through a combination of deterministic Java checks (machine-verifiable criteria) and Bedrock analysis (human-judgment criteria that require understanding context).
Pass 1: Java Rules (Deterministic)
| Check | Implementation | WCAG Criterion |
|---|---|---|
| Image alt text | Regex: every <img> must have non-empty alt | 1.1.1 Non-text Content |
| Form labels | Every <input> has associated <label> or aria-label | 1.3.1 Info and Relationships |
| Color contrast | Parse CSS color/background-color; compute luminance ratio ≥ 4.5:1 | 1.4.3 Contrast (Minimum) |
| Heading hierarchy | Verify h1→h2→h3 sequence; no skips | 1.3.1 Info and Relationships |
| Language attribute | <html lang="..."> present | 3.1.1 Language of Page |
| Focus styles | CSS includes :focus rules; no outline: none without replacement | 2.4.7 Focus Visible |
Pass 2: Bedrock Deep Review
- Prompt:
qa-accessibility-review.txt - AI evaluates: Semantic HTML usage, ARIA roles/states correctness, keyboard navigation completeness, screen reader experience, touch target sizing, cognitive load assessment
- Key insight: Many WCAG criteria (e.g., "meaningful sequence", "consistent navigation") require human-level understanding that pure regex cannot provide
Scoring
- Accessibility Score: 0–10 (weighted: Java checks 40%, Bedrock analysis 60%)
- Maps each finding to specific WCAG Success Criterion with conformance level (A, AA, AAA)
Layer 5: Performance Audit Bedrock AI
Analyzes the asset graph and render path of the deployed site. Since these are static sites without server-side rendering, performance analysis focuses on client-side loading strategy, asset optimization, and perceived performance.
What Bedrock Analyzes
| Category | Analysis | Common Findings |
|---|---|---|
| Asset size | Total page weight, individual file sizes, unminified detection | Unminified JS >50KB, oversized images |
| Render blocking | <script> without defer/async, CSS in <head> load order | Render-blocking scripts in <head> |
| Image optimization | Format analysis (PNG vs WebP), dimensions, lazy loading | Missing loading="lazy", no width/height |
| Caching | Asset fingerprinting, cache-control headers, CDN usage | No cache busting on CSS/JS filenames |
| Critical render path | First paint blocking resources, inline critical CSS presence | All CSS loaded before any content renders |
Why Bedrock Instead of Lighthouse?
- No headless Chrome needed: Lighthouse requires a browser runtime; Bedrock works from source alone
- Context-aware: AI understands that a login page's performance profile differs from a dashboard
- Actionable output: AI provides specific
remediationsteps, not just scores - Trade-off: Cannot measure actual FCP/LCP/CLS metrics — these require real rendering
Scoring
- Performance Score: 0–10
- Deductions: unminified assets (−2), render-blocking scripts (−1.5), no lazy loading (−1), missing cache strategy (−1)
📊 Output: Report, PR Update & SSE Broadcast
After all 5 layers complete, QaReportBuilder aggregates findings into a unified report. Three outputs are generated simultaneously:
1. QA Report (Database + API)
- DB entities:
QaReport(one per run) +QaFinding(one per issue) stored via JPA - API endpoint:
GET /api/qa/{reqId}returns JSON;GET /requirements/{reqId}/qarenders HTML view - Schema: Flyway
V18__qa_tables.sql—qa_report(id, req_id, overall_score, security_score, accessibility_score, performance_score, functional_score, structure_pass, created_at) +qa_finding(id, report_id, layer, severity, category, description, location, remediation)
2. PR Description Patch
- Mechanism: GitHub API
PATCH /repos/{owner}/{repo}/pulls/{number} - Content: Appends QA badge (overall score with color), summary table of findings per layer, and critical findings with remediation steps
- Advisory only: Does not block merge — provides visibility for human reviewer
3. SSE Broadcast
- Event:
QA_COMPLETEsent viaPipelineStreamService - Payload: Overall score, per-layer scores, critical finding count
- Dashboard: Real-time update on requirement detail page — QA section appears with expandable layer results
Composite Scoring
| Component | Weight | Range |
|---|---|---|
| Structure | Gate (must pass) | Pass / Fail |
| Security | 30% | 0–10 |
| Functional | 30% | 0–10 |
| Accessibility | 25% | 0–10 |
| Performance | 15% | 0–10 |
| Overall | 100% | 0–10 |
No comments:
Post a Comment