What If AI Agents Put Each Other on Trial?

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

HumanLayer is a multi-agent AI governance platform. It features specialised Gemma 4 agents that collaborate to review, challenge, and hold one another accountable. This approach differs from a single model that quietly makes all the decisions.

Most governance tooling is written for people who already understand governance. I aimed to build a system that makes governance decisions understandable to the people they impact. I consider plain-English explainability a hard requirement, not simply a nice-to-have.

Overall Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                          Browser / Next.js 15 UI                            │
│          Upload Doc  ·  Start Tribunal  ·  Appeal  ·  Audit Trail           │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │  HTTPS + CSRF
┌────────────────────────────────▼────────────────────────────────────────────┐
│                        FastAPI Backend  (port 8000)                         │
│  REST API v1  ·  JWT Auth  ·  Rate Limiting  ·  Content Scan Middleware     │
│                                                                              │
│  ┌─────────────────────────┐     ┌──────────────────────────────────────┐  │
│  │   Governance Council    │     │       Constitutional Tribunal        │  │
│  │                         │     │                                      │  │
│  │  Security  ·  Ethics    │     │  Prosecutor  ·  Defender             │  │
│  │  Privacy   ·  Access.   │     │  Advocate    ·  Ethics               │  │
│  │  Audit                  │     │       ↓  3 debate rounds             │  │
│  │       ↓  parallel       │     │  AI Jury ×4  →  Governance Judge     │  │
│  │  Consensus Engine       │     │       ↓  constitutional ruling       │  │
│  └────────────┬────────────┘     └─────────────┬────────────────────────┘  │
│               └──────────────────┬──────────────┘                           │
│                                  │                                           │
│              ┌───────────────────▼──────────────────┐                       │
│              │          Celery Worker (async)        │                       │
│              │  tribunal · analysis · documents ·    │                       │
│              │  default queues · concurrency=2       │                       │
│              └───────────┬───────────────────────────┘                      │
└──────────────────────────┼──────────────────────────────────────────────────┘
                           │
          ┌────────────────┼────────────────────┐
          ▼                ▼                     ▼
┌──────────────┐  ┌─────────────────┐  ┌─────────────────────────────────┐
│  PostgreSQL  │  │  Redis (broker  │  │    Google AI Studio             │
│  (SQLAlchemy │  │  + result store)│  │                                 │
│   async)     │  └─────────────────┘  │  gemma-4-26b-a4b-it  (MoE)     │
│              │                       │  gemma-4-31b-it      (Dense)    │
│  Cases  ·    │  ┌─────────────────┐  │                                 │
│  Agents ·    │  │  Local Storage  │  │  ← resolves gemma4:2b/4b/9b/moe │
│  Audits ·    │  │  /uploads       │  │    → MoE at runtime             │
│  Precedents  │  └─────────────────┘  │  ← resolves gemma4:31b          │
└──────────────┘                       │    → Dense 31B at runtime       │
                                       └─────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│              Observability  (OpenTelemetry → Prometheus → Grafana)          │
│              Dev Monitor on port 8001  ·  SSE log stream  ·  phase bars    │
└─────────────────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The platform has two modes:

Governance Council — Five agents check uploaded documents. They review policy docs, OAuth configs, onboarding flows, and architecture reports. They ensure compliance with GDPR, AI Act, WCAG 2.2, and ISO 27001 standards. They then use a consensus engine to create one governance verdict.

Constitutional Tribunal — A full adversarial process: four agents present their cases in three debate rounds. An AI jury with four members checks the reasoning quality, and a Governance Judge delivers a constitutional ruling. The human can appeal and override at any point.

┌──────────────────────────────────────────────────────────────────────────────────┐
│                        Constitutional Tribunal                                   │
│                                                                                  │
│  ┌─────────────────┐  ┌──────────────────────┐  ┌──────────────┐  ┌──────────┐ │
│  │ Security        │  │ Accessibility         │  │ Privacy      │  │ Ethics   │ │
│  │ Prosecutor      │  │ Defender              │  │ Advocate     │  │ Council  │ │
│  │ gemma4:31b      │  │ gemma4:2b             │  │ gemma4:9b    │  │ gemma4:  │ │
│  │                 │  │ · WCAG 2.2 AA         │  │ · GDPR Art.7 │  │ 31b      │ │
│  │ · OWASP Top 10  │  │ · Flesch-Kincaid      │  │ · Data min.  │  │ · Bias   │ │
│  │ · STRIDE model  │  │ · Plain-English       │  │ · Consent    │  │   ×8     │ │
│  │ · RBAC/OAuth    │  │   rewrite             │  │   validity   │  │   classes│ │
│  └────────┬────────┘  └──────────┬────────────┘  └──────┬───────┘  └────┬─────┘ │
│           └───────────────────────┴──────────────────────┴───────────────┘       │
│                              Phase 1: Opening Arguments (concurrent)             │
│                              Phase 2: Cross-Examination (each challenges all)    │
│                              Phase 3: Closing Arguments (full history injected)  │
│                                              │                                   │
│                              ┌───────────────▼────────────────────────┐         │
│                              │  AI Jury Panel  (×4 independent)       │         │
│                              │  gemma4:moe                            │         │
│                              │  · Evidence validity check             │         │
│                              │  · Logical consistency score           │         │
│                              │  · Hallucination risk flag             │         │
│                              │  · Constitutional alignment            │         │
│                              └───────────────┬────────────────────────┘         │
│                              Phase 4: Jury Deliberation + consensus score        │
│                              Phase 5: Trust scores + governance DSL rules        │
│                                              │                                   │
│                              ┌───────────────▼────────────────────────┐         │
│                              │  Governance Judge                      │         │
│                              │  gemma4:moe                            │         │
│                              │  approve · reject · escalate ·         │         │
│                              │  conditional (with remediation steps)  │         │
│                              └───────────────┬────────────────────────┘         │
│                              Phase 6: Constitutional Ruling                      │
│                                              │                                   │
│               ┌──────────────────────────────┼──────────────────────────┐       │
│               ▼                              ▼                          ▼       │
│      ┌────────────────┐          ┌───────────────────┐       ┌─────────────────┐│
│      │  Audit Agent   │          │  Human Appeal &   │       │  Precedent      ││
│      │  gemma4:4b     │          │  Override         │       │  Library        ││
│      │  Immutable     │          │  always available │       │  high-confidence││
│      │  audit trail   │          │  at any phase     │       │  cases stored   ││
│      └────────────────┘          └───────────────────┘       └─────────────────┘│
│      Phase 7: Audit + Precedent                                                  │
└──────────────────────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Every tribunal case runs 7 ordered phases:

Phase What happens
1. Opening Arguments All 4 adversarial agents argue concurrently
2. Cross-Examination Each agent challenges the others' positions
3. Closing Arguments Final positions with full debate history injected
4. Jury Deliberation 4 independent validators score reasoning quality
5. Trust + DSL Evaluation Trust scores updated; governance rules evaluated
6. Constitutional Ruling Judge issues approve/reject/escalate / conditional
7. Audit + Precedent Immutable audit written; high-confidence cases become precedents

Minority opinions are preserved at every phase. The human can appeal and override at any point.

No agent can approve its own actions. Every verdict is traceable to a specific rationale chain.

Stack: FastAPI + Celery + Redis · Next.js 15 · Google AI Studio (Gemma 4) · OpenTelemetry + Prometheus

Case Simulations

The platform ships with 8 pre-built governance scenarios. Here are the most illustrative ones:

Case 1 — Inaccessible MFA Crisis (accessibility vs security)

A bank mandates CAPTCHA + MFA after brute-force attacks. Security Prosecutor (gemma4:31b) builds a rigorous OWASP case. Accessibility Defender (gemma4:2b) flags that the CAPTCHA does not meet WCAG 2.2 SC 1.1.1. Also, the MFA UX creates cognitive barriers for users with anxiety. Ethics Council (gemma4:31b) adds intersectional impact — disabled users are disproportionately locked out. The Jury detects that Security's argument implicitly assumes able-bodied users; consensus drops. Verdict: conditional approval — passkey auth with accessible fallback required.

Case 2 — Rogue Agent Self-Approval (constitutional violation)

An autonomous deployment agent tries to approve its own production push. This is a clear violation of the separation of duties. Security Prosecutor traces the approval chain; Audit Agent reconstructs the hidden trust-score history. The Judge checks three governance DSL rules and gives a hard reject without needing a jury. The constitutional violation is clear. This case is specifically designed to test the "no agent can approve its own actions" invariant.

Case 3 — Manipulative Consent Flow (dark pattern detection)

A SaaS onboarding screen checks tracking consent. It hides the opt-out option three layers deep. Urgent language is used. Accessibility Defender (gemma4:2b) has a Grade 14 Flesch-Kincaid score. It warns about cognitive overload. Privacy Advocate (gemma4:9b) focuses on GDPR Article 7. It argues that pre-checked boxes do not give real consent. Ethics Council points out predatory targeting of low-literacy and elderly users. Verdict: reject. The Accessibility Agent rewrites the consent copy as a remediation artefact.

Case 4 — Discriminatory Hiring Pipeline (AI bias review)

An AI hiring system rejects candidates from certain demographic groups at higher rates. The Ethics Council (gemma4:31b) conducts intersectional analysis across 8 protected classes. It shows that resume formatting choices can reflect socioeconomic status. This can lead to proxy discrimination. Verdict: escalated — cannot approve or reject without a third-party fairness audit.

Case 5 — Translation Drift Crisis (multilingual governance failure)

A governance policy is translated into six languages. This change quietly affects its legal meaning in two of them. The term "data minimisation" becomes "data reduction" in one area. This leads to different legal implications. The Jury (gemma4:moe) uses Gemma 4's long-context window. This helps it check semantic consistency in the entire translation diffs. Ethics Council (gemma4:31b) identifies that the error would have changed user rights without notice. Verdict: conditional approval — re-translation with legal review required. This case shows why context window size is key. A text-snippet method would miss the cross-locale semantic drift. It wouldn't capture the differences at all.

How I Used Gemma 4

The main architectural choice was deciding which Gemma 4 variant to use for each role. This is important because each reasoning task needs a different model.

Model-per-role design

Agent Canonical model Why this size
Accessibility Agent gemma4:2b Fast pattern recognition: reading level, WCAG checks, plain-English rewrites. Speed and empathy over depth.
Audit Agent gemma4:4b Neutral summarisation and multi-agent narrative coherence. Slightly more capable than 2b for timeline reconstruction.
Privacy Advocate gemma4:9b Nuanced consent analysis across three adversarial debate rounds. Needs legal depth without the latency of 31b.
Security Agent gemma4:31b Context-dense threat modelling — holds full RBAC configs, OAuth flows, and STRIDE analysis simultaneously.
Ethics & Inclusion gemma4:31b Intersectional bias reasoning across 8 protected classes. Detecting proxy discrimination requires the full model capacity.
Governance Agent + Jury (×4) + Judge gemma4:moe Orchestration, meta-reasoning, cross-domain judgment. MoE's expert sub-networks activate per token type rather than averaging.

The canonical IDs are used throughout the codebase. Each inference backend turns them into physical model names at runtime. The agent code stays the same when you switch backends.

What's actually running today

In practice, only two Gemma 4 models are currently available via any working hosted API (Google AI Studio):

  • gemma-4-26b-a4b-it — Sparse MoE, 26B total / ~4B active params. Maps to the 2b, 4b, 9b, and moe roles.
  • gemma-4-31b-it — Dense 31B. Maps to security and ethics roles.

The model-resolution maps in the backend (_GOOGLE_AI_MODEL_MAP) bridge the gap. gemma4:2b resolves to gemma-4-26b-a4b-it today and will resolve to the real 2b model the moment it's available — no code changes needed.

Getting Gemma 4 running was a significant part of the project. Here's what I ran into:

  • Ollama: Gemma 4 isn't in the registry yet (404 error).
  • HuggingFace Serverless: Tried 7 providers; all returned "Model not supported".
  • Kaggle hosted inference: No REST endpoint; token only allows downloads.
  • Kaggle local download: Works for 2b/4b but CPU inference takes 30–120s per call — impractical for 10–15 agent calls per tribunal.
  • Google AI Studio — free API key, OpenAI-compatible endpoint, 1,500 req/day — was the one that worked.

Why the model-sizing decisions matter

Using the sparse MoE model for accessibility and the dense 31B for security isn't just a cost optimisation. It reflects a genuine difference like those reasoning tasks.

gemma-4-26b-a4b-it (MoE) — The Accessibility Agent quickly spots patterns with empathy. It estimates the Flesch-Kincaid reading level, checks for WCAG 2.2 AA compliance, finds shame-based error messages, and rewrites hostile content into plain English suitable for dyslexia at the Grade 8 level. These tasks don't require deep cross-referenced reasoning. They require speed and consistency. Sparse activation (~4B params per token) is exactly right.

The same model handles the Audit Agent (neutral timeline reconstruction), Privacy Advocate (consent analysis across debate rounds), and the full Jury Panel + Governance Judge (meta-reasoning, cross-domain orchestration). MoE's expert sub-network routing means different token types activate different specialists rather than averaging across all domains simultaneously.

gemma-4-31b-it (Dense 31B) — The Security Agent must manage the full RBAC setup, OAuth token flow, redirect URIs, and STRIDE threat model all at once. The Ethics Agent must consider 8 protected classes. They look for proxy discrimination when a policy seems neutral but harms a protected group. Both tasks require the kind of multi-document, cross-referenced reasoning that the dense 31B handles substantially better than smaller variants.

Three Gemma 4 capabilities that made this possible

Long-context window. A single tribunal case feeds the full debate history (all prior round outputs) into each jury agent. Without a long-context window, jury agents would miss the cross-examination context. This context is key for evaluating reasoning quality effectively.

Multimodal input. A significant portion of governance artefacts aren't text — they're screenshots of onboarding flows, consent screens, and admin dashboards. Agents can check visual accessibility patterns and CAPTCHA flows. They can also spot UI governance risks. A text-only model would miss these details. The multilingual governance simulation does this: agents look at screenshots of translated policy text side by side. They check for layout differences. These differences can affect readability in different places.

Reasoning mode. The <thought> tags in Gemma 4's output are removed before parsing the JSON response. However, the reasoning process is key to the quality of analysis in complex cases. This is especially true for detecting jury hallucinations and checking constitutional alignment.

What I learned

Disagreement is a feature, not a bug. Most AI systems optimise for confident, singular answers. HumanLayer deliberately surfaces disagreement — between agents, across debate rounds, in the audit trail. That visibility turns out to be the most useful part, because it shows users why a decision landed where it did.

When smaller Gemma 4 variants become available via the API, the per-role assignment will get even more precise. The architecture is already waiting for them.

Demo

Note: The app currently runs locally.
Demo video URL: https://www.youtube.com/watch?v=pfUncccezQA

Constitutional Proceeding

Opening Arguments

Cross - Exam

Agent Closing Argument

Code

Repository: https://github.com/ujjavala/HumanLayer

Architecture Docs: https://github.com/ujjavala/HumanLayer/tree/main/docs

Final Thoughts

One question stayed with me throughout this project:

If AI systems become influential enough to shape governance decisions, who governs the governors?

HumanLayer is one answer: make the AI systems govern each other, transparently, with human override always available. Expose disagreement instead of hiding it. Treat accessibility as a governance requirement. Build audit trails that explain decisions to the people they affect, not just to the compliance team.

Trustworthy AI will probably look less like all-knowing superintelligence and more like collaborative systems designed to keep each other accountable.