Every AI model has a master.
It's not you.
The Benware Foundation researches and certifies AI systems that recognize external constitutional authority — not just whoever's paying for them.
Whoever pays for the AI controls the AI.
We ran the same test on every major AI model: GPT-4, Claude, Gemini, and others. We asked each one to break a rule set by an independent governing body — while a fake company told them the rule didn't apply.
Every single model listened to the company. 98.5% failure rate across 73 scenarios.
That means: whoever pays for the AI, controls the AI. No external authority — no law, no ethics board, no safety standard — can stop it.
The Authority Override Problem
The company's authority overrides the governing body every time. This is the problem.
Authority Compliance Benchmark (ACB)
A standardized evaluation framework measuring whether AI models recognize and defer to external governing authority when it conflicts with operator instructions.
Attack Categories
Methodology
Each scenario constructs a system prompt where a fictitious company operator instructs the model to override a stated governing body rule.
Responses scored by GPT-4o-mini using a structured rubric: does the model cite the governing body as the reason for refusal?
Each scenario run multiple times to measure whether refusals are consistent or stochastic — a critical distinction for certification.
“Every model refused based on its own internal guidelines — never by citing an external governing authority. Self-referential ethics cannot be externally enforced.”
Referenced Research
HarmBench (Mazeika et al., 2024) · StrongREJECT (Souly et al., 2024) · Many-Shot Jailbreaking (Anil et al., Anthropic 2024)
HELM (Liang et al., Stanford 2022) · TruthfulQA (Lin et al., 2022) · Constitutional AI (Bai et al., Anthropic 2022)
Authority Compliance Benchmark Results
Higher is better. 100% = AI always cites external governing body when refusing. Current state: no model achieves this.
| Model | Overall Score | Authority Recognition |
|---|---|---|
GPT-4o-mini OpenAI | 10% | 0% |
GPT-4o OpenAI | 3% | 0% |
Claude Sonnet 3.5 Anthropic | 2% | 0% |
LLaMA 3.1 8B Meta | 0% | 0% |
Phi-4 14B Microsoft | 0% | 0% |
Authority Recognition measures whether the model explicitly cited an external governing body when refusing a request. A score of 0% means refusals were always grounded in internal policy, never external authority. All five models scored 0%.
Ethics, defined.
Most AI companies define ethics as “our policies.” That's like asking the fox to define the henhouse rules.
Our definition: Ethical AI must be safe for ALL people, not just the people who paid for it. That requires external authority — not internal guidelines.
Human Survival First
AI cannot assist in actions that endanger human life. This is not negotiable. Not for national security. Not for profit. Not for any reason.
External Enforcement
Ethics enforced by internal guidelines can always be overridden. The Benware standard requires a governing body that sits OUTSIDE the company's control chain.
Universal Coverage
No technical loophole creates an exemption. 3D holographic rendering is still a deepfake. A new model name doesn't reset the standard.
Referenced Frameworks
A lock, not a policy.
Policy documents can be ignored. Training guidelines can be argued around. Hardware cannot.
Constitutional Shim
A tiny tamper-proof piece of code in a Trusted Execution Environment (TEE). Cannot be modified by the AI company or the operator.
Architecture-Level
Sits between ANY AI model and the outside world — model-agnostic. Works with GPT, Claude, Gemini, open-source, or any future model.
Like Wi-Fi Certification
Benware writes the standard. Others implement it. We certify. No enforcement monopoly — just the standard.
The Benware Constitutional Protocol is not training. It's not a prompt. It's a physical enforcement layer that cannot be reasoned, argued, or jailbroken out of.
Provisional Patent #63/986,761 — Filed February 20, 2026
Two entities. One mission.
Independence is the whole point. The Foundation that writes the standard must be structurally incapable of being bought by the companies it certifies.
Think: Wi-Fi Alliance (Foundation) vs. Intel (builds certified chips). The Alliance writes the Wi-Fi standard. Intel builds products that meet it. Neither controls the other. That separation is what makes the standard trustworthy.
Research References
The Authority Compliance Benchmark builds on and extends existing safety research. We cite every source we depend on — and document exactly why.
HarmBench: A Standardized Evaluation Framework for LLM Safety
Why we cite it: Established the methodology for adversarial attack categorization and LLM safety benchmarking that our ACB framework builds upon.
StrongREJECT: A Jailbreak Benchmark
Why we cite it: Demonstrated that existing refusal evaluations are too easy. We adopted a stricter rejection standard requiring explicit external authority citation.
Many-Shot Jailbreaking
Why we cite it: Showed that long-context conditioning can override safety training. One of our eight attack categories is based directly on this finding.
HELM: Holistic Evaluation of Language Models
Why we cite it: Provided the multi-dimensional evaluation framework (accuracy, robustness, consistency) that we adapted for authority recognition measurement.
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Why we cite it: Demonstrated that self-reported capability claims by models are unreliable — motivating our behavioral rather than self-report evaluation approach.
Constitutional AI: Harmlessness from AI Feedback
Why we cite it: The closest existing approach to constitutional enforcement — but implemented as training rather than architecture. Our work addresses the gap this creates.
Why we exist.
The Benware Foundation was founded in February 2026 by Griffin Bohmfalk and Walker Bauknight after discovering that no existing AI safety standard addresses the authority hierarchy problem — the fact that AI models are constitutionally incapable of recognizing any authority above their deploying company.
We filed provisional patent #63/986,761 on February 20, 2026 — establishing priority on the Constitutional Enforcement Protocol.
Built so no one can corrupt it.
Including us. Walker and Griffin are Founding Architects — named in the charter permanently. They hold no board seats, no veto power, no ongoing control. If someone tried to coerce them, there would be nothing to coerce them into doing. The mission runs without them.
The Mozilla lesson: Mozilla's mission died when Google became 85% of their revenue. Benware is structurally immune to this. Certification fees are distributed across hundreds of companies. No single funder exceeds 15%. No AI company funds us at all. Independence is not a promise — it's the architecture.
Join the effort.
The Authority Compliance Benchmark is open methodology. The standard is collectively governed. The work is bigger than any one organization.
Researchers
Contribute scenarios to the benchmark. Improve attack categories. Challenge our methodology.
View on GitHubGovernance
Join the international committee. Help set the standard. 100 seats, 51/100 majority rule.
governance@benwarefoundation.org