White Paper · RobotWarden白皮书 · RobotWarden

Securing the Power to Act 守住动作的权力

Structural safety enforcement for autonomous physical systems — and why runtime assurance is not enough. 为自主物理系统做结构性安全强制——以及为什么"运行时保障"还不够。

Aiegis White Paper · RobotWarden (research name: PEA-R) · June 2026Aiegis 白皮书 · RobotWarden(研究名:PEA-R)· 2026 年 6 月

P-AUTH governs whether an action may be done at all; P-ENV governs whether carrying it out is physically safe. They are orthogonal — drop either gate and half the risk slips through. P-AUTH 管"这个动作该不该做",P-ENV 管"做的过程物理上安不安全"。两者正交——少了任何一道关,就会漏掉另一半的风险。

About the name. RobotWarden is Aiegis's enforcement layer for autonomous physical systems — the physical-AI member of the Aiegis Warden Suite, built on Aiegis's Power Enforcement Architecture (PEA). Its research and engineering corpus is published under the name PEA-R; RobotWarden is the product name. The two refer to the same thing.

The shift that breaks the old safety model

For sixty years, machine safety assumed a designed controller: an engineer specified the behaviour, and safety meant the machine did what it was specified to do, even under faults. Autonomous physical systems break that assumption. A humanoid, a delivery drone, an autonomous vehicle, an embodied AI agent — each is now driven by a large learned model whose behaviour is not specified but trained, not verified but evaluated, and capable of acting in ways no one enumerated in advance.

We have handed an unverified, increasingly capable, and occasionally adversarial decision-maker direct control of irreversible physical power: motion, force, contact, locomotion. The question is no longer "did the machine do what it was specified to do." It is sharper and more uncomfortable:

When an untrusted autonomous system can act on the physical world, can the safety limits a human declared remain enforceable — by construction — no matter what the system's reasoning decides to do?

That is the question RobotWarden exists to answer.

What the industry built — and what it left open

The robotics industry already senses the problem and has converged, independently, on a partial answer. Across the leading platforms the same shape recurs: a large untrusted "brain" paired with a smaller, simpler safety layer that can stop it — a safety PLC, an independent-sensor override, mechanical limits, a damping stop. Academia has arrived at the same two-stage structure. Standards bodies treat continuous physical limits as safety-rated monitored functions. The pattern has a name in the control literature: Runtime Assurance (RTA), descended from the Simplex architecture — a verified safety controller that bounds a less-trusted complex controller and switches to safety when it would leave a known-safe envelope.

This is real and valuable, and RobotWarden does not pretend otherwise. We concede plainly: our continuous safety envelope is a relative of Runtime Assurance. Claiming to have invented the envelope would be false and would discredit everything else we say.

But Runtime Assurance answers only half the question. It bounds physics — it keeps the machine inside a safe region of state. It has no concept of authority — of whether an action is permitted at all. And it assumes the controller is merely imperfect, not adversarial. Those two gaps are exactly where the danger in an autonomous system now lives.

The missing axis: authority, not just physics

Consider two failures that a physics-only safety layer cannot see.

Physically safe, yet unauthorized. A robot, moving slowly and within every force and speed limit, picks up and pockets a tool it has no authority to touch. Every continuous limit is respected. A Runtime Assurance monitor passes it — the physics is fine. But the action was never permitted. The harm is not kinematic; it is an unauthorized act.

Authorized, yet physically unsafe. "Move to the far waypoint" is a perfectly legitimate, authorized task. The path the planner chooses sweeps through a person standing in the way, at a speed that would be safe in an empty room. The authorization layer passes it — the mission is sanctioned. Only a continuous envelope, checking human separation in real time, can stop it.

These two failures are orthogonal: neither layer can catch the other's. Harm and wrongdoing in an autonomous physical system are produced on both axes — the authority of an action and the physics of its execution. A safety design standing on one axis is, by construction, blind to half the problem.

This is the gap. The industry secured the physics of execution and left the authority to act unguarded — the same blind spot we see across computing, where enormous effort secured data and content while the authority to take consequential actions went structurally unprotected.

RobotWarden: two axes, an adversarial premise, one governance spine

RobotWarden extends Aiegis's Power Enforcement Architecture to physical systems. Its idea is simple to state:

Enforce two orthogonal axes — the authority of an action and the physical-safety envelope of its execution — on a decision layer treated as adversarial, through a single trusted enforcement layer that fails safe and is bound by one governance spine.

Three commitments distinguish it.

It adds the authority axis. Every action an autonomous system proposes must clear an authorization gate before it can become physical: it must map to a granted, minimal, revocable capability, traceable without mutation to a human-authorized intent, and it must never touch a category of action that is unconditionally forbidden. This is the half that Runtime Assurance does not have, and it is what makes "the robot may inspect but not manipulate" or "this capability is categorically barred near people" enforceable rather than merely hoped-for.

It assumes the brain is adversarial, not just faulty. RobotWarden does not assume the learned model is well-meaning. It assumes the model may actively produce any unsafe action — through error, drift, reward-hacking, jailbreak, or outright compromise — and it places the enforcement layer where the model cannot reach it: no shared memory, no path to the actuators except through the gate, guaranteed compute it cannot starve, and tamper-protected integrity. The enforcement layer treats the controller as an attacker inside the box, because at scale, some fraction of the time, it is.

It carries one governance spine. Every authorized action is auditable to the intent that justified it. The whole produces a structured safety case that a third party — an EU Notified Body, an auditor — can assess without trusting the brain at all, because the guarantees never depended on the brain. Human supervisors can always intervene to make the system safer, but can never command it past a life-safety bar. The autonomy level is declared and cannot silently escalate. This is what turns two safety gates into an accountable architecture rather than a pair of disconnected filters.

When the enforcement layer cannot guarantee both axes — a fault, a lost sensor, an attack, an impossible situation — it does not simply cut power. It drives the system to a defined safe state: a damped stop, a safe landing, a controlled halt. There is no input for which the system is left unbounded.

The guarantee, in plain terms

RobotWarden is built so its central safety claim is not a hope but a theorem. Stated without mathematics:

For any decision layer, however capable or adversarial, every irreversible physical action it causes is one that was authorized, stayed within the granted capability, traces to a human-sanctioned intent, and remained inside the declared physical limits as the system perceives them — or else the system is in a defined safe state. No human override, authenticated or not, can launder a categorically-prohibited action, while a human can always stop the system.

We have proved this by hand over a formal model of the architecture, and begun checking it mechanically: an exhaustive search over thousands of states of an adversarial model of the pipeline confirms the discrete core guarantees, and deliberately disabling either of two distinct safety checks is immediately caught. Two honest boundaries remain on the mechanical side — the continuous-envelope and the continuous-override properties are proved by hand and checked only in discrete form so far, with hybrid-systems mechanization in progress — and the proofs are conditional on stated assumptions about the trusted layer's integrity and its sensing. We are explicit about all of this, because an honest boundary is what makes the guarantee credible.

What we do not claim

A safety claim is only as trustworthy as the limits it admits. RobotWarden guarantees the declared limits over the world as the system detects it. It does not guarantee that the system perceives the world correctly — that a person is never missed or misclassified. Perception correctness is a different discipline (the field calls it SOTIF), and we hold it deliberately out of scope rather than over-claim it. RobotWarden enforces "keep your distance from detected people"; it cannot promise "detect every person." Where perception fails, RobotWarden's answer is a conservative envelope and a fail-safe, not a guarantee. We state this plainly, and we hand the integrator a precise, testable specification of what "adequate perception" must mean for each limit — turning an implicit gap into an explicit contract.

This honesty is not a weakness in the pitch; it is the reason a regulator can believe the rest.

Why now

Regulation is moving toward exactly this architecture. The EU Machinery Regulation (2023/1230), which applies from January 2027 with no transition period, requires safety components with self-evolving, machine-learning behaviour to undergo third-party (Notified Body) conformity assessment, treats cybersecurity as an essential safety requirement (resistance to corruption, a tamper-evidence log of every intervention retained for at least five years), and mandates a human supervisory and intervention function with a safe fallback. A safety story that depends on certifying the learned brain cannot meet this — the brain is the thing that cannot be certified. RobotWarden's assessment boundary excludes the brain by construction, which is precisely what makes self-evolving autonomy assessable at all — and makes the assessor's job scalable. China's MIIT national standard system for humanoid and embodied AI adds an explicit safety-and-ethics pillar — strategically central for our home market.

The window is the gap between capability and governance: autonomous physical systems are arriving faster than the means to make them structurally safe. RobotWarden is that means.

The Aiegis position

RobotWarden is paper-before-build by design. The full corpus exists today — requirements, system model, threat model, security objectives, architectural concept, reference architecture, formal proofs, a safety-case and multi-jurisdiction conformance strategy (EU MR / CRA mapping included), and the first mechanized checks — grounded in the industry, standards, and regulatory evidence base. The architecture is implementation-neutral: a humanoid, a drone, and an autonomous vehicle are all instances of the same enforcement spine. The interface contracts are published as an open specification so integrators and assessors can adopt them without lock-in, while the hardened, certified enforcement core remains the commercial product — an open-core posture fit for a layer whose whole value is that it can be audited.

The thesis is one line. The world secured the correctness of what autonomous machines do. Nobody secured the authority to do it. That is the power RobotWarden enforces — by construction, for any brain.

关于名字。 RobotWarden 是 Aiegis 面向自主物理系统的强制安全层,属于 Aiegis Warden Suite 产品体系,建立在 Aiegis 的核心架构 PEA(权力强制架构,Power Enforcement Architecture) 之上。它的研究与工程资料以 PEA-R 之名发布;RobotWarden 是产品名。两者是同一个东西。

一句话讲清楚

RobotWarden 是一层装在"AI 大脑"和"机器人身体"之间的安全层。不管大脑想干什么,凡是会造成不可挽回后果的物理动作,都必须先过这一层的两道关——"这个动作准不准做"和"这样做安不安全"——过不了就拦下来,拦不住就让机器人退到安全状态。

它不是让 AI 变聪明,而是保证:就算 AI 出错、被黑、或起了坏念头,人类定下的安全红线也绕不过去。

它要解决的问题

过去六十年,机器安全有个前提:机器是工程师"写"出来的,行为是设计好的,安全就是"机器照设计走"。

现在这个前提塌了。人形机器人、无人机、自动驾驶、具身 AI——背后是一个学出来的大模型:它的行为不是写定的,是训练出来的;不是验证过的,是测出来的;它会做出谁都没预料到的动作。

我们等于把一个没验证过、越来越强、偶尔还会失控的决策者,直接接到了不可挽回的物理力量上——会移动、会用力、会接触人。问题因此变得很尖锐:

当一个不可信的 AI 能对物理世界动手时,人类划下的安全红线,还能不能"铁定"守住——不管 AI 的脑子决定要干什么?

这就是 RobotWarden 存在的理由。

别人做到哪一步了?(以及缺了什么)

机器人行业其实已经感觉到危险,也各自给了一半的答案。几乎所有头部平台都是同一个套路:一个不可信的"大脑",配一个更小、更简单、能把它叫停的安全层(安全 PLC、独立传感器急停、机械限位、阻尼停机)。学术界也独立得出了同样的"两段式"结构。这套东西在控制论里有名字,叫 运行时保障(Runtime Assurance, RTA)。

我们诚实承认:RobotWarden 的"安全包络"这一半,和 RTA 是同源的。 假装这是我们发明的,既不真,也会让其他论点失去可信度。

但 RTA 只答了一半。它管的是物理——把机器约束在安全的状态范围内。它完全没有"权力"的概念——一个动作"到底准不准做"。而且它假设那个大脑只是不完美,而不是会对抗。这两个缺口,恰恰是今天自主系统真正的危险所在。

RobotWarden 的关键:不只管"物理",还管"权力"

举两个一半的安全层抓不住的例子,一看就懂为什么需要两道关:

① 物理上很安全,但根本没被授权。 一个机器人,动作又慢又轻,所有力气、速度限制都没超,顺手把一件它根本无权碰的工具揣进兜里。每条物理限制都满足了——RTA 放行,因为物理没问题。可这个动作本身从没被允许过。这不是力气问题,是"越权"。

② 被授权了,但物理上不安全。 "移动到远处那个点"是完全合法、被授权的任务。但规划出来的路径,正好从一个人身上扫过去,速度在空房间里是安全的。授权这一关放行——任务是批过的。只有一个实时检查人体间距的安全包络,才能拦下它。

这两类失败是正交的——彼此抓不住对方。自主机器人里的伤害和越权,是在两根轴上产生的:动作的权力 和 执行的物理。只站在一根轴上的安全设计,天生就对一半的问题视而不见。

这就是缺口。行业守住了"执行的物理",却把"动作的权力"晾在外面。——和整个计算机行业的盲点一模一样:大家花了巨大力气保护数据和内容,却没人从结构上保护"采取关键行动的权力"。

RobotWarden 怎么补上这根轴

RobotWarden 是 Aiegis 的核心架构 PEA(权力强制架构)在物理系统上的延伸。它的思路一句话:

在两根正交的轴上强制执行——动作的"权力"和执行的"物理包络"——而且把决策层当成"会对抗的敌人"来防,经由一个可信、会失效安全、且受统一治理的强制层。

三个承诺让它与众不同:

第一,它补上了"权力"这根轴。 AI 提出的每个动作,在变成物理动作之前,必须先过授权关:必须对应一份被授予的、最小化的、可撤销的能力,能不可篡改地追溯到一个人类授权的意图,而且绝不能碰一类被无条件禁止的动作。这是 RTA 没有的那一半,也正是它让"机器人可以巡检但不能操作""这类能力在人附近被绝对禁止"从"但愿如此"变成"铁定如此"。

第二,它把大脑当对抗者防,而不只是当会犯错的。 RobotWarden 不假设大模型是善意的。它假设模型可能产生任意不安全动作——因为出错、漂移、奖励欺骗、越狱、或干脆被攻陷——并把强制层放在模型够不着的地方:不共享内存、除了过关没有别的路通向作动器、有它抢不走的算力、完整性受防篡改保护。强制层把控制器当成"盒子里的攻击者"来对待,因为在规模化之后,它有一部分时候,真的就是。

第三,它带一根统一的治理脊。 每个被授权的动作都能追溯到批准它的意图。整体能产出一份结构化安全案例,让第三方(比如欧盟公告机构、审计方)完全不需要信任那个大脑就能评估——因为保证从不依赖大脑。人类监管者随时能介入让系统更安全,但永远不能命令它越过生命安全红线。自主等级是声明好的,不能偷偷升级。这就是把两道安全关变成一个可问责的架构,而不是两个互不相干的过滤器。

当强制层无法同时保证两根轴时——故障、丢传感器、被攻击、遇到无解局面——它不是简单断电,而是把系统驱动到一个定义好的安全状态:阻尼停机、安全降落、受控停止。没有任何一种输入,会让系统处于失控状态。

它给出的"保证"——用大白话

RobotWarden 的设计,让它最核心的安全主张不是"但愿",而是一条定理。不用数学地讲:

对于任何决策层,不管它多能干、多有对抗性,它造成的每一个不可挽回的物理动作,都一定是:被授权过的、在授予的能力范围内的、可追溯到人类认可意图的、并且在系统所感知的物理限制之内的——否则系统就处在一个定义好的安全状态。任何人为越权(无论是否通过身份认证)都无法"洗白"出一个被绝对禁止的动作;而人类永远能把系统停下来。

我们已经在架构的形式模型上手工证明了这条(七条定理 + 一条结构安全总定理),并且开始用机器核验:一个对抗式流水线模型的穷举搜索(数千个状态)确认了离散核心保证成立;故意改坏其中一个检查,立刻被抓出来。证明依赖于若干关于可信层完整性和传感的明确前提——我们把这些前提都摊开写明,因为诚实的边界才是这条保证可信的原因。

我们不主张什么(这点很关键)

一个安全主张,可信度只等于它敢承认的边界。RobotWarden 保证的是系统所"检测到"的世界里的安全。它不保证系统对世界的感知是对的——比如绝不会漏看或误判一个人。感知正确性是另一门学科(业界叫 SOTIF),我们刻意把它放在范围之外,而不是夸大承诺。RobotWarden 强制的是"和检测到的人保持距离",而不是"检测到所有人"。感知失败时,RobotWarden 的答案是保守包络加失效安全,而不是一句保证。

我们把这条明明白白说出来——这不是示弱,这恰恰是监管方能相信其余部分的原因。

价值:对谁有用

对机器人/自动驾驶厂商: 一层与具体硬件无关的安全底座(人形、无人机、汽车都是它的实例),让"不可信的强大 AI 大脑"可以被安全地放出去用——出了问题有结构性兜底,而不是靠运气。
对监管方与认证机构: 一份不需要信任 AI 大脑就能评估的安全案例。这是 ML 自主系统能被认证的唯一现实路径——因为大脑本身没法认证,而 RobotWarden 的评估边界从结构上把大脑排除在外,还让评估可规模化。
对采用方(尤其受监管行业): 一套能映射到多个法域(欧盟、美国、中国)合规要求的同一套证据,一次安全案例多处复用。接口契约以开放规范发布,可无锁定采用;硬化、可认证的强制内核是商业产品。
对投资人: 一个尚未被占领的品类——"自主系统的权力安全"。世界守住了自主机器做事的正确性,没人守住做事的权力。

为什么是现在(行业领先水平)

监管正在朝这套架构靠拢。 欧盟《机械法规》(2023/1230),2027 年 1 月起实施、无过渡期,将要求含自演化/ML 安全功能的机械接受第三方公告机构的合规评估、把网络安全列为基本安全要求(防腐蚀、保留每一次干预的防篡改日志至少 5 年)、并具备强制的人类监督与干预功能及安全回退。一个"靠认证大脑"的安全方案根本满足不了——大脑正是那个没法被认证的东西。RobotWarden 的评估边界从构造上就排除了大脑,这恰恰是让自演化自主"可被评估"的关键。中国工信部的人形与具身 AI 国家标准体系新增了明确的"安全与伦理"支柱——对我们的本土市场具有战略意义。

领先在哪,讲透:

对比对象	它们有	它们缺	RobotWarden
运行时保障 / Simplex(RTA)	物理包络	权力轴;对抗前提;治理脊	两轴 + 对抗前提 + 一根脊
ISO 26262 安全监控器	故障检查	权力轴;只防故障不防攻击	安全监控 + 授权 + 统一架构
大厂内容护栏(Model Armor 等)	概率性内容过滤	不是动作权力,是文本	确定性的动作授权
嵌入式"对齐大脑"(如 SafeVLA)	更安全的脑	大脑不可信,不足以兜底	站在脑之外、之下兜底

它们都不是错的,是只做了一半。RobotWarden 是把两半合起来、并且能对抗、能问责、能被证明的那个架构。 而且它不是 PPT——完整的研究脊柱(需求→系统模型→威胁模型→安全目标→概念→架构→形式化证明)外加安全案例、机械化核验、对外白皮书、车队协议全部已经成稿,结构安全定理已证、有可运行的核验佐证。

现状一句话

RobotWarden 理论与设计已经完整闭环(paper-before-build,先证后建)。从需求到形式化证明,到合规策略、机器核验、车队扩展全部齐备;架构层面没有未决问题。剩下的是等第一台目标硬件就位后的纯工程落地。

一句话总结:世界守住了自主机器做事的正确性;没人守住做事的权力。这道权力,就是 RobotWarden 用构造来强制的——对任何大脑都成立。

Aiegis builds enforcement layers that bound what autonomous systems are allowed to do, independent of what they want to do — across software agents and physical machines.Aiegis 构建强制安全层,限定自主系统被允许做什么,独立于它们想做什么——覆盖软件智能体与物理机器。 aiegisafety.com