auditable¶

Audit any agent decision across its past, present, and future, on one typed graph.

In plain terms: your AI agent makes a decision based on some state (a budget, a price, an allow-list), but that state can go stale by the time the agent acts. auditable records the decision, checks it again against the state that is live now, and undoes the action when the decision no longer holds. The same check also reviews a plan before you ship it and ranks a finished run so you know which step to look at first.

Here is the shortest taste. An agent pays a vendor against a budget that read $10,000. The budget later drops to $3,000, so the recorded decision no longer holds, and the payment is reversed:

from auditable import Action, ActionGate, DependencySnapshot, ReferenceLedger, audit, replay

def policy(state, action):
    ok = action.cost <= state["budget"]
    return ok, "within budget" if ok else "over budget"

ledger = ReferenceLedger(balance=10_000)
gate = ActionGate(ledger)
payment = Action("payment", {"to": "acme"}, cost=4_200)

# The agent pays $4,200 against a budget snapshot that read $10,000.
with audit("payment", snapshot=DependencySnapshot(state={"budget": 10_000})) as decision:
    decision.act(payment)
receipt = gate.commit(payment)                       # paid; balance is now 5,800

# The live budget is now $3,000. Replay re-decides; the gate reverses the payment.
verdict = replay(decision.record, live_state={"budget": 3_000}, policy=policy)
gate.enforce_post_commit(verdict, receipt=receipt)
print(verdict.action.value, "->", ledger.balance)   # rollback -> 10000

That is the LIVE pillar, the sharpest of three. The same library also lints a plan before deploy (PRE) and ranks a finished run (POST). The Quickstart has the smallest snippet for each, and Lifecycle is the map across all three.

How It Works: One Graph Across the Lifecycle¶

The rest of this page is the conceptual model behind the snippet. Skip to the Quickstart if you just want to run code.

auditable audits AI agent decisions at three points in an agent's life. The same typed two-layer decision graph is scored and reported before deploy, while the agent runs, and after a run finishes. Detection and report generation run on one graph kernel, so the same construction serves every pillar.

One Graph Consolidates Everything¶

The differentiator is a single typed decision graph that three orthogonal views project onto. You do not assemble three disconnected tools; one structure carries the analysis.

Lifecycle (when a check fires): PRE before deploy, LIVE while running, POST after a run. One graph, three attach points.
Signal (what each decision binds): data, model, and harness, the three orthogonal spans bound per decision.
Coverage (which standard a finding maps to): OWASP and CWE threats expressed as graph-structural predicates rather than a flat checklist. See PRE Coverage.

One typed two-layer decision graph at the center, read by three lifecycle attach points: PRE lints a declared plan before deploy, LIVE replays and recovers while the agent runs, and POST ranks a finished run

Because every view lands on one graph, detection and report run on one structure instead of three. That is the honest answer to "is this enough": not more rules, one consolidated substrate. Threats with no structural signature, such as prompt injection or content poisoning, route to LIVE or stay out of scope; the Coverage page states that boundary.

The Lifecycle¶

Pillar	When it fires	Focus	Public entry
PRE	Design time, before any step runs	Lint a declared plan, name the control-flow chokepoint, withhold dependency-state risk	`analyze_plan` (from `auditable.graph.pre`)
LIVE	While the agent runs	Capture a decision, re-decide under live state, route a fix through a rail	`replay` plus `ActionGate`
POST	After a run completes	Rank a finished run by structural blast share, name the keystone	`analyze_run`

See Lifecycle for each pillar in detail, and examples/example_end_to_end.py for one payment carried through all three.

What Makes It New¶

A unified graph representation for agentic AI. Every agent run becomes one typed graph that PRE, LIVE, and POST all read, one representation from plan to live operation to review. The two-layer model is introduced in GRADE (arXiv:2606.22741).
Recover, do not just observe. auditable captures the dependency state a decision relied on, replays it under the state that is live now, and reverses the committed action through a compensating rail when it no longer holds. Logging tells you what broke; auditable undoes it.
One decision, three spans, judged together. Data, model, and harness are bound in a single signed, hash-chained record, so a decision is audited as one unit, not three disconnected logs.

Span	What the record binds	Signal in v0.1
Data	What the agent read and the dependency snapshot it relied on	Snapshot freshness
Model	Which model produced the output, and its stated basis	Decision-basis trust flag
Harness	The action executed and its cost	A static cost-cap rule, plus the replay verdict

Install¶

pip install auditable            # core: capture, replay, recovery
pip install "auditable[graph]"   # adds the graph analyses (PRE lints, POST analyze_run)

The graph extra pulls in NetworkX, which the PRE and POST graph entries need.

Where to Start¶

Quickstart: the smallest runnable snippet for each pillar, plus the Markdown report renderer.
Lifecycle: the map across PRE, LIVE, and POST.
PRE Coverage: how the lints map to OWASP and CWE.
Audit Report: the Markdown and PDF report a run produces.
API Reference: the full public surface.

The fastest way to see the whole lifecycle on one dataset is examples/example_end_to_end.py: one vendor payment walked through PRE, LIVE, and POST with a single python examples/example_end_to_end.py.