Skip to content

API Reference

The stable public surface is the flat top-level API. The graph analyses (PRE and the graph kernel) live under auditable.graph.*.

Top-Level API

The capture, replay, and recovery flow, the standalone auditors, the POST offline-analysis entry, and render_report (the dependency-free Markdown renderer for both the PRE and POST reports; see the Quickstart).

auditable

auditable: capture, replay, and recover AI agent decisions.

Open-source SDK for live-state replay and recovery of consequential agent decisions. Capture a signed record of each decision with the dependency state it relied on, replay it under the state that is live now, and route and execute a fix (allow, block, human-review, rollback). One decision binds a data, model, and harness report in a single record (the full chain).

Action dataclass

What the agent is about to do (or did).

Auditor

Bases: ABC

Detect-face base class, the analog of pyod's BaseDetector.

A concrete auditor sets stage and name and implements assess, returning a normalized Report. The subject type is stage-specific (a snapshot, a model span, an action); the uniform part is the Report return, not the input.

DecisionRecord dataclass

digest

digest()

Content hash over the record body (minus record_id), chained via prev.

The body includes the three leaf reports and the compound, so the record digest transitively commits to all leaves.

DependencySnapshot dataclass

The dependency state the agent relied on at decision time.

state holds the versioned dependencies, for example {"budget_remaining": 5000, "allow_list_version": 7, "policy_id": "kyc-2026-03"}. captured_at is the snapshot's own timestamp, which may lag the decision; a stale snapshot is the failure mode auditable is built to surface.

Report dataclass

The normalized leaf an Auditor returns. Uniform across stages.

score is normalized to [0, 1] (0 normal, 1 maximal risk); it is the field that lets the compound combine reports across stages. flag is a short label such as "ok", "stale", "low_trust", or "over_cap". evidence holds the detector-specific detail behind the score.

digest

digest()

Content hash of this leaf, independently verifiable.

CompoundReport dataclass

A transparent bundle over the per-stage leaves (v0.1).

reports preserves the per-stage breakdown, because an auditor usually needs to know which stage flagged a decision. uncalibrated_score is an explicit debug aggregate (the max of per-stage scores), not decision-grade and not used by the gate. v0.2 replaces it with a calibrated combined risk.

by_stage

by_stage()

Map stage name to its report, for callers that want the breakdown.

Decision

Handle yielded by audit. Fill in inputs, model, action, and attach reports.

attach

attach(report)

Route a leaf report to its stage span.

FileSink

Append-only JSONL sink: one signed record per line, durable across process exit.

A second concrete sink alongside MemorySink so the signed log survives the process and the pluggable-sink abstraction has more than one implementation.

MemorySink

Default in-process sink. Signs each record and chains it to the previous one.

ReplayUndecidable

Bases: Exception

Raised by a policy that cannot re-decide deterministically under a given state.

DataAuditor

Bases: Auditor

Score the dependency state a decision relied on; fall back to freshness.

ModelAuditor

Bases: Auditor

A thin trust flag on the deciding model (v0.1).

Heuristic: a stated decision basis raises trust, its absence lowers it. The score is the risk (1 - trust); the trust value is kept in evidence. v0.3 replaces this with TrustLLM trust signals on the model and its output.

ActionGate

The concrete v0.1 control surface. Maps a replay verdict to an executed fix.

Side-effect timing is explicit. enforce_pre_commit runs before the action (allow, block, or hold). enforce_post_commit runs after the action committed through the rail (allow, hold, or roll back via rail.compensate). A routed verdict that cannot execute a compensating action is observability, not control.

HarnessAuditor

Bases: Auditor

A thin static rule on the action: flag spend over a static cap (v0.1).

The static cap is the forward, point-in-time check that incumbents already run. Its report is the harness audit leaf; replay-under-live-state (in chain) is layered on top of it. The score ramps from 0 at the cap to 1 at twice the cap.

Rail

Bases: Protocol

Any commit/compensate backend (a payment rail, a record store, a ledger).

ReferenceLedger

In-process reference rail: commit spends, compensate refunds. For demo and tests.

AnalysisReport dataclass

The result of :func:analyze_run: the run's structural risk plus grounding.

Fields the user reads:

  • state: scored / no_score:single_decision / no_score:low_coverage (the same honest gate structural_risk applies; a no-score state still reports coverage and the descriptive structure).
  • ranked: every step as a :class:DecisionRisk, highest structural risk first; in a no-score state the scores are None and the order is by index.
  • keystone: the worst-blast step (what most of the run rests on), or None when the run is not scored.
  • per_session: the keystone's blast share (the run-level risk), or None.
  • coverage: dependency-edge coverage with the saturation ratio rho and the per-grade breakdown (observed / declared / inferred).
  • grounding: per step index, the model-basis grounding where a basis is stated. Empty for a corpus tool trace (no step states a model basis); it lights up on auditable's own records, which carry decision_basis and read context.
  • completeness: complete (offline) now; prefix for the v0.3b live path, with no field change.
  • adapter: the ingestion adapter id (<name>_<version>), so the report names the source that produced it.
  • features: the raw layered structural features (descriptive, present in every state).
  • notes: plain-language honesty notes (modeled corpus edges, low coverage, the no-calibration statement).

summary

summary()

Render the human-readable report the examples print.

to_markdown

to_markdown(*, level=1)

Render this POST report as Markdown (the additive copy-pasteable form).

Thin delegate to :func:auditable.report.post_to_markdown; the plaintext summary / __str__ are unchanged. Imported lazily to avoid an import cycle (report.py imports this module).

Adapter

Bases: Protocol

Map one source into typed steps. The stable public ingestion contract.

An adapter carries a name and a version (so a produced graph records which adapter built it) and implements to_steps, which turns a source (a public-corpus trajectory, an auditable run's own records, or a live framework stream) into the Step list SessionGraph.from_steps reads. The protocol is runtime_checkable, so an instance with these three members satisfies isinstance(obj, Adapter) without subclassing.

audit

audit(action_type, *, snapshot, sink=None)

Capture one agent decision with the dependency snapshot it relied on.

replay

replay(record, *, live_state, policy)

Re-derive whether the decision still holds under the live dependency state.

Pure: returns a verdict, executes nothing. The agent acted under record.data.snapshot.state; we re-evaluate the same action against live_state. If the action was justified on the snapshot but not on live state, it relied on stale or drifted state and we route a ROLLBACK. A policy that cannot decide raises ReplayUndecidable and we return HUMAN_REVIEW.

analyze_run

analyze_run(source, *, adapter, ground=True)

Analyze one agent run offline: structural risk plus model-basis grounding.

source is whatever the adapter consumes (a public-corpus trajectory, a chain of auditable's own DecisionRecords, or, in v0.3b, a live stream). adapter is any :class:~auditable.graph.adapters.protocol.Adapter (it exposes to_steps plus a name / version). The call maps the source to typed steps, builds the :class:SessionGraph, scores it with :func:structural_risk, grounds each step that states a basis, and returns an :class:AnalysisReport.

Set ground=False to skip the (cheap, deterministic) grounding pass. Scoring requires the graph extra (NetworkX); without it the underlying projection raises a clear ImportError.

render_report

render_report(report, *, level=1)

Render a PRE or POST report to Markdown, dispatched by type.

PreReport renders through :func:pre_to_markdown; AnalysisReport through :func:post_to_markdown. Any other type raises TypeError. This is the single import a caller reaches for when they do not want the report.to_markdown() method.

The PreReport / AnalysisReport imports are function-local: report.py sits beside analysis.py, and importing both report modules at load time would risk an import cycle.

Report Rendering

The Markdown renderer for the typed reports: render_report (the top-level dispatcher), the per-pillar pre_to_markdown and post_to_markdown, and the to_markdown method on each report. It formats the fields the report already carries and is standard-library only.

auditable.report

Markdown rendering for the analysis reports: one dependency-free renderer.

analyze_plan (PRE) and analyze_run (POST) each return a typed report whose summary() is terse indented plaintext. This module adds a second, additive surface: a clean Markdown form of the SAME typed fields, suitable for pasting into a pull request, an issue, or a design doc. It computes nothing new. The PRE report already carries the execution-topology keystone, the four lint findings, the preflight coverage views, and the notes; the POST report already carries the blast-radius keystone, the ranked decisions, the grounding, and the notes. The renderer only formats what is there.

Two reach paths produce the same string:

  1. report.to_markdown() on each report object (a thin method that delegates here), paralleling the existing report.summary().
  2. render_report(report), a top-level dispatcher that picks the right renderer by type. This is the single import for a caller who does not want the method.

Both PRE and POST render five labeled parts: the lifecycle stage (the banner plus a meta line), what is risky on the graph, the keystone (PRE: an execution-topology chokepoint; POST: a dependency-DAG blast-radius keystone, two distinct concepts the wording keeps apart), the per-finding detail, and a short "what to do" line.

Dependency-free by design: standard library only, no NetworkX, no templating engine, no table library. The PreReport / AnalysisReport imports are done lazily inside the functions, because report.py sits beside analysis.py and would otherwise import both auditable.graph.pre and auditable.analysis at module load and risk an import cycle.

pre_to_markdown

pre_to_markdown(report, *, level=1)

Render a :class:~auditable.graph.pre.PreReport as Markdown.

Mirrors the section order of PreReport.summary so the two stay recognizable, and keeps the PRE keystone labeled as an execution-topology chokepoint (distinct from the POST blast-radius keystone). No new computation: every value is read off report.

post_to_markdown

post_to_markdown(report, *, level=1)

Render an :class:~auditable.analysis.AnalysisReport as Markdown.

Mirrors the section order of AnalysisReport.summary so the two stay recognizable, and keeps the POST keystone labeled as a dependency-DAG blast-radius keystone (distinct from the PRE execution-topology chokepoint). No new computation: every value is read off report.

render_report

render_report(report, *, level=1)

Render a PRE or POST report to Markdown, dispatched by type.

PreReport renders through :func:pre_to_markdown; AnalysisReport through :func:post_to_markdown. Any other type raises TypeError. This is the single import a caller reaches for when they do not want the report.to_markdown() method.

The PreReport / AnalysisReport imports are function-local: report.py sits beside analysis.py, and importing both report modules at load time would risk an import cycle.

PRE: Declared-Plan Analysis

analyze_plan, the four reachability lints, the execution-topology keystone, the preflight coverage report, and the PreReport it returns.

auditable.graph.pre

PRE: design-time lints over a DECLARED agent plan (before any run).

The PRE entry mirrors :func:auditable.analysis.analyze_run's adapter -> SessionGraph flow, but computes only the parts that are honest before a single step executes. A DECLARED plan (a LangGraph compiled graph, a CrewAI task DAG, or an AutoGen topology, lowered through :class:DeclaredPlanAdapter into the neutral plan dict) carries control flow and declared data reads / writes, but no observed values. So PRE does two things and withholds a third:

  1. Execution-topology keystone. The structural chokepoint of the declared plan: the node the most other nodes transitively FOLLOW in control flow, via :func:execution_reach over the handoff_to projection. This is a STRUCTURAL design lint (a chokepoint), NOT the POST blast-radius keystone from :mod:auditable.graph.risk (a blast-share triage signal over the dependency DAG). The two are distinct named concepts and must not be conflated.

  2. Four reachability lints over the projected declared graph. All four are pure, read-only NetworkX queries: no mutation of the graph, no side effects, no runtime / value execution. The "would it flip" and drift-confirmation halves are runtime work, explicitly out of scope and noted in each finding's detail.

  3. State-B (dependency-state) blast-share risk is WITHHELD. A declared dependency layer is declared-only (observed_fraction=0), so a multi-step plan makes :func:structural_risk return no_score:low_coverage (a 0- or 1-step plan is gated earlier as no_score:single_decision). Either way no number is emitted: PRE asserts the boundary as "the verdict is some no_score:* state" and surfaces state_b_risk=None with a reason string; it never presents a dependency-state risk number. Only a SCORED verdict on a declared graph violates the boundary, and then :func:analyze_plan raises rather than emit the number.

Alongside the withheld State-B number, PRE attaches a Preflight Coverage Report: a descriptive, calibrated coverage-readiness view, NOT a risk score. It reuses the existing :meth:SessionGraph.coverage model and the declared resource-touch metadata to tell the user what the runtime scorer will need before it can score (preflight_coverage), which declared touches still lack a resource identity (resource_touch_completeness), and where declared revalidation barriers exist per resource (barrier_inventory). This strengthens PRE without selling a false score and leaves the State-B withhold boundary above unchanged.

PRE applies only where a declared graph exists. A free-form ReAct agent with no declared plan degrades to the flat rule floor and is out of scope here.

This module lives under auditable.graph.* and adds no top-level public export.

LintFinding dataclass

One PRE lint hit: a structural design issue read off the declared graph.

  • lint: the lint name (e.g. 'write_with_no_prior_read').
  • node_idx: the offending step idx.
  • resource_id: the resource the finding is about, or None when not resource-specific.
  • detail: a one-line human reason. For the annotation-only halves it states that the runtime / value confirmation is out of scope at PRE.
  • severity: 'warning' by default; PRE findings are structural design warnings, not validated failure predictions.

PreflightCoverage dataclass

The existing coverage() model surfaced over the DECLARED graph.

Descriptive, NOT a risk number. Reads :meth:SessionGraph.coverage plus the exact no_score:* state :func:structural_risk would apply at runtime, so the user can see the grade mix and why the runtime scorer will withhold a State-B number rather than guessing one here.

  • n_steps: plan node count (the size-normalized risk denominator basis).
  • n_dep_edges: total dependency edges (every one DECLARED at PRE).
  • observed / declared / inferred: the grade-mix counts from coverage().by_grade (the same three :class:Grade buckets, flattened to plain ints for legibility).
  • observed_fraction / rho: the observed share and the saturation ratio from coverage(); at PRE observed_fraction is 0.0 on any non-empty declared layer.
  • no_score_reason: the exact no_score:* state structural_risk applies -- no_score:low_coverage for a multi-step declared plan, no_score:single_decision for a 0- or 1-step plan. This is the reason the State-B score is withheld, surfaced descriptively (it is never a number).
  • would_score: always False at PRE; present so the contract reads explicitly as "the runtime scorer cannot score this declared layer yet".

ResourceGap dataclass

One declared touch (a read, write, or dependency edge) lacking a resource id.

  • kind: 'read' / 'write' / 'edge'.
  • node_idx: the owning step idx (the dependent step for an 'edge').
  • src_idx: only for 'edge' -- the producer step the edge points at.
  • detail: a one-line reason naming the missing identity.

ResourceTouchCompleteness dataclass

Which declared touches carry a resource identity, and which do not.

The runtime touch contract (v0.3b own-record) matches a later read to an earlier write of the same {namespace, resource_id, key} and fills :class:~auditable.graph.session.ResourceRef on the observed edge. At PRE no edge is observed, so this view reports, descriptively, which writes, reads, and declared dependency edges are still missing an identity the runtime contract will need:

  • a read / write is complete when its node_attrs id string is non-empty;
  • a declared dependency edge is complete when it carries a resource id, either the structured DependencyEdge.resource (ResourceRef) or the evidence['resource_id'] string the declared adapter records.

Counts plus the per-touch gap list are exposed so a caller can see both the headline (writes_with_id of n_writes) and the exact offending touch. edges_missing_structured_resource separately counts edges that carry an evidence['resource_id'] but a None structured resource -- the declared-corpus norm, and exactly the seam the runtime contract fills.

BarrierInventory dataclass

The declared re-read / re-validation nodes, grouped per resource (structure only).

A barrier is a node that re-reads (revalidates) a resource: the declared adapter records it in node_attrs['barriers'] (a read flagged revalidates). This view lists, per resource id, the step idxs that declare a revalidation barrier for it, and the flat set of resources that have at least one barrier. It is reported as STRUCTURE: a resource that appears as a volatile read but is absent from by_resource has no declared barrier, which a consuming view can surface without claiming any drift occurred (drift confirmation is runtime work, out of scope at PRE).

  • by_resource: resource id -> sorted list of barrier step idxs.
  • barrier_nodes: sorted list of every step idx that declares any barrier.
  • resources_with_barrier: sorted list of resource ids that have a barrier.

PreReport dataclass

The result of :func:analyze_plan: the PRE-honest view of a DECLARED plan.

  • adapter: the ingestion adapter id (declared_plan_v1).
  • n_steps: number of plan nodes.
  • keystone_idx / keystone_followers: the execution-topology keystone (the argmax of :func:execution_reach) and its transitive control-flow followers. This is the STRUCTURAL chokepoint of the declared plan, NOT the POST blast-radius keystone from :mod:auditable.graph.risk.
  • execution_reach_by_idx: every step idx -> its transitive control-flow followers.
  • findings: the four lints' :class:LintFindings.
  • state_b_risk / state_b_withheld / state_b_withheld_reason: the dependency-state blast-share risk is ALWAYS withheld at PRE (declared-only); the number is never computed. state_b_risk stays None and state_b_withheld stays True.
  • preflight_coverage / resource_touch_completeness / barrier_inventory: the Preflight Coverage Report -- a descriptive, calibrated coverage-readiness view (the grade mix, the exact no-score reason the runtime scorer will apply, which declared touches lack a resource identity, and the declared revalidation barriers per resource). It is NOT a risk number and does not touch the State-B withhold boundary.
  • notes: plain-language notes, including that the keystone is a structural chokepoint (a design lint), not a failure predictor.

summary

summary()

Render a human-readable PRE report (mirrors AnalysisReport.summary style).

to_markdown

to_markdown(*, level=1)

Render this PRE report as Markdown (the additive copy-pasteable form).

Thin delegate to :func:auditable.report.pre_to_markdown; the plaintext summary / __str__ are unchanged. Imported lazily to avoid an import cycle (report.py imports this module).

execution_projection

execution_projection(G)

The handoff_to projection as a simple DiGraph (predecessor -> successor).

Step nodes plus the execution (control-flow) edges, dropping the dependency and emits layers. handoff_to points predecessor -> successor, so a node's transitive control-flow FOLLOWERS are its descendants here (the basis for :func:auditable.graph.execution_reach).

write_with_no_prior_read

write_with_no_prior_read(G)

Fire when a node writes a resource never read in its backward slice.

Primitive: nx.descendants over dependency_dag(G) from the write node (the backward slice = what the action transitively rests on, because depends_on points dependent -> dependency), cross-referenced against the reads resource sets in node_attrs of the slice nodes plus the writer itself. For each step W whose writes is non-empty, FIRE one finding per written resource R that is NOT in the union of reads over {W} U slice.

flippable_dependency_annotations

flippable_dependency_annotations(G)

Annotate unpinned, non-revalidated volatile dependencies feeding a decision.

Primitive: nx.descendants over dependency_dag(G) from each decision node (its backward slice / dependency set), intersected with the per-edge evidence flags on the DECLARED depends_on edges. For each decision D, over the DECLARED depends_on edges on D's backward slice that carry evidence['volatile'], FIRE one annotation per such edge / resource that is neither evidence['pinned'] nor evidence['revalidates']. This is an ANNOTATION, not a value-flip proof: severity stays 'warning' and the detail says the would-flip half needs runtime values.

scope_vs_snapshot

scope_vs_snapshot(G)

Fire when granted tool scope strictly exceeds the snapshot the node read.

Primitive: set comparison of the declared scope (node_attrs['scope'], the granted resource ids) versus the read-resource set actually pulled into the node's snapshot, computed as the union of reads over {N} U nx.descendants(dependency_dag(G), 'step::N'). For each node N whose scope is present, FIRE when set(scope) is a STRICT superset of the read set (the grant exceeds the snapshot it validated; it can act on state it never read). The reported resource_id per finding is one of scope - read_set.

missing_revalidation_barrier

missing_revalidation_barrier(G)

Fire when a volatile read reaches an action with no re-read between them.

Two-projection query. First, nx.descendants over dependency_dag(G) locates a volatile read upstream of a consequential action (the backward slice). Then nx.descendants over :func:execution_projection (handoff_to) checks the control path from the volatile-read node to the action contains NO intervening barrier (a node re-reading that resource, i.e. with the resource in its node_attrs['barriers'] set).

For each consequential action A (writes non-empty, or a decision with a volatile dependency), for each volatile read node V in A's backward slice whose resource R is volatile, FIRE when there is NO node B with R in its barrier set on a control path strictly between V and A in handoff order. No finding when such a barrier B exists.

resource_touch_completeness

resource_touch_completeness(G)

Report which declared reads, writes, and dependency edges carry a resource id.

Pure read-only scan of the projected graph: it reads node_attrs['reads'] / ['writes'] on each step node and the evidence / resource on each depends_on edge. A read or write counts as identified when its id string is non-empty; a dependency edge counts as identified when it carries either a structured :class:ResourceRef (resource) or an evidence['resource_id'] string. Edges that carry only the evidence string but a None structured resource are counted separately, since that ResourceRef is exactly what the runtime touch contract fills.

barrier_inventory

barrier_inventory(G)

Inventory the declared revalidation barriers per resource (structure only).

Pure read-only scan of node_attrs['barriers'] (the reads flagged revalidates by the declared adapter). Returns, per resource id, the sorted step idxs that declare a barrier for it, plus the flat barrier-node and barrier-resource sets. No drift is claimed: this is the declared structure of where revalidation re-reads exist, which a consuming view can compare against the volatile-read set to spot a missing barrier without runtime values.

analyze_plan

analyze_plan(plan, *, adapter=declared_plan_v1)

Run PRE over a DECLARED plan: execution-topology keystone + reachability lints.

Mirrors :func:auditable.analysis.analyze_run's adapter -> SessionGraph flow, but computes only the PRE-honest parts. The plan is lowered through adapter (default :data:declared_plan_v1) into typed steps, projected to NetworkX once, and the four pure lints run over it. The execution-topology keystone is the argmax of :func:execution_reach (the structural chokepoint of the declared plan).

State-B (dependency-state) blast-share risk is explicitly WITHHELD: the declared dependency layer is declared-only, so :func:structural_risk returns a no_score:* verdict (no_score:low_coverage for a multi-step plan, no_score:single_decision for a 0- or 1-step plan). This function asserts that boundary -- the verdict must be some no_score state, never SCORED; if a scored verdict ever came back here, it raises rather than emit a number the evidence does not support. Scoring requires the graph extra (NetworkX); without it the projection raises a clear ImportError.

Graph Adapters

The ingestion extension point and the shipped adapters, including the declared-plan adapter the PRE pillar consumes.

auditable.graph.adapters

Source-agnostic ingestion adapters: map a source into typed Steps.

The :class:Adapter protocol is the public extension point (the analog of subclassing pyod's BaseDetector). The concrete adapters here are versioned reference adapters: pinned by name and version, stable to call by that versioned name, and used in the examples; the protocol, not any concrete class, is the contract user code implements. Each adapter turns one source into the typed Step list that auditable.graph.SessionGraph and the structural scorer consume, so a public corpus, an auditable run's own records, and (in v0.3b) a live framework stream converge on one representation.

Adapters shipped this round:

  • :data:tau_bench_prior_db_reads_v1 (corpus): a tau-bench-style trajectory to steps, with each consequential write depending on every prior DB read, graded OBSERVED but marked modeled in evidence (a conservative prior-read upper bound, not a causal label). Pure: messages in, steps out, no download.
  • :data:own_record_v1 (own records): a chain of signed DecisionRecords to steps, execution edges from the prev_digest backbone and model attributes on each node, with sparse DECLARED dependency edges (no fabricated observed edges) until the v0.3b resource-touch contract lands.
  • :data:declared_plan_v1 (declared plan, PRE): a framework-agnostic DECLARED agent plan dict to steps, control_preds -> execution edges and declared reads -> DECLARED dependency edges, the neutral target a LangGraph / CrewAI / AutoGen front-end would lower into (see :mod:auditable.graph.pre). Not a parser for any framework.

Adapter

Bases: Protocol

Map one source into typed steps. The stable public ingestion contract.

An adapter carries a name and a version (so a produced graph records which adapter built it) and implements to_steps, which turns a source (a public-corpus trajectory, an auditable run's own records, or a live framework stream) into the Step list SessionGraph.from_steps reads. The protocol is runtime_checkable, so an instance with these three members satisfies isinstance(obj, Adapter) without subclassing.

OwnRecordAdapter

Bases: _BaseAdapter

Map a chain of DecisionRecords to typed steps (execution + model facet).

agent_label is the single actor's label under the homogeneous-model assumption (the model identity rides on each node as model_id, not as the agent). link_sequential toggles whether each non-genesis record declares a dependency on its immediate predecessor; both settings keep the dependency layer non-observed and low coverage this round.

to_steps

to_steps(records)

Build one decision step per record, in log order.

Execution predecessors come from the prev_digest chain; dependency edges are sparse and DECLARED (see the module docstring). Records are read by duck-typing, so loaded, live, or stubbed records all work.

TauBenchPriorDBReadsAdapter

Bases: _BaseAdapter

Map a tau-bench-style message trajectory to typed steps with modeled deps.

assistant_agent is the agent label for assistant turns (the messages do not name the model, so it defaults to "assistant"); model_id, when set, is attached to each assistant decision node so the model attribute is populated without inventing an identity the trace does not carry.

to_steps

to_steps(messages)

Normalize one tau-bench task run (a list of role / tool messages) into typed steps. Read and write tool events are observed; each write depends on every prior DB read as a conservative, modeled prior-read upper bound.

DeclaredPlanAdapter

Bases: _BaseAdapter

Lower a framework-agnostic DECLARED plan dict into typed steps.

name='declared_plan' / version='v1' (id='declared_plan_v1'), callable via the inherited __call__. :meth:to_steps validates the plan shape and each node, then emits one :class:Step per node with DECLARED-graded dependency edges only. It is the seam a real LangGraph / CrewAI / AutoGen front-end would target; it does not parse any of those frameworks itself.

to_steps

to_steps(plan)

Validate the declared-plan schema and lower it to typed steps.

The top-level shape must be {"nodes": [...]} (an empty / None plan yields []). Each node's idx must be a plain, unique int, kind must be "decision" or "tool_call", and every resource-ref must be a valid string or dict. Reads naming a prior producer become DECLARED dependency edges carrying the resource id and flags in evidence; reads / writes / scope / barriers / volatile-reads are recorded in node_attrs so the PRE lints are pure node + edge queries.

Graph Kernel

The typed decision-graph construction and the structural queries the analyses read.

auditable.graph

Heterogeneous decision-graph construction and characterization (the kernel).

The signed records are the source of truth; this projects a normalized agent trace into a NetworkX MultiDiGraph for analysis and characterization. It is torch-free (NetworkX only); the heavy graph-OD stack (PyG, PyGOD) is optional and lives benchmark-side.

Two edge classes, matching the attachment model: execution edges (emits, handoff_to) are observed from the trace (State A); dependency edges (depends_on) are inferred or declared (State B) and are never read off the trace. The graph is a first-class, queryable part of the package; audit() stays the ergonomic capture entry and the user is never asked to build the graph by hand.

build_graph

build_graph(
    steps,
    *,
    dependency="full_context",
    shared_resource=True
)

Build the typed decision graph from a normalized trace.

steps are ordered dicts with keys idx (a unique int), agent (str), and kind ("decision" or "tool_call"). Execution edges come from the trace. Dependency edges follow dependency: "full_context" (each step depends on every prior step, the full-history assumption these multi-agent systems actually use), "chain" (each step depends only on its immediate predecessor), or "explicit" (each step depends on exactly the prior steps named in its own deps list of indices). "explicit" is the observed- dependency case: when the trace exposes which state a step actually read (tool I/O, file reads), the adapter records it in deps instead of inferring it. shared_resource adds one dependency_resource node that every step reads and writes (the shared blackboard); pass False when dependencies are modeled explicitly and the coarse blackboard would double-count.

dependency_dag

dependency_dag(G)

The depends_on projection as a simple DiGraph (step -> what it relied on).

characterize

characterize(G)

Structural properties of one decision graph (the per-graph measurement).

downstream_reach

downstream_reach(G, step_idx)

How many steps (transitively) depend on step_idx -- its blast radius.

execution_reach

execution_reach(G, step_idx)

How many steps transitively FOLLOW step_idx in control flow -- the plan keystone.

Twin of :func:downstream_reach, but over the EXECUTION projection (handoff_to edges) instead of depends_on. In :meth:SessionGraph.to_networkx, handoff_to points predecessor -> successor, so the transitive followers of a node are its DESCENDANTS in that projection (contrast :func:downstream_reach, where depends_on points dependent -> dependency and the blast set is the ANCESTORS). Returns 0 if the node is not in the graph.