API Reference¶
The stable public surface is the flat top-level API. The graph analyses (PRE and the
graph kernel) live under auditable.graph.*.
Top-Level API¶
The capture, replay, and recovery flow, the standalone auditors, the POST
offline-analysis entry, and render_report (the dependency-free Markdown
renderer for both the PRE and POST reports; see the Quickstart).
auditable ¶
auditable: capture, replay, and recover AI agent decisions.
Open-source SDK for live-state replay and recovery of consequential agent decisions. Capture a signed record of each decision with the dependency state it relied on, replay it under the state that is live now, and route and execute a fix (allow, block, human-review, rollback). One decision binds a data, model, and harness report in a single record (the full chain).
Action
dataclass
¶
What the agent is about to do (or did).
Auditor ¶
Bases: ABC
Detect-face base class, the analog of pyod's BaseDetector.
A concrete auditor sets stage and name and implements assess, returning a
normalized Report. The subject type is stage-specific (a snapshot, a model
span, an action); the uniform part is the Report return, not the input.
DecisionRecord
dataclass
¶
digest ¶
Content hash over the record body (minus record_id), chained via prev.
The body includes the three leaf reports and the compound, so the record digest transitively commits to all leaves.
DependencySnapshot
dataclass
¶
The dependency state the agent relied on at decision time.
state holds the versioned dependencies, for example
{"budget_remaining": 5000, "allow_list_version": 7, "policy_id": "kyc-2026-03"}.
captured_at is the snapshot's own timestamp, which may lag the decision; a stale
snapshot is the failure mode auditable is built to surface.
Report
dataclass
¶
The normalized leaf an Auditor returns. Uniform across stages.
score is normalized to [0, 1] (0 normal, 1 maximal risk); it is the field that
lets the compound combine reports across stages. flag is a short label such as
"ok", "stale", "low_trust", or "over_cap". evidence holds the
detector-specific detail behind the score.
CompoundReport
dataclass
¶
A transparent bundle over the per-stage leaves (v0.1).
reports preserves the per-stage breakdown, because an auditor usually needs to
know which stage flagged a decision. uncalibrated_score is an explicit debug
aggregate (the max of per-stage scores), not decision-grade and not used by the gate.
v0.2 replaces it with a calibrated combined risk.
Decision ¶
Handle yielded by audit. Fill in inputs, model, action, and attach reports.
FileSink ¶
Append-only JSONL sink: one signed record per line, durable across process exit.
A second concrete sink alongside MemorySink so the signed log survives the process
and the pluggable-sink abstraction has more than one implementation.
MemorySink ¶
Default in-process sink. Signs each record and chains it to the previous one.
ReplayUndecidable ¶
Bases: Exception
Raised by a policy that cannot re-decide deterministically under a given state.
DataAuditor ¶
ModelAuditor ¶
Bases: Auditor
A thin trust flag on the deciding model (v0.1).
Heuristic: a stated decision basis raises trust, its absence lowers it. The score is
the risk (1 - trust); the trust value is kept in evidence. v0.3 replaces this with
TrustLLM trust signals on the model and its output.
ActionGate ¶
The concrete v0.1 control surface. Maps a replay verdict to an executed fix.
Side-effect timing is explicit. enforce_pre_commit runs before the action (allow,
block, or hold). enforce_post_commit runs after the action committed through the
rail (allow, hold, or roll back via rail.compensate). A routed verdict that cannot
execute a compensating action is observability, not control.
HarnessAuditor ¶
Bases: Auditor
A thin static rule on the action: flag spend over a static cap (v0.1).
The static cap is the forward, point-in-time check that incumbents already run. Its
report is the harness audit leaf; replay-under-live-state (in chain) is layered on
top of it. The score ramps from 0 at the cap to 1 at twice the cap.
Rail ¶
Bases: Protocol
Any commit/compensate backend (a payment rail, a record store, a ledger).
ReferenceLedger ¶
In-process reference rail: commit spends, compensate refunds. For demo and tests.
AnalysisReport
dataclass
¶
The result of :func:analyze_run: the run's structural risk plus grounding.
Fields the user reads:
state:scored/no_score:single_decision/no_score:low_coverage(the same honest gatestructural_riskapplies; a no-score state still reports coverage and the descriptive structure).ranked: every step as a :class:DecisionRisk, highest structural risk first; in a no-score state the scores areNoneand the order is by index.keystone: the worst-blast step (what most of the run rests on), orNonewhen the run is not scored.per_session: the keystone's blast share (the run-level risk), orNone.coverage: dependency-edge coverage with the saturation ratiorhoand the per-grade breakdown (observed / declared / inferred).grounding: per step index, the model-basis grounding where a basis is stated. Empty for a corpus tool trace (no step states a model basis); it lights up on auditable's own records, which carrydecision_basisand read context.completeness:complete(offline) now;prefixfor the v0.3b live path, with no field change.adapter: the ingestion adapter id (<name>_<version>), so the report names the source that produced it.features: the raw layered structural features (descriptive, present in every state).notes: plain-language honesty notes (modeled corpus edges, low coverage, the no-calibration statement).
to_markdown ¶
Render this POST report as Markdown (the additive copy-pasteable form).
Thin delegate to :func:auditable.report.post_to_markdown; the plaintext
summary / __str__ are unchanged. Imported lazily to avoid an import
cycle (report.py imports this module).
Adapter ¶
Bases: Protocol
Map one source into typed steps. The stable public ingestion contract.
An adapter carries a name and a version (so a produced graph records
which adapter built it) and implements to_steps, which turns a source
(a public-corpus trajectory, an auditable run's own records, or a live
framework stream) into the Step list SessionGraph.from_steps reads.
The protocol is runtime_checkable, so an instance with these three
members satisfies isinstance(obj, Adapter) without subclassing.
audit ¶
Capture one agent decision with the dependency snapshot it relied on.
replay ¶
Re-derive whether the decision still holds under the live dependency state.
Pure: returns a verdict, executes nothing. The agent acted under
record.data.snapshot.state; we re-evaluate the same action against live_state.
If the action was justified on the snapshot but not on live state, it relied on stale
or drifted state and we route a ROLLBACK. A policy that cannot decide raises
ReplayUndecidable and we return HUMAN_REVIEW.
analyze_run ¶
Analyze one agent run offline: structural risk plus model-basis grounding.
source is whatever the adapter consumes (a public-corpus trajectory, a
chain of auditable's own DecisionRecords, or, in v0.3b, a live stream).
adapter is any :class:~auditable.graph.adapters.protocol.Adapter (it
exposes to_steps plus a name / version). The call maps the source to
typed steps, builds the :class:SessionGraph, scores it with
:func:structural_risk, grounds each step that states a basis, and returns an
:class:AnalysisReport.
Set ground=False to skip the (cheap, deterministic) grounding pass. Scoring
requires the graph extra (NetworkX); without it the underlying projection
raises a clear ImportError.
render_report ¶
Render a PRE or POST report to Markdown, dispatched by type.
PreReport renders through :func:pre_to_markdown; AnalysisReport
through :func:post_to_markdown. Any other type raises TypeError. This is
the single import a caller reaches for when they do not want the
report.to_markdown() method.
The PreReport / AnalysisReport imports are function-local: report.py
sits beside analysis.py, and importing both report modules at load time
would risk an import cycle.
Report Rendering¶
The Markdown renderer for the typed reports: render_report (the top-level
dispatcher), the per-pillar pre_to_markdown and post_to_markdown, and the
to_markdown method on each report. It formats the fields the report already
carries and is standard-library only.
auditable.report ¶
Markdown rendering for the analysis reports: one dependency-free renderer.
analyze_plan (PRE) and analyze_run (POST) each return a typed report whose
summary() is terse indented plaintext. This module adds a second, additive
surface: a clean Markdown form of the SAME typed fields, suitable for pasting into
a pull request, an issue, or a design doc. It computes nothing new. The PRE report
already carries the execution-topology keystone, the four lint findings, the
preflight coverage views, and the notes; the POST report already carries the
blast-radius keystone, the ranked decisions, the grounding, and the notes. The
renderer only formats what is there.
Two reach paths produce the same string:
report.to_markdown()on each report object (a thin method that delegates here), paralleling the existingreport.summary().render_report(report), a top-level dispatcher that picks the right renderer by type. This is the single import for a caller who does not want the method.
Both PRE and POST render five labeled parts: the lifecycle stage (the banner plus a meta line), what is risky on the graph, the keystone (PRE: an execution-topology chokepoint; POST: a dependency-DAG blast-radius keystone, two distinct concepts the wording keeps apart), the per-finding detail, and a short "what to do" line.
Dependency-free by design: standard library only, no NetworkX, no templating
engine, no table library. The PreReport / AnalysisReport imports are done
lazily inside the functions, because report.py sits beside analysis.py and
would otherwise import both auditable.graph.pre and auditable.analysis at
module load and risk an import cycle.
pre_to_markdown ¶
Render a :class:~auditable.graph.pre.PreReport as Markdown.
Mirrors the section order of PreReport.summary so the two stay
recognizable, and keeps the PRE keystone labeled as an execution-topology
chokepoint (distinct from the POST blast-radius keystone). No new computation:
every value is read off report.
post_to_markdown ¶
Render an :class:~auditable.analysis.AnalysisReport as Markdown.
Mirrors the section order of AnalysisReport.summary so the two stay
recognizable, and keeps the POST keystone labeled as a dependency-DAG
blast-radius keystone (distinct from the PRE execution-topology chokepoint).
No new computation: every value is read off report.
render_report ¶
Render a PRE or POST report to Markdown, dispatched by type.
PreReport renders through :func:pre_to_markdown; AnalysisReport
through :func:post_to_markdown. Any other type raises TypeError. This is
the single import a caller reaches for when they do not want the
report.to_markdown() method.
The PreReport / AnalysisReport imports are function-local: report.py
sits beside analysis.py, and importing both report modules at load time
would risk an import cycle.
PRE: Declared-Plan Analysis¶
analyze_plan, the four reachability lints, the execution-topology keystone, the
preflight coverage report, and the PreReport it returns.
auditable.graph.pre ¶
PRE: design-time lints over a DECLARED agent plan (before any run).
The PRE entry mirrors :func:auditable.analysis.analyze_run's adapter ->
SessionGraph flow, but computes only the parts that are honest before a single
step executes. A DECLARED plan (a LangGraph compiled graph, a CrewAI task DAG, or
an AutoGen topology, lowered through :class:DeclaredPlanAdapter into the neutral
plan dict) carries control flow and declared data reads / writes, but no observed
values. So PRE does two things and withholds a third:
-
Execution-topology keystone. The structural chokepoint of the declared plan: the node the most other nodes transitively FOLLOW in control flow, via :func:
execution_reachover thehandoff_toprojection. This is a STRUCTURAL design lint (a chokepoint), NOT the POST blast-radius keystone from :mod:auditable.graph.risk(a blast-share triage signal over the dependency DAG). The two are distinct named concepts and must not be conflated. -
Four reachability lints over the projected declared graph. All four are pure, read-only NetworkX queries: no mutation of the graph, no side effects, no runtime / value execution. The "would it flip" and drift-confirmation halves are runtime work, explicitly out of scope and noted in each finding's detail.
-
State-B (dependency-state) blast-share risk is WITHHELD. A declared dependency layer is declared-only (
observed_fraction=0), so a multi-step plan makes :func:structural_riskreturnno_score:low_coverage(a 0- or 1-step plan is gated earlier asno_score:single_decision). Either way no number is emitted: PRE asserts the boundary as "the verdict is someno_score:*state" and surfacesstate_b_risk=Nonewith a reason string; it never presents a dependency-state risk number. Only a SCORED verdict on a declared graph violates the boundary, and then :func:analyze_planraises rather than emit the number.
Alongside the withheld State-B number, PRE attaches a Preflight Coverage
Report: a descriptive, calibrated coverage-readiness view, NOT a risk score. It
reuses the existing :meth:SessionGraph.coverage model and the declared
resource-touch metadata to tell the user what the runtime scorer will need before
it can score (preflight_coverage), which declared touches still lack a resource
identity (resource_touch_completeness), and where declared revalidation
barriers exist per resource (barrier_inventory). This strengthens PRE without
selling a false score and leaves the State-B withhold boundary above unchanged.
PRE applies only where a declared graph exists. A free-form ReAct agent with no declared plan degrades to the flat rule floor and is out of scope here.
This module lives under auditable.graph.* and adds no top-level public export.
LintFinding
dataclass
¶
One PRE lint hit: a structural design issue read off the declared graph.
lint: the lint name (e.g.'write_with_no_prior_read').node_idx: the offending step idx.resource_id: the resource the finding is about, orNonewhen not resource-specific.detail: a one-line human reason. For the annotation-only halves it states that the runtime / value confirmation is out of scope at PRE.severity:'warning'by default; PRE findings are structural design warnings, not validated failure predictions.
PreflightCoverage
dataclass
¶
The existing coverage() model surfaced over the DECLARED graph.
Descriptive, NOT a risk number. Reads :meth:SessionGraph.coverage plus the
exact no_score:* state :func:structural_risk would apply at runtime, so
the user can see the grade mix and why the runtime scorer will withhold a
State-B number rather than guessing one here.
n_steps: plan node count (the size-normalized risk denominator basis).n_dep_edges: total dependency edges (every one DECLARED at PRE).observed/declared/inferred: the grade-mix counts fromcoverage().by_grade(the same three :class:Gradebuckets, flattened to plain ints for legibility).observed_fraction/rho: the observed share and the saturation ratio fromcoverage(); at PREobserved_fractionis0.0on any non-empty declared layer.no_score_reason: the exactno_score:*statestructural_riskapplies --no_score:low_coveragefor a multi-step declared plan,no_score:single_decisionfor a 0- or 1-step plan. This is the reason the State-B score is withheld, surfaced descriptively (it is never a number).would_score: alwaysFalseat PRE; present so the contract reads explicitly as "the runtime scorer cannot score this declared layer yet".
ResourceGap
dataclass
¶
One declared touch (a read, write, or dependency edge) lacking a resource id.
kind:'read'/'write'/'edge'.node_idx: the owning step idx (the dependent step for an'edge').src_idx: only for'edge'-- the producer step the edge points at.detail: a one-line reason naming the missing identity.
ResourceTouchCompleteness
dataclass
¶
Which declared touches carry a resource identity, and which do not.
The runtime touch contract (v0.3b own-record) matches a later read to an
earlier write of the same {namespace, resource_id, key} and fills
:class:~auditable.graph.session.ResourceRef on the observed edge. At PRE no
edge is observed, so this view reports, descriptively, which writes, reads, and
declared dependency edges are still missing an identity the runtime contract
will need:
- a read / write is complete when its
node_attrsid string is non-empty; - a declared dependency edge is complete when it carries a resource id, either
the structured
DependencyEdge.resource(ResourceRef) or theevidence['resource_id']string the declared adapter records.
Counts plus the per-touch gap list are exposed so a caller can see both the
headline (writes_with_id of n_writes) and the exact offending touch.
edges_missing_structured_resource separately counts edges that carry an
evidence['resource_id'] but a None structured resource -- the
declared-corpus norm, and exactly the seam the runtime contract fills.
BarrierInventory
dataclass
¶
The declared re-read / re-validation nodes, grouped per resource (structure only).
A barrier is a node that re-reads (revalidates) a resource: the declared
adapter records it in node_attrs['barriers'] (a read flagged
revalidates). This view lists, per resource id, the step idxs that declare a
revalidation barrier for it, and the flat set of resources that have at least
one barrier. It is reported as STRUCTURE: a resource that appears as a volatile
read but is absent from by_resource has no declared barrier, which a
consuming view can surface without claiming any drift occurred (drift
confirmation is runtime work, out of scope at PRE).
by_resource: resource id -> sorted list of barrier step idxs.barrier_nodes: sorted list of every step idx that declares any barrier.resources_with_barrier: sorted list of resource ids that have a barrier.
PreReport
dataclass
¶
The result of :func:analyze_plan: the PRE-honest view of a DECLARED plan.
adapter: the ingestion adapter id (declared_plan_v1).n_steps: number of plan nodes.keystone_idx/keystone_followers: the execution-topology keystone (the argmax of :func:execution_reach) and its transitive control-flow followers. This is the STRUCTURAL chokepoint of the declared plan, NOT the POST blast-radius keystone from :mod:auditable.graph.risk.execution_reach_by_idx: every step idx -> its transitive control-flow followers.findings: the four lints' :class:LintFindings.state_b_risk/state_b_withheld/state_b_withheld_reason: the dependency-state blast-share risk is ALWAYS withheld at PRE (declared-only); the number is never computed.state_b_riskstaysNoneandstate_b_withheldstaysTrue.preflight_coverage/resource_touch_completeness/barrier_inventory: the Preflight Coverage Report -- a descriptive, calibrated coverage-readiness view (the grade mix, the exact no-score reason the runtime scorer will apply, which declared touches lack a resource identity, and the declared revalidation barriers per resource). It is NOT a risk number and does not touch the State-B withhold boundary.notes: plain-language notes, including that the keystone is a structural chokepoint (a design lint), not a failure predictor.
to_markdown ¶
Render this PRE report as Markdown (the additive copy-pasteable form).
Thin delegate to :func:auditable.report.pre_to_markdown; the plaintext
summary / __str__ are unchanged. Imported lazily to avoid an import
cycle (report.py imports this module).
execution_projection ¶
The handoff_to projection as a simple DiGraph (predecessor -> successor).
Step nodes plus the execution (control-flow) edges, dropping the dependency and
emits layers. handoff_to points predecessor -> successor, so a node's
transitive control-flow FOLLOWERS are its descendants here (the basis for
:func:auditable.graph.execution_reach).
write_with_no_prior_read ¶
Fire when a node writes a resource never read in its backward slice.
Primitive: nx.descendants over dependency_dag(G) from the write node (the
backward slice = what the action transitively rests on, because depends_on
points dependent -> dependency), cross-referenced against the reads resource
sets in node_attrs of the slice nodes plus the writer itself. For each step W
whose writes is non-empty, FIRE one finding per written resource R that is
NOT in the union of reads over {W} U slice.
flippable_dependency_annotations ¶
Annotate unpinned, non-revalidated volatile dependencies feeding a decision.
Primitive: nx.descendants over dependency_dag(G) from each decision node
(its backward slice / dependency set), intersected with the per-edge evidence
flags on the DECLARED depends_on edges. For each decision D, over the DECLARED
depends_on edges on D's backward slice that carry evidence['volatile'], FIRE
one annotation per such edge / resource that is neither evidence['pinned'] nor
evidence['revalidates']. This is an ANNOTATION, not a value-flip proof:
severity stays 'warning' and the detail says the would-flip half needs runtime
values.
scope_vs_snapshot ¶
Fire when granted tool scope strictly exceeds the snapshot the node read.
Primitive: set comparison of the declared scope (node_attrs['scope'], the
granted resource ids) versus the read-resource set actually pulled into the
node's snapshot, computed as the union of reads over {N} U
nx.descendants(dependency_dag(G), 'step::N'). For each node N whose scope
is present, FIRE when set(scope) is a STRICT superset of the read set (the
grant exceeds the snapshot it validated; it can act on state it never read). The
reported resource_id per finding is one of scope - read_set.
missing_revalidation_barrier ¶
Fire when a volatile read reaches an action with no re-read between them.
Two-projection query. First, nx.descendants over dependency_dag(G) locates
a volatile read upstream of a consequential action (the backward slice). Then
nx.descendants over :func:execution_projection (handoff_to) checks the
control path from the volatile-read node to the action contains NO intervening
barrier (a node re-reading that resource, i.e. with the resource in its
node_attrs['barriers'] set).
For each consequential action A (writes non-empty, or a decision with a
volatile dependency), for each volatile read node V in A's backward slice whose
resource R is volatile, FIRE when there is NO node B with R in its barrier set on
a control path strictly between V and A in handoff order. No finding when such a
barrier B exists.
resource_touch_completeness ¶
Report which declared reads, writes, and dependency edges carry a resource id.
Pure read-only scan of the projected graph: it reads node_attrs['reads'] /
['writes'] on each step node and the evidence / resource on each
depends_on edge. A read or write counts as identified when its id string is
non-empty; a dependency edge counts as identified when it carries either a
structured :class:ResourceRef (resource) or an evidence['resource_id']
string. Edges that carry only the evidence string but a None structured
resource are counted separately, since that ResourceRef is exactly what
the runtime touch contract fills.
barrier_inventory ¶
Inventory the declared revalidation barriers per resource (structure only).
Pure read-only scan of node_attrs['barriers'] (the reads flagged
revalidates by the declared adapter). Returns, per resource id, the sorted
step idxs that declare a barrier for it, plus the flat barrier-node and
barrier-resource sets. No drift is claimed: this is the declared structure of
where revalidation re-reads exist, which a consuming view can compare against
the volatile-read set to spot a missing barrier without runtime values.
analyze_plan ¶
Run PRE over a DECLARED plan: execution-topology keystone + reachability lints.
Mirrors :func:auditable.analysis.analyze_run's adapter -> SessionGraph flow, but
computes only the PRE-honest parts. The plan is lowered through adapter (default
:data:declared_plan_v1) into typed steps, projected to NetworkX once, and the four
pure lints run over it. The execution-topology keystone is the argmax of
:func:execution_reach (the structural chokepoint of the declared plan).
State-B (dependency-state) blast-share risk is explicitly WITHHELD: the declared
dependency layer is declared-only, so :func:structural_risk returns a
no_score:* verdict (no_score:low_coverage for a multi-step plan,
no_score:single_decision for a 0- or 1-step plan). This function asserts that
boundary -- the verdict must be some no_score state, never SCORED; if a scored
verdict ever came back here, it raises rather than emit a number the evidence does
not support. Scoring requires the graph extra (NetworkX); without it the
projection raises a clear ImportError.
Graph Adapters¶
The ingestion extension point and the shipped adapters, including the declared-plan adapter the PRE pillar consumes.
auditable.graph.adapters ¶
Source-agnostic ingestion adapters: map a source into typed Steps.
The :class:Adapter protocol is the public extension point (the analog of
subclassing pyod's BaseDetector). The concrete adapters here are versioned
reference adapters: pinned by name and version, stable to call by that versioned
name, and used in the examples; the protocol, not any concrete class, is the
contract user code implements.
Each adapter turns one source into the typed Step list that
auditable.graph.SessionGraph and the structural scorer consume, so a public
corpus, an auditable run's own records, and (in v0.3b) a live framework stream
converge on one representation.
Adapters shipped this round:
- :data:
tau_bench_prior_db_reads_v1(corpus): a tau-bench-style trajectory to steps, with each consequential write depending on every prior DB read, gradedOBSERVEDbut markedmodeledin evidence (a conservative prior-read upper bound, not a causal label). Pure: messages in, steps out, no download. - :data:
own_record_v1(own records): a chain of signedDecisionRecords to steps, execution edges from theprev_digestbackbone and model attributes on each node, with sparseDECLAREDdependency edges (no fabricated observed edges) until the v0.3b resource-touch contract lands. - :data:
declared_plan_v1(declared plan, PRE): a framework-agnostic DECLARED agent plan dict to steps,control_preds-> execution edges and declared reads ->DECLAREDdependency edges, the neutral target a LangGraph / CrewAI / AutoGen front-end would lower into (see :mod:auditable.graph.pre). Not a parser for any framework.
Adapter ¶
Bases: Protocol
Map one source into typed steps. The stable public ingestion contract.
An adapter carries a name and a version (so a produced graph records
which adapter built it) and implements to_steps, which turns a source
(a public-corpus trajectory, an auditable run's own records, or a live
framework stream) into the Step list SessionGraph.from_steps reads.
The protocol is runtime_checkable, so an instance with these three
members satisfies isinstance(obj, Adapter) without subclassing.
OwnRecordAdapter ¶
Bases: _BaseAdapter
Map a chain of DecisionRecords to typed steps (execution + model facet).
agent_label is the single actor's label under the homogeneous-model
assumption (the model identity rides on each node as model_id, not as the
agent). link_sequential toggles whether each non-genesis record declares a
dependency on its immediate predecessor; both settings keep the dependency
layer non-observed and low coverage this round.
to_steps ¶
Build one decision step per record, in log order.
Execution predecessors come from the prev_digest chain; dependency
edges are sparse and DECLARED (see the module docstring). Records are
read by duck-typing, so loaded, live, or stubbed records all work.
TauBenchPriorDBReadsAdapter ¶
Bases: _BaseAdapter
Map a tau-bench-style message trajectory to typed steps with modeled deps.
assistant_agent is the agent label for assistant turns (the messages do
not name the model, so it defaults to "assistant"); model_id, when
set, is attached to each assistant decision node so the model attribute is
populated without inventing an identity the trace does not carry.
to_steps ¶
Normalize one tau-bench task run (a list of role / tool messages) into typed steps. Read and write tool events are observed; each write depends on every prior DB read as a conservative, modeled prior-read upper bound.
DeclaredPlanAdapter ¶
Bases: _BaseAdapter
Lower a framework-agnostic DECLARED plan dict into typed steps.
name='declared_plan' / version='v1' (id='declared_plan_v1'), callable
via the inherited __call__. :meth:to_steps validates the plan shape and each
node, then emits one :class:Step per node with DECLARED-graded dependency edges
only. It is the seam a real LangGraph / CrewAI / AutoGen front-end would target;
it does not parse any of those frameworks itself.
to_steps ¶
Validate the declared-plan schema and lower it to typed steps.
The top-level shape must be {"nodes": [...]} (an empty / None plan
yields []). Each node's idx must be a plain, unique int, kind must
be "decision" or "tool_call", and every resource-ref must be a valid
string or dict. Reads naming a prior producer become DECLARED dependency
edges carrying the resource id and flags in evidence; reads / writes /
scope / barriers / volatile-reads are recorded in node_attrs so the PRE
lints are pure node + edge queries.
Graph Kernel¶
The typed decision-graph construction and the structural queries the analyses read.
auditable.graph ¶
Heterogeneous decision-graph construction and characterization (the kernel).
The signed records are the source of truth; this projects a normalized agent
trace into a NetworkX MultiDiGraph for analysis and characterization. It is
torch-free (NetworkX only); the heavy graph-OD stack (PyG, PyGOD) is optional and
lives benchmark-side.
Two edge classes, matching the attachment model: execution edges (emits,
handoff_to) are observed from the trace (State A); dependency edges
(depends_on) are inferred or declared (State B) and are never read off the
trace. The graph is a first-class, queryable part of the package; audit()
stays the ergonomic capture entry and the user is never asked to build the graph
by hand.
build_graph ¶
Build the typed decision graph from a normalized trace.
steps are ordered dicts with keys idx (a unique int), agent (str),
and kind ("decision" or "tool_call"). Execution edges come from the
trace. Dependency edges follow dependency: "full_context" (each step
depends on every prior step, the full-history assumption these multi-agent
systems actually use), "chain" (each step depends only on its immediate
predecessor), or "explicit" (each step depends on exactly the prior steps
named in its own deps list of indices). "explicit" is the observed-
dependency case: when the trace exposes which state a step actually read
(tool I/O, file reads), the adapter records it in deps instead of inferring
it. shared_resource adds one dependency_resource node that every step
reads and writes (the shared blackboard); pass False when dependencies are
modeled explicitly and the coarse blackboard would double-count.
dependency_dag ¶
The depends_on projection as a simple DiGraph (step -> what it relied on).
characterize ¶
Structural properties of one decision graph (the per-graph measurement).
downstream_reach ¶
How many steps (transitively) depend on step_idx -- its blast radius.
execution_reach ¶
How many steps transitively FOLLOW step_idx in control flow -- the plan keystone.
Twin of :func:downstream_reach, but over the EXECUTION projection (handoff_to
edges) instead of depends_on. In :meth:SessionGraph.to_networkx, handoff_to
points predecessor -> successor, so the transitive followers of a node are its
DESCENDANTS in that projection (contrast :func:downstream_reach, where
depends_on points dependent -> dependency and the blast set is the ANCESTORS).
Returns 0 if the node is not in the graph.