The Causal Standard Workflow Is Backwards

Who builds the causal model of a system controls the decisions the system supports. The standard workflow gives that power to the wrong people.

Mar 21, 2026

Every causal model of an organization is also a political document. It decides which variables appear in the decision analysis, which interventions are worth simulating, which levers are visible to the executives who act on it. In virtually every organization deploying causal AI today, that document is written by statisticians — and the domain expert is brought in at the end to say whether it looks right.

This is the wrong order. And the reason it’s wrong isn’t procedural. It’s mathematical.

The Statistician Who Doesn’t Know What They’re Doing

Human domain knowledge is not a convenience that speeds up causal discovery. It is mathematically required to orient the edges that data cannot touch.

Here is the structural fact behind that claim. Multiple distinct causal structures can be perfectly consistent with the same statistical data. Three graphs — a chain where A causes B causes C, a fork where B causes both A and C, a collider where A and C both cause B — are statistically indistinguishable from observational evidence alone. The causal discovery algorithms — PC, GES, NOTEARS — return this ambiguity honestly. They output what is called a Completed Partially Directed Acyclic Graph: directed edges where the data can say something, undirected edges where it cannot. For a graph with ten variables, the number of structures consistent with any given observational dataset can run into the thousands.

What resolves the undirected edges is domain knowledge. The expert who says “I know from operating this system for fifteen years that Y causes Z, not the other way around” is providing something no dataset contains. Not because the dataset is too small. Not because the algorithm is insufficiently sophisticated. Because the data cannot answer the question being asked.

The standard workflow takes this mathematical fact and systematically ignores it. A data science team builds a causal model. They identify variables, make structural assumptions, run the algorithms. Then they bring the model to a domain expert and ask them to validate it. Does this look right? Does this match your understanding of how the system works?

The expert does not build the model. They review it. This is not a small distinction.

A causal model is a political document because models are not neutral — a causal model of a business is a formal statement about what causes what in that business. It determines which interventions are worth simulating, which variables appear in the decision analysis, which levers are visible to the executives who act on the model’s outputs. The choice of causal structure is, in the most literal sense, the choice of what the organization can see and what it cannot.

The statistician who builds the model sets the boundaries of the organization’s causal imagination. Most of them don’t know that’s what they’re doing.

In the standard workflow, that choice — by default, unremarked, structurally embedded in the order of operations — belongs to the person who is furthest from the system being modeled. The domain expert who has spent twenty years inside a specific business knows which edges are real and which are correlation artifacts, the feedback loops the literature doesn’t capture, the confounders that never appear in any dataset because they were never measured, the structural features that shift when you intervene rather than observe. Asking them to validate a model the statistician built is asking them to check someone else’s map of their own territory.

A model that excludes a specific variable cannot tell you anything about that variable’s effect. A model that misspecifies an edge direction cannot correctly simulate the intervention that reverses it. Wrong model confidence — the kind Double Machine Learning produces when the causal structure is misspecified — is more dangerous than no model. It gives you precise wrongness with tight confidence intervals.

The standard workflow is backwards. The question is what the correct workflow looks like, and why it has taken this long to build.

Here Is the Thing That Should Bother You: The Components Exist

The Knowledge Acquisition Tool is the name for the workflow that doesn’t yet exist as an integrated product — the conversational system designed to extract the causal knowledge an expert holds implicitly and convert it into a first-draft directed acyclic graph that a statistician can refine but does not need to originate.

The components required to build it are all present. What has not happened is the assembly.

CausalChat-class interfaces demonstrate that LLMs can maintain structured conversational context across a long interview without losing the thread. The Sheffield Elicitation Framework and the IDEA protocol document how expert beliefs can be made explicit, quantified, and calibrated without requiring the expert to learn probability theory. Analysis of Competing Hypotheses shows how confirmation bias can be intercepted before it propagates through a structure — not by correcting experts after the fact, but by changing the procedure so that disconfirmation is built into the interview sequence.

The FinCARE result closes the empirical case: integrating a hybrid of LLMs, a knowledge graph derived from SEC filings, and NOTEARS improved graph recovery accuracy from an F1 score of 0.163 to 0.759. The human knowledge is not decorating the algorithm. It is doing the structural work the algorithm cannot do alone.

The components have not been assembled because assembling them requires solving a problem that spans two communities that rarely talk to each other. Knowledge Engineering has spent two decades developing elicitation protocols — structured questioning, calibrated linguistic probability mapping, spiral development, bias interception. Causal inference has developed the algorithmic machinery — NOTEARS, PC, FCI, Double Machine Learning. Neither community has built the pipeline because neither community is primarily interested in the strategy executive who needs to originate a causal model in forty-five minutes without learning what a CPDAG is.

That executive is the tool’s actual user. The gap is not in the theory. It is in the design of the workflow for the person who holds the knowledge but not the mathematical language to express it.

What the Tool Actually Does

The Knowledge Acquisition Tool runs in four phases. Understanding them matters not because the technical architecture is what’s interesting, but because each phase resolves a specific failure mode in the standard workflow — the failure mode of asking the wrong person to make the structural decisions.

Variable confirmation comes first. The system presents a candidate list derived from domain literature, and the expert curates it. Confirm, reject, rename. Ten minutes. The expert is not generating variables from scratch; they are editing a draft. This is by design — current LLMs are reliable at literature synthesis, surfacing plausible candidates the expert can evaluate without pressure to recall.

Edge elicitation follows. The system presents candidate relationships in temporal language rather than causal language, because experts orient edges more reliably when asked “which comes first?” than when asked “which causes which?” Cycles are flagged not by telling the expert their model is mathematically invalid, but by asking a temporal resolution question: “Over a single quarter, which tends to move first?” The expert’s intuition is preserved. The graph’s formal requirements are satisfied.

Interventional disambiguation targets the highest-stakes undirected edges — the ones the data cannot orient. The question that resolves Markov equivalence is interventional: “If you held B constant through external intervention — locked it — would changes in A still be associated with changes in C?” The expert does not need to understand why this question resolves the structural ambiguity. They only need to be able to answer it. That is the design principle throughout: the expert’s job is to contribute knowledge, not to learn mathematics.

Confidence calibration closes the session. Reference-class questions, correction function applied to subsequent estimates, rough probability distributions attached to the oriented edges. The output is not a publication-ready causal model. It is a first-draft graph sufficient to run basic counterfactual scenarios and identify which additional data collection would most reduce uncertainty.

Forty-five minutes. A first breath for the living model. Built by the person who understands the system — not the person who can specify the algorithm.

Who Builds the Model Builds the World

A living model — causal, counterfactual, continuously updated, treatment-oriented — is an analytical system that can answer not just “what happened?” but “what would happen if we did this?” and “what would have happened if we had done that differently?” These are not refinements of the same question. They are categorically different operations, requiring machinery that the standard analytics stack does not contain.

The causal structure the Knowledge Acquisition Tool produces is the prerequisite for all of it. Without a correctly specified causal graph, the entire downstream apparatus — the interventional simulations, the counterfactual analysis, the ranked treatment recommendations — operates on a foundation built by people who are furthest from the system being modeled.

This is not a technical argument about algorithm performance. It is an argument about organizational epistemology: what an enterprise is capable of knowing, and who decides. The boundary of the causal model is the boundary of the organization’s decision intelligence. Whoever draws that boundary controls what the organization can see — not through manipulation, but through the structural fact that unasked questions produce no answers.

The standard workflow gives that authorship to statisticians. It does so not through malice but through the default order of operations: build, then validate. The Knowledge Acquisition Tool inverts that order: extract, then formalize. The statistician’s role shifts from author to editor. The domain expert moves from reviewer to originator.

The bridge between expert knowledge and formal causal structure has needed building for two decades. The components are here. The assembly is the project.

This is part of the Living Models series on causal intelligence for organizational decision-making. If you’re building a causal modeling project — or evaluating one — and want to talk about where the workflow breaks down in practice, reply or leave a comment. The case studies are being built now. If this is the problem your organization is sitting on, this is the project to watch.

Tags: causal AI, organizational decision-making, knowledge elicitation, causal inference, Living Models

Raghu Ram

Mar 25

This really highlights the 'data-first' trap. We’ve been conditioned to believe that if the dataset is large enough, the 'truth' will eventually emerge through sheer computation. But as you point out, if the causal structure is missing or misidentified, more data just lead to more precisely wrong answers. Starting with the DAG isn't just a 'step' in the process; it’s the only way to ensure the math actually maps to reality. Do you think the current obsession with 'Black Box' ML is making it harder for new practitioners to adopt this 'Backwards' (but correct) workflow?

Hypothetical AI

Discussion about this post

Ready for more?