Counterfactual Simulation: What Would Happen If...?
VynFi 3.0's counterfactual engine lets you define a causal DAG, inject interventions, and generate paired baseline/counterfactual datasets. This post walks through the structural causal model, do-calculus semantics, and a complete Python example.
Traditional stress testing generates shocked data in isolation: you pick a parameter, shift it, and observe the output. The problem is that financial systems are not collections of independent variables — they are causal graphs. When GDP contracts, unemployment rises, which increases loan defaults, which reduces bank capital ratios, which triggers credit tightening, which further contracts GDP. Shifting one variable without propagating through the causal structure produces scenarios that are internally inconsistent.
VynFi 3.0 introduces a counterfactual simulation engine built on structural causal models (SCMs). You define the causal DAG — the directed acyclic graph of variable dependencies — and the engine handles intervention propagation, noise term consistency, and paired dataset generation.
**DataSynth 3.1 update:** Sector-specific causal DAG presets landed — `scenarios.causal_model.preset = "manufacturing" | "retail" | "financial_services" | "minimal" | "custom"`. Each preset wires sector-appropriate transfer functions (retail: demand → stockouts → DSO → revenue cutoff risk; financial_services: correspondent concentration → KYC → AML screening → NPL → ECL provisions). Custom DAGs are fully loaded now (nodes + edges). See sector_dag_presets.py for worked examples across all three sectors.
How Causal DAGs Work in VynFi
A causal DAG in VynFi is a set of nodes (observable financial variables) and directed edges (causal relationships). Each node has a structural equation that determines its value as a function of its parents plus an exogenous noise term. VynFi ships with pre-built DAGs for common financial domains — banking credit risk, insurance claims, supply chain disruption, and macroeconomic transmission — but you can also define custom DAGs via the SDK.
The key operation is the <em>do-operator</em>: <code>do(X = x)</code> sets variable X to value x by severing all incoming edges to X in the DAG. This distinguishes intervention (we force GDP to -4%) from observation (we see GDP at -4% and want to infer causes). The engine then forward-propagates through all downstream nodes using their structural equations, keeping the noise terms identical to the baseline run. This produces paired datasets where the only difference is the causal effect of the intervention.
Defining a Custom DAG
import vynficlient = vynfi.VynFi()# Define a causal DAG for credit riskdag = client.simulation.create_dag( name="credit_risk_transmission", nodes=[ {"id": "gdp_growth", "type": "exogenous", "distribution": "normal", "params": {"mean": 0.02, "std": 0.01}}, {"id": "unemployment", "type": "endogenous", "parents": ["gdp_growth"], "equation": "0.06 - 1.8 * gdp_growth + noise"}, {"id": "default_rate", "type": "endogenous", "parents": ["unemployment", "gdp_growth"], "equation": "0.02 + 0.15 * unemployment - 0.5 * gdp_growth + noise"}, {"id": "loss_given_default", "type": "endogenous", "parents": ["default_rate"], "equation": "0.35 + 0.4 * default_rate + noise"}, {"id": "capital_ratio", "type": "endogenous", "parents": ["loss_given_default", "default_rate"], "equation": "0.12 - 0.8 * loss_given_default - 0.3 * default_rate + noise"}, ],)print(f"DAG created: {dag.id} with {len(dag.nodes)} nodes")Generating Paired Datasets
Once the DAG is defined, you submit a simulation job that specifies a baseline configuration and one or more interventions. The engine generates two datasets per intervention: the baseline (no intervention) and the counterfactual (with intervention applied). Because noise terms are held constant across runs, you can directly compare row-by-row to isolate the causal effect.
# Run a counterfactual simulationjob = client.jobs.create( mode="simulate", dag_id=dag.id, baseline={"sector": "banking", "rows": 25_000, "periods": 8}, interventions=[ {"name": "mild_recession", "do": {"gdp_growth": -0.01}}, {"name": "severe_recession", "do": {"gdp_growth": -0.04}}, {"name": "rate_shock", "do": {"gdp_growth": -0.02, "unemployment": 0.12}}, ], paired=True, # keep noise terms consistent for row-level comparison)result = client.jobs.wait(job.id)archive = client.jobs.download_archive(result.id)# Load paired datasetsimport pandas as pdbaseline = pd.read_parquet(archive.file("baseline.parquet"))severe = pd.read_parquet(archive.file("severe_recession.parquet"))# Row-level causal effecteffect = severe["default_rate"] - baseline["default_rate"]print(f"Mean causal effect on default rate: {effect.mean():.4f}")print(f"Max causal effect: {effect.max():.4f}")Pre-Built Scenario Packs
VynFi ships with scenario packs that bundle a DAG, calibrated parameters, and named interventions for common use cases. Available packs include <code>recession_2008_replay</code>, <code>covid_supply_chain</code>, <code>rate_hike_cycle</code>, <code>sovereign_debt_crisis</code>, and <code>cyber_incident_cascade</code>. Each pack has been calibrated against historical data to produce realistic transmission dynamics. You can use a pack as-is or fork it to customize the DAG structure and parameters.
# Use a pre-built scenario packpacks = client.simulation.list_scenario_packs()for pack in packs: print(f"{pack.id}: {pack.description} ({len(pack.interventions)} scenarios)")# Run the 2008 recession replayjob = client.jobs.create( mode="simulate", scenario_pack="recession_2008_replay", baseline={"sector": "banking", "rows": 50_000, "periods": 12}, paired=True,)