counterfactualsimulationcausal

Counterfactual Simulation: What Would Happen If...?

VynFi 3.0's counterfactual engine lets you define a causal DAG, inject interventions, and generate paired baseline/counterfactual datasets. This post walks through the structural causal model, do-calculus semantics, and a complete Python example.

VynFi Team · EngineeringApril 16, 20269 min read

Traditional stress testing generates shocked data in isolation: you pick a parameter, shift it, and observe the output. The problem is that financial systems are not collections of independent variables — they are causal graphs. When GDP contracts, unemployment rises, which increases loan defaults, which reduces bank capital ratios, which triggers credit tightening, which further contracts GDP. Shifting one variable without propagating through the causal structure produces scenarios that are internally inconsistent.

VynFi 3.0 introduces a counterfactual simulation engine built on structural causal models (SCMs). You define the causal DAG — the directed acyclic graph of variable dependencies — and the engine handles intervention propagation, noise term consistency, and paired dataset generation.

**DataSynth 3.1 update:** Sector-specific causal DAG presets landed — `scenarios.causal_model.preset = "manufacturing" | "retail" | "financial_services" | "minimal" | "custom"`. Each preset wires sector-appropriate transfer functions (retail: demand → stockouts → DSO → revenue cutoff risk; financial_services: correspondent concentration → KYC → AML screening → NPL → ECL provisions). Custom DAGs are fully loaded now (nodes + edges). See sector_dag_presets.py for worked examples across all three sectors.

How Causal DAGs Work in VynFi

A causal DAG in VynFi is a set of nodes (observable financial variables) and directed edges (causal relationships). Each node has a structural equation that determines its value as a function of its parents plus an exogenous noise term. VynFi ships with pre-built DAGs for common financial domains — banking credit risk, insurance claims, supply chain disruption, and macroeconomic transmission — but you can also define custom DAGs via the SDK.

The key operation is the <em>do-operator</em>: <code>do(X = x)</code> sets variable X to value x by severing all incoming edges to X in the DAG. This distinguishes intervention (we force GDP to -4%) from observation (we see GDP at -4% and want to infer causes). The engine then forward-propagates through all downstream nodes using their structural equations, keeping the noise terms identical to the baseline run. This produces paired datasets where the only difference is the causal effect of the intervention.

Defining a Custom DAG

Python

import vynfi
client = vynfi.VynFi()
# Define a causal DAG for credit risk
dag = client.simulation.create_dag(
    name="credit_risk_transmission",
    nodes=[
        {"id": "gdp_growth", "type": "exogenous", "distribution": "normal", "params": {"mean": 0.02, "std": 0.01}},
        {"id": "unemployment", "type": "endogenous", "parents": ["gdp_growth"],
         "equation": "0.06 - 1.8 * gdp_growth + noise"},
        {"id": "default_rate", "type": "endogenous", "parents": ["unemployment", "gdp_growth"],
         "equation": "0.02 + 0.15 * unemployment - 0.5 * gdp_growth + noise"},
        {"id": "loss_given_default", "type": "endogenous", "parents": ["default_rate"],
         "equation": "0.35 + 0.4 * default_rate + noise"},
        {"id": "capital_ratio", "type": "endogenous", "parents": ["loss_given_default", "default_rate"],
         "equation": "0.12 - 0.8 * loss_given_default - 0.3 * default_rate + noise"},
    ],
)
print(f"DAG created: {dag.id} with {len(dag.nodes)} nodes")

Generating Paired Datasets

Once the DAG is defined, you submit a simulation job that specifies a baseline configuration and one or more interventions. The engine generates two datasets per intervention: the baseline (no intervention) and the counterfactual (with intervention applied). Because noise terms are held constant across runs, you can directly compare row-by-row to isolate the causal effect.

Python

# Run a counterfactual simulation
job = client.jobs.create(
    mode="simulate",
    dag_id=dag.id,
    baseline={"sector": "banking", "rows": 25_000, "periods": 8},
    interventions=[
        {"name": "mild_recession", "do": {"gdp_growth": -0.01}},
        {"name": "severe_recession", "do": {"gdp_growth": -0.04}},
        {"name": "rate_shock", "do": {"gdp_growth": -0.02, "unemployment": 0.12}},
    ],
    paired=True,  # keep noise terms consistent for row-level comparison
)
result = client.jobs.wait(job.id)
archive = client.jobs.download_archive(result.id)
# Load paired datasets
import pandas as pd
baseline = pd.read_parquet(archive.file("baseline.parquet"))
severe = pd.read_parquet(archive.file("severe_recession.parquet"))
# Row-level causal effect
effect = severe["default_rate"] - baseline["default_rate"]
print(f"Mean causal effect on default rate: {effect.mean():.4f}")
print(f"Max causal effect: {effect.max():.4f}")

Pre-Built Scenario Packs

VynFi ships with scenario packs that bundle a DAG, calibrated parameters, and named interventions for common use cases. Available packs include <code>recession_2008_replay</code>, <code>covid_supply_chain</code>, <code>rate_hike_cycle</code>, <code>sovereign_debt_crisis</code>, and <code>cyber_incident_cascade</code>. Each pack has been calibrated against historical data to produce realistic transmission dynamics. You can use a pack as-is or fork it to customize the DAG structure and parameters.

Python

# Use a pre-built scenario pack
packs = client.simulation.list_scenario_packs()
for pack in packs:
    print(f"{pack.id}: {pack.description} ({len(pack.interventions)} scenarios)")
# Run the 2008 recession replay
job = client.jobs.create(
    mode="simulate",
    scenario_pack="recession_2008_replay",
    baseline={"sector": "banking", "rows": 50_000, "periods": 12},
    paired=True,
)

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.