Introducing VynFi 3.0: From Generation to Simulation
VynFi 3.0 moves beyond data generation into full scenario simulation. Three new pillars — counterfactual simulation, adversarial ML augmentation, and neural diffusion — transform how teams stress-test models, audit controls, and train AI.
VynFi 1.x shipped a rule-based generation engine capable of producing 100K+ rows per second of statistically rigorous financial data. VynFi 2.x added domain modules — banking, manufacturing, ESG, process mining — and a streaming output pipeline that handles terabyte-scale jobs. Today we are releasing VynFi 3.0, and it represents the largest architectural shift since the project began.
**2026-04-19 update — DataSynth 3.1.1 shipped.** The 3.1 series closes the SDK-team feedback loop from the 3.0 launch: AML network density up 38× (0.0014 → 0.053), OCPM coverage 47% → 100%, pandas timestamp retention 5% → 100%, AML typology coverage 0.000 → 0.857, and fraud behavioural biases now fire on every is_fraud path (weekend ×32, round-dollar ×170, post-close ×3,106 lift measured on DS 3.1.1). The Python SDK 1.5.1 wraps the new `fraud_split` endpoint and ships four new examples: `document_level_fraud.py`, `behavioral_fraud_patterns.py`, `sector_dag_presets.py`, `audit_opinions_kam.py`. See the full SDK changelog.
The core insight behind 3.0 is simple: generating realistic data is necessary but not sufficient. Teams do not just need data — they need scenarios. An audit team validating SOX controls needs a dataset where a specific material weakness has been injected at a known point in time. A fraud detection team needs adversarial examples that probe the exact decision boundary of their production model. A risk team needs to replay the 2008 recession through their current portfolio structure and observe what breaks.
Three Pillars of VynFi 3.0
- <strong>Counterfactual Simulation Engine</strong> — Define a causal DAG over your financial domain, inject a macro shock or control failure, and generate paired datasets (baseline vs. counterfactual) that isolate the causal effect. Built on structural causal models with do-calculus intervention semantics.
- <strong>Adversarial ML Augmentation</strong> — Supply an ONNX model, and VynFi probes its decision boundary to generate targeted synthetic examples in the region where the model is least confident. Purpose-built for fraud detection, credit scoring, and AML classification hardening.
- <strong>Neural Diffusion Generation</strong> — A score-based diffusion model trained on tabular financial distributions. Where the rule-based engine excels at structural fidelity (balanced entries, document flows, Benford compliance), the diffusion model captures higher-order distributional patterns that rules alone cannot express. A hybrid mode combines both.
API Surface
All three capabilities are exposed through the existing job submission API. The <code>mode</code> field in the generation config now accepts <code>"simulate"</code>, <code>"adversarial"</code>, and <code>"diffusion"</code> in addition to the existing <code>"generate"</code> default. Each mode adds its own configuration block, but the output format, streaming, and download mechanics remain identical.
import vynficlient = vynfi.VynFi()# Counterfactual simulationjob = client.jobs.create( mode="simulate", scenario="recession_2008_replay", baseline={"sector": "banking", "rows": 50_000, "periods": 12}, intervention={"macro.gdp_growth": -0.04, "macro.unemployment": 0.10},)# Adversarial augmentationjob = client.jobs.create( mode="adversarial", model_uri="s3://models/fraud_classifier_v7.onnx", target_class="fraud", n_samples=10_000, boundary_sigma=0.05,)# Neural diffusionjob = client.jobs.create( mode="diffusion", sector="financial_statements", rows=100_000, hybrid=True, # combine with rule-based for structural constraints guidance_scale=2.5,)Backward Compatibility
Every existing generation config continues to work without modification. The default mode remains <code>"generate"</code>, and all v2.x SDK code is forward-compatible. The new modes are additive — they do not change the behavior of any existing endpoint or parameter. SDK versions 1.4+ include full type hints and autocompletion for the new configuration blocks.
What Comes Next
This post is the first in a series of deep dives. Over the next three days, we will publish detailed technical walkthroughs of each pillar: counterfactual simulation with causal DAGs, adversarial augmentation for fraud models, neural diffusion for tabular data, stress testing with recession scenarios, SOX compliance simulation, GNN-generated vendor networks, and privacy-preserving synthesis. Each post includes working Python SDK examples you can run against your VynFi API key today.