announcementv3.0v3.1v3.1.1simulation

Introducing VynFi 3.0: From Generation to Simulation

VynFi 3.0 moves beyond data generation into full scenario simulation. Three new pillars — counterfactual simulation, adversarial ML augmentation, and neural diffusion — transform how teams stress-test models, audit controls, and train AI.

VynFi Team · EngineeringApril 16, 20268 min read

VynFi 1.x shipped a rule-based generation engine capable of producing 100K+ rows per second of statistically rigorous financial data. VynFi 2.x added domain modules — banking, manufacturing, ESG, process mining — and a streaming output pipeline that handles terabyte-scale jobs. Today we are releasing VynFi 3.0, and it represents the largest architectural shift since the project began.

**2026-04-19 update — DataSynth 3.1.1 shipped.** The 3.1 series closes the SDK-team feedback loop from the 3.0 launch: AML network density up 38× (0.0014 → 0.053), OCPM coverage 47% → 100%, pandas timestamp retention 5% → 100%, AML typology coverage 0.000 → 0.857, and fraud behavioural biases now fire on every is_fraud path (weekend ×32, round-dollar ×170, post-close ×3,106 lift measured on DS 3.1.1). The Python SDK 1.5.1 wraps the new `fraud_split` endpoint and ships four new examples: `document_level_fraud.py`, `behavioral_fraud_patterns.py`, `sector_dag_presets.py`, `audit_opinions_kam.py`. See the full SDK changelog.

The core insight behind 3.0 is simple: generating realistic data is necessary but not sufficient. Teams do not just need data — they need scenarios. An audit team validating SOX controls needs a dataset where a specific material weakness has been injected at a known point in time. A fraud detection team needs adversarial examples that probe the exact decision boundary of their production model. A risk team needs to replay the 2008 recession through their current portfolio structure and observe what breaks.

Three Pillars of VynFi 3.0

Counterfactual Simulation Engine — Define a causal DAG over your financial domain, inject a macro shock or control failure, and generate paired datasets (baseline vs. counterfactual) that isolate the causal effect. Built on structural causal models with do-calculus intervention semantics.
Adversarial ML Augmentation — Supply an ONNX model, and VynFi probes its decision boundary to generate targeted synthetic examples in the region where the model is least confident. Purpose-built for fraud detection, credit scoring, and AML classification hardening.
Neural Diffusion Generation — A score-based diffusion model trained on tabular financial distributions. Where the rule-based engine excels at structural fidelity (balanced entries, document flows, Benford compliance), the diffusion model captures higher-order distributional patterns that rules alone cannot express. A hybrid mode combines both.

API Surface

All three capabilities are exposed through the existing job submission API. The <code>mode</code> field in the generation config now accepts <code>"simulate"</code>, <code>"adversarial"</code>, and <code>"diffusion"</code> in addition to the existing <code>"generate"</code> default. Each mode adds its own configuration block, but the output format, streaming, and download mechanics remain identical.

Python

import vynfi
client = vynfi.VynFi()
# Counterfactual simulation
job = client.jobs.create(
    mode="simulate",
    scenario="recession_2008_replay",
    baseline={"sector": "banking", "rows": 50_000, "periods": 12},
    intervention={"macro.gdp_growth": -0.04, "macro.unemployment": 0.10},
)
# Adversarial augmentation
job = client.jobs.create(
    mode="adversarial",
    model_uri="s3://models/fraud_classifier_v7.onnx",
    target_class="fraud",
    n_samples=10_000,
    boundary_sigma=0.05,
)
# Neural diffusion
job = client.jobs.create(
    mode="diffusion",
    sector="financial_statements",
    rows=100_000,
    hybrid=True,   # combine with rule-based for structural constraints
    guidance_scale=2.5,
)

Backward Compatibility

Every existing generation config continues to work without modification. The default mode remains <code>"generate"</code>, and all v2.x SDK code is forward-compatible. The new modes are additive — they do not change the behavior of any existing endpoint or parameter. SDK versions 1.4+ include full type hints and autocompletion for the new configuration blocks.

What Comes Next

This post is the first in a series of deep dives. Over the next three days, we will publish detailed technical walkthroughs of each pillar: counterfactual simulation with causal DAGs, adversarial augmentation for fraud models, neural diffusion for tabular data, stress testing with recession scenarios, SOX compliance simulation, GNN-generated vendor networks, and privacy-preserving synthesis. Each post includes working Python SDK examples you can run against your VynFi API key today.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.