Blog
News, tutorials, and deep dives on synthetic financial data
Synthetic data can pass every quality test and still break your fraud detector
A third-party benchmark (Sajja, 2026) shows generators with near-perfect statistical fidelity destroy the temporal, velocity, and graph signals fraud detection actually relies on. Why it happens — and why forward generation is the fix.
DataSynth 5.30 → 5.33.2: Eval-Driven Realism, 570× Memory Ceiling Lift, Framework-Aware Classification
Four minor releases (5.30, 5.31, 5.32, 5.33) plus two patches. The theme: stop adding realism features, start proving the ones we have. Sajja exact-eval wiring, ConsolidationOutlierPass, per-process fraud rates, a streaming aggregate that drops 2k-entity peak memory from 218 GB to 0.382 GB (−570×), multi-period closing-to-opening with framework knob, a closed-loop calibration framework, and framework-aware TB classification that silently fixes a 32% A vs L+E+NCI gap latent on German entities. The VynFi API now pins 5.33.2 as the final engine version.
ISO 21378: the audit-data classification standard you didn't know your data already speaks
VynFi datasets carry the ISO 21378 L1/L2/L3 audit-data classification on every GL account + journal entry. Here's why that matters for SAF-T, FEC, GoBD, and Big-4 audit-software imports.
IAS 36 CGU impairment testing: from indicator to disclosure
Cash-generating units, recoverable amounts, sensitivity tables. The full IAS 36 mechanic, mapped to a synthetic dataset.
From entity tree to group opinion: a walkthrough of VynFi's group-audit pipeline
Step-by-step: build a 12-entity multinational tree, run consolidation, review NCI/CTA/IC eliminations, and produce ISA 600 component reports — all from one YAML.
ISA 600 (Revised): what synthetic component-auditor data should actually look like
The 2022 revision of ISA 600 raised the bar for group auditor oversight of components. We unpack what 'meaningful involvement' looks like in synthetic test data — and where most generators fall short.
Testing your consolidation engine? Here's an IFRS 10 / IAS 21 reference dataset.
A purpose-built dataset for QA-ing your group reporting tooling: every flavor of NCI, equity-method investee with goodwill, and a triangulated FX matrix that exposes IAS 21 corner cases.
Multi-Period Continuity: Generating Coherent Full-Year Audit Data
Multi-period chains in DataSynth 5.3/5.5 thread closing trial balances into the next period's opening balances, run FX revaluation at each period boundary, and roll retained earnings correctly. This walkthrough covers the schema, a worked 4-quarter manufacturing example, the on-disk output structure, how to verify carryover, and the pricing model (N × per-period, no premium).
Streaming TB-Scale Synthetic Datasets Without Disk Hell
Customers hit OOM kills and disk-full errors generating terabyte datasets. We rebuilt the output pipeline around Azure Blob, per-file SAS URLs, BYO storage, and rate-controlled NDJSON streaming — so a 1 TB job now ships end-to-end with zero buffering.
Build a Fraud Detector in 30 Minutes with Python
Generate fully labeled fraud data, engineer features, train a RandomForest classifier, and compare it against rule-based audit analytics — all in one notebook session.
Process Mining with Synthetic Manufacturing Data and OCEL 2.0
Before Six Sigma consultants spend months mapping your processes, let process mining show you where the bottlenecks are. Here is how to do it with VynFi's manufacturing event logs.
AML Compliance Testing with 697K Synthetic Banking Transactions
AML-labeled transaction data is the most expensive and legally restricted dataset in financial services. Here is how to build and test a complete compliance program without it.
Synthetic Audit Data for PCAOB and SOC 2 Testing
Auditors need realistic test data to validate tools and train teams, but real client data is off limits. Here is how synthetic data solves the compliance testing problem.
How to Generate SAP-Compatible Test Data with VynFi
SAP implementations need realistic test data but getting it is painful. VynFi generates journal entries, trial balances, and subledgers in SAP-importable formats.
Building Financial AI Models? Here's Your Training Data Pipeline
Synthetic financial data beats anonymized real data for ML training. Benford compliance, balanced entries, ground-truth labels, and unlimited scale via API.
Introducing VynFi: Synthetic Financial Data for Everyone
Today we are launching VynFi, a cloud-native API that generates realistic synthetic financial data at 100K+ rows per second. Here is why we built it and what you can do with it.
Why Synthetic Financial Data Matters for Audit Training
Audit teams train on flat, unrealistic data. Synthetic financial data changes that by providing configurable complexity, labeled anomalies, and unlimited scale.
Getting Started with VynFi in 5 Minutes
A quick walkthrough: sign up, create an API key, generate your first dataset, and inspect the results. All in under 5 minutes.
The Ground Truth Problem in Enterprise Audit Analytics
Why you cannot use production data to build audit knowledge systems. The inverse problem is computationally infeasible, systematic errors propagate undetected, and internal consistency does not imply correctness.
How VynFi Generates Statistically Rigorous Financial Data
Inside the three-layer knowledge model, Benford compliance, copula-based dependencies, and calibration against 155 real-world datasets that power VynFi's generation engine.
130+ Fraud Scenarios: Building Better Fraud Detection Models
How VynFi generates labeled fraud training data with 130+ anomaly subtypes, multi-stage fraud schemes, and ground-truth labels across all five knowledge dimensions.