FRAUD ML
Labeled Synthetic Fraud Data for Model Training
Journal entries and transactions with ground-truth fraud and anomaly labels — multi-class typology, controllable rate, fully synthetic and shareable.
No card required. Credits are the only meter — every feature is open on every account.
Labeled fraud data is the bottleneck for every fraud-detection model. Real fraud is rare, sensitive, unlabeled, and impossible to share. VynFi generates synthetic journal entries and transactions with ground-truth fraud and anomaly labels — so you can train, benchmark, and stress-test models on data you actually own.
The labeled-data problem, solved
You cannot get labeled fraud data out of real companies — and the few public datasets are stale, narrow, and class-imbalanced. Because VynFi's fraud is injected synthetically, the labels are ground truth by construction: you know exactly which entries are fraudulent and why.
- Per-row fraud and anomaly labels you can train and evaluate against
- Multi-class typology: management override, revenue recognition, fictitious expense, journal-entry manipulation, and more
- Control the fraud rate and class balance to match your modeling needs
Behaviorally faithful, so it transfers
Statistical similarity isn't enough — a model trained on superficially-realistic data fails on real data. VynFi's design targets behavioral fidelity (per Sajja et al., 2026, arXiv:2604.13125): the data reproduces process variants, control patterns, and anomaly signatures, not just column distributions. Every run emits Benford analysis and quality reports so you can verify rather than trust.
Document-flow context, not isolated rows
Fraud lives in relationships — an invoice that flows to a payment that flows to a posting. VynFi emits the document-flow graph linking entries, so models can learn structural fraud signals that flat row-level data can't express.
Any format, any scale
Export CSV, JSON, or Parquet at whatever scale your training pipeline needs, from a quick evaluation set to millions of labeled rows. Built on the open-source DataSynth engine (Rust, 100k+ rows/sec).
Frequently asked questions
How are the fraud labels defined?
Fraud is injected synthetically against a multi-class typology (management override, revenue recognition, fictitious expense, journal-entry manipulation, and others as the taxonomy grows). Because it is injected, the per-row labels are ground truth — there is no labeling ambiguity.
Can I control the fraud rate and class balance?
Yes. You can set the overall fraud rate and shape the mix across fraud types, which is useful for handling the class-imbalance problem that plagues real fraud datasets.
Will a model trained on it transfer to real data?
The engine targets behavioral fidelity — reproducing the behaviors (process variants, control patterns, anomaly signatures) that determine transfer, not just column-level statistics. Every run emits Benford and quality reports so you can validate fidelity for your use case.
What formats are available?
CSV, JSON, and Parquet, plus the document-flow graph that links entries. Free tier is 5,000 non-expiring credits, no card; one-time packs from $19.
Related use cases
AUDIT ANALYTICS
Synthetic General Ledgers for Audit Analytics
Benford-conforming ledgers with seeded, labeled anomalies — validate audit routines against known ground truth.
Learn moreERP TEST DATA
Synthetic SAP Test Data — BKPF, BSEG & ACDOCA
BKPF/BSEG/ACDOCA + master data for S/4HANA and ECC testing — balanced, reconciling, GDPR-safe.
Learn moreTry it in 30 seconds — no signup
Generate a sample in the playground, or create a free account for 5,000 credits. Built on the open-source DataSynth engine (Apache 2.0).