Early accessSome features may be unavailable
The Reference

Ground truth, by construction.

Audit analytics assembled from the data you’re auditing is a jigsaw puzzle solved without the picture. Recovering ground truth from observed enterprise data is combinatorially infeasible.

VynFi generates the reference forward— from a fully specified model where every node’s provenance is known.

I · Structural
P2P · 6 STAGES
REQUISITION1,243PO1,243GR1,198INVOICE1,240PAYMENT1,240JE2,480

II · Statistical
MAD · 0.006 · EXCELLENT
123456789

III · Normative
5 / 5 CERTIFIED
GAAP
IFRS
ISA
PCAOB
COSO
155 datasets · 364M entries · 2.4B line items · calibrated against ISO 21378:2019 — arXiv ↗
REFERENCES GENERATED TODAY · 12,481BENFORD MAD (rolling) · 0.0058155 DATASETS CALIBRATION364M JOURNAL ENTRIES · 2.4B LINE ITEMSP99 LATENCY · 82 msUPTIME · 99.98%
REFERENCES GENERATED TODAY · 12,481BENFORD MAD (rolling) · 0.0058155 DATASETS CALIBRATION364M JOURNAL ENTRIES · 2.4B LINE ITEMSP99 LATENCY · 82 msUPTIME · 99.98%
Try it

Watch it generate.

One curl command. Reference data in your terminal before you finish reading this sentence.

Specimen · POST /v1/generate/quickLive
terminal · bash
$ 
Generates sample data instantly
Response · application/json1,000 rows · ~840 ms
How it works

From signup to reference in three moves.

No credit card. No sales call. Your first reference knowledge graph generates in under three minutes.

Step 01

Sign up, collect key

Create a free account. Your API key is generated instantly — no credit card, no sales call.

Step 02

Generate a reference

Call the API with your sector, tables, and row count. Receive a fully provenanced reference knowledge graph.

Step 03

Build against ground truth

Use reference data for testing, ML training, and compliance workflows — with known audit trail for every node.

SDK · five languages

Integrate in minutes.

First-class SDKs for Python, TypeScript, Rust, and .NET. Or just use curl.

Bash
curl https://api.vynfi.com/v1/generate/quick \
-H "Authorization: Bearer vf_live_7mN4kP2x..." \
-H "Content-Type: application/json" \
-d '{
"preset": "retail_small",
"tables": ["journal_entries"],
"rows": { "journal_entries": 1000 },
"format": "json"
}'
Use cases

Built for every discipline.

Audit firms, fintech engineers, academic researchers, compliance teams, and ESG reporters — all running reference data against the same model.

Audit testing

Journal entries with known anomalies for audit analytics testing. Calibrated to real-world distributions.

Big 4 · audit firms

Fintech development

Build and test financial applications with production-quality synthetic data. Zero real customer exposure.

Engineering · QA

Academic research

Large-scale labelled datasets for fraud detection, process mining, and financial ML research.

Research · ML

Compliance validation

Test SOX, Basel III, and IFRS workflows with COSO control mappings and evaluation reports.

GRC · regulators

ESG & sustainability

CSRD/TCFD reporting pipelines with Scope 1/2 emissions, workforce diversity, and pay equity analysis.

Sustainability
Enterprise audit

Big-4 methodologies, codified.

Four integrated audit methodologies, group-audit simulation per ISA 600, and complete end-to-end audit data generation.

Big 4 methodologies

4 integrated blueprints with 728–757 steps each — KPMG Clara, PwC Aura, Deloitte Omnia, EY GAM.

4 · blueprints

Group audit · ISA 600

Component-auditor simulation with Significant, Non-Significant, Not-in-Scope classification and consolidated reporting.

ISA 600

14 audit data types

From journal entries to board minutes — IT reports, management packs, regulatory filings, and more.

14 · artefacts

Process mining exports

Disco, Celonis IBC, XES 2.0, and OCEL 2.0 format support for process-mining research and tooling.

OCEL 2.0 · XES 2.0
New in v2.3 · Banking & AML

Money-laundering data, done right.

14 fully-implemented typologies, multi-party criminal networks, cross-layer fraud propagation from payments to bank transactions, and 10 evaluators to prove it.

14 AML typologies

Structuring, smurfing, mule chains, synthetic identity, trade-based ML, crypto integration, sanctions evasion, romance scam, casino & real-estate integration — with ground-truth labels.

Ground-truth labels

Multi-party networks

Barabási-Albert preferential-attachment topology — one coordinator + 5–25 smurfs, mule chains with recruiter / middleman / cash-out roles, shell-company pyramids.

Power-law topology

Velocity & device features

Rolling-window counts (1h/24h/7d/30d), unique counterparties, amount z-scores, and realistic power-law device fingerprint distributions — pre-computed.

Feature-ready

Cross-layer coherence

A fraudulent vendor payment surfaces in document flow, journal entries, AND on both sides of a mirrored bank-transaction pair — ≥95% fraud-label propagation.

≥95% propagation
Scale tier · TB-scale

TB-scale without the disk hell.

Data streams direct to Azure Blob with short-lived SAS downloads — or bring your own storage and keep zero bytes on VynFi. For live pipelines, NDJSON streaming at up to 10,000/sec.

Managed Azure Blob

Lifecycle retention (7d Free → 365d Scale). Per-file SAS URLs — direct blob access, no API proxy, no 2 GB cap, no OOM kills.

All tiers

BYO storage

Supply a container SAS URL and the worker uploads directly to your data lake. Zero bytes transit our storage. Pair with Private Link for airgapped flows.

Team+ · Enterprise

NDJSON live streaming

GET /v1/jobs/{id}/stream/ndjson emits self-describing envelopes with token-bucket rate-limiting. Point Kafka, Spark, ClickHouse at it and ingest live.

Scale+
Enterprise integrations · new in DS 4.4.3

SAP and SAF-T, out of the box.

Two new enterprise-grade output formats. Reference data that drops directly into S/4HANA IMPORT or OECD tax-software validators — no manual CSV pre-processing, no hand-written XML.

SAP · 27 tables

BKPF / BSEG / ACDOCA plus five master-data tables (LFA1 / KNA1 / MARA / CSKS / CEPC). HANA dialect for S/4HANA IMPORT, classic for legacy ECC. Configurable client / ledger / source-system tags. FK integrity across the full P2P cycle.

Scale+ · 1.25× credit multiplier

SAF-T · 5 OECD jurisdictions

Structurally-valid SAF-T XML for Portugal (1.04_01), Poland (JPK_KR 1.0), Romania (D406 3.0), Norway (Fin 1.10), Luxembourg (FAIA 2.01). Audit-PoC ready for tax-software validation and compliance simulation.

Scale+ · 1.25× credit multiplier
Financial coherence engine

Every number connects.

From raw journal entries to audited financial statements, VynFi generates data that passes your reconciliation, audit, and regulatory tests — all derived from a single declarative model.

Full financial statements

Complete balance sheet, income statement, cash flow, and equity rollforward — generated from actual journal entry data, not templates.

BS · P&L · CF · Equity

Manufacturing cost flow

Multi-stage WIP → Finished Goods → COGS pipeline with standard cost variance accounting and IAS 37 warranty provisions.

IAS 37 · Std cost

Treasury & hedge accounting

Debt interest accrual, cash-flow and fair-value hedge mark-to-market, cash-pool sweeps, and covenant compliance evaluation.

ASC 815 · IFRS 9

Tax from real GL

Tax provision computed from actual pre-tax income. VAT posting from source documents. Deferred tax with temporary-difference tracking.

Deferred · VAT · DTA

XBRL 2.1 export

Instance documents mapped to US GAAP and IFRS taxonomies. Test your regulatory filing pipeline with reference data.

US GAAP · IFRS

32+ coherence validators

FG rollforward, WIP rollforward, trial-balance proof, cash-flow reconciliation, equity rollforward, segment-to-consolidated, IC elimination.

32 · validators
09½ · Compliance

Audit-grade by default.

GDPR-ready, EU AI Act aligned, AES-256 at rest and in transit, zero real-client-data ingestion, all 4 Big 4 firm methodologies covered.

GDPR-Ready
EU AI Act
AES-256
Zero Real Data
Big 4 Coverage
Quality by design

Statistical rigor, measurable.

Validation is not a feature — it is the output. Every reference carries provable bounds on its own distributional fidelity.

Benford MAD
< 0
1st digit

Mean absolute deviation for first-digit compliance. Rated 'excellent conformity' by Nigrini's criteria.

F1 Delta
~0%
vs real

GNN fraud detectors trained on reference data within 3% F1 of real-data baselines.

Copula families
0
implemented

Gaussian, Clayton, Gumbel, Frank, and Student-t copulas model complex inter-variable dependencies.

Anomaly types
0
subtypes

Across 5 categories — timing, amount, relationship, pattern, and structural — with ground-truth labels.

The paper

Built on real-world research.

The DataSynth engine was calibrated against 155 ISO 21378:2019–compliant general-ledger datasets, encompassing 364 million journal entries and 2.4 billion line items across industries and geographies.

Real-world datasets
0
GL corpora

Analyzed for distribution calibration and statistical benchmarking across 10 industry sectors.

Journal entries
0M
posted

In the calibration corpus used to derive realistic financial patterns and temporal dynamics.

Line items
0B
observed

Processed to build inter-table correlation models and cross-entity relationship graphs.

Localization

41 country packs, already wired.

Localized tax, banking, naming, holidays, and accounting standards — so regional realism is a config flag, not a backlog item.

Americas
7packs
EMEA
15packs
APAC
14packs
Middle East & Africa
5packs
Pricing

Simple, transparent.

Start free. Scale when you need it. Credits reset every billing cycle — no rollover games.

Free

$0/ mo
10K credits / month
  • 10,000 credits/month
  • 1 concurrent job
  • JSON & CSV output
  • Community support

Developer

$49/ mo
500K credits / month
  • 500,000 credits/month
  • 5 concurrent jobs
  • All output formats
  • Email support

Team

Most Popular
$199/ mo
5M credits / month
  • 5,000,000 credits/month
  • 20 concurrent jobs
  • All output formats
  • Priority support

Scale

$499/ mo
25M credits / month
  • 25,000,000 credits/month
  • Unlimited concurrent jobs
  • All output formats
  • Dedicated support
From the paper · §1
“Recovering ground truth from observed enterprise data is combinatorially infeasible. DataSynth circumvents this by generating data forward — producing reference datasets where the complete audit trail is known by construction.”

Ivertowski · 2026 · arXiv:cs.CE

33
Anomaly types
32+
Coherence validators
41
Country packs
100K+
Rows / second
155
Datasets analyzed

Powered by the DataSynth engine — a purpose-built Rust engine with 16 crates and counting.

Begin

Generate your first reference.

10,000 credits, every month, free. No credit card required. Your first reference knowledge graph generates in under three minutes.

You scrolled all the way down. We respect that.