Onboarding paused — at capacity.Due to exceptionally high demand, new user onboarding is temporarily paused — we're not accepting new sign-ups or sign-ins right now.

The Reference

Ground truth, by construction.

Audit analytics assembled from the data you’re auditing is a jigsaw puzzle solved without the picture. Recovering ground truth from observed enterprise data is combinatorially infeasible.

VynFi generates the reference forward— from a fully specified model where every node’s provenance is known.

Generate a reference View the paper

I · Structural

P2P · 6 STAGES

II · Statistical

MAD · 0.0023 · EXCELLENT

III · Normative

5 / 5 CERTIFIED

GAAP

IFRS

ISA

PCAOB

COSO

155 datasets · 364M entries · 2.4B line items · calibrated against ISO 21378:2019 — Read the paper ↗

REFERENCES GENERATED TODAY · 12,481BENFORD MAD (rolling) · 0.0058155 DATASETS CALIBRATION364M JOURNAL ENTRIES · 2.4B LINE ITEMSP99 LATENCY · 82 msUPTIME · 99.98%

Try it

Watch it generate.

One curl command. Reference data in your terminal before you finish reading this sentence.

Specimen · POST /v1/generate/quickLive

terminal · bash

Generates sample data instantly

Response · application/json1,000 rows · ~840 ms

How it works

From signup to reference in three moves.

No credit card. No sales call. Your first reference knowledge graph generates in under three minutes.

Step 01

Sign up, collect key

Create a free account. Your API key is generated instantly — no credit card, no sales call.

Step 02

Generate a reference

Call the API with your sector, tables, and row count. Receive a fully provenanced reference dataset.

Step 03

Build against ground truth

Use reference data for testing, ML training, and compliance workflows — with a known audit trail for every node.

SDK

Integrate in minutes.

Start with curl or Python. First-class SDKs for TypeScript, Rust, and .NET are a click away.

Bash

curl https://api.vynfi.com/v1/generate/quick \
  -H "Authorization: Bearer vf_live_7mN4kP2x..." \
  -H "Content-Type: application/json" \
  -d '{
    "preset": "retail_small",
    "tables": ["journal_entries"],
    "rows": { "journal_entries": 1000 },
    "format": "json"
  }'

All five SDKs →

Use cases

Built for the people who test the numbers.

Auditors, engineers, and researchers — all running reference data against the same provenanced model.

Audit & assurance

Journal entries with known anomalies, Big 4 methodologies, and ISA 600 group-audit data — calibrated to real-world distributions.

Big 4 · audit firms

Engineering & QA

Build and test financial applications with production-quality synthetic data. SAP/SAF-T exports, zero real-customer exposure.

Fintech · QA

Research & ML

Large-scale labelled datasets for fraud detection, AML networks, and process-mining research — with ground-truth labels.

Research · ML

Explore every use case →

Financial coherence engine

Every number connects.

From raw journal entries to audited financial statements, VynFi generates data that passes your reconciliation, audit, and regulatory tests — all derived from a single declarative model.

Full financial statements

Balance sheet, income statement, cash flow, and equity rollforward — generated from actual journal entry data, not templates.

BS · P&L · CF · Equity

32+ coherence validators

Trial-balance proof, cash-flow reconciliation, equity rollforward, segment-to-consolidated, and IC elimination — checked on every dataset.

32 · validators

Tax from real GL

Tax provision computed from actual pre-tax income, VAT posted from source documents, deferred tax with temporary-difference tracking.

Deferred · VAT · DTA

How the engine works →

Proof

Statistical rigor, measurable.

Validation is the output, not a feature. The engine was calibrated against 155 ISO 21378-compliant general-ledger corpora — and every reference carries provable bounds on its own distributional fidelity.

Benford MAD

< 0

1st digit

Mean absolute deviation for first-digit compliance — 'excellent conformity' by Nigrini's criteria.

F1 Delta

~0%

vs real

GNN fraud detectors trained on reference data land within 3% F1 of real-data baselines.

Real-world datasets

GL corpora

Analyzed for distribution calibration across multiple industry sectors and geographies.

Journal entries

calibrated

In the calibration corpus used to derive realistic financial patterns and temporal dynamics.

Read the whitepaper →

Pricing

Simple, transparent.

Buy credits — no subscription. Start with 5,000 free credits, every feature included. Buy packs as you grow.