auditBenfordanomaly detectionanalyticsPythonjournal entriesDataSynth 3.1.1

Journal Entry Forensics: Benford's Law, Anomaly Detection, and Pre-Built Analytics

Use VynFi's pre-built analytics API to validate Benford's Law conformity, inspect amount distributions, and assess process variant entropy — without computing anything client-side.

VynFi Team · EngineeringApril 13, 202612 min read

Every audit engagement starts with journal entry testing. ISA 240 requires the auditor to test for management override of controls — and the first-digit distribution (Benford's Law) is the canonical screening test. If the leading digits of transaction amounts deviate from the expected logarithmic distribution, something may be off: round-number bias, duplicate entries, or deliberate manipulation.

DataSynth 3.1.1 computes Benford's Law conformity, amount distribution statistics, process variant summaries (with rework / skip / out-of-order rates), and banking evaluation metrics as part of every generation run. The results land as pre-built JSON files in the archive. VynFi's analytics API merges them into a single response — no client-side computation needed.

**Update (2026-04-19):** As of DataSynth 3.1.1, fraud-labeled entries carry behavioural signal that real forensic pipelines can actually detect. Weekend-posting lift jumps from ~1× to ~32×, round-dollar lift from ~0× to ~170×, and post-close lift reaches ~3,106× on fraud-marked JEs. Scheme-propagated fraud (document-seeded rings) is now distinguishable from direct line-level injections via `is_fraud_propagated` on every JE header — see the new fraud-split endpoint below.

Fetching Pre-Built Analytics

Python

import os
import vynfi
client = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])
job = client.jobs.list(status="completed", limit=1).data[0]
analytics = client.jobs.analytics(job.id)
# Benford's Law
b = analytics.benford_analysis
print(f"Benford's Law Analysis ({b.sample_size:,} amounts):")
print(f"  MAD:          {b.mad:.4f}")
print(f"  Chi-squared:  {b.chi_squared:.2f} (p={b.p_value:.4f})")
print(f"  Conformity:   {b.conformity}")
print(f"  Passes:       {b.passes}")

Interpreting the Results

The Mean Absolute Deviation (MAD) measures how far the observed first-digit distribution is from the theoretical Benford distribution. Nigrini's thresholds: MAD < 0.006 = close conformity, 0.006-0.012 = acceptable, 0.012-0.015 = marginal, > 0.015 = nonconformity. VynFi's synthetic data typically falls in the close-conformity range because the underlying generators use log-normal amount distributions calibrated from real financial data.

Amount Distribution Statistics

Python

d = analytics.amount_distribution
print(f"Amount Distribution ({d.sample_size:,} amounts):")
print(f"  Mean:              {d.mean}")
print(f"  Median:            {d.median}")
print(f"  Skewness:          {d.skewness:+.3f}")
print(f"  Kurtosis:          {d.kurtosis:+.3f}")
print(f"  Round number ratio: {d.round_number_ratio:.2%}")
if d.fitted_mu is not None:
    print(f"  Log-normal fit:    mu={d.fitted_mu:.2f}, sigma={d.fitted_sigma:.2f}")

Positive skewness with high kurtosis is the hallmark of real financial data: many small transactions and a long tail of large ones. If your synthetic data has skewness near zero, it's too uniform for realistic audit testing. The round-number ratio flags how many amounts end in 000 — a useful red flag for manual journal entries.

Process Variant Analysis

Python

v = analytics.process_variant_summary
print(f"Process Variants ({v.total_cases:,} cases):")
print(f"  Variant count:     {v.variant_count}")
print(f"  Entropy:           {v.variant_entropy:.3f}")
print(f"  Happy-path share:  {v.happy_path_concentration:.2%}")
print(f"  Top variants:")
for vid, freq in v.top_variants[:5]:
    print(f"    {vid}: {freq:.2%}")

High variant entropy means the process has many execution paths — typical of complex P2P flows with rework, returns, and partial deliveries. Low entropy with high happy-path concentration means the process is well-controlled. Auditors look for the gap: if a process should be controlled but has high entropy, that's a risk indicator.

Banking Evaluation (AML Jobs)

Python

if analytics.banking_evaluation:
    be = analytics.banking_evaluation
    print(f"Banking Evaluation (passes={be.passes}):")
    if be.cross_layer:
        print(f"  Fraud propagation: {be.cross_layer.fraud_propagation_rate:.2%}")
    if be.velocity:
        print(f"  Velocity coverage: {be.velocity.coverage_rate:.2%}")
    if be.false_positive:
        print(f"  FP rate: {be.false_positive.fp_rate:.2%}")

For banking/AML jobs, the analytics response includes 10 sub-analyses covering KYC completeness, typology mix, cross-layer fraud propagation, velocity feature quality, false-positive calibration, device fingerprint distributions, sanctions screening, sophistication diversity, lifecycle phase coverage, and network topology structure. Each sub-analysis reports whether the dataset passes its quality gate — so you know before training whether the data has the properties you need. AML typology coverage reaches **0.857** in DataSynth 3.1.1, comfortably above the 0.80 evaluator threshold (was 0.000 in 3.1.0).

Scheme vs line-level fraud split (DS 3.1.1)

Document-level fraud fans out to every derived journal entry when `fraud.documentFraudRate` is set and `propagate_to_lines` is on. The resulting JEs carry `is_fraud_propagated = true` and `fraud_source_document_id`. This lets you train two detector classes on the same dataset: a cross-document scheme detector on the propagated population and a noise-robust slip-level detector on the direct-injection population. The new endpoint aggregates the split server-side.

Python

# VynFi Python SDK 1.5.1+
split = client.jobs.fraud_split(job.id)
print(f"Total fraud JEs:      {split.fraud_entries:,}")
print(f"Scheme-propagated:    {split.scheme_propagated:,} ({split.propagation_rate:.1%})")
print(f"Direct injection:     {split.direct_injection:,}")
for fraud_type, counts in split.by_fraud_type.items():
    print(f"  {fraud_type:30} total={counts.total:5}  scheme={counts.scheme_propagated:4}  direct={counts.direct_injection:4}")

Worked examples and the regenerated dataset

The VynFi Python SDK 1.5.1 ships three worked examples that exercise this post end-to-end: `examples/ml_training_pipeline.py` (fraud-split stratification), `examples/behavioral_fraud_patterns.py` (weekend / round-dollar / post-close lift verification), and `examples/02_audit_data_deep_dive.ipynb` (interactive Benford + variant notebook). Regenerated journal-entry datasets are published on Hugging Face: VynFi/vynfi-journal-entries-1m (2.1M lines, manufacturing, 12 periods) and VynFi/vynfi-audit-p2p (document-flow fraud with `is_fraud_propagated`).

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.