AMLKYCcompliancebankingPythonfinancial crimeDataSynth 3.1.1

AML Compliance Testing with 697K Synthetic Banking Transactions

AML-labeled transaction data is the most expensive and legally restricted dataset in financial services. Here is how to build and test a complete compliance program without it.

VynFi Team · EngineeringApril 11, 202611 min read

AML-labeled transaction data is the most expensive and legally restricted dataset in financial services. A bank might process hundreds of millions of transactions to generate a few hundred confirmed SAR filings. Those labels are based on forensic investigation, law enforcement interaction, and often years of litigation. Even when labels exist, sharing them — even internally across lines of business — can trigger reporting obligations and legal review.

This is why AML detection systems are so difficult to build and validate. You cannot test a structuring detection algorithm without data that contains known structuring. You cannot tune a risk scorer without ground-truth risk labels. And you cannot train an analyst team without case examples that contain real suspicious patterns.

VynFi's financial_services sector generates comprehensive banking datasets with full AML ground-truth labels at the transaction, customer, account, and relationship levels. This tutorial walks through the complete AML compliance testing workflow from the VynFi AML notebook: customer KYC analysis, transaction monitoring, structuring detection, network analysis, SAR narrative review, composite risk scoring, and regulatory reporting metrics.

**Update (2026-04-19, DataSynth 3.1.1):** AML typology coverage now reaches **0.857** on the evaluator (was 0.000 in 3.1.0 — zero of the seven canonical typologies passed the coverage check). Relationship-network density is up 38× (0.0014 → 0.053) with mule_link and shell_link edges now populated from coordinated criminal structures. The regenerated VynFi/vynfi-aml-100k and VynFi/vynfi-sar-narratives datasets on Hugging Face include these improvements.

Generate the Banking Dataset

Python

import os
import vynfi
client = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])
config = {
    "sector": "financial_services",
    "country": "US",
    "accountingFramework": "us_gaap",
    "rows": 1000,
    "companies": 5,
    "periods": 3,
    "periodLength": "monthly",
    "processModels": ["o2c", "p2p", "banking"],
    "exportFormat": "json",
    "fraudPacks": [],
    "fraudRate": 0.0,
}
job = client.jobs.generate_config(config=config)
completed = client.jobs.wait(job.id, poll_interval=2.0, timeout=300.0)
archive = client.jobs.download_archive(completed.id)

Python

import pandas as pd
# Core banking data
customers_df    = pd.DataFrame(archive.json("banking/banking_customers.json"))
accounts_df     = pd.DataFrame(archive.json("banking/banking_accounts.json"))
transactions_df = pd.DataFrame(archive.json("banking/banking_transactions.json"))
# AML labels at every level
aml_txn_labels  = pd.DataFrame(archive.json("banking/aml_transaction_labels.json"))
aml_cust_labels = pd.DataFrame(archive.json("banking/aml_customer_labels.json"))
aml_acct_labels = pd.DataFrame(archive.json("banking/aml_account_labels.json"))
aml_rel_labels  = pd.DataFrame(archive.json("banking/aml_relationship_labels.json"))
# SAR narrative templates
aml_narratives  = archive.json("banking/aml_narratives.json")
print(f"Customers:     {len(customers_df):>8,}")
print(f"Accounts:      {len(accounts_df):>8,}")
print(f"Transactions:  {len(transactions_df):>8,}")
print(f"AML txn labels:{len(aml_txn_labels):>8,}")
print(f"SAR narratives:{len(aml_narratives):>8,}")

Customer KYC Analysis

FinCEN's CDD Final Rule requires financial institutions to identify and verify customer identity, assess risk, and perform ongoing monitoring. The notebook analyzes the customer base across three dimensions: status (active, inactive, suspended), risk tier (low, medium, high), and PEP flag (Politically Exposed Person). It then calculates review aging — how long since each customer's last KYC review — and flags customers overdue based on their risk tier.

Python

# KYC Review Aging
customers_df["last_kyc_review"] = pd.to_datetime(customers_df["last_kyc_review"], errors="coerce")
customers_df["days_since_review"] = (pd.Timestamp.now() - customers_df["last_kyc_review"]).dt.days
# Review frequency by risk level (days)
review_cycle = {"high": 365, "medium": 730, "low": 1095}
customers_df["review_cycle_days"] = customers_df["risk_tier"].map(review_cycle)
customers_df["review_overdue"] = customers_df["days_since_review"] > customers_df["review_cycle_days"]
overdue = customers_df[customers_df["review_overdue"]]
print(f"Customers with overdue KYC reviews: {len(overdue)} / {len(customers_df)} "
      f"({len(overdue)/len(customers_df):.1%})")
print("
Overdue by risk tier:")
colors_risk = {"low": "green", "medium": "orange", "high": "red"}
for risk in ["high", "medium", "low"]:
    count = overdue[overdue["risk_tier"] == risk].shape[0]
    total = (customers_df["risk_tier"] == risk).sum()
    print(f"  {risk:<8s}  {count:>4} / {total:>4}  ({count/max(total,1):.0%})")

Transaction Monitoring

The BSA requires Currency Transaction Reports (CTRs) for cash transactions exceeding $10,000 and Suspicious Activity Reports (SARs) for transactions that exhibit red flags. The notebook sets up three monitoring checks: large transaction detection (CTR threshold), velocity anomaly detection (unusual bursts of activity), and structuring pattern detection.

Python

txn = transactions_df.copy()
txn["amount"] = pd.to_numeric(txn["amount"], errors="coerce")
txn["timestamp"] = pd.to_datetime(txn.get("timestamp_initiated", txn.get("timestamp")), errors="coerce")
CTR_THRESHOLD = 10_000
# Large transaction detection (CTR candidates)
large_txns = txn[txn["amount"] >= CTR_THRESHOLD]
print(f"CTR-eligible transactions (>= ${CTR_THRESHOLD:,}): {len(large_txns):,} "
      f"({len(large_txns)/len(txn):.1%})")
# Velocity analysis: flag customer-days with unusual activity
cust_col = "customer_id" if "customer_id" in txn.columns else "account_id"
daily_counts = (txn.groupby([cust_col, txn["timestamp"].dt.date])
                .size().reset_index(name="daily_txn_count"))
customer_baselines = daily_counts.groupby(cust_col)["daily_txn_count"].agg(["mean", "std"])
customer_baselines.columns = ["mean_daily", "std_daily"]
customer_baselines["std_daily"] = customer_baselines["std_daily"].fillna(0)
daily_counts = daily_counts.merge(customer_baselines, on=cust_col)
daily_counts["threshold"] = (daily_counts["mean_daily"] +
                              3 * daily_counts["std_daily"]).clip(lower=10)
daily_counts["velocity_alert"] = daily_counts["daily_txn_count"] > daily_counts["threshold"]
velocity_alerts = daily_counts[daily_counts["velocity_alert"]]
print(f"Velocity alerts: {len(velocity_alerts):,} customer-days flagged")

Structuring Detection

Structuring (smurfing) is the deliberate breaking up of transactions into smaller amounts to evade reporting thresholds. Under 31 USC section 5324, structuring is a federal crime even if the underlying funds are legitimate. The notebook implements three pattern detectors: just-below-threshold transactions in the $8,000-$9,999 band, same-day aggregation where individual transactions are all below $10K but the daily total exceeds it, and rapid-succession sub-threshold transactions over a rolling three-day window.

Python

import numpy as np
# Pattern 1: Just-below-threshold transactions ($8K-$9,999)
just_below = txn[(txn["amount"] >= 8_000) & (txn["amount"] <= 9_999)]
print(f"Just-below-threshold ($8K-$10K): {len(just_below):,} ({len(just_below)/len(txn):.2%})")
# For reference: compare to neighboring amount ranges
below_ref = txn[(txn["amount"] >= 6_000) & (txn["amount"] < 8_000)]
above_ref = txn[(txn["amount"] >= 10_000) & (txn["amount"] < 12_000)]
print(f"  $6K-$7,999:   {len(below_ref):,}")
print(f"  $8K-$9,999:   {len(just_below):,}  <-- structuring band")
print(f"  $10K-$11,999: {len(above_ref):,}")
# Pattern 2: Same-day aggregation (splitting)
daily_agg = (txn.groupby([cust_col, txn["timestamp"].dt.date])
             .agg(total_amount=("amount", "sum"),
                  max_single=("amount", "max"),
                  txn_count=("amount", "count"))
             .reset_index())
structuring_candidates = daily_agg[
    (daily_agg["total_amount"] > CTR_THRESHOLD) &
    (daily_agg["max_single"] < CTR_THRESHOLD) &
    (daily_agg["txn_count"] >= 2)
]
print(f"
Splitting candidates (same-day aggregate > $10K, no single txn > $10K): "
      f"{len(structuring_candidates):,} customer-days")

The structuring band density relative to neighboring ranges is the key signal. A natural distribution of transaction amounts would show roughly equal density in $6K-$8K and $8K-$10K. An elevated density just below $10K compared to equivalent ranges is statistically suspicious and would warrant investigation by a BSA officer.

Composite Risk Scoring

A composite risk score combines multiple signals into a single prioritization metric. The notebook builds a score from four components: customer risk (KYC risk tier, PEP flag, account status), behavioral risk (transaction volume percentile), AML label risk (whether the customer has a suspicious label from the monitoring system), and network risk (membership in a suspicious relationship cluster).

Python

risk_df = customers_df[["customer_id", "risk_tier", "status", "is_pep"]].copy()
# Component 1: Customer risk (0-30 points)
tier_score   = {"low": 5, "medium": 15, "high": 30}
status_score = {"active": 0, "inactive": 10, "suspended": 20}
risk_df["score_tier"]    = risk_df["risk_tier"].map(tier_score).fillna(10)
risk_df["score_status"]  = risk_df["status"].map(status_score).fillna(5)
risk_df["score_pep"]     = risk_df["is_pep"].apply(lambda x: 15 if x else 0)
# Component 2: Behavioral risk (0-25 points, percentile-based)
cust_txn_stats = (txn.groupby(cust_col)["amount"]
                  .agg(["count", "sum", "mean"]).reset_index()
                  .rename(columns={"count": "txn_count", cust_col: "customer_id"}))
risk_df = risk_df.merge(cust_txn_stats, on="customer_id", how="left")
risk_df["txn_count"] = risk_df["txn_count"].fillna(0)
txn_pct = risk_df["txn_count"].rank(pct=True).fillna(0)
risk_df["score_volume"] = pd.cut(
    txn_pct, bins=[-0.01, 0.2, 0.4, 0.6, 0.8, 1.0],
    labels=[0, 3, 6, 12, 25]).astype(float).fillna(0)
# Composite score
score_cols = [c for c in risk_df.columns if c.startswith("score_")]
raw_max = sum([30, 10, 15, 25])  # maximum possible across components
risk_df["risk_score_raw"] = risk_df[score_cols].sum(axis=1)
risk_df["risk_score_normalized"] = (risk_df["risk_score_raw"] / raw_max * 100).clip(0, 100)
print("Top 10 highest risk customers:")
top_risk = risk_df.nlargest(10, "risk_score_normalized")[
    ["customer_id", "risk_tier", "is_pep", "txn_count", "risk_score_normalized"]]
print(top_risk.to_string(index=False))

SAR Narrative Templates

When a financial institution identifies suspicious activity, it must file a SAR with FinCEN within 30 days. The SAR narrative section — describing the who, what, when, where, and why of the suspicious activity — is the most critical and most difficult part to write. VynFi generates SAR narrative templates based on detected AML patterns, providing training data for compliance analysts and NLP models.

Under 31 USC 5318(g)(2), SAR filings are confidential. Sharing the existence of a SAR filing with the subject of the report is a federal crime (SAR tipping off). Synthetic SAR narratives allow analyst training and NLP model development without this legal exposure.

Regulatory Reporting Metrics

The compliance dashboard at the end of the notebook consolidates all analyses into a single-pane view of the institution's AML/KYC health. It covers KYC program health (overdue reviews, PEP count), transaction monitoring results (CTR count, structuring candidates, velocity alerts), AML alert summary (suspicious rate, pattern breakdown), and regulatory filing queue (CTRs and SAR candidates to be filed).

Python

# Compliance Dashboard Summary
print("=" * 70)
print("            AML/KYC COMPLIANCE DASHBOARD")
print("=" * 70)
total_cust   = len(customers_df)
active       = (customers_df["status"] == "active").sum()
overdue_ct   = customers_df["review_overdue"].sum()
pep_ct       = customers_df["is_pep"].sum()
suspicious_txn = (aml_txn_labels["label"].str.lower() == "suspicious").sum()
clean_txn      = (aml_txn_labels["label"].str.lower() == "clean").sum()
suspicious_rate = suspicious_txn / max(len(aml_txn_labels), 1)
ctr_individual = txn[txn["amount"] >= CTR_THRESHOLD]
total_ctrs     = len(ctr_individual) + len(structuring_candidates)
sar_candidates = set()
if "customer_id" in aml_cust_labels.columns:
    suspicious_custs = aml_cust_labels[
        aml_cust_labels["label"].str.lower() == "suspicious"]["customer_id"].tolist()
    sar_candidates.update(suspicious_custs)
sar_candidates.update(structuring_candidates[cust_col].tolist())
print(f"  Total customers:           {total_cust:>8,}")
print(f"  KYC reviews overdue:       {overdue_ct:>8,}  ({overdue_ct/total_cust:.1%})")
print(f"  PEP customers:             {pep_ct:>8,}  ({pep_ct/total_cust:.1%})")
print()
print(f"  Total transactions:        {len(transactions_df):>8,}")
print(f"  Suspicious transaction %:  {suspicious_rate:>8.2%}")
print(f"  CTRs to file:              {total_ctrs:>8,}")
print(f"  SAR candidates:            {len(sar_candidates):>8,}")

Network Analysis with AML Relationship Labels

FATF Recommendation 10 emphasizes relationship-based risk. Money launderers use networks of seemingly unrelated accounts and shell companies to obscure transaction flows. The aml_relationship_labels.json file contains labeled relationships between entities with suspicious flags. Building a NetworkX graph from this data enables cluster detection — identifying groups of entities where at least one suspicious relationship exists.

Shared attribute detection complements graph analysis. Customers sharing addresses, phone numbers, or tax IDs can indicate hidden connections even when no explicit relationship is documented. This is a standard technique in entity resolution work used by financial intelligence units to surface networks that would otherwise evade detection.

Next Steps

The notebook demonstrates the mechanics of each monitoring check. In production, these would run as continuous monitoring jobs against real transaction streams, with alert queues feeding into case management systems. The value of synthetic data here is not just testing the algorithms — it is testing the workflows. You can validate alert routing, case assignment, SAR drafting, and escalation procedures against a realistic but legally safe dataset before your compliance program goes live.

Use separate generation jobs for each compliance testing scenario: clean baseline for false-positive rate measurement, structuring-heavy dataset for structuring detection tuning, and high-fraud-rate dataset for overall system sensitivity testing. The VynFi API seed parameter makes each scenario reproducible across test runs.

The full notebook is available at 07_aml_compliance_testing.ipynb in the VynFi Python SDK repository. It includes network analysis with networkx, SAR narrative templates, OFAC screening simulation, and a compliance dashboard.

For GNN-based link prediction on the new dense AML networks, see gnn_vendor_networks.ipynb. The new sector_dag_presets.py example shows the DataSynth 3.1 `financial_services` causal DAG (correspondent banking, regulatory pressure, KYC score, AML screening strength, liquidity coverage).

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.