GNN-Generated Vendor Networks for AML Detection
VynFi 3.0 uses graph neural networks to generate realistic entity relationship graphs — vendor networks, correspondent banking chains, shell company structures — for AML model training. This post covers the GNN edge predictor architecture and criminal network simulation.
Anti-money-laundering models increasingly operate on graph-structured data: transaction networks where nodes are entities (people, companies, accounts) and edges are financial relationships (payments, ownership, correspondent banking links). The challenge for model training is that real AML networks are classified, sparse, and heavily imbalanced — fewer than 0.1% of entities in a typical banking network are involved in laundering. Generating realistic synthetic networks that preserve the topological properties of real criminal structures (layering depth, fan-out patterns, circular flows) is substantially harder than generating tabular transaction data.
**DataSynth 3.1.1 update:** Edge-type diversity landed — `is_mule_link` and `is_shell_link` edges now populate from coordinated criminal structures (were 0 in 3.1.0), network density climbed from 0.0014 to 0.053 (38× denser), and `Spoofing` joined the typology catalog. Train link-prediction GNNs directly on VynFi/vynfi-aml-100k — the regenerated dataset includes the `relationship_labels` config with the new edge types. See gnn_vendor_networks.ipynb for a worked PyTorch Geometric pipeline.
GNN Edge Predictor Architecture
VynFi 3.0 introduces a GNN-based network generator that operates in two phases. First, it generates entity nodes with attributes (entity type, jurisdiction, incorporation date, risk score) using the existing tabular generation engine. Second, a graph neural network edge predictor determines which pairs of entities should be connected and with what relationship type. The GNN is trained on topological features of real financial networks: degree distribution (scale-free, following a power law), clustering coefficient, community structure, and the specific subgraph motifs that characterize different laundering typologies.
The edge predictor uses a message-passing architecture with 4 GNN layers. Each node aggregates information from its neighborhood to produce an embedding. Edge probabilities are computed from the dot product of source and target node embeddings, conditioned on the desired network topology (e.g., <code>"topology": "trade_based_ml"</code> produces networks with hub-and-spoke structures typical of trade-based money laundering).
Generating a Synthetic AML Network
import vynficlient = vynfi.VynFi()# Generate a synthetic vendor/correspondent networkjob = client.jobs.create( mode="generate", sector="banking", network={ "enabled": True, "entities": 10_000, "topology": "correspondent_banking", "criminal_structures": [ {"type": "layering", "depth": 4, "count": 15}, {"type": "trade_based_ml", "hubs": 8, "spokes_per_hub": 12}, {"type": "shell_company_chain", "chain_length": 6, "count": 10}, ], "edge_features": ["amount_total", "tx_count", "first_tx_date", "last_tx_date"], "export_format": "graphml", }, rows=500_000, # transaction rows flowing through the network)result = client.jobs.wait(job.id)archive = client.jobs.download_archive(result.id)Analyzing Network Structure
import networkx as nximport pandas as pd# Load the generated networkG = nx.read_graphml(archive.file("entity_network.graphml"))print(f"Nodes: {G.number_of_nodes()}")print(f"Edges: {G.number_of_edges()}")print(f"Density: {nx.density(G):.6f}")print(f"Connected components: {nx.number_weakly_connected_components(G)}")# Degree distribution follows power law (scale-free)degrees = [d for _, d in G.degree()]print(f"Mean degree: {sum(degrees)/len(degrees):.2f}")print(f"Max degree: {max(degrees)}")# Criminal structure labelscriminal_nodes = [n for n, d in G.nodes(data=True) if d.get("is_criminal") == "true"]print(f"Criminal entities: {len(criminal_nodes)} ({len(criminal_nodes)/G.number_of_nodes():.2%})")# Subgraph analysis for layering structureslayering_nodes = [n for n, d in G.nodes(data=True) if d.get("criminal_type") == "layering"]layering_sub = G.subgraph(layering_nodes)print(f"Layering subgraph: {layering_sub.number_of_nodes()} nodes, " f"{layering_sub.number_of_edges()} edges")Training a GNN Classifier on the Synthetic Network
import torchfrom torch_geometric.data import Datafrom torch_geometric.nn import GCNConvimport numpy as np# Convert to PyG formattransactions = pd.read_parquet(archive.file("transactions.parquet"))node_features = pd.read_parquet(archive.file("entity_features.parquet"))# Node feature matrixx = torch.tensor(node_features[["risk_score", "tx_volume", "unique_counterparties", "jurisdiction_risk", "age_days"]].values, dtype=torch.float)# Edge index from networkedges = [(int(u), int(v)) for u, v in G.edges()]edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()# Labels: 1 = criminal entity, 0 = legitimatey = torch.tensor([1 if G.nodes[n].get("is_criminal") == "true" else 0 for n in sorted(G.nodes())], dtype=torch.long)data = Data(x=x, edge_index=edge_index, y=y)print(f"PyG Data: {data}")print(f"Positive rate: {y.float().mean():.4f}")The generated network includes ground-truth labels for every criminal entity and the specific typology it belongs to (layering, trade-based ML, shell company chain). Edge-level labels indicate which transactions are part of a laundering flow. This labeled graph data is directly usable for training GNN-based AML classifiers, link prediction models, and community detection algorithms. The topological properties of the criminal substructures — layering depth, fan-out ratio, circular flow diameter — are configurable, so you can generate networks that stress-test specific detection capabilities.