GNNAMLnetworksgraphDataSynth 3.1.1

GNN-Generated Vendor Networks for AML Detection

VynFi 3.0 uses graph neural networks to generate realistic entity relationship graphs — vendor networks, correspondent banking chains, shell company structures — for AML model training. This post covers the GNN edge predictor architecture and criminal network simulation.

VynFi Team · EngineeringApril 18, 202610 min read

Anti-money-laundering models increasingly operate on graph-structured data: transaction networks where nodes are entities (people, companies, accounts) and edges are financial relationships (payments, ownership, correspondent banking links). The challenge for model training is that real AML networks are classified, sparse, and heavily imbalanced — fewer than 0.1% of entities in a typical banking network are involved in laundering. Generating realistic synthetic networks that preserve the topological properties of real criminal structures (layering depth, fan-out patterns, circular flows) is substantially harder than generating tabular transaction data.

**DataSynth 3.1.1 update:** Edge-type diversity landed — `is_mule_link` and `is_shell_link` edges now populate from coordinated criminal structures (were 0 in 3.1.0), network density climbed from 0.0014 to 0.053 (38× denser), and `Spoofing` joined the typology catalog. Train link-prediction GNNs directly on VynFi/vynfi-aml-100k — the regenerated dataset includes the `relationship_labels` config with the new edge types. See gnn_vendor_networks.ipynb for a worked PyTorch Geometric pipeline.

GNN Edge Predictor Architecture

VynFi 3.0 introduces a GNN-based network generator that operates in two phases. First, it generates entity nodes with attributes (entity type, jurisdiction, incorporation date, risk score) using the existing tabular generation engine. Second, a graph neural network edge predictor determines which pairs of entities should be connected and with what relationship type. The GNN is trained on topological features of real financial networks: degree distribution (scale-free, following a power law), clustering coefficient, community structure, and the specific subgraph motifs that characterize different laundering typologies.

The edge predictor uses a message-passing architecture with 4 GNN layers. Each node aggregates information from its neighborhood to produce an embedding. Edge probabilities are computed from the dot product of source and target node embeddings, conditioned on the desired network topology (e.g., <code>"topology": "trade_based_ml"</code> produces networks with hub-and-spoke structures typical of trade-based money laundering).

Generating a Synthetic AML Network

Python

import vynfi
client = vynfi.VynFi()
# Generate a synthetic vendor/correspondent network
job = client.jobs.create(
    mode="generate",
    sector="banking",
    network={
        "enabled": True,
        "entities": 10_000,
        "topology": "correspondent_banking",
        "criminal_structures": [
            {"type": "layering", "depth": 4, "count": 15},
            {"type": "trade_based_ml", "hubs": 8, "spokes_per_hub": 12},
            {"type": "shell_company_chain", "chain_length": 6, "count": 10},
        ],
        "edge_features": ["amount_total", "tx_count", "first_tx_date", "last_tx_date"],
        "export_format": "graphml",
    },
    rows=500_000,  # transaction rows flowing through the network
)
result = client.jobs.wait(job.id)
archive = client.jobs.download_archive(result.id)

Analyzing Network Structure

Python

import networkx as nx
import pandas as pd
# Load the generated network
G = nx.read_graphml(archive.file("entity_network.graphml"))
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.6f}")
print(f"Connected components: {nx.number_weakly_connected_components(G)}")
# Degree distribution follows power law (scale-free)
degrees = [d for _, d in G.degree()]
print(f"Mean degree: {sum(degrees)/len(degrees):.2f}")
print(f"Max degree: {max(degrees)}")
# Criminal structure labels
criminal_nodes = [n for n, d in G.nodes(data=True) if d.get("is_criminal") == "true"]
print(f"Criminal entities: {len(criminal_nodes)} ({len(criminal_nodes)/G.number_of_nodes():.2%})")
# Subgraph analysis for layering structures
layering_nodes = [n for n, d in G.nodes(data=True) if d.get("criminal_type") == "layering"]
layering_sub = G.subgraph(layering_nodes)
print(f"Layering subgraph: {layering_sub.number_of_nodes()} nodes, "
      f"{layering_sub.number_of_edges()} edges")

Training a GNN Classifier on the Synthetic Network

Python

import torch
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv
import numpy as np
# Convert to PyG format
transactions = pd.read_parquet(archive.file("transactions.parquet"))
node_features = pd.read_parquet(archive.file("entity_features.parquet"))
# Node feature matrix
x = torch.tensor(node_features[["risk_score", "tx_volume", "unique_counterparties",
                                 "jurisdiction_risk", "age_days"]].values, dtype=torch.float)
# Edge index from network
edges = [(int(u), int(v)) for u, v in G.edges()]
edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
# Labels: 1 = criminal entity, 0 = legitimate
y = torch.tensor([1 if G.nodes[n].get("is_criminal") == "true" else 0
                   for n in sorted(G.nodes())], dtype=torch.long)
data = Data(x=x, edge_index=edge_index, y=y)
print(f"PyG Data: {data}")
print(f"Positive rate: {y.float().mean():.4f}")

The generated network includes ground-truth labels for every criminal entity and the specific typology it belongs to (layering, trade-based ML, shell company chain). Edge-level labels indicate which transactions are part of a laundering flow. This labeled graph data is directly usable for training GNN-based AML classifiers, link prediction models, and community detection algorithms. The topological properties of the criminal substructures — layering depth, fan-out ratio, circular flow diameter — are configurable, so you can generate networks that stress-test specific detection capabilities.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.