Document Flow Traceability: P2P/O2C Three-Way Matching
Every payment should trace back through an invoice, a goods receipt, and a purchase order. Here is how to reconstruct, validate, and audit those document chains with VynFi and Python.
Three-way matching is the foundation of procure-to-pay controls. Before a payment is released, the system verifies that: (1) a purchase order authorized the spend, (2) a goods receipt confirmed delivery, and (3) a vendor invoice matches both. When any link in this chain is missing or mismatched, you have a control gap — and a potential fraud vector.
VynFi generates complete P2P and O2C document chains with realistic reference linking. Each document carries a header with `document_id`, and a `document_references.json` file maps source-to-target relationships across the chain. This tutorial walks through loading those chains, validating three-way matching, and identifying gaps.
**DataSynth 3.1.1 update:** Every document-flow JE header now carries `DocumentRef` (GoodsReceipt / VendorInvoice / Payment / Delivery / CustomerInvoice / Receipt) on `source_document`, so `is_fraud_propagated` correctly populates when a fraudulent PO fans out through GR → invoice → payment. Previously this chain was broken at the reference level. The regenerated VynFi/vynfi-audit-p2p dataset ships the full P2P flow with correct propagation flags.
Load the Document Flows
import osimport vynficlient = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])config = { "sector": "retail", "rows": 1000, "companies": 5, "processModels": ["p2p", "o2c"], "exportFormat": "json",}job = client.jobs.generate_config(config=config)completed = client.jobs.wait(job.id)archive = client.jobs.download_archive(completed.id)# Load each document typepos = archive.json("document_flows/purchase_orders.json")grs = archive.json("document_flows/goods_receipts.json")vis = archive.json("document_flows/vendor_invoices.json")pays = archive.json("document_flows/payments.json")refs = archive.json("document_flows/document_references.json")print(f"POs: {len(pos)}, GRs: {len(grs)}, VIs: {len(vis)}, Payments: {len(pays)}")print(f"References: {len(refs)}")Build the Reference Graph
from collections import defaultdict# Build adjacency: source_id -> [target_ids]forward = defaultdict(set)backward = defaultdict(set)for ref in refs: src = ref.get("source_document_id") or ref.get("from_id") tgt = ref.get("target_document_id") or ref.get("to_id") if src and tgt: forward[str(src)].add(str(tgt)) backward[str(tgt)].add(str(src))# Index documents by IDdoc_index = {}for doc_list, dtype in [(pos, "PO"), (grs, "GR"), (vis, "VI"), (pays, "PAY")]: for doc in doc_list: did = str(doc.get("header", {}).get("document_id", doc.get("id"))) doc_index[did] = {"type": dtype, "doc": doc}print(f"Document index: {len(doc_index)} documents")print(f"Forward links: {sum(len(v) for v in forward.values())}")Validate Three-Way Matching
matched = 0unmatched_pos = []for po in pos: po_id = str(po["header"]["document_id"]) # Find GRs linked to this PO gr_ids = [t for t in forward.get(po_id, set()) if doc_index.get(t, {}).get("type") == "GR"] # Find VIs linked to this PO (directly or via GR) vi_ids = set() for gid in [po_id] + gr_ids: vi_ids.update(t for t in forward.get(gid, set()) if doc_index.get(t, {}).get("type") == "VI") if gr_ids and vi_ids: matched += 1 else: unmatched_pos.append({ "po_id": po_id, "has_gr": bool(gr_ids), "has_vi": bool(vi_ids), })total = len(pos)print(f"Three-way match rate: {matched}/{total} ({matched/total:.1%})")print(f"Gaps: {len(unmatched_pos)} POs missing GR and/or VI")for gap in unmatched_pos[:5]: print(f" PO {gap['po_id']}: GR={'yes' if gap['has_gr'] else 'NO'}, " f"VI={'yes' if gap['has_vi'] else 'NO'}")Fraud Labels on Document Headers (v2.3.1)
With DataSynth 2.3.1, document headers carry `is_fraud` and `fraud_type` directly. You can filter for fraudulent POs, GRs, or payments without joining through `document_references.json`. This makes gap analysis actionable: if an unmatched PO is also flagged `is_fraud: true` with `fraud_type: FictitiousVendor`, that's your test case for the control-weakness finding.
# Filter for fraudulent documents (v2.3.1+)fraud_docs = [ doc for doc in pos + grs + vis + pays if doc.get("header", {}).get("is_fraud", False)]print(f"\nFraudulent documents: {len(fraud_docs)}")for doc in fraud_docs[:3]: h = doc["header"] print(f" {h['document_id']}: {h.get('fraud_type', 'unknown')}")