pm4py + VynFi: Process Mining on Synthetic OCEL Event Logs
Generate OCEL 2.0 event logs with VynFi, load them into pm4py, discover process models, check conformance, and detect bottlenecks — no ERP data extraction needed.
pm4py is the go-to Python library for process mining. It supports process discovery (alpha miner, inductive miner, heuristics miner), conformance checking, social network analysis, and performance visualization. The hard part isn't the library — it's getting clean event logs to feed it.
Extracting event logs from SAP (BKPF, EKKO, EKPO → case/activity/timestamp triples) takes weeks of data engineering. VynFi generates OCEL 2.0-compliant event logs directly — complete with case IDs, activity labels, timestamps, object types, and variant annotations. This tutorial shows the full pipeline: generate data, load into pm4py, discover a process model, and check conformance.
**Update (2026-04-19, DataSynth 3.1.1):** OCEL timestamps are now microsecond-precision (previously nanosecond), so `pandas.to_datetime(..., utc=True)` retains **100%** of events — the prior nanosecond format silently dropped 95% of rows. Process-variant imperfections (rework, skip-step, out-of-order) are now injected at realistic default rates (15% / 10% / 8%), producing Inductive-Miner fitness in the 0.70–0.92 band — much closer to real ERP data than the near-perfect 1.00 fitness the engine used to produce. The regenerated VynFi/vynfi-supply-chain-ocel and VynFi/vynfi-ocel-manufacturing datasets on Hugging Face include these improvements (162 variants, 55% happy-path concentration on a sample retail job).
Generate the Event Log
import osimport vynficlient = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])config = { "sector": "manufacturing", "rows": 5000, "companies": 5, "periods": 6, "processModels": ["p2p", "o2c", "manufacturing"], "exportFormat": "json", "ocpm": {"enabled": True, "computeVariants": True},}job = client.jobs.generate_config(config=config)completed = client.jobs.wait(job.id)archive = client.jobs.download_archive(completed.id)Load into pm4py
import pandas as pdimport pm4py# Load the OCEL event logevents = archive.json("ocel-event-log.json")df = pd.json_normalize(events)df["timestamp"] = pd.to_datetime(df["timestamp"])df = df.sort_values("timestamp").reset_index(drop=True)# Rename to pm4py's expected schemadf = df.rename(columns={ "case_id": "case:concept:name", "activity": "concept:name", "timestamp": "time:timestamp",})print(f"Events: {len(df)}")print(f"Cases: {df['case:concept:name'].nunique()}")print(f"Activities: {df['concept:name'].nunique()}")# Convert to pm4py EventLogevent_log = pm4py.convert_to_event_log(df)Discover a Process Model
# Inductive Miner — guaranteed sound modelnet, initial_marking, final_marking = pm4py.discover_petri_net_inductive(event_log)# Visualizepm4py.view_petri_net(net, initial_marking, final_marking)# Or use a BPMN modelbpmn = pm4py.discover_bpmn_inductive(event_log)pm4py.view_bpmn(bpmn)Conformance Checking
# Token-based replayreplayed = pm4py.conformance_diagnostics_token_based_replay(event_log, net, initial_marking, final_marking)# Fitness scorefitness = pm4py.fitness_token_based_replay(event_log, net, initial_marking, final_marking)print(f"Fitness: {fitness['average_trace_fitness']:.3f}")# Alignment-based (exact, slower)aligned = pm4py.conformance_diagnostics_alignments(event_log, net, initial_marking, final_marking)precision = pm4py.precision_alignments(event_log, net, initial_marking, final_marking)print(f"Precision: {precision:.3f}")Bottleneck Detection
# Performance analysis — median time between activitiesfrom pm4py.algo.filtering.log.timestamp import timestamp_filter# Filter to a specific time windowfiltered = timestamp_filter.filter_traces_intersecting( event_log, "2024-01-01 00:00:00", "2024-06-30 23:59:59",)# Discover directly-follows graph with performancedfg, start, end = pm4py.discover_performance_dfg(filtered)pm4py.view_performance_dfg(dfg, start, end)Pre-Built Variant Analysis
VynFi's analytics API includes a process variant summary with variant count, entropy, and happy-path concentration — computed server-side. Use it to verify data quality before running expensive conformance checks.
analytics = client.jobs.analytics(completed.id)if analytics.process_variant_summary: v = analytics.process_variant_summary print(f"Variants: {v.variant_count}") print(f"Entropy: {v.variant_entropy:.3f}") print(f"Happy-path: {v.happy_path_concentration:.1%}")Other Export Formats
VynFi also generates XES 2.0 (ProM/pm4py native), Celonis IBC (with metadata sidecar), Disco CSV, and Parquet. Request these via exportFormat in your config or find them in the archive alongside the JSON event log.
SDK examples + regenerated datasets
The pm4py_integration.py example runs this whole pipeline against the live API and prints conformance metrics. 05_process_mining_ocel.ipynb is the interactive notebook companion. For sector-specific process DAGs (manufacturing supply-chain, retail O2C, financial-services correspondent banking), see sector_dag_presets.py. The regenerated HF datasets VynFi/vynfi-supply-chain-ocel and VynFi/vynfi-ocel-manufacturing include native OCEL events, objects, and anomaly labels in one parquet bundle — load directly with `datasets.load_dataset("VynFi/vynfi-supply-chain-ocel", "events")`.