Cleaner SDK Output: Native Decimals, Flat Layout, and DataSynth 2.3.1
DataSynth 2.3.1 fixes three output bugs and delivers two ergonomic wins: native JSON numbers (no more pd.to_numeric boilerplate) and flat document layout (no more manual header flattening). Here is what changed and how to use it.
Every SDK tutorial we've written includes the same boilerplate: `pd.to_numeric(df['debit_amount'], errors='coerce')` to convert string decimals to floats, and a 10-line flattening loop to merge nested `{header, lines}` journal entries into a flat DataFrame. DataSynth 2.3.1 eliminates both.
**DataSynth 3.1.1 update:** camelCase config aliases now work — `exportLayout`, `fraudRate`, `documentFraudRate`, `propagateToLines`, `propagateToDocument` are all accepted alongside snake_case. Prior to 3.1.1, camelCase versions silently fell back to defaults, which made `exportLayout: "flat"` look like a hang (generation completed but the output stayed nested). **Still upstream:** the flat JSON layout writer itself has a separate hang at 0% progress on some configs — use nested until the next patch. `numericMode: native` works cleanly in 3.1.1. For automatic coercion on the SDK side, `JobArchive.dataframes()` in SDK 1.5.0+ handles both native and legacy string decimals.
Native JSON Numbers
Set `output.numericMode: native` in your generation config. Decimal fields serialize as JSON numbers (`1729237.30`) instead of strings (`"1729237.30"`). Pandas reads them directly as float64 — no conversion step, no silent NaN from malformed strings.
import vynfi, pandas as pdclient = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])config = { "sector": "retail", "rows": 500, "companies": 3, "exportFormat": "json", "output": { "numericMode": "native", # JSON numbers, not strings "exportLayout": "flat", # one row per line item },}job = client.jobs.generate_config(config=config)completed = client.jobs.wait(job.id)archive = client.jobs.download_archive(completed.id)entries = archive.json("journal_entries.json")df = pd.DataFrame(entries)# Amounts are already numeric — no conversion neededtotal = df["debit_amount"].sum()print(f"Total debits: {total:,.2f}")print(f"dtype: {df['debit_amount'].dtype}") # float64Flat Document Layout
Set `output.exportLayout: flat`. Journal entries and document flows serialize with header fields merged onto every line item. Instead of `{"header": {...}, "lines": [...]}`, you get `[{...header_fields, ...line_fields}, ...]`. No flattening loop needed.
# v2.3.1 flat layout: header fields are on every rowfirst = entries[0]print("Sample flat record:")for k, v in list(first.items())[:10]: print(f" {k}: {v!r} ({type(v).__name__})")# Compare with legacy nested layout:# entries_nested = [# {**entry["header"], **line}# for entry in raw_entries# for line in entry["lines"]# ]# v2.3.1: just pd.DataFrame(entries) — done.What 2.3.1 Fixed
DataSynth 2.3.0 accepted both config options but silently ignored them due to three bugs:
- numeric_mode: native — the thread-local serialization flag was cleared before JSON files were written (scope bug in the CLI's orchestrator guard). Fixed: flag is now set just before writing and cleared after.
- export_layout: flat — only journal entries and core document flows honored it; banking, subledger, manufacturing, HR, audit, tax, ESG, and treasury sinks used the nested writer unconditionally. Fixed: a new thread-local flag in write_json_safe() routes all sinks through the flat writer when active.
- Fraud propagation to document headers — JE reference keys didn't match document header identity keys, so DocumentHeader::propagate_fraud() missed every lookup. Fixed: both prefixed and bare forms are registered in the fraud map.
All three are transparent — your existing configs work; the output just does what it always should have. No SDK version bump needed; the change is entirely server-side.
The Before/After
# BEFORE (v2.3.0 and earlier):# 10+ lines of flatten + pd.to_numeric boilerplate# Silent string columns if you forget the conversion# AFTER (v2.3.1):entries = archive.json("journal_entries.json")df = pd.DataFrame(entries)# Done. Amounts are float64, layout is flat.