SDKdeveloper experiencepandasv2.3.1Python

Cleaner SDK Output: Native Decimals, Flat Layout, and DataSynth 2.3.1

DataSynth 2.3.1 fixes three output bugs and delivers two ergonomic wins: native JSON numbers (no more pd.to_numeric boilerplate) and flat document layout (no more manual header flattening). Here is what changed and how to use it.

VynFi Team · EngineeringApril 13, 20267 min read

Every SDK tutorial we've written includes the same boilerplate: `pd.to_numeric(df['debit_amount'], errors='coerce')` to convert string decimals to floats, and a 10-line flattening loop to merge nested `{header, lines}` journal entries into a flat DataFrame. DataSynth 2.3.1 eliminates both.

**DataSynth 3.1.1 update:** camelCase config aliases now work — `exportLayout`, `fraudRate`, `documentFraudRate`, `propagateToLines`, `propagateToDocument` are all accepted alongside snake_case. Prior to 3.1.1, camelCase versions silently fell back to defaults, which made `exportLayout: "flat"` look like a hang (generation completed but the output stayed nested). **Still upstream:** the flat JSON layout writer itself has a separate hang at 0% progress on some configs — use nested until the next patch. `numericMode: native` works cleanly in 3.1.1. For automatic coercion on the SDK side, `JobArchive.dataframes()` in SDK 1.5.0+ handles both native and legacy string decimals.

Native JSON Numbers

Set `output.numericMode: native` in your generation config. Decimal fields serialize as JSON numbers (`1729237.30`) instead of strings (`"1729237.30"`). Pandas reads them directly as float64 — no conversion step, no silent NaN from malformed strings.

Python

import vynfi, pandas as pd
client = vynfi.VynFi(api_key=os.environ["VYNFI_API_KEY"])
config = {
    "sector": "retail",
    "rows": 500,
    "companies": 3,
    "exportFormat": "json",
    "output": {
        "numericMode": "native",   # JSON numbers, not strings
        "exportLayout": "flat",    # one row per line item
    },
}
job = client.jobs.generate_config(config=config)
completed = client.jobs.wait(job.id)
archive = client.jobs.download_archive(completed.id)
entries = archive.json("journal_entries.json")
df = pd.DataFrame(entries)
# Amounts are already numeric — no conversion needed
total = df["debit_amount"].sum()
print(f"Total debits: {total:,.2f}")
print(f"dtype: {df['debit_amount'].dtype}")  # float64

Flat Document Layout

Set `output.exportLayout: flat`. Journal entries and document flows serialize with header fields merged onto every line item. Instead of `{"header": {...}, "lines": [...]}`, you get `[{...header_fields, ...line_fields}, ...]`. No flattening loop needed.

Python

# v2.3.1 flat layout: header fields are on every row
first = entries[0]
print("Sample flat record:")
for k, v in list(first.items())[:10]:
    print(f"  {k}: {v!r} ({type(v).__name__})")
# Compare with legacy nested layout:
# entries_nested = [
#     {**entry["header"], **line}
#     for entry in raw_entries
#     for line in entry["lines"]
# ]
# v2.3.1: just pd.DataFrame(entries) — done.

What 2.3.1 Fixed

DataSynth 2.3.0 accepted both config options but silently ignored them due to three bugs:

numeric_mode: native — the thread-local serialization flag was cleared before JSON files were written (scope bug in the CLI's orchestrator guard). Fixed: flag is now set just before writing and cleared after.
export_layout: flat — only journal entries and core document flows honored it; banking, subledger, manufacturing, HR, audit, tax, ESG, and treasury sinks used the nested writer unconditionally. Fixed: a new thread-local flag in write_json_safe() routes all sinks through the flat writer when active.
Fraud propagation to document headers — JE reference keys didn't match document header identity keys, so DocumentHeader::propagate_fraud() missed every lookup. Fixed: both prefixed and bare forms are registered in the fraud map.

All three are transparent — your existing configs work; the output just does what it always should have. No SDK version bump needed; the change is entirely server-side.

The Before/After

Python

# BEFORE (v2.3.0 and earlier):
#   10+ lines of flatten + pd.to_numeric boilerplate
#   Silent string columns if you forget the conversion
# AFTER (v2.3.1):
entries = archive.json("journal_entries.json")
df = pd.DataFrame(entries)
# Done. Amounts are float64, layout is flat.

Ready to try VynFi?

Start generating synthetic financial data with 10,000 free credits. No credit card required.