Boston· --:-- · --°F

Automations / Reporting

CSV → Clean Report PDF

A tiny Python assembly line that ingests a CSV, cleans it, and spits out an impressive PDF artifact (plus a cleaned CSV). Great as a “Download report” CTA when you want to prove the pipeline can reason about data—not just render UI.

Summary

Rows processed, rows fixed, warnings, and any rows skipped.

Column stats

Missing percentage, unique counts, and quick win suggestions.

Potential misalignment

Highlights suspect data like out-of-range amounts or mismatched enums.

Sample of fixed rows

Before/after rows so stakeholders can see what changed.

Python core

The whole thing stays in Python so deployment is simple—pandas for data, pydantic-esque validators, and ReportLab for the PDF canvas.

  1. Parse CSV with pandas, normalize headers, and coerce types.
  2. Validate rows with a lightweight schema + custom business rules.
  3. Aggregate stats + “misalignment” findings.
  4. Render a polished PDF via ReportLab (same template every time).
  5. Optionally dump a cleaned CSV for downstream tools.

Sample report values

Summary

Rows processed: 1,000

Rows fixed: 78

Warnings: 6

Column stats (Amount)

Missing: 2.4%

Unique values: 356

Out-of-range detections: 4

Potential misalignment

  • 3 rows with “completed” status but zero amount
  • 12 rows missing ISO-formatted timestamps
  • 5 rows with uppercase country codes that don’t match ISO-2

Fixed rows preview

Order IDBeforeAfter
1043amount=“TBD”amount=0 (flagged)
1109country=USAcountry=US
1188status=donestatus=completed

Website angle

Embed the tool behind a CTA (“Drop a CSV, get a PDF”). Using the sample data keeps onboarding lightweight, and the downloaded PDF doubles as a portfolio artifact that looks enterprise-ready.

Artifacts produced
  • PDF report (ReportLab)
  • Cleaned CSV with normalized headers + patched rows
  • JSON payload for dashboards (optional)