Automations / Reporting
CSV → Clean Report PDF
A tiny Python assembly line that ingests a CSV, cleans it, and spits out an impressive PDF artifact (plus a cleaned CSV). Great as a “Download report” CTA when you want to prove the pipeline can reason about data—not just render UI.
Summary
Rows processed, rows fixed, warnings, and any rows skipped.
Column stats
Missing percentage, unique counts, and quick win suggestions.
Potential misalignment
Highlights suspect data like out-of-range amounts or mismatched enums.
Sample of fixed rows
Before/after rows so stakeholders can see what changed.
Python core
The whole thing stays in Python so deployment is simple—pandas for data, pydantic-esque validators, and ReportLab for the PDF canvas.
- Parse CSV with pandas, normalize headers, and coerce types.
- Validate rows with a lightweight schema + custom business rules.
- Aggregate stats + “misalignment” findings.
- Render a polished PDF via ReportLab (same template every time).
- Optionally dump a cleaned CSV for downstream tools.
Sample report values
Summary
Rows processed: 1,000
Rows fixed: 78
Warnings: 6
Column stats (Amount)
Missing: 2.4%
Unique values: 356
Out-of-range detections: 4
Potential misalignment
- 3 rows with “completed” status but zero amount
- 12 rows missing ISO-formatted timestamps
- 5 rows with uppercase country codes that don’t match ISO-2
Fixed rows preview
| Order ID | Before | After |
|---|---|---|
| 1043 | amount=“TBD” | amount=0 (flagged) |
| 1109 | country=USA | country=US |
| 1188 | status=done | status=completed |
Website angle
Embed the tool behind a CTA (“Drop a CSV, get a PDF”). Using the sample data keeps onboarding lightweight, and the downloaded PDF doubles as a portfolio artifact that looks enterprise-ready.
- PDF report (ReportLab)
- Cleaned CSV with normalized headers + patched rows
- JSON payload for dashboards (optional)