Healthcare data backbone: dbt medallion (bronze→silver→gold) star schema, FastAPI 11 endpoints over 55,500 synthetic encounters, LLM-augmented enrichment via Vertex AI gemini-2.5-flash (497 rows · $0.0005/row · 100% JSON-schema success), patient identity resolver (55K encounters → 40K patients), and a 7-check L1 quality gate that runs in CI and exits 1 on any critical failure.
Trusted L1 layer that catches the dumb-but-pipeline-killing failures BEFORE the GenAI layer hallucinates around bad input.
Tech Stack
Features
dbt medallion star schema
Bronze → silver → gold. fact_patient_encounters + 7 dim_*. Full schema.yml with not_null + unique + relationships (FK) + accepted_values for clinical enums.
7-check L1 quality gate
schema_drift · critical_nulls · duplicate_encounters · temporal_sanity · pii_in_narrative · patient_identity · audit_lineage. Runs in CI on every PR, exits 1 on failure.
Vertex AI enrichment
gemini-2.5-flash + response_schema → 100% JSON parse success on 497 rows. CC/HPI/vitals/labs/ESI ground-truth generated for $0.25 total. Scales to 1M rows ≈ $500.
Patient identity bridge
55K encounters → 40,235 unique patients via SHA256 short hash. Catches the 'same patient, 12 encounters' pattern that breaks cross-patient leak guards in eval.