GenAIProduction

Healthcare GenAI Engineer

Healthcare RAG service — BM25/dense hybrid + PII guardrails + custom-proxy eval + CI regression gate.

5 ms
p95 latency
0.65
faithfulness (BM25)
20
golden queries
497
corpus rows

End-to-end healthcare RAG service: FastAPI ER-triage workflow with BM25 + dense + RRF hybrid retrieval over a 497-row enriched corpus, input PII guardrails (SSN, phone, email, CC, MRN, DOB), citation-validated grounded answers, custom-proxy Ragas-style eval over a 20-query golden set, and a CI regression gate that blocks merges on metric drop past tolerance.

One focused ER-triage RAG vertical where every claim cites a retrieved source_id — no hallucinated citations on hits.
PythonFastAPIBM25Sentence TransformersRRFAnthropicOpenAIDockerPydanticpytestGitHub Actions

Hybrid retrieval

BM25 from scratch (Okapi k1=1.5 / b=0.75) + dense MiniLM + RRF fusion (k=60, Cormack & Buettcher). Swap method via query param.

PII + injection guardrails

Input: sanitize · injection regex · token cap. PII masker covers SSN, phone, email, CC, MRN, DOB. Output: citation valid · length · forbidden-action.

Citation-grounded answers

Every claim cites a retrieved source_id. Deterministic template baseline by default; LLM path behind USE_LLM flag (Anthropic or OpenAI), falls back on provider error.

Regression gate in CI

20-query golden set + custom-proxy faithfulness/relevance + baseline.json snapshot. `make gate` exits 1 on metric drop past tolerance — GitHub Actions blocks merges.

01
Guard
POST /v1/ask → sanitize · injection scan · PII mask · token cap before retrieval sees the query.
02
Retrieve
BM25 (default) / dense / hybrid over 497-row enriched corpus → top-k candidates with similarity scores.
03
Generate
Grounded template (zero-LLM, deterministic) or LLM path with enforced inline source_id citations.
04
Validate
Output validator: citation freshness · length · forbidden actions. Regression gate runs in CI on every PR.