Skip to main content
CLIF Logo

Data Quality Assurance Framework

The CLIF Consortium validates every dataset through a rigorous three-pillar framework — Conformance, Completeness, and Plausibility — catching data issues before they reach your analysis.

7 Conformance Checks
5 Completeness Checks
8 Plausibility Checks

Conformance

Schema & Structure

Conformance checks verify structure and schema — do the right tables exist, are the required columns present, are data types valid, and do categorical values belong to the allowed vocabulary? These are the first line of defense: if the skeleton is wrong, nothing downstream matters.

PASS
patient.parquet12,847 rows
hospitalization.parquet15,203 rows
vitals.parquet1,204,881 rows
labs.parquet892,456 rows
FAIL
patient.parquet12,847 rows
hospitalization.parquet15,203 rows
vitals.parquetMISSING
labs.parquet892,456 rows
PASS — hospitalization
hospitalization_id
patient_id
admission_dttm
discharge_dttm
hospital_id
FAIL — hospitalization
hospitalization_id
patient_id
admission_dttm
discharge_dttm ✗ missing
hospital_id
PASS
ColumnType
weightfloat64
heightfloat64
ageint64
FAIL
ColumnType
weightobject ("abc")
heightfloat64
ageint64
PASS
admission_dttm
datetime64[ns] → 2024-01-15 08:32:00
FAIL
admission_dttm
object (string) → "2024-01-15"
PASS — vital_category
"heart_rate"
"sbp"
"dbp"
"spo2"
FAIL — vital_category
"heart_rate"
"sbp"
"heart_rate_bpm" ✗ not in vocabulary
"spo2"
PASS
acetaminophen → analgesic_antipyretic
norepinephrine → vasopressor
fentanyl → opioid
FAIL
acetaminophen → analgesic_antipyretic
acetaminophen → pain_reliever ✗ dual mapping
fentanyl → opioid
PASS
potassiummmol/L
creatininemg/dL
hemoglobing/dL
FAIL
potassiummEq/L (expected mmol/L)
creatininemg/dL
hemoglobing/dL
clifpy v0.4.9 · Python 3.9+

Run It Yourself

The clifpy package runs every check on this page against your own CLIF data and emits structured results, a PDF report, and a text summary.

all tables · one call
# 1. install
pip install clifpy

# 2. orchestrate
from clifpy.orchestrator import ClifOrchestrator

co = ClifOrchestrator(config='clif_config.yaml')
co.initialize(['labs', 'vitals', 'medications'])
co.validate_all()

# 3. inspect
for err in co.labs.errors:
    print(err.column, err.message)
one table · one pillar
from clifpy.utils.validator import (
    run_conformance_checks,
    run_completeness_checks,
    run_plausibility_checks,
    run_full_dqa,
)

result = run_full_dqa(df, schema, 'labs')
print(result.passed, result.metrics)

# Serialize or report out
result.to_dict()                 # JSON-safe dict
generate_validation_pdf(result)  # PDF report
generate_text_report(result)     # plain-text summary

New in clifpy 0.4.x

Nullable / allow_missing support, data profiling and monthly trends, ICD normalization (0.4.8+), atomic-check counts in reports, dual Polars / DuckDB backends for datasets in the hundreds of GB, and PDF + text report generators.