CLIF ETL Guide
General ETL guidance for transforming EHR data into CLIF format. Implementation details may vary by site depending on your source data structure.
Data Types & Formatting
Identifier Fields
All *_id variables must be character strings:
DateTime Handling
Convert all timestamps to UTC:
Numeric Values
Clean and validate numeric data:
Data Quality & Validation
Essential Quality Checks
- Check for duplicates using composite keys
- Calculate missingness percentages by field
- Validate date ranges and distributions
- Remove rows with missing critical identifiers
Validation Tools
- CLIF Assistant - AI-powered guidance
- CLIFpy package - Python package
Terminology & Mapping
mCIDE Guidelines
Follow mCIDE (minimum Common ICU Data Elements) mapping guidelines for consistent data transformation across all institutions.
View mCIDE Documentation →Sample Mapping
Example of mapping site-specific discharge names to standardized discharge_category values for the hospitalization table.
| discharge_name (site) | discharge_category (CLIF) |
|---|---|
| Acute Rehab Facility | Acute Inpatient Rehab Facility |
| Rehab Facility - Inpatient | Acute Inpatient Rehab Facility |
| Short Term Hospital | Acute Care Hospital |
| Mental Health Jud Commit Anoka | Psychiatric Hospital |
| IRTS | Home |
Table Specific Guidelines
Procedural areas & OR data — important scope note
- • Procedural areas and operating rooms should be mapped to
Proceduralin theadttable'slocation_category. - • Pre/Intra/Post-procedural and OR EHR data (such as anesthesia flowsheet records from Vitals, Scores, Respiratory Support) are not currently represented in CLIF.
Microbiology — result_dttm row-keeping convention (CLIF 3.0)
For microbiology_culture and microbiology_nonculture, EHRs frequently emit multiple interim result_dttm rows during a single workup (e.g., "no growth at 24h / 48h / 72h", "no bacteria seen at 6h / 12h"). CLIF 3.0 collapses those interim rows on negative results:
- • Positives — for positive cultures, positive gram stains, and positive non-culture results, keep every interval/processing-step row as the EHR records it.
- • Negatives — keep only one row per workup: the final no-growth row for cultures (
method_category = culture), the first no-bacteria-seen row for gram stains (method_category = gram stain), and the final negative-result row for non-culture results.
See Issue #180 for the decision thread.
adt Click to flip
- • Location category & type mapping (mCIDE)
- • Fix overlapping time intervals
- • Deduplication & UTC conversion
ADT - Admissions, Discharges, Transfers
code_status Click to flip
- • Code status category mapping (mCIDE)
- • Deduplication & UTC conversion
Code Status
crrt_therapy Click to flip
- • CRRT mode category mapping (mCIDE)
- • Unit conversions to CLIF standards
- • Verify missingness by modality
CRRT Therapy
hospital_diagnosis Click to flip
- • Primary vs secondary diagnosis assignment
- • Present on admission (POA) filtering
Hospital Diagnosis
hospitalization Click to flip
- • Admission & discharge category mapping (mCIDE)
- • Age calculation & filtering
- • Geographic variable cleaning (FIPS codes)
Hospitalization
labs Click to flip
- • Lab order & category mapping (mCIDE)
- • Blood/plasma/serum specimens only
- • Unit standardization to CLIF reference units
- • Numeric value parsing
Labs
medication_admin_continuous Click to flip
- • MAR action & med category mapping (mCIDE)
- • Filter continuous medications
- • Combination drug handling
- • Trial drugs & placebos
Medication Admin Continuous
medication_admin_intermittent Click to flip
- • MAR action & med category mapping (mCIDE)
- • Filter intermittent medications
- • Combination drug handling
- • Trial drugs & placebos
Medication Admin Intermittent
patient Click to flip
- • Demographics mapping (mCIDE)
- • Handle missing category values
- • One record per patient_id
Patient
patient_assessments Click to flip
- • Assessment category mapping (mCIDE)
- • Value type casting (numeric/categorical/text)
Patient Assessments
patient_procedures Click to flip
- • CPT codes (HCPCS Level 1) only
- • ICD-10-PCS procedure codes
- • Filter to relevant procedure codes
Patient Procedures
position Click to expand
- • Free text position standardization
- • Bed angle/position exclusion
- • Prone position detection
- • Supine-type consolidation
- • Missing label handling
Position
respiratory_support Click to expand
- • Map device_name → device_category
- • Map mode_name → mode_category
- • Pivot flowsheets from long to wide
- • Create tracheostomy variable
- • Validate IMV vs Non-IMV settings
Respiratory Support
vitals Click to expand
- • Map vital_name → vital_category
- • Handle BP split (sbp/dbp extraction)
- • Standardize units (temp, spo2, weight, height)
- • Specify invasive vs non-invasive in meas_site_name
Vitals
Additional Resources
Ready to Start Your CLIF ETL Implementation?
Get hands-on support and connect with the CLIF community for guidance throughout your ETL journey.
