CLIF ETL Guide
General ETL guidance for transforming EHR data into CLIF format. Implementation details may vary by site depending on your source data structure.
Data Types & Formatting
Identifier Fields
All *_id variables must be character strings:
DateTime Handling
Convert all timestamps to UTC:
Numeric Values
Clean and validate numeric data:
Data Quality & Validation
Essential Quality Checks
- Check for duplicates using composite keys
- Calculate missingness percentages by field
- Validate date ranges and distributions
- Remove rows with missing critical identifiers
Validation Tools
- CLIF Assistant - AI-powered guidance
- CLIFpy package - Python package
Terminology & Mapping
mCIDE Guidelines
Follow mCIDE (minimum Common ICU Data Elements) mapping guidelines for consistent data transformation across all institutions.
View mCIDE Documentation →Sample Mapping
Example of mapping site-specific discharge names to standardized discharge_category values for the hospitalization table.
| discharge_name (site) | discharge_category (CLIF) |
|---|---|
| Acute Rehab Facility | Acute Inpatient Rehab Facility |
| Rehab Facility - Inpatient | Acute Inpatient Rehab Facility |
| Short Term Hospital | Acute Care Hospital |
| Mental Health Jud Commit Anoka | Psychiatric Hospital |
| IRTS | Home |
Table Specific Guidelines
adt Click to flip
- • Location category & type mapping (mCIDE)
- • Fix overlapping time intervals
- • Deduplication & UTC conversion
ADT - Admissions, Discharges, Transfers
Important Notes
- • ADT represents the patient's physical location, NOT the patient "status"
- • Procedural areas and operating rooms should be mapped to
Procedural - • Pre/Intra/Post-procedural and OR EHR data (e.g., anesthesia flowsheet records from Labs, Vitals, Scores, Respiratory Support) are not currently represented in CLIF
Step 1: Create ADT Mappings
Before transforming your ADT data, prepare two mapping files:
a. Location Category Mapping: Map all site-specific location_name values to standardized CLIF location_category values
b. ICU Location Type Mapping: For ICU locations, map location_name to the appropriate location_type
Step 2: Filter to Inpatient Hospitalizations
Filter your ADT data to include only inpatient hospitalizations. Exclude outpatient visits, observation stays, or other non-inpatient encounters.
Step 3: Apply Mappings
Join the mapping files created in Step 1 to your ADT table to add location_category and location_type columns.
Step 4: Fix Time Intervals
a. Remove zero-duration stays:
Filter out records where in_dttm == out_dttm
b. Fix overlapping intervals:
- 1. Generate unique time points per patient
- 2. Create non-overlapping intervals from consecutive time points
- 3. Map original records using
foverlaps() - 4. Resolve conflicts: select most recent when multiple locations exist
c. Merge consecutive same-location stays:
Use run-length encoding to group and merge consecutive identical locations
Step 5: Create hospital_type Column
Assign the appropriate hospital_type value from the CLIF permissible set based on your institution's characteristics.
Step 6: Final Cleanup
- • Deduplicate: Remove duplicates by composite key
hospitalization_id,in_dttm - • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Ensure
out_dttm > in_dttmfor all records - ✓ Validate
location_categorymappings against mCIDE - ✓ Flag cases where ED visits occur after ICU/Ward
- ✓ Check for remaining duplicates by composite key
code_status Click to flip
- • Code status category mapping (mCIDE)
- • Deduplication & UTC conversion
Code Status
Important Notes
- • This table contains only code status orders placed by clinicians
- • It is NOT equivalent to the code status display name in the EMR
Step 1: Create Code Status Mappings
Before transforming your code status data, prepare the mapping file:
Code Status Category Mapping: Map all site-specific code_status_name values to standardized CLIF code_status_category values
Step 2: Filter to Patients with Inpatient Hospitalizations
Filter your code status data to include only patients with an inpatient hospitalization. Since code status is at the patient level, include only patients whose hospitalization_id exists in your CLIF ADT table.
Step 3: Apply Mappings
Join the mapping file created in Step 1 to your code status table to add the code_status_category column.
Step 4: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Validate
code_status_categorymappings against mCIDE - ✓ Ensure all
hospitalization_idvalues exist in ADT table
crrt_therapy Click to flip
- • CRRT mode category mapping (mCIDE)
- • Unit conversions to CLIF standards
- • Verify missingness by modality
CRRT Therapy
Step 1: Create CRRT Mode Mappings
Before transforming your CRRT data, prepare the mapping file:
Mode Category Mapping: Map all site-specific crrt_mode_name values to standardized CLIF crrt_mode_category values
Step 2: Filter to Inpatient Hospitalizations
Filter your CRRT data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Join the mapping file created in Step 1 to your CRRT table to add the crrt_mode_category column.
Step 4: Unit Conversions
Critical Requirement:
All CRRT parameters must be reported using CLIF units. Please refer to the data dictionary page for CRRT setting units.
If your source data uses different units, convert to CLIF standards. Eg:
| Parameter | CLIF Unit | Conversion Example |
|---|---|---|
blood_flow_rate | mL/min | 0.2 L/min → 200 mL/min |
dialysate_flow_rate | mL/hr | 2 L/hr → 2000 mL/hr |
ultrafiltration_out | mL/hr | 1.5 L/hr → 1500 mL/hr |
Step 5: Verify Missingness by Modality
Validate that parameter availability matches expected patterns for each CRRT modality:
| Modality | Blood Flow | Pre-Filter | Post-Filter | Dialysate | UF Out |
|---|---|---|---|---|---|
| SCUF | E | E | |||
| CVVH | E | E | E | E | |
| CVVHD | E | E | E | ||
| CVVHDF | E | E | E | E | E |
| AVVH | E | P | P | E | |
| AVVHD | E | P | E | ||
| AVVHF | E | P | P | P | E |
E = Expected | P = Possible | Empty = Not applicable
Step 6: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Validate
crrt_mode_categorymappings against mCIDE - ✓ Verify parameter missingness aligns with modality expectations
- ✓ Check unit conversions are applied correctly
hospital_diagnosis Click to flip
- • Primary vs secondary diagnosis assignment
- • Present on admission (POA) filtering
Hospital Diagnosis
Important Notes
- • This table only includes finalized billing diagnosis codes for hospital reimbursement
- • All other diagnosis codes for a patient are included in the concept table
patient_diagnosis
Step 1: Filter to Inpatient Hospitalizations
Filter your hospital diagnosis data to include only inpatient hospitalizations.
Step 2: Create Primary Diagnosis Flag
If your source data has diagnosis ranking, create the diagnosis_primary column:
- 1 = Primary diagnosis (rank = 1)
- 0 = Secondary diagnosis (rank ≥ 2)
Step 3: Filter and Assign POA Values
Only include rows where poa_present is definitively Yes or No:
Exclude rows with:
Unknown, Null, Missing, Clinically Undetermined, Exempt from POA Reporting, Unspecified, etc.
Assign values:
- 1 = Yes (present on admission)
- 0 = No (not present on admission)
No other values are permitted.
Step 4: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Verify
diagnosis_primarycontains only 0 or 1 - ✓ Verify
poa_presentcontains only 0 or 1 - ✓ Validate diagnosis codes are in expected format (ICD-10, etc.)
hospitalization Click to flip
- • Admission & discharge category mapping (mCIDE)
- • Age calculation & filtering
- • Geographic variable cleaning (FIPS codes)
Hospitalization
Important Notes
- • If a patient is discharged to Home/Hospice, then
discharge_category== Hospice - • Geographic indicators (
zipcode_nine_digit,census_block_code, etc.) should be added if available. zipcode_nine_digit is preferred over zipcode_five_digit; census_block_code is ideal for census-based indicators - • If a patient is transferred between different hospitals within a health system, create a new hospitalization_id
- • If a patient is seen in ER at hospital A then admitted to inpatient at hospital B, use one hospitalization_id for both stays
- • A
hospitalization_joined_idcan be created from contiguous hospitalization_ids
Step 1: Create Hospitalization Mappings
Before transforming your hospitalization data, prepare the mapping files:
a. Admission Type Mapping: Map admission_type_name to admission_type_category
b. Discharge Mapping: Map discharge_name to discharge_category
Step 2: Filter to Inpatient Hospitalizations
Filter your hospitalization data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Join the mapping files created in Step 1 to add admission_type_category and discharge_category columns.
Step 4: Age Calculation & Filtering
- • If
age_at_admissionis not present, calculate it usingbirth_date - • Drop encounters where age > 120 years or < 18 years
Step 5: Clean Geographic Variables
Clean and standardize FIPS codes by removing spaces, dashes, and special characters:
| Variable | Digits | Description |
|---|---|---|
state_code | 2 | State FIPS |
county_code | 5 | State + County |
census_tract | 11 | State + County + Tract |
census_block_group_code | 12 | Tract + Block Group |
census_block_code | 15 | Full Block ID |
Derive from census_block_id:
fips_version:
Year of Census geography definitions (2000, 2010, 2020)
Step 6: Final Cleanup
- • Deduplicate: Remove duplicate rows by composite key
hospitalization_id - • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Ensure
discharge_dttm≥admission_dttm - ✓ Verify FIPS codes have correct digit lengths
- ✓ Validate category mappings against mCIDE
labs Click to flip
- • Lab order & category mapping (mCIDE)
- • Blood/plasma/serum specimens only
- • Unit standardization to CLIF reference units
- • Numeric value parsing
Labs
Step 1: Create Lab Mappings
Before transforming your lab data, prepare the mapping files:
a. Lab Order Mapping: Map lab_order_name to lab_order_category
b. Lab Category Mapping: Map lab_name to lab_category
Step 2: Filter to Inpatient Hospitalizations
Filter your lab data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Join the mapping files created in Step 1 to add lab_order_category and lab_category columns.
Step 4: Filter Specimen Types
Include only:
Blood, Plasma, Serum samples
Exclude:
Urine and other fluid specimens
Step 5: Standardize Reference Units
Critical Requirement:
All lab values must be reported using CLIF reference units. Only listed reference units are permissible for respective lab categories. Sites must convert raw values during ETL.
Reference units for each lab category:
→ clif_lab_categories.csv (reference_unit column)Step 6: Create Numeric Value Field
Create lab_value_numeric by parsing the character field lab_value to extract only the numeric part of the string.
Step 7: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Validate
lab_categorymappings against mCIDE - ✓ Verify all units match CLIF reference units
- ✓ Check
lab_value_numericparsing accuracy
medication_admin_continuous Click to flip
- • MAR action & med category mapping (mCIDE)
- • Filter continuous medications
- • Combination drug handling
- • Trial drugs & placebos
Medication Admin Continuous
Step 1: Create Medication Mappings
Before transforming your medication data, prepare the mapping files:
a. Action Category Mapping: Map mar_action_name → mar_action_category
b. Med Category Mapping: Map med_name → med_category
c. Route Category Mapping: Map med_route_name → med_route_category
Step 2: Filter to Inpatient Hospitalizations
Filter your medication data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Join the mapping files to add category columns.
Step 4: Filter Continuous Medications
Include only medications with time-based dosing units:
Step 5: Special Case - Inhaled Nitric Oxide
If inhaled nitric oxide is not in your medication admin table, refer to flowsheets data. The order typically goes in as an RT order (not medication order) and is documented in flowsheets rather than MAR.
Step 6: Handle Combination & Trial Drugs
A. Combination Drugs
When to duplicate rows:
Only duplicate combo meds that are not in the mCIDE as a med_category itself. Assign a duplicated row for each component med_category.
Example: "MedA MedB 5-325mg Tab"
If medA_medB is not a CLIF med_category, but medA and medB are separate categories:
→ Row 1: med_category = "medA", med_dose = 5mg
→ Row 2: med_category = "medB", med_dose = 325mg
Both rows maintain same: hospitalization_id, admin_dttm, med_name
Unit Conversion (mL → mg):
If original unit is mL, convert to mg using concentration from med_name.
e.g., "MedA-MedB 7.5-325 MG/15 ML" at 30 mL → 15mg MedA, 650mg MedB
B. Trial Drugs & Placebos
Map trial drugs to trial_drug as med_category. Leave med_group empty if with placebo.
Example:
"ACYCLOVIR OR PLACEBO CAPSULE (IRB 171591) 400 MG"
→ med_category: trial_drug
→ med_group: None
- • Brand Names: Map to generic, preserve brand in notes
Step 7: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Validate
med_categorymappings against mCIDE - ✓ Verify continuous med filter captures time-based units
- ✓ Check combo drug rows have correct dose splits
medication_admin_intermittent Click to flip
- • MAR action & med category mapping (mCIDE)
- • Filter intermittent medications
- • Combination drug handling
- • Trial drugs & placebos
Medication Admin Intermittent
Step 1: Create Medication Mappings
Before transforming your medication data, prepare the mapping files:
a. Action Category Mapping: Map mar_action_name → mar_action_category
b. Med Category Mapping: Map med_name → med_category
c. Route Category Mapping: Map med_route_name → med_route_category
Step 2: Filter to Inpatient Hospitalizations
Filter your medication data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Join the mapping files to add category columns.
Step 4: Filter Intermittent Medications
Exclude medications with time-based dosing units (keep only intermittent):
Step 5: Special Case - Inhaled Nitric Oxide
If inhaled nitric oxide is not in your medication admin table, refer to flowsheets data. The order typically goes in as an RT order (not medication order) and is documented in flowsheets rather than MAR.
Step 6: Handle Combination & Trial Drugs
A. Combination Drugs
When to duplicate rows:
Only duplicate combo meds that are not in the mCIDE as a med_category itself. Assign a duplicated row for each component med_category.
Example: "MedA MedB 5-325mg Tab"
If medA_medB is not a CLIF med_category, but medA and medB are separate categories:
→ Row 1: med_category = "medA", med_dose = 5mg
→ Row 2: med_category = "medB", med_dose = 325mg
Both rows maintain same: hospitalization_id, admin_dttm, med_name
Unit Conversion (mL → mg):
If original unit is mL, convert to mg using concentration from med_name.
e.g., "MedA-MedB 7.5-325 MG/15 ML" at 30 mL → 15mg MedA, 650mg MedB
B. Trial Drugs & Placebos
Map trial drugs to trial_drug as med_category. Leave med_group empty if with placebo.
Example:
"ACYCLOVIR OR PLACEBO CAPSULE (IRB 171591) 400 MG"
→ med_category: trial_drug
→ med_group: None
- • Brand Names: Map to generic, preserve brand in notes
Step 7: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Validate
med_categorymappings against mCIDE - ✓ Verify intermittent med filter excludes time-based units
- ✓ Check combo drug rows have correct dose splits
patient Click to flip
- • Demographics mapping (mCIDE)
- • Handle missing category values
- • One record per patient_id
Patient
Step 1: Create Patient Mappings
Before transforming your patient data, prepare the mapping files:
a. Race Category Mapping: Map race_name → race_category
b. Sex Category Mapping: Map sex_name → sex_category
c. Ethnicity Category Mapping: Map ethnicity_name → ethnicity_category
d. Language Category Mapping: Map language_name → language_category
Step 2: Filter to Inpatient Patients
Filter your patient data to include only patients whose hospitalization_id exists in your CLIF ADT table.
Step 3: Apply Mappings & Handle Missing Values
Join the mapping files to add category columns, then fill missing values:
Fill missing/null values:
- •
race_category→ "Unknown" - •
sex_category→ "Unknown" - •
ethnicity_category→ "Unknown" - •
language_category→ "Unknown or NA"
Step 4: Handle External Death Data Sources
If using an external source like a state vital records registry to extract death_dttm and the death records are only in date format (no timestamp):
- 1. Same day as discharge: If the external death date matches the patient's
discharge_dttmdate, use thedischarge_dttmas thedeath_dttm - 2. After discharge: If the death date occurs after
discharge_dttm, use the death date and add midnight as the timestamp (format:YYYY-MM-DD 00:00:00+00:00)
Step 5: Final Cleanup
- • Deduplicate: Remove duplicate rows by composite key
patient_id. Ensure one record per patient_id - • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Ensure one record per
patient_id - ✓ Validate all category mappings against mCIDE
- ✓ Verify no null values remain in category fields
patient_assessments Click to flip
- • Assessment category mapping (mCIDE)
- • Value type casting (numeric/categorical/text)
Patient Assessments
Step 1: Create Assessment Mappings
Before transforming your assessment data, prepare the mapping file:
Assessment Category Mapping: Map assessment_name → assessment_category
Step 2: Filter to Inpatient Hospitalizations
Filter your assessment data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Join the mapping file to add assessment_category and assessment_group columns.
Step 4: Create Value Type Columns
Identify and cast the measured value for each assessment to the appropriate column:
- •
numerical_value→ Cast numeric assessment values - •
categorical_value→ Cast categorical/ordinal values - •
text_value→ Cast free-text values
Step 5: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Validate
assessment_categorymappings against mCIDE - ✓ Verify value type casting is correct for each assessment
patient_procedures Click to flip
- • CPT codes (HCPCS Level 1) only
- • ICD-10-PCS procedure codes
- • Filter to relevant procedure codes
Patient Procedures
Important Notes
- • Contains standardized procedural billing codes (HCPCS) for hospitalizations
- • Only includes procedures that were performed/completed (not cancelled)
- • Includes: CPT codes billed by clinicians (HCPCS Level 1 - professional billing)
- • Excludes: Hospital billing codes (HCPCS Level 2 - products, supplies, services without CPT codes)
- • Also contains ICD-10-PCS procedure codes (used for DRG calculation, may appear in hospital_diagnosis)
Step 1: Identify Relevant Procedure Codes
Identify the relevant procedure codes required for CLIF:
Step 2: Filter Data
Filter to inpatient hospitalizations and only include the relevant procedure codes identified in Step 1.
Step 3: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Date-Only Columns
When your source data only contains a date (no time component):
- • Set the time to midnight (00:00:00) in your local timezone
- • Make it timezone-aware
- • Then convert to UTC
Example: A date of 2021-08-26 becomes 2021-08-26 00:00:00 local time, which converts to 2021-08-26 05:00:00+00:00 UTC (for US/Central timezone).
Data Quality Checks
- ✓ Verify procedure codes are valid CPT or ICD-10-PCS codes
- ✓ Ensure only completed procedures are included
position Click to expand
- • Free text position standardization
- • Prone position detection
- • Supine-type consolidation
- • Missing label handling
Position
Step 1: Identify Relevant Position Measurements
Identify relevant position measurement names from flowsheets and filter to them:
Example flowsheet names:
Step 2: Filter to Inpatient Hospitalizations
Filter your position data to include only inpatient hospitalizations.
Step 3: Identify Prone Positions
Prone detection using regex:
regex("prone", ignore_case = TRUE)), 1, 0))
Step 4: Identify Supine-Type Positions
Step 5: Consolidate Supine-Type
Group related positions (Fowlers, Trendelenburg, Sitting) as supine-type.
Step 6: Handle Missing Labels
Position Categories
Prone Positions:
Supine-Type Positions:
Step 7: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Verify prone/supine are mutually exclusive (if prone=1, then supine=0)
- ✓ Check position categories are correctly classified
respiratory_support Click to expand
- • Map device_name → device_category
- • Map mode_name → mode_category
- • Pivot flowsheets from long to wide
- • Create tracheostomy variable
- • Validate IMV vs Non-IMV settings
Respiratory Support
Step 1: Create Respiratory Support Mappings
Before transforming your respiratory support data, prepare the mapping files:
a. Device Category Mapping: Map device_name → device_category
b. Mode Category Mapping: Map mode_name → mode_category
c. Handling Multiple Source Measures:
Multiple flowsheet measure names can map to the same CLIF element (e.g., device_name). Include all matching measures and prioritize by counting unique hospitalizations where each appears.
Steps:
- Include all flowsheet measures that map to the CLIF element
- Pivot each measure into its own column (e.g., device_name_1, device_name_2, device_name_3)
- Use
coalesce()to select the first non-null value in priority order
Example:
| Source Measure Name | Hosp. Count | Priority | Pivoted Column |
|---|---|---|---|
| RT RS OXYGEN DEVICE | 100 | 1 | device_name_1 |
| RT O2 DELIVERY DEVICE | 75 | 2 | device_name_2 |
| ED RT RS OXYGEN DEVICE | 50 | 3 | device_name_3 |
Pivot and coalesce to get final value:
| patient_id | device_name_1 | device_name_2 | device_name_3 | device_name = COALESCE(device_name_1, device_name_2, device_name_3) |
|---|---|---|---|---|
| 123 | NULL | Nasal Cannula | Room Air | Nasal Cannula |
| 456 | High Flow NC | NULL | High Flow NC | High Flow NC |
| 789 | NULL | NULL | Room Air | Room Air |
Step 2: Filter to Inpatient Hospitalizations
Filter your respiratory support data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Apply the device and mode mappings to your flowsheets data.
Step 4: Pivot Long to Wide
Pivot the long dataset to wide format. Data must be unique by index columns before pivoting:
index=["patient_id", "hospitalization_id", "recorded_dttm"],
columns="unique_names",
values="meas_value"
)
Note: Ensure no duplicate combinations of (patient_id, hospitalization_id, recorded_dttm, unique_names) before pivoting.
Step 5: Create Tracheostomy Variable
Create the tracheostomy variable by:
- • Identifying any fields capturing presence of tracheostomy
- • Checking if device_name contains "trach"
Special Case:
If device_name contains "Vent" (e.g., "Trach Collar: Vent") AND other respiratory settings are observed, then device_category ≠ "Trach Collar"
Step 6: Device Category Logic
Invasive Mechanical Ventilation (IMV):
!grepl('Venturi Mask', device_name, ignore.case = TRUE) ~ 'IMV'
Other Categories:
Mode-Based Category Override
Mode names can override device categories when conflicts exist:
Step 7: Unit Conversions
If your source data uses different units for device settings, convert to CLIF standards:
Example:
minute_vent_obs is required to be in liters. If source data is in mL, convert accordingly (divide by 1000).
Step 8: Sanity Check for IMV vs Non-IMV
Verify device_category matches expected settings. If a hospitalization has respiratory settings typically used for IMV, device_category should not be a non-IMV device like Room Air, Face Mask, Nasal Cannula, etc.
Non-IMV Expected Settings:
| Device | Mode | Req Settings | Req Obs |
|---|---|---|---|
| CPAP | PS/CPAP | fio2_set, peep_set | tidal_volume_obs, resp_rate_obs |
| Face Mask | - | lpm_set, fio2_set (possible) | - |
| High Flow NC | - | fio2_set, lpm_set | - |
| Nasal Cannula | - | lpm_set | - |
| NIPPV | Vol Support | fio2, peep, tv, rr, insp_time | - |
| NIPPV | PS/CPAP | fio2, peep, pip, rr, insp_time | - |
| Room Air | - | No settings | No obs |
| Trach Collar | - | lpm_set, fio2_set | No obs |
IMV Expected Settings by Mode (device_category == "IMV"):
| ventilator_setting | AC-VC | PS/CPAP | PC | PRVC | SIMV | Vol Sup |
|---|---|---|---|---|---|---|
| fio2_set | E | E | E | E | E | E |
| tidal_volume_set | E | E | P | E | ||
| resp_rate_set | E | E | E | E | ||
| pressure_control_set | E | P | ||||
| pressure_support_set | E | E | ||||
| flow_rate_set | P | P | ||||
| inspiratory_time_set | P | E | P | |||
| peep_set | E | E | E | E | E | E |
E = Expected | P = Possible | Empty = Not applicable
Step 9: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Verify device_category and mode_category are consistent
- ✓ Check IMV rows have expected observations populated
- ✓ Validate tracheostomy flag against device_name patterns
vitals Click to expand
- • Map vital_name → vital_category
- • Handle BP split (sbp/dbp extraction)
- • Standardize units (temp, spo2, weight, height)
- • Specify invasive vs non-invasive in meas_site_name
Vitals
Step 1: Create Vitals Mappings
Before transforming your vitals data, prepare the mapping file:
Vital Category Mapping: Map vital_name → vital_category
Step 2: Filter to Inpatient Hospitalizations
Filter your vitals data to include only inpatient hospitalizations.
Step 3: Apply Mappings
Apply the vital_category mappings to your flowsheets data.
Step 4: Blood Pressure Split
If blood pressure is not available separately as sbp and dbp, split to create the respective vital_category:
bp_df_pd['bp_split'] = bp_df_pd['vital_value'].str.strip().str.split('/')
bp_df_pd['sbp'] = bp_df_pd['bp_split'].str[0]
bp_df_pd['dbp'] = bp_df_pd['bp_split'].str[1]
Step 5: Standardize Vitals Units
Convert to CLIF standard units:
| Vital | Required Unit |
|---|---|
| temp_c | Celsius (°C) |
| spo2 | Percent (%) |
| map | mmHg |
| height_cm | Centimeters (cm) |
| weight_kg | Kilograms (kg) |
Step 6: Final Cleanup
- • Deduplicate: Remove duplicate rows
- • UTC Conversion: Convert all datetime fields to UTC timezone
- • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
- ✓ Check BP split created valid sbp/dbp values
- ✓ Validate meas_site_name populated for invasive measurements
Additional Resources
Ready to Start Your CLIF ETL Implementation?
Get hands-on support and connect with the CLIF community for guidance throughout your ETL journey.
