CLIF ETL Guide

General ETL guidance for transforming EHR data into CLIF format. Implementation details may vary by site depending on your source data structure.

Data Types & Formatting

Identifier Fields

All *_id variables must be character strings:

patient_id <- as.character(patient_id)

DateTime Handling

Convert all timestamps to UTC:

admit_dttm <- as_datetime(admit_dttm, tz = "UTC")

Numeric Values

Clean and validate numeric data:

value <- as.numeric(parse_number(value))

Data Quality & Validation

Essential Quality Checks

Check for duplicates using composite keys
Calculate missingness percentages by field
Validate date ranges and distributions
Remove rows with missing critical identifiers

Validation Tools

CLIF Assistant - AI-powered guidance
CLIFpy package - Python package

Terminology & Mapping

mCIDE Guidelines

Follow mCIDE (minimum Common ICU Data Elements) mapping guidelines for consistent data transformation across all institutions.

View mCIDE Documentation →

Sample Mapping

Example of mapping site-specific discharge names to standardized discharge_category values for the hospitalization table.

discharge_name (site)	discharge_category (CLIF)
Acute Rehab Facility	Acute Inpatient Rehab Facility
Rehab Facility - Inpatient	Acute Inpatient Rehab Facility
Short Term Hospital	Acute Care Hospital
Mental Health Jud Commit Anoka	Psychiatric Hospital
IRTS	Home

Table Specific Guidelines

adt Click to flip

• Location category & type mapping (mCIDE)
• Fix overlapping time intervals
• Deduplication & UTC conversion

Composite Keys: hospitalization_id, in_dttm

ADT - Admissions, Discharges, Transfers

Important Notes

• ADT represents the patient's physical location, NOT the patient "status"
• Procedural areas and operating rooms should be mapped to Procedural
• Pre/Intra/Post-procedural and OR EHR data (e.g., anesthesia flowsheet records from Labs, Vitals, Scores, Respiratory Support) are not currently represented in CLIF

Step 1: Create ADT Mappings

Before transforming your ADT data, prepare two mapping files:

a. Location Category Mapping: Map all site-specific location_name values to standardized CLIF location_category values

→ clif_adt_location_categories.csv

b. ICU Location Type Mapping: For ICU locations, map location_name to the appropriate location_type

→ clif_adt_location_type.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your ADT data to include only inpatient hospitalizations. Exclude outpatient visits, observation stays, or other non-inpatient encounters.

Step 3: Apply Mappings

Join the mapping files created in Step 1 to your ADT table to add location_category and location_type columns.

Step 4: Fix Time Intervals

a. Remove zero-duration stays:

Filter out records where in_dttm == out_dttm

b. Fix overlapping intervals:

1. Generate unique time points per patient
2. Create non-overlapping intervals from consecutive time points
3. Map original records using foverlaps()
4. Resolve conflicts: select most recent when multiple locations exist

c. Merge consecutive same-location stays:

Use run-length encoding to group and merge consecutive identical locations

Step 5: Create hospital_type Column

Assign the appropriate hospital_type value from the CLIF permissible set based on your institution's characteristics.

Step 6: Final Cleanup

• Deduplicate: Remove duplicates by composite key hospitalization_id, in_dttm
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Ensure out_dttm > in_dttm for all records
✓ Validate location_category mappings against mCIDE
✓ Flag cases where ED visits occur after ICU/Ward
✓ Check for remaining duplicates by composite key

code_status Click to flip

• Code status category mapping (mCIDE)
• Deduplication & UTC conversion

Code Status

Important Notes

• This table should represent what the clinician intends for the patients' limitations in care, not just what the EHR banner displays
• It is NOT always equivalent to the code status display name in the EHR
• Sites will need to identify their code status orders. Additionally, sites may have separate orders for "comfort measures only" not found within the code status order that, if available, should be pulled to map it to allow_natural_death

Step 1: Create Code Status Mappings

Before transforming your code status data, prepare the mapping file:

Code Status Category Mapping: Map all site-specific code_status_name values to standardized CLIF code_status_category values

→ clif_code_status_categories.csv

Step 2: Filter to Patients with Inpatient Hospitalizations

Filter your code status data to include only patients with an inpatient hospitalization. Since code status is at the patient level, include only patients whose hospitalization_id exists in your CLIF ADT table.

Step 3: Apply Mappings

Join the mapping file created in Step 1 to your code status table to add the code_status_category column.

Step 4: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Validate code_status_category mappings against mCIDE
✓ Ensure all hospitalization_id values exist in ADT table

crrt_therapy Click to flip

• CRRT mode category mapping (mCIDE)
• Unit conversions to CLIF standards
• Verify missingness by modality

CRRT Therapy

Step 1: Create CRRT Mode Mappings

Before transforming your CRRT data, prepare the mapping file:

Mode Category Mapping: Map all site-specific crrt_mode_name values to standardized CLIF crrt_mode_category values

→ clif_crrt_therapy_mode_categories.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your CRRT data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping file created in Step 1 to your CRRT table to add the crrt_mode_category column.

Step 4: Unit Conversions

Critical Requirement:

All CRRT parameters must be reported using CLIF units. Please refer to the data dictionary page for CRRT setting units.

If your source data uses different units, convert to CLIF standards. Eg:

Parameter	CLIF Unit	Conversion Example
`blood_flow_rate`	mL/min	0.2 L/min → 200 mL/min
`dialysate_flow_rate`	mL/hr	2 L/hr → 2000 mL/hr
`ultrafiltration_out`	mL/hr	1.5 L/hr → 1500 mL/hr

Step 5: Verify Missingness by Modality

Validate that parameter availability matches expected patterns for each CRRT modality:

Modality	Blood Flow	Pre-Filter	Post-Filter	Dialysate	UF Out
SCUF	E				E
CVVH	E	E	E		E
CVVHD	E			E	E
CVVHDF	E	E	E	E	E
AVVH	E	P	P		E
AVVHD	E			P	E
AVVHF	E	P	P	P	E

E = Expected | P = Possible | Empty = Not applicable

Step 6: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Validate crrt_mode_category mappings against mCIDE
✓ Verify parameter missingness aligns with modality expectations
✓ Check unit conversions are applied correctly

hospital_diagnosis Click to flip

• Primary vs secondary diagnosis assignment
• Present on admission (POA) filtering

Hospital Diagnosis

Important Notes

• This table only includes finalized billing diagnosis codes for hospital reimbursement
• All other diagnosis codes for a patient are included in the concept table patient_diagnosis

Step 1: Filter to Inpatient Hospitalizations

Filter your hospital diagnosis data to include only inpatient hospitalizations.

Step 2: Create Primary Diagnosis Flag

If your source data has diagnosis ranking, create the diagnosis_primary column:

1 = Primary diagnosis (rank = 1)
0 = Secondary diagnosis (rank ≥ 2)

Step 3: Filter and Assign POA Values

Only include rows where poa_present is definitively Yes or No:

Exclude rows with:

Unknown, Null, Missing, Clinically Undetermined, Exempt from POA Reporting, Unspecified, etc.

Assign values:

1 = Yes (present on admission)
0 = No (not present on admission)

No other values are permitted.

Step 4: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Verify diagnosis_primary contains only 0 or 1
✓ Verify poa_present contains only 0 or 1
✓ Validate diagnosis codes are in expected format (ICD-10, etc.)

hospitalization Click to flip

• Admission & discharge category mapping (mCIDE)
• Age calculation & filtering
• Geographic variable cleaning (FIPS codes)

Composite Key: hospitalization_id

Hospitalization

Important Notes

• If a patient is discharged to Home/Hospice, then discharge_category == Hospice
• Geographic indicators (zipcode_nine_digit, census_block_code, etc.) should be added if available. zipcode_nine_digit is preferred over zipcode_five_digit; census_block_code is ideal for census-based indicators
• If a patient is transferred between different hospitals within a health system, create a new hospitalization_id
• If a patient is seen in ER at hospital A then admitted to inpatient at hospital B, use one hospitalization_id for both stays
• A hospitalization_joined_id can be created from contiguous hospitalization_ids

Step 1: Create Hospitalization Mappings

Before transforming your hospitalization data, prepare the mapping files:

a. Admission Type Mapping: Map admission_type_name to admission_type_category

→ clif_hospitalization_admission_type_categories.csv

b. Discharge Mapping: Map discharge_name to discharge_category

→ clif_hospitalization_discharge_categories.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your hospitalization data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files created in Step 1 to add admission_type_category and discharge_category columns.

Step 4: Age Calculation & Filtering

• If age_at_admission is not present, calculate it using birth_date
• Drop encounters where age > 120 years or < 18 years

Step 5: Clean Geographic Variables

Clean and standardize FIPS codes by removing spaces, dashes, and special characters:

Variable	Digits	Description
`state_code`	2	State FIPS
`county_code`	5	State + County
`census_tract`	11	State + County + Tract
`census_block_group_code`	12	Tract + Block Group
`census_block_code`	15	Full Block ID

Derive from census_block_id:

state_code = census_block_id[0:2]

county_code = census_block_id[0:5]

census_tract = census_block_id[0:11]

census_block_group_code = census_block_id[0:12]

fips_version:

Year of Census geography definitions (2000, 2010, 2020)

Step 6: Final Cleanup

• Deduplicate: Remove duplicate rows by composite key hospitalization_id
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Ensure discharge_dttm ≥ admission_dttm
✓ Verify FIPS codes have correct digit lengths
✓ Validate category mappings against mCIDE

labs Click to flip

• Lab order & category mapping (mCIDE)
• Blood/plasma/serum specimens only
• Unit standardization to CLIF reference units
• Numeric value parsing

Labs

Step 1: Create Lab Mappings

Before transforming your lab data, prepare the mapping files:

a. Lab Order Mapping: Map lab_order_name to lab_order_category

→ clif_labs_order_categories.csv

b. Lab Category Mapping: Map lab_name to lab_category

→ clif_lab_categories.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your lab data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files created in Step 1 to add lab_order_category and lab_category columns.

Step 4: Filter Specimen Types

Include only:

Blood, Plasma, Serum samples

Exclude:

Urine and other fluid specimens

Step 5: Merge Sensitive Lab Variants

Important:

Map both regular and highly sensitive lab assays to the same lab_category.

Examples:

• Troponin I (ng/mL) + hs-Troponin I (ng/L) → troponin_i
• CRP (mg/L) + hs-CRP (mg/L) → crp

Note: Unit conversions will be handled in the next step.

Step 6: Standardize Reference Units

Critical Requirement:

All lab values must be reported using CLIF reference units. Only listed reference units are permissible for respective lab categories. Sites must convert raw values during ETL.

Reference units for each lab category:

→ clif_lab_categories.csv (reference_unit column)

Step 7: Create Numeric Value Field

Create lab_value_numeric by parsing the character field lab_value to extract only the numeric part of the string.

lab_value_numeric = parse_number(lab_value)

Step 8: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Validate lab_category mappings against mCIDE
✓ Verify all units match CLIF reference units
✓ Check lab_value_numeric parsing accuracy

medication_admin_continuous Click to flip

• MAR action & med category mapping (mCIDE)
• Filter continuous medications
• Combination drug handling
• Trial drugs & placebos

Medication Admin Continuous

Step 1: Create Medication Mappings

Before transforming your medication data, prepare the mapping files:

a. Action Category Mapping: Map mar_action_name → mar_action_category

→ clif_medication_admin_continuous_action_categories.csv

b. Med Category Mapping: Map med_name → med_category

→ clif_medication_admin_continuous_med_categories.csv

c. Route Category Mapping: Map med_route_name → med_route_category

→ clif_medication_admin_continuous_med_route_categories.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your medication data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files to add category columns.

Step 4: Filter Continuous Medications

Include only medications with time-based dosing units:

filter(grepl("min|hr|day", med_dose_unit, ignore.case = TRUE))

Step 5: Special Case - Inhaled Nitric Oxide

If inhaled nitric oxide is not in your medication admin table, refer to flowsheets data. The order typically goes in as an RT order (not medication order) and is documented in flowsheets rather than MAR.

Step 6: Handle Combination & Trial Drugs

A. Combination Drugs

When to duplicate rows:

Only duplicate combo meds that are not in the mCIDE as a med_category itself. Assign a duplicated row for each component med_category.

Example: "MedA MedB 5-325mg Tab"

If medA_medB is not a CLIF med_category, but medA and medB are separate categories:

→ Row 1: med_category = "medA", med_dose = 5mg

→ Row 2: med_category = "medB", med_dose = 325mg

Both rows maintain same: hospitalization_id, admin_dttm, med_name

Unit Conversion (mL → mg):

If original unit is mL, convert to mg using concentration from med_name.

e.g., "MedA-MedB 7.5-325 MG/15 ML" at 30 mL → 15mg MedA, 650mg MedB

B. Trial Drugs & Placebos

Map trial drugs to trial_drug as med_category. Leave med_group empty if with placebo.

Example:

"ACYCLOVIR OR PLACEBO CAPSULE (IRB 171591) 400 MG"

→ med_category: trial_drug

→ med_group: None

• Brand Names: Map to generic, preserve brand in notes

Step 7: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Validate med_category mappings against mCIDE
✓ Verify continuous med filter captures time-based units
✓ Check combo drug rows have correct dose splits

medication_admin_intermittent Click to flip

• MAR action & med category mapping (mCIDE)
• Filter intermittent medications
• Combination drug handling
• Trial drugs & placebos

Medication Admin Intermittent

Step 1: Create Medication Mappings

Before transforming your medication data, prepare the mapping files:

a. Action Category Mapping: Map mar_action_name → mar_action_category

→ clif_medication_admin_intermittent_action_categories.csv

b. Med Category Mapping: Map med_name → med_category

→ clif_medication_admin_intermittent_med_categories.csv

c. Route Category Mapping: Map med_route_name → med_route_category

→ clif_medication_admin_intermittent_med_route_categories.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your medication data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files to add category columns.

Step 4: Filter Intermittent Medications

Exclude medications with time-based dosing units (keep only intermittent):

filter(!grepl("min|hr|day", med_dose_unit, ignore.case = TRUE))

Step 5: Special Case - Inhaled Nitric Oxide

Step 6: Handle Combination & Trial Drugs

A. Combination Drugs

When to duplicate rows:

Only duplicate combo meds that are not in the mCIDE as a med_category itself. Assign a duplicated row for each component med_category.

Example: "MedA MedB 5-325mg Tab"

If medA_medB is not a CLIF med_category, but medA and medB are separate categories:

→ Row 1: med_category = "medA", med_dose = 5mg

→ Row 2: med_category = "medB", med_dose = 325mg

Both rows maintain same: hospitalization_id, admin_dttm, med_name

Unit Conversion (mL → mg):

If original unit is mL, convert to mg using concentration from med_name.

e.g., "MedA-MedB 7.5-325 MG/15 ML" at 30 mL → 15mg MedA, 650mg MedB

B. Trial Drugs & Placebos

Map trial drugs to trial_drug as med_category. Leave med_group empty if with placebo.

Example:

"ACYCLOVIR OR PLACEBO CAPSULE (IRB 171591) 400 MG"

→ med_category: trial_drug

→ med_group: None

• Brand Names: Map to generic, preserve brand in notes

Step 7: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Validate med_category mappings against mCIDE
✓ Verify intermittent med filter excludes time-based units
✓ Check combo drug rows have correct dose splits

patient Click to flip

• Demographics mapping (mCIDE)
• Handle missing category values
• One record per patient_id

Composite Key: patient_id

Patient

Step 1: Create Patient Mappings

Before transforming your patient data, prepare the mapping files:

a. Race Category Mapping: Map race_name → race_category

→ clif_patient_race_categories.csv

b. Sex Category Mapping: Map sex_name → sex_category

→ clif_patient_sex_categories.csv

c. Ethnicity Category Mapping: Map ethnicity_name → ethnicity_category

→ clif_patient_ethinicity_categories.csv

d. Language Category Mapping: Map language_name → language_category

→ clif_patient_language_categories.csv

Step 2: Filter to Inpatient Patients

Filter your patient data to include only patients whose hospitalization_id exists in your CLIF ADT table.

Step 3: Apply Mappings & Handle Missing Values

Join the mapping files to add category columns, then fill missing values:

Fill missing/null values:

• race_category → "Unknown"
• sex_category → "Unknown"
• ethnicity_category → "Unknown"
• language_category → "Unknown or NA"

Step 4: Handle External Death Data Sources

If using an external source like a state vital records registry to extract death_dttm and the death records are only in date format (no timestamp):

1. Same day as discharge: If the external death date matches the patient's discharge_dttm date, use the discharge_dttm as the death_dttm
2. After discharge: If the death date occurs after discharge_dttm, use the death date and add midnight as the timestamp (format: YYYY-MM-DD 00:00:00+00:00)

Step 5: Final Cleanup

• Deduplicate: Remove duplicate rows by composite key patient_id. Ensure one record per patient_id
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Ensure one record per patient_id
✓ Validate all category mappings against mCIDE
✓ Verify no null values remain in category fields

patient_assessments Click to flip

• Assessment category mapping (mCIDE)
• Value type casting (numeric/categorical/text)

Patient Assessments

Step 1: Create Assessment Mappings

Before transforming your assessment data, prepare the mapping file:

Assessment Category Mapping: Map assessment_name → assessment_category

→ clif_patient_assessment_categories.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your assessment data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping file to add assessment_category and assessment_group columns.

Step 4: Create Value Type Columns

Identify and cast the measured value for each assessment to the appropriate column:

• numerical_value → Cast numeric assessment values
• categorical_value → Cast categorical/ordinal values
• text_value → Cast free-text values

Step 5: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Validate assessment_category mappings against mCIDE
✓ Verify value type casting is correct for each assessment

patient_procedures Click to flip

• CPT codes (HCPCS Level 1) only
• ICD-10-PCS procedure codes
• Filter to relevant procedure codes

Patient Procedures

Important Notes

• Contains standardized procedural billing codes (HCPCS) for hospitalizations
• Only includes procedures that were performed/completed (not cancelled)
• Includes: CPT codes billed by clinicians (HCPCS Level 1 - professional billing)
• Excludes: Hospital billing codes (HCPCS Level 2 - products, supplies, services without CPT codes)
• Also contains ICD-10-PCS procedure codes (used for DRG calculation, may appear in hospital_diagnosis)

Step 1: Identify Relevant Procedure Codes

Identify the relevant procedure codes required for CLIF:

→ clif_patient_procedure_codes.csv

Step 2: Filter Data

Filter to inpatient hospitalizations and only include the relevant procedure codes identified in Step 1.

Step 3: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Date-Only Columns

When your source data only contains a date (no time component):

• Set the time to midnight (00:00:00) in your local timezone
• Make it timezone-aware
• Then convert to UTC

Example: A date of 2021-08-26 becomes 2021-08-26 00:00:00 local time, which converts to 2021-08-26 05:00:00+00:00 UTC (for US/Central timezone).

Data Quality Checks

✓ Verify procedure codes are valid CPT or ICD-10-PCS codes
✓ Ensure only completed procedures are included

position Click to expand

• Free text position standardization
• Bed angle/position exclusion
• Prone position detection
• Supine-type consolidation
• Missing label handling

Position

Step 1: Identify Relevant Position Measurements

Identify relevant position measurement names from flowsheets and filter to them:

Example flowsheet names:

• "NUR RS POSITION"

Step 2: Filter to Inpatient Hospitalizations

Filter your position data to include only inpatient hospitalizations.

Step 3: Separate Patient Position from Bed Position

Patient position and bed position are distinct concepts and should be treated as separate categories. Patient position refers to the patient's body orientation (prone, supine, sitting, etc.), while bed position refers to bed angles and configurations (Trendelenburg, 30 degrees, etc.). Only patient position should be mapped to position_category.

Bed Position (exclude):

• Trendelenburg, Reverse Trendelenburg

• Bed angle degrees (30°, 45°, etc.)

• Head of bed elevation

Combined Records:

Sometimes both are recorded together in a single field. Extract only the patient position component:

• "prone/trendelenburg" → extract "prone"

• "supine, HOB 30 degrees" → extract "supine"

• "semi-fowlers/reverse trendelenburg" → extract "semi-fowlers"

Step 4: Identify Prone Positions

Prone detection using regex:

mutate(prone = if_else(str_detect(position_name,
regex("prone", ignore_case = TRUE)), 1, 0))

Step 5: Identify Supine-Type Positions

supine = if_else(str_detect(position_name, regex("supin|supine", ignore_case = TRUE)), 1, NaN)

fowlers = if_else(str_detect(position_name, regex("fowler", ignore_case = TRUE)), 1, NaN)

sitting = if_else(str_detect(position_name, regex("sitting", ignore_case = TRUE)), 1, NaN)

Step 6: Consolidate Supine-Type

Group related positions (Fowlers, Sitting) as supine-type.

Step 7: Handle Missing Labels

supine = if_else(is.na(supine) & prone == 0, 1, supine)

supine = if_else(is.na(supine) & prone == 1, 0, supine)

Position Categories

Prone Positions:

• "Prone" / "Proning" / "Prone positioning"

• "Face Down"

• "Prone Positioning for ARDS"

• "Prone Ventilation"

• "Swimming Position"

Supine-Type Positions:

• "Supine"

• "Fowlers" / "Semi-Fowlers"

• "Sitting"

Step 8: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Verify prone/supine are mutually exclusive (if prone=1, then supine=0)
✓ Check position categories are correctly classified

respiratory_support Click to expand

• Map device_name → device_category
• Map mode_name → mode_category
• Pivot flowsheets from long to wide
• Create tracheostomy variable
• Validate IMV vs Non-IMV settings

Respiratory Support

Step 1: Create Respiratory Support Mappings

Before transforming your respiratory support data, prepare the mapping files:

a. Device Category Mapping: Map device_name → device_category

→ clif_respiratory_support_device_categories.csv

b. Mode Category Mapping: Map mode_name → mode_category

→ clif_respiratory_support_mode_categories.csv

c. Handling Multiple Source Measures:

Multiple flowsheet measure names can map to the same CLIF element (e.g., device_name). Include all matching measures and prioritize by counting unique hospitalizations where each appears.

Steps:

Include all flowsheet measures that map to the CLIF element
Pivot each measure into its own column (e.g., device_name_1, device_name_2, device_name_3)
Use coalesce() to select the first non-null value in priority order

COALESCE(device_name_1, device_name_2, device_name_3) AS device_name

Example:

Source Measure Name	Hosp. Count	Priority	Pivoted Column
RT RS OXYGEN DEVICE	100	1	device_name_1
RT O2 DELIVERY DEVICE	75	2	device_name_2
ED RT RS OXYGEN DEVICE	50	3	device_name_3

Pivot and coalesce to get final value:

patient_id	device_name_1	device_name_2	device_name_3	device_name = COALESCE(device_name_1, device_name_2, device_name_3)
123	NULL	Nasal Cannula	Room Air	Nasal Cannula
456	High Flow NC	NULL	High Flow NC	High Flow NC
789	NULL	NULL	Room Air	Room Air

Step 2: Filter to Inpatient Hospitalizations

Filter your respiratory support data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Apply the device and mode mappings to your flowsheets data.

Step 4: Pivot Long to Wide

Pivot the long dataset to wide format. Data must be unique by index columns before pivoting:

wide_resp_df = resp_df.pivot(
  index=["patient_id", "hospitalization_id", "recorded_dttm"],
  columns="unique_names",
  values="meas_value"
)

Note: Ensure no duplicate combinations of (patient_id, hospitalization_id, recorded_dttm, unique_names) before pivoting.

Step 5: Create Tracheostomy Variable

Create the tracheostomy variable by:

• Identifying any fields capturing presence of tracheostomy
• Checking if device_name contains "trach"

Special Case:

If device_name contains "Vent" (e.g., "Trach Collar: Vent") AND other respiratory settings are observed, then device_category ≠ "Trach Collar"

Step 6: Device Category Logic

Invasive Mechanical Ventilation (IMV):

grepl('Vent', device_name, ignore.case = TRUE) &
!grepl('Venturi Mask', device_name, ignore.case = TRUE) ~ 'IMV'

Other Categories:

NIPPV|Bipap → 'NIPPV'

CPAP → 'CPAP'

High Flow NC → 'High Flow NC'

Cannula → 'Nasal Cannula'

Mask|Non-Rebreather → 'Face Mask'

Room Air → 'Room Air'

Trach Collar → 'Trach Collar'

Mode-Based Category Override

Mode names can override device categories when conflicts exist:

"SIMV - PC PS" → "IMV"

"A/C Volume" → "IMV"

"CPAP" → "CPAP"

"NIV-PC" → "NIPPV"

"BiPAP" → "NIPPV"

Step 7: Unit Conversions

If your source data uses different units for device settings, convert to CLIF standards:

Example:

minute_vent_obs is required to be in liters. If source data is in mL, convert accordingly (divide by 1000).

Step 8: Sanity Check for IMV vs Non-IMV

Verify device_category matches expected settings. If a hospitalization has respiratory settings typically used for IMV, device_category should not be a non-IMV device like Room Air, Face Mask, Nasal Cannula, etc.

Non-IMV Expected Settings:

Device	Mode	Req Settings	Req Obs
CPAP	PS/CPAP	fio2_set, peep_set	tidal_volume_obs, resp_rate_obs
Face Mask	-	lpm_set, fio2_set (possible)	-
High Flow NC	-	fio2_set, lpm_set	-
Nasal Cannula	-	lpm_set	-
NIPPV	Vol Support	fio2, peep, tv, rr, insp_time	-
NIPPV	PS/CPAP	fio2, peep, pip, rr, insp_time	-
Room Air	-	No settings	No obs
Trach Collar	-	lpm_set, fio2_set	No obs

IMV Expected Settings by Mode (device_category == "IMV"):

ventilator_setting	AC-VC	PS/CPAP	PC	PRVC	SIMV	Vol Sup
fio2_set	E	E	E	E	E	E
tidal_volume_set	E			E	P	E
resp_rate_set	E		E	E	E
pressure_control_set			E		P
pressure_support_set		E			E
flow_rate_set	P				P
inspiratory_time_set	P		E		P
peep_set	E	E	E	E	E	E

E = Expected | P = Possible | Empty = Not applicable

Step 9: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Verify device_category and mode_category are consistent
✓ Check IMV rows have expected observations populated
✓ Validate tracheostomy flag against device_name patterns

vitals Click to expand

• Map vital_name → vital_category
• Handle BP split (sbp/dbp extraction)
• Standardize units (temp, spo2, weight, height)
• Specify invasive vs non-invasive in meas_site_name

Vitals

Step 1: Create Vitals Mappings

Before transforming your vitals data, prepare the mapping file:

Vital Category Mapping: Map vital_name → vital_category

→ clif_vitals_categories.csv

Step 2: Filter to Inpatient Hospitalizations

Filter your vitals data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Apply the vital_category mappings to your flowsheets data.

sbp: Systolic blood pressure. Can include both invasive (arterial line) and non-invasive (cuff) measurements; specify in meas_site_name.

dbp: Diastolic blood pressure. Can include both invasive (arterial line) and non-invasive (cuff) measurements; specify in meas_site_name.

map: Mean arterial pressure. Can include both invasive and non-invasive. Source type should be indicated in meas_site_name.

Step 4: Blood Pressure Split

If blood pressure is not available separately as sbp and dbp, split to create the respective vital_category:

bp_df = filtered_df.filter(pl.col("vital_category") == "blood_pressure")
bp_df_pd['bp_split'] = bp_df_pd['vital_value'].str.strip().str.split('/')
bp_df_pd['sbp'] = bp_df_pd['bp_split'].str[0]
bp_df_pd['dbp'] = bp_df_pd['bp_split'].str[1]

Step 5: Standardize Vitals Units

Convert to CLIF standard units:

Vital	Required Unit
temp_c	Celsius (°C)
spo2	Percent (%)
map	mmHg
height_cm	Centimeters (cm)
weight_kg	Kilograms (kg)

Step 6: Final Cleanup

• Deduplicate: Remove duplicate rows
• UTC Conversion: Convert all datetime fields to UTC timezone
• Data Types: Ensure all variables match data dictionary specifications

Data Quality Checks

✓ Check BP split created valid sbp/dbp values
✓ Validate meas_site_name populated for invasive measurements

Ready to Start Your CLIF ETL Implementation?

Get hands-on support and connect with the CLIF community for guidance throughout your ETL journey.

Try CLIF Assistant