Skip to main content
CLIF Logo

CLIF ETL Guide

General ETL guidance for transforming EHR data into CLIF format. Implementation details may vary by site depending on your source data structure.

Data Types & Formatting

Identifier Fields

All *_id variables must be character strings:

patient_id <- as.character(patient_id)

DateTime Handling

Convert all timestamps to UTC:

admit_dttm <- as_datetime(admit_dttm, tz = "UTC")

Numeric Values

Clean and validate numeric data:

value <- as.numeric(parse_number(value))

Data Quality & Validation

Essential Quality Checks

  • Check for duplicates using composite keys
  • Calculate missingness percentages by field
  • Validate date ranges and distributions
  • Remove rows with missing critical identifiers

Validation Tools

Terminology & Mapping

mCIDE Guidelines

Follow mCIDE (minimum Common ICU Data Elements) mapping guidelines for consistent data transformation across all institutions.

View mCIDE Documentation →

Sample Mapping

Example of mapping site-specific discharge names to standardized discharge_category values for the hospitalization table.

discharge_name (site)discharge_category (CLIF)
Acute Rehab FacilityAcute Inpatient Rehab Facility
Rehab Facility - InpatientAcute Inpatient Rehab Facility
Short Term HospitalAcute Care Hospital
Mental Health Jud Commit AnokaPsychiatric Hospital
IRTSHome

Table Specific Guidelines

adt Click to flip

  • • Location category & type mapping (mCIDE)
  • • Fix overlapping time intervals
  • • Deduplication & UTC conversion
Composite Keys: hospitalization_id, in_dttm

ADT - Admissions, Discharges, Transfers

Important Notes
  • • ADT represents the patient's physical location, NOT the patient "status"
  • • Procedural areas and operating rooms should be mapped to Procedural
  • • Pre/Intra/Post-procedural and OR EHR data (e.g., anesthesia flowsheet records from Labs, Vitals, Scores, Respiratory Support) are not currently represented in CLIF
Step 1: Create ADT Mappings

Before transforming your ADT data, prepare two mapping files:

a. Location Category Mapping: Map all site-specific location_name values to standardized CLIF location_category values

→ clif_adt_location_categories.csv

b. ICU Location Type Mapping: For ICU locations, map location_name to the appropriate location_type

→ clif_adt_location_type.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your ADT data to include only inpatient hospitalizations. Exclude outpatient visits, observation stays, or other non-inpatient encounters.

Step 3: Apply Mappings

Join the mapping files created in Step 1 to your ADT table to add location_category and location_type columns.

Step 4: Fix Time Intervals

a. Remove zero-duration stays:

Filter out records where in_dttm == out_dttm

b. Fix overlapping intervals:

  1. 1. Generate unique time points per patient
  2. 2. Create non-overlapping intervals from consecutive time points
  3. 3. Map original records using foverlaps()
  4. 4. Resolve conflicts: select most recent when multiple locations exist

c. Merge consecutive same-location stays:

Use run-length encoding to group and merge consecutive identical locations

Step 5: Create hospital_type Column

Assign the appropriate hospital_type value from the CLIF permissible set based on your institution's characteristics.

Step 6: Final Cleanup
  • Deduplicate: Remove duplicates by composite key hospitalization_id, in_dttm
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Ensure out_dttm > in_dttm for all records
  • Validate location_category mappings against mCIDE
  • Flag cases where ED visits occur after ICU/Ward
  • Check for remaining duplicates by composite key

code_status Click to flip

  • • Code status category mapping (mCIDE)
  • • Deduplication & UTC conversion

Code Status

Important Notes
  • • This table contains only code status orders placed by clinicians
  • • It is NOT equivalent to the code status display name in the EMR
Step 1: Create Code Status Mappings

Before transforming your code status data, prepare the mapping file:

Code Status Category Mapping: Map all site-specific code_status_name values to standardized CLIF code_status_category values

→ clif_code_status_categories.csv
Step 2: Filter to Patients with Inpatient Hospitalizations

Filter your code status data to include only patients with an inpatient hospitalization. Since code status is at the patient level, include only patients whose hospitalization_id exists in your CLIF ADT table.

Step 3: Apply Mappings

Join the mapping file created in Step 1 to your code status table to add the code_status_category column.

Step 4: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Validate code_status_category mappings against mCIDE
  • Ensure all hospitalization_id values exist in ADT table

crrt_therapy Click to flip

  • • CRRT mode category mapping (mCIDE)
  • • Unit conversions to CLIF standards
  • • Verify missingness by modality

CRRT Therapy

Step 1: Create CRRT Mode Mappings

Before transforming your CRRT data, prepare the mapping file:

Mode Category Mapping: Map all site-specific crrt_mode_name values to standardized CLIF crrt_mode_category values

→ clif_crrt_therapy_mode_categories.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your CRRT data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping file created in Step 1 to your CRRT table to add the crrt_mode_category column.

Step 4: Unit Conversions

Critical Requirement:

All CRRT parameters must be reported using CLIF units. Please refer to the data dictionary page for CRRT setting units.

If your source data uses different units, convert to CLIF standards. Eg:

ParameterCLIF UnitConversion Example
blood_flow_ratemL/min0.2 L/min → 200 mL/min
dialysate_flow_ratemL/hr2 L/hr → 2000 mL/hr
ultrafiltration_outmL/hr1.5 L/hr → 1500 mL/hr
Step 5: Verify Missingness by Modality

Validate that parameter availability matches expected patterns for each CRRT modality:

ModalityBlood FlowPre-FilterPost-FilterDialysateUF Out
SCUFEE
CVVHEEEE
CVVHDEEE
CVVHDFEEEEE
AVVHEPPE
AVVHDEPE
AVVHFEPPPE

E = Expected | P = Possible | Empty = Not applicable

Step 6: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Validate crrt_mode_category mappings against mCIDE
  • Verify parameter missingness aligns with modality expectations
  • Check unit conversions are applied correctly

hospital_diagnosis Click to flip

  • • Primary vs secondary diagnosis assignment
  • • Present on admission (POA) filtering

Hospital Diagnosis

Important Notes
  • • This table only includes finalized billing diagnosis codes for hospital reimbursement
  • • All other diagnosis codes for a patient are included in the concept table patient_diagnosis
Step 1: Filter to Inpatient Hospitalizations

Filter your hospital diagnosis data to include only inpatient hospitalizations.

Step 2: Create Primary Diagnosis Flag

If your source data has diagnosis ranking, create the diagnosis_primary column:

  • 1 = Primary diagnosis (rank = 1)
  • 0 = Secondary diagnosis (rank ≥ 2)
Step 3: Filter and Assign POA Values

Only include rows where poa_present is definitively Yes or No:

Exclude rows with:

Unknown, Null, Missing, Clinically Undetermined, Exempt from POA Reporting, Unspecified, etc.

Assign values:

  • 1 = Yes (present on admission)
  • 0 = No (not present on admission)

No other values are permitted.

Step 4: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Verify diagnosis_primary contains only 0 or 1
  • Verify poa_present contains only 0 or 1
  • Validate diagnosis codes are in expected format (ICD-10, etc.)

hospitalization Click to flip

  • • Admission & discharge category mapping (mCIDE)
  • • Age calculation & filtering
  • • Geographic variable cleaning (FIPS codes)
Composite Key: hospitalization_id

Hospitalization

Important Notes
  • • If a patient is discharged to Home/Hospice, then discharge_category == Hospice
  • • Geographic indicators (zipcode_nine_digit, census_block_code, etc.) should be added if available. zipcode_nine_digit is preferred over zipcode_five_digit; census_block_code is ideal for census-based indicators
  • • If a patient is transferred between different hospitals within a health system, create a new hospitalization_id
  • • If a patient is seen in ER at hospital A then admitted to inpatient at hospital B, use one hospitalization_id for both stays
  • • A hospitalization_joined_id can be created from contiguous hospitalization_ids
Step 1: Create Hospitalization Mappings

Before transforming your hospitalization data, prepare the mapping files:

a. Admission Type Mapping: Map admission_type_name to admission_type_category

→ clif_hospitalization_admission_type_categories.csv

b. Discharge Mapping: Map discharge_name to discharge_category

→ clif_hospitalization_discharge_categories.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your hospitalization data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files created in Step 1 to add admission_type_category and discharge_category columns.

Step 4: Age Calculation & Filtering
  • • If age_at_admission is not present, calculate it using birth_date
  • Drop encounters where age > 120 years or < 18 years
Step 5: Clean Geographic Variables

Clean and standardize FIPS codes by removing spaces, dashes, and special characters:

VariableDigitsDescription
state_code2State FIPS
county_code5State + County
census_tract11State + County + Tract
census_block_group_code12Tract + Block Group
census_block_code15Full Block ID

Derive from census_block_id:

state_code = census_block_id[0:2]
county_code = census_block_id[0:5]
census_tract = census_block_id[0:11]
census_block_group_code = census_block_id[0:12]

fips_version:

Year of Census geography definitions (2000, 2010, 2020)

Step 6: Final Cleanup
  • Deduplicate: Remove duplicate rows by composite key hospitalization_id
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Ensure discharge_dttmadmission_dttm
  • Verify FIPS codes have correct digit lengths
  • Validate category mappings against mCIDE

labs Click to flip

  • • Lab order & category mapping (mCIDE)
  • • Blood/plasma/serum specimens only
  • • Unit standardization to CLIF reference units
  • • Numeric value parsing

Labs

Step 1: Create Lab Mappings

Before transforming your lab data, prepare the mapping files:

a. Lab Order Mapping: Map lab_order_name to lab_order_category

→ clif_labs_order_categories.csv

b. Lab Category Mapping: Map lab_name to lab_category

→ clif_lab_categories.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your lab data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files created in Step 1 to add lab_order_category and lab_category columns.

Step 4: Filter Specimen Types

Include only:

Blood, Plasma, Serum samples

Exclude:

Urine and other fluid specimens

Step 5: Standardize Reference Units

Critical Requirement:

All lab values must be reported using CLIF reference units. Only listed reference units are permissible for respective lab categories. Sites must convert raw values during ETL.

Reference units for each lab category:

→ clif_lab_categories.csv (reference_unit column)
Step 6: Create Numeric Value Field

Create lab_value_numeric by parsing the character field lab_value to extract only the numeric part of the string.

lab_value_numeric = parse_number(lab_value)
Step 7: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Validate lab_category mappings against mCIDE
  • Verify all units match CLIF reference units
  • Check lab_value_numeric parsing accuracy

medication_admin_continuous Click to flip

  • • MAR action & med category mapping (mCIDE)
  • • Filter continuous medications
  • • Combination drug handling
  • • Trial drugs & placebos

Medication Admin Continuous

Step 1: Create Medication Mappings

Before transforming your medication data, prepare the mapping files:

a. Action Category Mapping: Map mar_action_namemar_action_category

→ clif_medication_admin_continuous_action_categories.csv

b. Med Category Mapping: Map med_namemed_category

→ clif_medication_admin_continuous_med_categories.csv

c. Route Category Mapping: Map med_route_namemed_route_category

→ clif_medication_admin_continuous_med_route_categories.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your medication data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files to add category columns.

Step 4: Filter Continuous Medications

Include only medications with time-based dosing units:

filter(grepl("min|hr|day", med_dose_unit, ignore.case = TRUE))
Step 5: Special Case - Inhaled Nitric Oxide

If inhaled nitric oxide is not in your medication admin table, refer to flowsheets data. The order typically goes in as an RT order (not medication order) and is documented in flowsheets rather than MAR.

Step 6: Handle Combination & Trial Drugs
A. Combination Drugs

When to duplicate rows:

Only duplicate combo meds that are not in the mCIDE as a med_category itself. Assign a duplicated row for each component med_category.

Example: "MedA MedB 5-325mg Tab"

If medA_medB is not a CLIF med_category, but medA and medB are separate categories:

→ Row 1: med_category = "medA", med_dose = 5mg

→ Row 2: med_category = "medB", med_dose = 325mg

Both rows maintain same: hospitalization_id, admin_dttm, med_name

Unit Conversion (mL → mg):

If original unit is mL, convert to mg using concentration from med_name.

e.g., "MedA-MedB 7.5-325 MG/15 ML" at 30 mL → 15mg MedA, 650mg MedB

B. Trial Drugs & Placebos

Map trial drugs to trial_drug as med_category. Leave med_group empty if with placebo.

Example:

"ACYCLOVIR OR PLACEBO CAPSULE (IRB 171591) 400 MG"

→ med_category: trial_drug

→ med_group: None

  • Brand Names: Map to generic, preserve brand in notes
Step 7: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Validate med_category mappings against mCIDE
  • Verify continuous med filter captures time-based units
  • Check combo drug rows have correct dose splits

medication_admin_intermittent Click to flip

  • • MAR action & med category mapping (mCIDE)
  • • Filter intermittent medications
  • • Combination drug handling
  • • Trial drugs & placebos

Medication Admin Intermittent

Step 1: Create Medication Mappings

Before transforming your medication data, prepare the mapping files:

a. Action Category Mapping: Map mar_action_namemar_action_category

→ clif_medication_admin_intermittent_action_categories.csv

b. Med Category Mapping: Map med_namemed_category

→ clif_medication_admin_intermittent_med_categories.csv

c. Route Category Mapping: Map med_route_namemed_route_category

→ clif_medication_admin_intermittent_med_route_categories.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your medication data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping files to add category columns.

Step 4: Filter Intermittent Medications

Exclude medications with time-based dosing units (keep only intermittent):

filter(!grepl("min|hr|day", med_dose_unit, ignore.case = TRUE))
Step 5: Special Case - Inhaled Nitric Oxide

If inhaled nitric oxide is not in your medication admin table, refer to flowsheets data. The order typically goes in as an RT order (not medication order) and is documented in flowsheets rather than MAR.

Step 6: Handle Combination & Trial Drugs
A. Combination Drugs

When to duplicate rows:

Only duplicate combo meds that are not in the mCIDE as a med_category itself. Assign a duplicated row for each component med_category.

Example: "MedA MedB 5-325mg Tab"

If medA_medB is not a CLIF med_category, but medA and medB are separate categories:

→ Row 1: med_category = "medA", med_dose = 5mg

→ Row 2: med_category = "medB", med_dose = 325mg

Both rows maintain same: hospitalization_id, admin_dttm, med_name

Unit Conversion (mL → mg):

If original unit is mL, convert to mg using concentration from med_name.

e.g., "MedA-MedB 7.5-325 MG/15 ML" at 30 mL → 15mg MedA, 650mg MedB

B. Trial Drugs & Placebos

Map trial drugs to trial_drug as med_category. Leave med_group empty if with placebo.

Example:

"ACYCLOVIR OR PLACEBO CAPSULE (IRB 171591) 400 MG"

→ med_category: trial_drug

→ med_group: None

  • Brand Names: Map to generic, preserve brand in notes
Step 7: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Validate med_category mappings against mCIDE
  • Verify intermittent med filter excludes time-based units
  • Check combo drug rows have correct dose splits

patient Click to flip

  • • Demographics mapping (mCIDE)
  • • Handle missing category values
  • • One record per patient_id
Composite Key: patient_id

Patient

Step 1: Create Patient Mappings

Before transforming your patient data, prepare the mapping files:

a. Race Category Mapping: Map race_namerace_category

→ clif_patient_race_categories.csv

b. Sex Category Mapping: Map sex_namesex_category

→ clif_patient_sex_categories.csv

c. Ethnicity Category Mapping: Map ethnicity_nameethnicity_category

→ clif_patient_ethinicity_categories.csv

d. Language Category Mapping: Map language_namelanguage_category

→ clif_patient_language_categories.csv
Step 2: Filter to Inpatient Patients

Filter your patient data to include only patients whose hospitalization_id exists in your CLIF ADT table.

Step 3: Apply Mappings & Handle Missing Values

Join the mapping files to add category columns, then fill missing values:

Fill missing/null values:

  • race_category"Unknown"
  • sex_category"Unknown"
  • ethnicity_category"Unknown"
  • language_category"Unknown or NA"
Step 4: Handle External Death Data Sources

If using an external source like a state vital records registry to extract death_dttm and the death records are only in date format (no timestamp):

  • 1. Same day as discharge: If the external death date matches the patient's discharge_dttm date, use the discharge_dttm as the death_dttm
  • 2. After discharge: If the death date occurs after discharge_dttm, use the death date and add midnight as the timestamp (format: YYYY-MM-DD 00:00:00+00:00)
Step 5: Final Cleanup
  • Deduplicate: Remove duplicate rows by composite key patient_id. Ensure one record per patient_id
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Ensure one record per patient_id
  • Validate all category mappings against mCIDE
  • Verify no null values remain in category fields

patient_assessments Click to flip

  • • Assessment category mapping (mCIDE)
  • • Value type casting (numeric/categorical/text)

Patient Assessments

Step 1: Create Assessment Mappings

Before transforming your assessment data, prepare the mapping file:

Assessment Category Mapping: Map assessment_nameassessment_category

→ clif_patient_assessment_categories.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your assessment data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Join the mapping file to add assessment_category and assessment_group columns.

Step 4: Create Value Type Columns

Identify and cast the measured value for each assessment to the appropriate column:

  • numerical_value → Cast numeric assessment values
  • categorical_value → Cast categorical/ordinal values
  • text_value → Cast free-text values
Step 5: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Validate assessment_category mappings against mCIDE
  • Verify value type casting is correct for each assessment

patient_procedures Click to flip

  • • CPT codes (HCPCS Level 1) only
  • • ICD-10-PCS procedure codes
  • • Filter to relevant procedure codes

Patient Procedures

Important Notes
  • • Contains standardized procedural billing codes (HCPCS) for hospitalizations
  • • Only includes procedures that were performed/completed (not cancelled)
  • Includes: CPT codes billed by clinicians (HCPCS Level 1 - professional billing)
  • Excludes: Hospital billing codes (HCPCS Level 2 - products, supplies, services without CPT codes)
  • • Also contains ICD-10-PCS procedure codes (used for DRG calculation, may appear in hospital_diagnosis)
Step 1: Identify Relevant Procedure Codes

Identify the relevant procedure codes required for CLIF:

Step 2: Filter Data

Filter to inpatient hospitalizations and only include the relevant procedure codes identified in Step 1.

Step 3: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Date-Only Columns

When your source data only contains a date (no time component):

  • • Set the time to midnight (00:00:00) in your local timezone
  • • Make it timezone-aware
  • • Then convert to UTC

Example: A date of 2021-08-26 becomes 2021-08-26 00:00:00 local time, which converts to 2021-08-26 05:00:00+00:00 UTC (for US/Central timezone).

Data Quality Checks
  • Verify procedure codes are valid CPT or ICD-10-PCS codes
  • Ensure only completed procedures are included

position Click to expand

  • • Free text position standardization
  • • Prone position detection
  • • Supine-type consolidation
  • • Missing label handling

Position

Step 1: Identify Relevant Position Measurements

Identify relevant position measurement names from flowsheets and filter to them:

Example flowsheet names:

• "NUR RS POSITION"
• "NUR RS DEGREE OF HEAD OF BED ELEVATION"
Step 2: Filter to Inpatient Hospitalizations

Filter your position data to include only inpatient hospitalizations.

Step 3: Identify Prone Positions

Prone detection using regex:

mutate(prone = if_else(str_detect(position_name,
regex("prone", ignore_case = TRUE)), 1, 0))
Step 4: Identify Supine-Type Positions
supine = if_else(str_detect(position_name, regex("supin|supine", ignore_case = TRUE)), 1, NaN)
fowlers = if_else(str_detect(position_name, regex("fowler", ignore_case = TRUE)), 1, NaN)
sitting = if_else(str_detect(position_name, regex("sitting", ignore_case = TRUE)), 1, NaN)
Step 5: Consolidate Supine-Type

Group related positions (Fowlers, Trendelenburg, Sitting) as supine-type.

Step 6: Handle Missing Labels
supine = if_else(is.na(supine) & prone == 0, 1, supine)
supine = if_else(is.na(supine) & prone == 1, 0, supine)
Position Categories

Prone Positions:

• "Prone" / "Proning" / "Prone positioning"
• "Face Down"
• "Prone Positioning for ARDS"
• "Prone Ventilation"
• "Swimming Position"

Supine-Type Positions:

• "Supine"
• "Fowlers" / "Semi-Fowlers"
• "Trendelenburg"
• "Sitting"
Step 7: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Verify prone/supine are mutually exclusive (if prone=1, then supine=0)
  • Check position categories are correctly classified

respiratory_support Click to expand

  • • Map device_name → device_category
  • • Map mode_name → mode_category
  • • Pivot flowsheets from long to wide
  • • Create tracheostomy variable
  • • Validate IMV vs Non-IMV settings

Respiratory Support

Step 1: Create Respiratory Support Mappings

Before transforming your respiratory support data, prepare the mapping files:

a. Device Category Mapping: Map device_namedevice_category

→ clif_respiratory_support_device_categories.csv

b. Mode Category Mapping: Map mode_namemode_category

→ clif_respiratory_support_mode_categories.csv

c. Handling Multiple Source Measures:

Multiple flowsheet measure names can map to the same CLIF element (e.g., device_name). Include all matching measures and prioritize by counting unique hospitalizations where each appears.

Steps:

  1. Include all flowsheet measures that map to the CLIF element
  2. Pivot each measure into its own column (e.g., device_name_1, device_name_2, device_name_3)
  3. Use coalesce() to select the first non-null value in priority order
COALESCE(device_name_1, device_name_2, device_name_3) AS device_name

Example:

Source Measure NameHosp. CountPriorityPivoted Column
RT RS OXYGEN DEVICE1001device_name_1
RT O2 DELIVERY DEVICE752device_name_2
ED RT RS OXYGEN DEVICE503device_name_3

Pivot and coalesce to get final value:

patient_iddevice_name_1device_name_2device_name_3device_name = COALESCE(device_name_1, device_name_2, device_name_3)
123NULLNasal CannulaRoom AirNasal Cannula
456High Flow NCNULLHigh Flow NCHigh Flow NC
789NULLNULLRoom AirRoom Air
Step 2: Filter to Inpatient Hospitalizations

Filter your respiratory support data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Apply the device and mode mappings to your flowsheets data.

Step 4: Pivot Long to Wide

Pivot the long dataset to wide format. Data must be unique by index columns before pivoting:

wide_resp_df = resp_df.pivot(
  index=["patient_id", "hospitalization_id", "recorded_dttm"],
  columns="unique_names",
  values="meas_value"
)

Note: Ensure no duplicate combinations of (patient_id, hospitalization_id, recorded_dttm, unique_names) before pivoting.

Step 5: Create Tracheostomy Variable

Create the tracheostomy variable by:

  • • Identifying any fields capturing presence of tracheostomy
  • • Checking if device_name contains "trach"

Special Case:

If device_name contains "Vent" (e.g., "Trach Collar: Vent") AND other respiratory settings are observed, then device_category ≠ "Trach Collar"

Step 6: Device Category Logic

Invasive Mechanical Ventilation (IMV):

grepl('Vent', device_name, ignore.case = TRUE) &
  !grepl('Venturi Mask', device_name, ignore.case = TRUE) ~ 'IMV'

Other Categories:

NIPPV|Bipap → 'NIPPV'
CPAP → 'CPAP'
High Flow NC → 'High Flow NC'
Cannula → 'Nasal Cannula'
Mask|Non-Rebreather → 'Face Mask'
Room Air → 'Room Air'
Trach Collar → 'Trach Collar'
Mode-Based Category Override

Mode names can override device categories when conflicts exist:

"SIMV - PC PS" → "IMV"
"A/C Volume" → "IMV"
"CPAP" → "CPAP"
"NIV-PC" → "NIPPV"
"BiPAP" → "NIPPV"
Step 7: Unit Conversions

If your source data uses different units for device settings, convert to CLIF standards:

Example:

minute_vent_obs is required to be in liters. If source data is in mL, convert accordingly (divide by 1000).

Step 8: Sanity Check for IMV vs Non-IMV

Verify device_category matches expected settings. If a hospitalization has respiratory settings typically used for IMV, device_category should not be a non-IMV device like Room Air, Face Mask, Nasal Cannula, etc.

Non-IMV Expected Settings:

DeviceModeReq SettingsReq Obs
CPAPPS/CPAPfio2_set, peep_settidal_volume_obs, resp_rate_obs
Face Mask-lpm_set, fio2_set (possible)-
High Flow NC-fio2_set, lpm_set-
Nasal Cannula-lpm_set-
NIPPVVol Supportfio2, peep, tv, rr, insp_time-
NIPPVPS/CPAPfio2, peep, pip, rr, insp_time-
Room Air-No settingsNo obs
Trach Collar-lpm_set, fio2_setNo obs

IMV Expected Settings by Mode (device_category == "IMV"):

ventilator_settingAC-VCPS/CPAPPCPRVCSIMVVol Sup
fio2_setEEEEEE
tidal_volume_setEEPE
resp_rate_setEEEE
pressure_control_setEP
pressure_support_setEE
flow_rate_setPP
inspiratory_time_setPEP
peep_setEEEEEE

E = Expected | P = Possible | Empty = Not applicable

Step 9: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Verify device_category and mode_category are consistent
  • Check IMV rows have expected observations populated
  • Validate tracheostomy flag against device_name patterns

vitals Click to expand

  • • Map vital_name → vital_category
  • • Handle BP split (sbp/dbp extraction)
  • • Standardize units (temp, spo2, weight, height)
  • • Specify invasive vs non-invasive in meas_site_name

Vitals

Step 1: Create Vitals Mappings

Before transforming your vitals data, prepare the mapping file:

Vital Category Mapping: Map vital_namevital_category

→ clif_vitals_categories.csv
Step 2: Filter to Inpatient Hospitalizations

Filter your vitals data to include only inpatient hospitalizations.

Step 3: Apply Mappings

Apply the vital_category mappings to your flowsheets data.

sbp: Systolic blood pressure. Can include both invasive (arterial line) and non-invasive (cuff) measurements; specify in meas_site_name.
dbp: Diastolic blood pressure. Can include both invasive (arterial line) and non-invasive (cuff) measurements; specify in meas_site_name.
map: Mean arterial pressure. Can include both invasive and non-invasive. Source type should be indicated in meas_site_name.
Step 4: Blood Pressure Split

If blood pressure is not available separately as sbp and dbp, split to create the respective vital_category:

bp_df = filtered_df.filter(pl.col("vital_category") == "blood_pressure")
bp_df_pd['bp_split'] = bp_df_pd['vital_value'].str.strip().str.split('/')
bp_df_pd['sbp'] = bp_df_pd['bp_split'].str[0]
bp_df_pd['dbp'] = bp_df_pd['bp_split'].str[1]
Step 5: Standardize Vitals Units

Convert to CLIF standard units:

VitalRequired Unit
temp_cCelsius (°C)
spo2Percent (%)
mapmmHg
height_cmCentimeters (cm)
weight_kgKilograms (kg)
Step 6: Final Cleanup
  • Deduplicate: Remove duplicate rows
  • UTC Conversion: Convert all datetime fields to UTC timezone
  • Data Types: Ensure all variables match data dictionary specifications
Data Quality Checks
  • Check BP split created valid sbp/dbp values
  • Validate meas_site_name populated for invasive measurements

Additional Resources

Ready to Start Your CLIF ETL Implementation?

Get hands-on support and connect with the CLIF community for guidance throughout your ETL journey.