Complete Workflow: Patient Data Extraction and Filtering

Extract structured fields from unstructured medical records, filter by criteria, and export a refined dataset.

Overview

Starting with a 150,000+-row parquet file containing unstructured patient records, use LLM-based extraction to create structured columns, filter the dataset by age and diagnosis criteria, and export a subset with selected columns.

Demo showing patient data extraction and filtering in Hyperparam

Steps

  1. Load the dataset

    Open Asclepius-Synthetic-Clinical-Notes/0000

  2. Extract structured fields using chat

    Open any cell from the 'note' column containing patient information

    > Note: We can view full unstructured text data for an individual chart by scrolling down

    Use chat to request extraction: "extract age, diagnosis, symptoms, comorbidities, treatments, outcome in separate columns from 'note' column"

    > Note: Hyperparam will create 6 new columns and populate them with the extractions.

    Columns appear as: age, diagnosis, symptoms, comorbidities, treatments, outcome

    > Note: Scroll down and you will see Hyperparam filling out all rows

  3. Create a SQL view for the target cohort

    In chat, request: "create a view of patients over 50 with a respiratory diagnosis and fever symptoms, keeping only subject_id, age, diagnosis, symptoms, comorbidities, treatments, outcome"

    Hyperparam calls add_sql_view with a query like:

    SELECT subject_id, age, diagnosis, symptoms, comorbidities, treatments, outcome
    FROM "0000.parquet"
    WHERE age > 50
      AND diagnosis ILIKE '%respiratory%'
      AND symptoms ILIKE '%fever%'

    > Note: Views replace the old filter UI. They are composable, support JOINs across multiple sources, and can project specific columns. Views appear in the workspace file browser alongside sources.

  4. Export the view

    Open the view tab, click the ⋮ button, and choose "Write Table"

    Select export options: columns, max rows, file name (e.g., filtered_patients.parquet)

    Export runs the view's SQL over the full dataset.

Expected Results

  • Extracted columns: Structured fields parsed from unstructured patient text
  • Final export: Exported file includes only patients with matching criteria and only selected columns.

Other Use Cases

Complete Workflow: Patient Data Extraction and Filtering - Hyperparam