Complete Workflow: Patient Data Extraction and Filtering

Extract structured fields from unstructured medical records, filter by criteria, and export a refined dataset.

Overview

Starting with a 150,000+-row parquet file containing unstructured patient records, use LLM-based extraction to create structured columns, filter the dataset by age and diagnosis criteria, and export a subset with selected columns.

Demo showing patient data extraction and filtering in Hyperparam

Steps

Load the dataset
Open Asclepius-Synthetic-Clinical-Notes/0000
Extract structured fields using chat
Open any cell from the 'note' column containing patient information
> Note: We can view full unstructured text data for an individual chart by scrolling down
Use chat to request extraction: "extract age, diagnosis, symptoms, comorbidities, treatments, outcome in separate columns from 'note' column"
> Note: Hyperparam will create 6 new columns and populate them with the extractions.
Columns appear as: age, diagnosis, symptoms, comorbidities, treatments, outcome
> Note: Scroll down and you will see Hyperparam filling out all rows
Open sample
Select either the first rows or a random sample to view 100 rows of the dataset, which can then be sorted by the generated columns
Apply filters
Add Filter by age: age > 50
Add Filter by diagnosis: diagnosis contains respiratory
Add Filter by symptoms: contains "fever"
Only matching patients are shown
Export filtered dataset
Click export
Select specific columns: subject_id, age, diagnosis, symptoms, comorbidities, treatments, outcome
Enable "Apply current table filters"
Set output filename: filtered_patients.parquet
Click export to process full dataset with filters applied

Expected Results

Extracted columns: Structured fields parsed from unstructured patient text
Final export: Exported file includes only patients with matching criteria and only selected columns.

Other Use Cases

Dataset Discovery - Use natural language to search and discover datasets
Data Transformation - Categorize and derive insights from unstructured text
Quality Filtering - Remove low-quality responses from datasets
Deep Research — Multi-step AI workflow for dataset research and model comparison