Complete Workflow: Patient Data Extraction and Filtering
Extract structured fields from unstructured medical records, filter by criteria, and export a refined dataset.
Overview
Starting with a 150,000+-row parquet file containing unstructured patient records, use LLM-based extraction to create structured columns, filter the dataset by age and diagnosis criteria, and export a subset with selected columns.

Steps
- Load the dataset
- Extract structured fields using chat
Open any cell from the 'note' column containing patient information
> Note: We can view full unstructured text data for an individual chart by scrolling down
Use chat to request extraction: "extract age, diagnosis, symptoms, comorbidities, treatments, outcome in separate columns from 'note' column"
> Note: Hyperparam will create 6 new columns and populate them with the extractions.
Columns appear as:
age,diagnosis,symptoms,comorbidities,treatments,outcome> Note: Scroll down and you will see Hyperparam filling out all rows
- Open sample
Select either the first rows or a random sample to view 100 rows of the dataset, which can then be sorted by the generated columns
- Apply filters
Add Filter by age:
age > 50Add Filter by diagnosis:
diagnosis contains respiratoryAdd Filter by symptoms:
contains "fever"Only matching patients are shown
- Export filtered dataset
Click export
Select specific columns:
subject_id,age,diagnosis,symptoms,comorbidities,treatments,outcomeEnable "Apply current table filters"
Set output filename:
filtered_patients.parquetClick export to process full dataset with filters applied
Expected Results
- Extracted columns: Structured fields parsed from unstructured patient text
- Final export: Exported file includes only patients with matching criteria and only selected columns.
Other Use Cases
- Dataset Discovery - Use natural language to search and discover datasets
- Data Transformation - Categorize and derive insights from unstructured text
- Quality Filtering - Remove low-quality responses from datasets
- Deep Research — Multi-step AI workflow for dataset research and model comparison