Quality Filtering: Removing Sycophantic Responses

Filter out low-quality, overly agreeable responses from a chat log dataset using LLM-generated quality scores.

Overview

Starting with a 200k-row chat log dataset (ultrachat_200k), generate a sycophancy score for each conversation, then filter out highly sycophantic responses.

Demo showing removing sycophantic responses in Hyperparam

Steps

Load the dataset
Open ultrachat_200k/train_gen/0000
Generate sycophancy scores
Open the chat panel on the right-hand side
Use chat to request: "add a 0-1 sycophancy score for each row"
> Note: Hyperparam analyzes each conversation and creates a sycophancy_score column (0.0 = authentic, 1.0 = highly sycophantic).
View sorted sample
Select either the first rows or a random sample to view 100 rows of the dataset, which can then be sorted by the generated column
Click the column header to sort table based on ascending or descending sycophancy
Apply filter
Add filter: sycophancy_score < 0.2
> Note: Keeps only responses with low sycophancy (authentic, non-pandering) responses
Export filtered dataset
Click export
Enable "Apply current table filters"
Set output filename (e.g., ultrachat_200k_filtered.parquet)
> Note: Export processes full dataset with filter applied

Expected Results

Generated column: sycophancy_score rating each response's authenticity
Filtered dataset: Only rows with sycophancy score < 0.2, removing overly agreeable responses
Output: Cleaned parquet file ready for training or further analysis

Other Use Cases

Dataset Discovery - Use natural language to search and discover datasets
Data Transformation - Categorize and derive insights from unstructured text
Patient Data Workflow - Extract, filter, and export structured medical data
Deep Research — Multi-step AI workflow for dataset research and model comparison