Hyperparam Documentation

Hyperparam’s documentation shows how to explore and transform massive AI datasets directly in the browser. Use these guides to stream data, apply AI-assisted transformations, and work with large-scale ML workflows without backend infrastructure.

Video Overview

Why Hyperparam?

Browser-Native Performance

Handle massive datasets without downloading or setting up infrastructure
Stream data directly from sources like Hugging Face, S3, or local files
Lazy computation ensures you only process what you need

AI-Powered Dataset Curation

Use state-of-the-art models to assess data quality
Automatically identify and filter low-quality samples
Generate new features and transformations with natural language

Designed for ML Workflows

Purpose-built for pre-training, fine-tuning, and RAG datasets
Seamless integration with popular ML data formats (ie: Parquet)
Export curated datasets ready for training

Workspace Mode

Create focused workspaces to edit, filter, and transform subsets of your data with full AI assistance.

Quality Assessment

Automatically grade and rank your data samples using advanced AI models, identifying both low-quality data and high-value examples.

Who Uses Hyperparam?

ML Engineers building better training datasets
Data Scientists exploring and understanding large-scale text data
Research Teams curating datasets for next-generation models
Enterprises improving their AI models through better data

Getting Started

Quick start — Start exploring data in under 3 minutes
Authentication — Understand the benefits of signing in
Data Sources — Learn how to connect to various data sources
Exporting Chat Logs — Learn how to export chat logs from popular platforms

Use Cases

Explore practical examples of using Hyperparam for common AI and ML data workflows. Each use case shows how to work with large datasets using chat-assisted exploration, filtering, transformation, or export.

Dataset Discovery — Use natural language to search and discover datasets
Quality Filtering — Filter out low-quality responses using LLM-generated quality scores
Data Transformation — Derive categorical data from unstructured text fields
Complete Workflow — Extract structured fields, filter by criteria, and export refined datasets
Deep Research — Multi-step AI workflow for dataset research and model comparison

Open Source

To build Hyperparam we created an ecosystem of open source libraries for efficient data handling in the browser:

hightable — High-performance react table for large datasets
hyparquet — Apache Parquet reader for JavaScript and TypeScript
hyparquet-compressors — Every parquet compression codec
hysnappy — Snappy decompressor optimized with WebAssembly
icebird — Apache Iceberg table reader in JavaScript
hyllama — Llama.cpp model parser in JavaScript

The Future of Dataset Engineering

As models become commoditized, data quality is the key differentiator for AI performance. Hyperparam gives you the tools to build datasets that produce state-of-the-art results.

"Model behavior is not determined by architecture, hyperparameters, or optimizer choices. It's determined by your dataset, nothing else." — James Betker, OpenAI

Start building better datasets today with Hyperparam.