Hyperparam Documentation
Video Overview
Why Hyperparam?
Browser-Native Performance
- Handle massive datasets without downloading or setting up infrastructure
- Stream data directly from sources like Hugging Face, S3, or local files
- Lazy computation ensures you only process what you need
AI-Powered Dataset Curation
- Use state-of-the-art models to assess data quality
- Automatically identify and filter low-quality samples
- Generate new features and transformations with natural language
Designed for ML Workflows
- Purpose-built for pre-training, fine-tuning, and RAG datasets
- Seamless integration with popular ML data formats (ie: Parquet)
- Export curated datasets ready for training
Workspace Mode
Create focused workspaces to edit, filter, and transform subsets of your data with full AI assistance.
Quality Assessment
Automatically grade and rank your data samples using advanced AI models, identifying both low-quality data and high-value examples.
Who Uses Hyperparam?
- ML Engineers building better training datasets
- Data Scientists exploring and understanding large-scale text data
- Research Teams curating datasets for next-generation models
- Enterprises improving their AI models through better data
Getting Started
- Quick start — Start exploring data in under 3 minutes
- Export Chat Logs — Learn how to export chat logs from popular platforms
- Authentication — Understand the benefits of signing in
Use Cases
- Dataset Discovery — Use natural language to search and discover datasets
- Quality Filtering — Filter out low-quality responses using LLM-generated quality scores
- Data Transformation — Derive categorical data from unstructured text fields
- Complete Workflow — Extract structured fields, filter by criteria, and export refined datasets
Open Source
To build Hyperparam we created an ecosystem of open source libraries for efficient data handling in the browser:
- hightable — High-performance react table for large datasets
- hyparquet — Apache Parquet reader for JavaScript and TypeScript
- hyparquet-compressors — Every parquet compression codec
- hysnappy — Snappy decompressor optimized with WebAssembly
- icebird — Apache Iceberg table reader in JavaScript
- hyllama — Llama.cpp model parser in JavaScript
The Future of Dataset Engineering
As models become commoditized, data quality is the key differentiator for AI performance. Hyperparam gives you the tools to build datasets that produce state-of-the-art results.
"Model behavior is not determined by architecture, hyperparameters, or optimizer choices. It's determined by your dataset, nothing else." — James Betker, OpenAI
Start building better datasets today with Hyperparam.
