Hyperparam Documentation

Video Overview

Hyperparam Overview

Why Hyperparam?

Browser-Native Performance

  • Handle massive datasets without downloading or setting up infrastructure
  • Stream data directly from sources like Hugging Face, S3, or local files
  • Lazy computation ensures you only process what you need

AI-Powered Dataset Curation

  • Use state-of-the-art models to assess data quality
  • Automatically identify and filter low-quality samples
  • Generate new features and transformations with natural language

Designed for ML Workflows

  • Purpose-built for pre-training, fine-tuning, and RAG datasets
  • Seamless integration with popular ML data formats (ie: Parquet)
  • Export curated datasets ready for training

Workspace Mode

Create focused workspaces to edit, filter, and transform subsets of your data with full AI assistance.

Quality Assessment

Automatically grade and rank your data samples using advanced AI models, identifying both low-quality data and high-value examples.

Who Uses Hyperparam?

  • ML Engineers building better training datasets
  • Data Scientists exploring and understanding large-scale text data
  • Research Teams curating datasets for next-generation models
  • Enterprises improving their AI models through better data

Getting Started

  1. Quick start — Start exploring data in under 3 minutes
  2. Export Chat Logs — Learn how to export chat logs from popular platforms
  3. Authentication — Understand the benefits of signing in

Use Cases

  1. Dataset Discovery — Use natural language to search and discover datasets
  2. Quality Filtering — Filter out low-quality responses using LLM-generated quality scores
  3. Data Transformation — Derive categorical data from unstructured text fields
  4. Complete Workflow — Extract structured fields, filter by criteria, and export refined datasets

Open Source

To build Hyperparam we created an ecosystem of open source libraries for efficient data handling in the browser:

  • hightable — High-performance react table for large datasets
  • hyparquet — Apache Parquet reader for JavaScript and TypeScript
  • hyparquet-compressors — Every parquet compression codec
  • hysnappy — Snappy decompressor optimized with WebAssembly
  • icebird — Apache Iceberg table reader in JavaScript
  • hyllama — Llama.cpp model parser in JavaScript

The Future of Dataset Engineering

As models become commoditized, data quality is the key differentiator for AI performance. Hyperparam gives you the tools to build datasets that produce state-of-the-art results.

"Model behavior is not determined by architecture, hyperparameters, or optimizer choices. It's determined by your dataset, nothing else." — James Betker, OpenAI

Start building better datasets today with Hyperparam.