Hyperparam OSS Universe

Hyperparam Open-Source

Hyperparam was founded to address a critical gap in the machine learning ecosystem: the lack of a user-friendly, scalable UI for exploring and curating massive datasets.

Our mission is grounded in the belief that data quality is the most important factor in ML success, and that better tools are needed to build better training sets. In practice, this means enabling data scientists and engineers to “look at your data” – even terabyte-scale text corpora – interactively and entirely in-browser without heavy infrastructure. By combining efficient data formats, high-performance JavaScript libraries, and emerging AI assistance, Hyperparam's vision is to put data quality front and center in model development. Our motto “the missing UI for AI data” reflects its goal to make massive data exploration, labeling, and quality management as intuitive as modern web apps, all while respecting privacy and compliance through a local-first design.

Mission and Vision: Data-Centric AI in the Browser

Our mission is to empower ML practitioners to create the best training datasets for the best models. This stems from an industry-wide realization that model performance is ultimately bounded by data quality, not just model architecture or hyperparameters. Hyperparam envisions a new workflow where:

  • Interactive Data Exploration at Scale: Users can freely explore huge datasets (millions or billions of records) with fast, free-form interactions to uncover insights. Unlike traditional Python notebooks that struggle with large data (often requiring downsampling or clunky pagination), Hyperparam leverages browser technology for a smooth UI.
  • AI-Assisted Curation: Hyperparam integrates ML models to help label, filter, and transform data at a scale that would be impractical to review manually. By combining a highly interactive UI with model assistance, we make it possible for the user to use data to express exactly what they want from the model.
  • Local-First and Private: Hyperparam runs entirely client-side, with no server dependency. This design not only simplifies setup (no complex pipeline or cloud needed) but also addresses enterprise compliance and security concerns, since sensitive data need not leave the user's machine. Fully browser-contained tools can bypass major adoption hurdles.

Experts across data engineering and MLOps widely agree on the need for better data exploration and labeling tools to tackle today's bottlenecks. We believe that the way to do that is to make data-centric AI workflows that are faster, easier to deploy, and more scalable – enabling users to iteratively improve data quality, which in turn yields better models.

The Hyperparam OSS Universe

Hyperparam OSS Universe Flowchart

Hyperparam delivers on our vision through a suite of open-source tools that tackle different aspects of data curation. These tools are built in TypeScript/JavaScript for seamless browser and Node.js usage.

We care about performance, minimal dependencies, and standards compliance.

Hyparquet GitHub

Hyparquet: In-Browser Parquet Data Access

Hyparquet is a lightweight, pure-JS library for reading Apache Parquet files directly in the browser. Parquet is a popular columnar format for large datasets, and Hyparquet enables web applications to tap into that efficiency without any server.

Hyparquet allows data scientists to open large dataset files instantly in a browser UI for examination, without needing Python scripts, servers, or cloud databases. It's useful for quick dataset validation (e.g. checking a sample of a new data for quality issues) and for powering web-based data analysis tools. Because it's pure JS, developers can integrate Hyparquet into any web app or Electron application that needs to read Parquet. It is the core engine behind Hyperparam's own dataset viewer, enabling what was previously thought impossible: client-side big data exploration.

  • Browser-Native & Dependency-Free: Hyparquet has zero external dependencies and is designed to run in both modern browsers and Node.js. At ~9.7 KB gzipped, it's extremely lightweight. It implements the full Parquet specification, aiming to be the “world's most compliant Parquet parser” that can open more files (all encodings and types) than other libraries.
  • Efficient Streaming of Massive Data: Built with performance in mind, Hyparquet only loads the portions of data needed for a given query or view. It leverages Parquet's built-in indexing to fetch just the required rows or columns on the fly. This “load just in time” approach makes it feasible to interactively explore multi-gigabyte or even billion-row datasets in a web app.
  • Complete Compression Support: Parquet files often use compression (Snappy, Gzip, ZSTD, etc.). Hyparquet by default handles common cases (uncompressed, Snappy), and with a companion library Hyparquet-Compressors, it supports all Parquet compression codecs. This is achieved with WebAssembly-optimized decompressors – notably HySnappy, a WASM Snappy decoder that accelerates parsing with minimal footprint.
Hyparquet-Writer GitHub

Hyparquet-Writer: Export Parquet Files from JavaScript

To complement Hyparquet's reading capabilities, Hyparquet-Writer provides a way to write or export data to Parquet format in JavaScript. It is designed to be as lightweight and efficient as its reading counterpart.

After exploring or filtering a dataset with Hyperparam's tools, a user might want to save a subset or annotations. Hyparquet-Writer makes it possible to export those results in-browser as a Parquet file (or in Node.js without needing Python/Java libraries). This is valuable for creating shareable “refined datasets” or for moving data between systems while staying in Parquet (avoiding expensive CSV conversions).

  • Fast Parquet Writing in JS: Hyparquet-Writer can take in JavaScript data (arrays of values per column) and output a binary Parquet file. It provides high efficiency and compact storage, so that even in-browser data manipulation results can be saved in a columnar format. It is especially efficient at representing sparse annotation data.
  • Extreme Data Compression: Parquet can represent large datasets very efficiently. It is especially efficient at representing sparse annotation data, exactly what we need for annotating and curating datasets.
  • Tiny and easy to deploy: Before Hyparquet-Writer the only way to write parquet files from the browser was huge wasm bundles (duckdb, datafusion). Hyparquet-Writer is less than 100kb of pure JavaScript, so it’s trivial to include with modern frontend applications.
HighTable GitHub

HighTable: Scalable React Data Table Component

Hyperparam HighTable table component

HighTable is a React-based virtualized table component for viewing extremely large tables in the browser. It is the UI workhorse that displays data fetched by Hyparquet or other sources.

HighTable is crucial for visual data exploration. In Hyperparam's dataset viewer, HighTable renders the content of Parquet files, allowing you to scroll through data that far exceeds memory limitations. You can also embed HighTable in custom web apps where a large results table is needed (for example, viewing logs, telemetry, or any big tabular data) without losing interactivity. By handling only what's visible, it bridges the gap between big data backends and a smooth front-end experience.

HighTable provides:

  • Virtual Scrolling for Large Data: Instead of rendering thousands or millions of rows (which would choke the browser), HighTable only renders the rows in the current viewport, dynamically loading more as you scroll. This ensures smooth performance even with datasets that have millions of entries.
  • Asynchronous Data Loading: HighTable works with a flexible data model that can fetch data on-the-fly. The table requests rows for a given range (e.g., 100–200) through a provided function. This means the data could come from an in-memory array, an IndexedDB store, or a remote source via Hyparquet. HighTable is agnostic as long as it can retrieve slices. This design allows infinite scrolling through data of “any size”.
  • Rich Table Features: Despite focusing on scale, HighTable offers convenient features expected in a spreadsheet-like interface: optional column sorting, adjustable column widths, and event hooks (e.g., double-click on a cell). It even displays per-cell loading placeholders to indicate when data is being fetched, maintaining a responsive feel.
Icebird GitHub

Icebird: JavaScript Apache Iceberg Table Reader

Icebird extends Hyperparam's reach into data stored in Apache Iceberg format. Iceberg is a popular table format for data lakes (often used on Hadoop/S3 storage) which contain Parquet files under the hood. Importantly, Iceberg allows you to efficiently evolve large datasets (add/remove rows, add columns, etc). Icebird is essentially a JavaScript Iceberg client that can read Iceberg table metadata and retrieve data files, built on top of Hyparquet.

If you are using Data Lake/Lakehouse architectures, Icebird makes it possible to inspect large Iceberg tables without a big data engine. A data engineer can point Hyperparam's viewer at an S3 path of an Iceberg table and quickly peek at a few rows or columns for validation. This is dramatically simpler than launching Spark or Trino for a small inspection task. Icebird brings our “no backend” philosophy to another major data format.

  • Iceberg Table Access: Given a pointer to an Iceberg table (for example, a directory or catalog entry on cloud storage), Icebird can read the table's schema and metadata, then use Hyparquet to read the actual parquet file fragments that make up the table. It supports Iceberg's features like schema evolution (rename columns) and position deletes, with a roadmap to cover more features as needed.
  • Time Travel Queries: Icebird allows users to retrieve data from older snapshots of the dataset (a feature of Iceberg) by specifying a metadata version to read. This is useful for auditing changes in data over time or reproducing an experiment on a previous dataset state – all from a browser environment.
HyLlama GitHub

Hyllama: Llama.cpp Model Metadata Parser

Hyllama is a slightly different tool in Hyperparam's suite – it's focused on model files rather than dataset files. Specifically, Hyllama is a JavaScript library to parse llama.cpp .gguf files (a format for LLaMA and related large language model weights) and extract their metadata.

Hyllama's primary use case is to allow users to inspect an LLM model's content (architecture parameters, vocab size, layer counts, etc.) and potentially even query its listed tokens or other metadata in the browser. For instance, you can drag-and-drop a .gguf model file onto a web page using Hyllama and quickly see what architecture and quantization it has, without running the model. You can use Hyllama to introspect model files easily or verify that model files match a datasets scheme expectations.

  • Efficient Metadata Extraction: LLM model files in GGUF format can be tens of gigabytes, which is impractical to load entirely in memory. Hyllama is designed to read just the metadata (and tensor indexes) from the file without loading full weights, by using partial reads (e.g., reading the first few MBs that contain the header and index).
  • No Dependencies & Web-Friendly: Like Hyparquet, Hyllama is dependency-free and can run in both Node and browser environments. For browser use, it suggests employing HTTP range requests to fetch just the needed bytes of a model file.
Hyperparam CLI GitHub

Hyperparam CLI: Local Dataset Viewer

The Hyperparam CLI ties everything together into a user-facing application. It is a command-line tool that, when run (npx hyperparam), launches a local web application for dataset viewing. Essentially, it's a one-command way to spin up the Hyperparam browser UI on your own local data.

  • Scalable Local Dataset Viewer: By running the CLI, users can point it to a file, folder, or URL containing data and open an interactive browser view. For example, npx hyperparam mydataset.parquet will open the Hyperparam web UI and display the contents of that Parquet file in a scrollable table. If a directory is given, it provides a file browser to pick a dataset. Under the hood, the CLI uses Node.js to serve the static app and utilizes Hyparquet/Icebird libraries (via a built-in API) to fetch data from local disk or remote URLs, then displays it with HighTable in the browser.

How the Tools Work Together

Hyperparam's suite of open-source tools is the backbone of a cohesive ecosystem tailored specifically for machine learning data workflows, enabling interactive exploration and management directly in the browser. By integrating efficient in-browser data handling (Hyparquet and Icebird), scalable visualization (HighTable), intuitive data export capabilities (Hyparquet-Writer), and model metadata inspection (Hyllama), we hope to show that there is a better way to build data-centric ML tools. We are releasing this work as open source because we believe that everyone benefits from having a strong ecosystem of AI data tools.

If you find these free open source tools useful, please show it! We love GitHub Stars ⭐

Enter your email to hear about new Hyperparam tools and libraries: