A browser-native debugger for agent and chat logs

Understand what your agents and chatbots are actually doing. Read millions of rows of traces straight from S3, GCS, or Azure, find the failure modes before your users do, and ship fixes grounded in what happened.

What you can do with Hyperparam

Open Claude Code transcripts, Codex sessions, ChatGPT exports, or Langfuse, LangSmith, and Phoenix traces. Then explore, surface issues, and improve. Here are a few of the workflows teams run on their logs.

Debug wasted tool calls

Find avoidable tool failures in your agent traces and capture suggested prompt fixes:

  • Load your agent log file (Parquet or JSONL).
  • Ask chat to surface failed tool calls as a new column.
  • Classify each failure as avoidable (prompt or schema error) or necessary (probing the environment).
  • Generate a column of suggested prompt fixes for the avoidable ones.
  • Save the workflow as a skill to re-run on next week's logs.

Expected results: Failure categorization across the dataset, concrete prompt fixes you can ship, a reusable skill.

Filter out low-quality chat responses

Score and remove sycophantic or weak responses from your chatbot history:

  • Load your chat log dataset.
  • Ask chat to add a 0-1 sycophancy score for each row.
  • Sort or filter on the new column to inspect the worst offenders.
  • Create a SQL view of the rows you want to keep and export.

Expected results: New score column, filtered dataset, cleaned export ready for fine-tuning or eval.

Classify what your assistant is being asked to do

Categorize unstructured system prompts or user messages to see your real traffic mix:

  • Load your conversation dataset.
  • Ask chat to create a category column from the prompt text.
  • Sort and review to verify category assignments.
  • Build a SQL view to see the distribution of prompt patterns across the dataset.

Expected results: A categorical column, a clear picture of which prompt patterns dominate, and where they fail.

Compare model performance on real traces

Run multiple models against the same inputs from your log set and pick the best:

  • Load a dataset of inputs (your own or a public one).
  • Ask chat to summarize or answer with multiple models in parallel columns.
  • Compare outputs side by side and have the agent explain the differences.
  • Get a recommendation for which model to use, grounded in your data.

Expected results: Side-by-side model outputs, qualitative comparison, a recommendation backed by real traces.

Features built for log-scale debugging

Powerful performance

Stream multi-gigabyte Parquet and JSONL traces directly from object storage. Lazy computation keeps billion-row tables responsive.

Time-to-first-data (lower is better)
155 ms
593 ms
862 ms
3466 ms
Hyparquet
Parquet WASM
Parquet JS
DuckDB WASM

Source: Hyparquet: The Quest for Instant Data

AI agent that works with you

Ask in plain language: which sessions failed, where the agent looped, which tool calls burned tokens. The agent searches your logs, adds columns, builds SQL views, and surfaces the answers.

Join across all your sources

Combine local files, S3, GCS, Azure Blob, Hugging Face, Iceberg tables, and GitHub repos in a single workspace. Run SQL across sources to correlate agent behavior with the code, issues, and prompts that drove it.

Browser-first by design

Credentials stay in the browser. HTTP range requests pull only the bytes needed, with no server-side proxy and no warehouse round-trip. Local files never leave your machine unless you choose to upload them.

Supported formats

Open Parquet, JSONL, and CSV files at multi-gigabyte scale. Drill into nested conversations, tool calls, and reasoning steps without flattening them first.

Skills and export

Save a workflow once and re-run it on next week's logs. Export filtered datasets as Parquet, JSONL, or CSV to feed your eval, fine-tuning, or prompt-engineering pipelines.

Where Hyperparam fits: HypStack

Hyperparam is the analysis side of an open stack we call HypStack. Both pieces are open, so your AI traces stay in your bucket, in your IAM, in your region. No vendor in the middle, no per-GB ingest fees, no copies of sensitive prompts behind someone else's API.

Collect with HypAware (OpenTelemetry)

HypAware is an OTel collector for AI workloads, deployable across your fleet via your existing MDM. It captures developer laptops, IDE plugins, agents, MCP tool calls, chatbots, and production LLM features. Pure OTel underneath, so anything that already speaks OTLP just works. Traces land as Apache Iceberg or Parquet in your own S3, GCS, or Azure account: object-storage prices, infinite retention, and same-day interop with Snowflake, Databricks, DuckDB, Spark, and Trino.

Analyze with Hyperparam

A browser-native client that reads Iceberg and Parquet directly from your bucket. No cluster, no SQL endpoint, no ingestion. Join trace tables with GitHub repos, other buckets, other Iceberg tables, and existing warehouse exports in one workspace. Credentials stay in the browser; an agent can query its own traces with no auth proxy in between.

The point is closing the loop. Hyperparam surfaces what is breaking in your traces (token-burn hotspots, retry loops, tool failures, rabbit-holes), then those findings feed back into the prompts, tools, and skills your team already maintains as markdown. The agents pick up the change. The cycle repeats.

Read the HypStack overview >>

Sign in to get started >>