A browser-native debugger for agent and chat logs
Understand what your agents and chatbots are actually doing. Read millions of rows of traces straight from S3, GCS, or Azure, find the failure modes before your users do, and ship fixes grounded in what happened.
What you can do with Hyperparam
Open Claude Code transcripts, Codex sessions, ChatGPT exports, or Langfuse, LangSmith, and Phoenix traces. Then explore, surface issues, and improve. Here are a few of the workflows teams run on their logs.
Debug wasted tool calls
Find avoidable tool failures in your agent traces and capture suggested prompt fixes:
- Load your agent log file (Parquet or JSONL).
- Ask chat to surface failed tool calls as a new column.
- Classify each failure as avoidable (prompt or schema error) or necessary (probing the environment).
- Generate a column of suggested prompt fixes for the avoidable ones.
- Save the workflow as a skill to re-run on next week's logs.
Expected results: Failure categorization across the dataset, concrete prompt fixes you can ship, a reusable skill.
Filter out low-quality chat responses
Score and remove sycophantic or weak responses from your chatbot history:
- Load your chat log dataset.
- Ask chat to add a 0-1 sycophancy score for each row.
- Sort or filter on the new column to inspect the worst offenders.
- Create a SQL view of the rows you want to keep and export.
Expected results: New score column, filtered dataset, cleaned export ready for fine-tuning or eval.
Classify what your assistant is being asked to do
Categorize unstructured system prompts or user messages to see your real traffic mix:
- Load your conversation dataset.
- Ask chat to create a category column from the prompt text.
- Sort and review to verify category assignments.
- Build a SQL view to see the distribution of prompt patterns across the dataset.
Expected results: A categorical column, a clear picture of which prompt patterns dominate, and where they fail.
Compare model performance on real traces
Run multiple models against the same inputs from your log set and pick the best:
- Load a dataset of inputs (your own or a public one).
- Ask chat to summarize or answer with multiple models in parallel columns.
- Compare outputs side by side and have the agent explain the differences.
- Get a recommendation for which model to use, grounded in your data.
Expected results: Side-by-side model outputs, qualitative comparison, a recommendation backed by real traces.
Features built for log-scale debugging
Powerful performance
Stream multi-gigabyte Parquet and JSONL traces directly from object storage. Lazy computation keeps billion-row tables responsive.
AI agent that works with you
Ask in plain language: which sessions failed, where the agent looped, which tool calls burned tokens. The agent searches your logs, adds columns, builds SQL views, and surfaces the answers.
Join across all your sources
Combine local files, S3, GCS, Azure Blob, Hugging Face, Iceberg tables, and GitHub repos in a single workspace. Run SQL across sources to correlate agent behavior with the code, issues, and prompts that drove it.
Browser-first by design
Credentials stay in the browser. HTTP range requests pull only the bytes needed, with no server-side proxy and no warehouse round-trip. Local files never leave your machine unless you choose to upload them.
Supported formats
Open Parquet, JSONL, and CSV files at multi-gigabyte scale. Drill into nested conversations, tool calls, and reasoning steps without flattening them first.
Skills and export
Save a workflow once and re-run it on next week's logs. Export filtered datasets as Parquet, JSONL, or CSV to feed your eval, fine-tuning, or prompt-engineering pipelines.
