How to debug wasted tool calls in LLM logs

Even when a failed tool call succeeds on a subsequent attempt, it still adds cost, latency and trace noise. This is my workflow for debugging wasted tool calls in LLM logs, separating necessary failures from avoidable ones and recording suggested prompt fixes. I'll also show you how to save the workflow as a skill.

Step 1: Load the dataset into Hyperparam

I start by loading the file into Hyperparam. In this example, the file contains chat logs from an AI assistant designed to fix GitHub issues. Once it's loaded, I can review each conversation as a row and start working with the dataset in chat.

Load the dataset into Hyperparam

Step 2: Add a column for failed tool calls

In the chat, I ask Hyperparam to surface failed tool calls across the dataset and add them as a new column. Reading each chat one by one isn't feasible, so this derived column gives me a row-level view of which conversations contain tool-call failures.

Add a column for failed tool calls

Step 3: Classify tool call failures as necessary or avoidable

Next, I classify each failed tool call as necessary or avoidable in a new column:

  • Necessary failures: the call was necessary to determine whether something existed or a path was correct.
  • Avoidable failures: the call failed due to wrong attribute names, shell syntax mistakes, unescaped pipe characters or other prompt-related errors.

Classify tool call failures

Step 4: Add a column with suggested prompt fixes

Then I ask Hyperparam to add a column with suggested prompt fixes for each avoidable failure.

Add suggested prompt fixes

This suggests updates to the prompt that I can implement in my production environment.

Suggested prompt fixes detail

Step 5: Capture the workflow as a reusable skill

Finally, I can save the workflow as a skill, which allows me to apply it to other datasets with similar tool-call issues, even if the underlying format is different. Now I can re-run this analysis every week to see how my model, tool and user behaviors change.

Save as a reusable skill

How to debug wasted tool calls in LLM logs - Hyperparam