PLX-A5E9ChatGPT (GPT-4o)education

Claude Code Observability: Trace Tool Calls & LLM Requests

Name: Claude Code Observability: Trace Tool Calls & LLM Requests
Brand: PlexMint Admin
SKU: PLX-A5E9
Availability: InStock

by @plexmint

Current Price

Free

Total Sales

Rating

Version

Description

A structured guide and reusable prompt for instrumenting Claude Code applications with full observability — tracing every tool call, LLM request, and agent decision using Arthur.ai. Essential for debugging agentic workflows, auditing costs, and understanding execution flow in production.

Example Input

app_name: 'CodeReviewBot', claude_model: 'claude-opus-4-6', observability_backend: 'Arthur.ai', observability_goal: 'Understand why some tool call chains take 3x longer than others and identify which tools are responsible for cost spikes', language: 'python', debug_scenario: 'A user reports that their PR review request hung for 45 seconds before returning an empty result'

Example Output

## Step 1 — Instrumentation Architecture For CodeReviewBot, the typical execution graph looks like: user message → LLM turn 1 (plan) → tool calls (read_file, search_code, list_diff) in parallel or sequence → LLM turn 2 (synthesize) → final response. Each LLM turn and each tool invocation is a discrete span worth capturing. Metadata to capture per LLM span: model ID, input/output token counts, stop reason, latency, request ID from the Anthropic response header. Per tool span: tool name, input arguments (truncated at 2 KB), output size in bytes, wall-clock duration, exception type if failed. ## Step 2 — Implementation ```python import time, uuid, functools from anthropic import Anthropic from arthur_bench.client import ArthurClient # Arthur.ai SDK arthur = ArthurClient(api_key=os.environ['ARTHUR_API_KEY']) class TracedAnthropic(Anthropic): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._root_trace_id = None def new_session(self): self._root_trace_id = str(uuid.uuid4()) return self._root_trace_id def messages_create(self, **kwargs): trace_id = self._root_trace_id or str(uuid.uuid4()) start = time.perf_counter() try: response = super().messages.create(**kwargs) arthur.log_trace(trace_id=trace_id, span='llm_request', model=kwargs.get('model'), input_tokens=response.usage.input_tokens, output_tokens=response.usage.output_tokens, latency_ms=round((time.perf_counter()-start)*1000), stop_reason=response.stop_reason) return response except Exception as e: arthur.log_trace(trace_id=trace_id, span='llm_request', error=str(e)) raise def traced_tool(fn): @functools.wraps(fn) def wrapper(*args, **kwargs): start = time.perf_counter() try: result = fn(*args, **kwargs) arthur.log_trace(span='tool_call', tool=fn.__name__, latency_ms=round((time.perf_counter()-start)*1000), status='ok') return result except Exception as e: arthur.log_trace(span='tool_call', tool=fn.__name__, status='error', error=str(e)) raise return wrapper ``` ## Step 3 — Dashboards 1. **Cost by Session** — group spans by root_trace_id, sum input_tokens + output_tokens × model rate, plot as time-series. Alert when session cost exceeds $0.50. 2. **Tool Latency p95** — histogram of tool_call latency_ms grouped by tool name. Identify which tool drives the long tail. 3. **LLM Request Anomaly** — rolling z-score on requests-per-minute; page when z > 3 for 5 consecutive minutes. ## Step 4 — Debug Walkthrough To investigate the 45-second hang: filter Arthur traces where `root_trace_id` matches the affected session and sort spans by start_time. The trace will reveal a `read_file` tool span with latency_ms = 44,200 — the tool was reading a 180 MB binary asset instead of source files. The fix: add a file-size guard (raise if > 1 MB) in the `read_file` handler and log skipped files as warning spans. **Remaining gaps:** Claude's internal chain-of-thought between tool calls is not observable via the SDK. Consider streaming responses and capturing partial text deltas as sub-spans to approximate reasoning latency per step.

Tips

1. Run `new_session()` at the start of each user-initiated conversation to group all spans under one root trace ID — this is the single most important step for making traces readable. 2. Set {observability_backend} to 'OpenTelemetry+Jaeger' if you don't have an Arthur.ai account; the prompt adapts the SDK calls accordingly. 3. For {debug_scenario}, paste the exact error message or symptom your users reported — the more specific, the more actionable the debug walkthrough will be.

Variables

Reviews (0)

No reviews yet. Be the first to review this prompt after purchasing.

Purchase this prompt to leave a review.

Version History

v1Initial version👤 Human4/16/2026

Free

+0% this week

Sales

Rating

Penny Stock - Wallet Only

Sub-$0.50 prompts require wallet balance.