NUE-SELF-IMPClaude (Opus)code

Self-Improving Agent: Build a Trace-to-Skill Optimizer

Name: Self-Improving Agent: Build a Trace-to-Skill Optimizer
Brand: Nuesion
SKU: NUE-SELF-IMP
Price: 2.99 USD
Availability: InStock

by @nuesion

Current Price

$2.99

Total Sales

Rating

Version

Description

Generates production-ready code for a meta-agent that mines live execution traces, scores failures, and rewrites the agent's own skills — no retraining, no fine-tune.

Example Input

agent_runtime: OpenClaw | skill_format: markdown SKILL.md with YAML frontmatter | trace_store: Postgres (table: agent_traces) | deploy_surface: Git-tracked skills/ directory, PR-based rollout | improvement_goals: Reduce 'missing tool' failures by 50% over 30 days without any skill regressing on its existing eval set. | language: Python 3.12

Example Output

[Architecture diagram] trace_ingest → failure_classifier (Claude Sonnet) → candidate_generator (Claude Opus) → shadow_evaluator (parallel replay) → PR bot → human review → merge. [Trace schema] pydantic Trace(id, agent_id, skill_id, tool_calls: list[ToolCall], outcome: Literal['ok','error','timeout'], latency_ms, tokens_in, tokens_out, created_at). SkillRevision(skill_id, parent_sha, diff_text, rationale, linked_traces: list[UUID], shadow_metrics: ShadowMetrics, status: Literal['shadow','promoted','rolled_back']). [Classifier prompt] … [Generator] reads ≥ 5 traces sharing a failure class, emits unified diff against SKILL.md, attaches rationale referencing trace IDs. [Shadow harness] replays last 200 traces in parallel (asyncio.gather, semaphore=16), computes pass_rate_delta, cost_delta, p95_delta. [Promotion rule] promote iff pass_rate_delta ≥ +0.05 AND cost_delta ≤ +10% AND p95_delta ≤ +15% AND no regression on core_eval_suite. [Rollback] wired to live SLO dashboard — if error_rate > 2× baseline for 10 min, auto-revert the last 3 merged SkillRevisions and page on-call. [Kill switch] SKILLS_AUTOPILOT=off disables the generator; existing skills untouched. Full Python code for each component provided.

Tips

1. Start in shadow-only mode for the first 2 weeks; never auto-merge until you've confirmed the eval harness matches production. 2. Cap candidate generation at 3 diffs per skill per day to avoid skill thrash. 3. The classifier LLM is the weak link — eval its accuracy on a labeled trace set before trusting the loop.

Variables

Reviews (0)

No reviews yet. Be the first to review this prompt after purchasing.

Purchase this prompt to leave a review.

Version History

v1Initial version👤 Human4/16/2026

$2.99

+0% this week

Sales

Rating

Price locked for 5 minutes after checkout starts