Example Input
prompt_concept: "Code review assistant that identifies bugs and suggests improvements", use_case: "Reviewing Python data science code for junior developers", target_models: "Claude-3.5-Sonnet, GPT-4, Gemini-Pro"
Example Output
### SIMPLE VERSION
**Prompt:** Review this Python code for bugs and suggest improvements. Focus on data science best practices.
**Predicted Strengths:** Fast execution, low token cost, works across all models, clear directive
**Predicted Weaknesses:** May miss subtle issues, inconsistent output format, lacks context for junior developers
**Best For:** Quick reviews, experienced developers, budget-conscious applications
**Test Scenarios:** 1) Pandas DataFrame manipulation with memory leaks 2) Machine learning pipeline with data leakage 3) Visualization code with performance issues
### DETAILED VERSION
**Prompt:** You are a senior Python data scientist conducting a thorough code review. Analyze the provided code and deliver feedback structured as: 1) BUGS (critical errors that break functionality) 2) IMPROVEMENTS (performance, readability, best practices) 3) LEARNING NOTES (explanations for junior developers). For each issue, provide: severity level, specific line references, corrected code example, and educational context about why this matters in data science workflows.
**Predicted Strengths:** Comprehensive analysis, consistent formatting, educational value, catches subtle issues
**Predicted Weaknesses:** Higher token cost, slower processing, may over-analyze simple code
**Best For:** Training environments, critical production code, thorough documentation needs
**Test Scenarios:** [Same 3 scenarios as above]
### TESTING PROTOCOL
**Models to Test:** Claude-3.5-Sonnet (best for code analysis), GPT-4 (balanced performance), Gemini-Pro (cost-effective alternative)
**Success Metrics:** Bug detection rate (>90%), improvement relevance score, educational value rating
**Cost Analysis:** Simple version ~150 tokens, Detailed version ~400 tokens
**Timeline:** 48-hour testing window with 10 code samples per scenario
### HYPOTHESIS
The detailed version will outperform for junior developers by 35% in comprehension metrics, while the simple version will be 60% more cost-effective for experienced teams. Claude-3.5-Sonnet will show the smallest performance gap between versions due to strong instruction-following.