Building Effective Agents

2024-12-19 agent

Building Effective Agents

Published: December 19, 2024

Overview

Anthropic shares insights from working with dozens of teams building LLM agents. The key finding: “the most successful implementations use simple, composable patterns rather than complex frameworks.”

What Are Agents?

Anthropic distinguishes between two types of agentic systems:

Workflows: LLMs and tools orchestrated through predefined code paths
Agents: Systems where LLMs dynamically direct their own processes and tool usage

When to Use Agents

Not all applications need agents. The guidance: start simple and increase complexity only when necessary. Agentic systems trade increased latency and cost for better task performance—a tradeoff worth considering carefully.

Workflows suit well-defined tasks requiring predictability. Agents work better for open-ended problems needing flexibility and model-driven decision-making at scale.

Framework Considerations

Popular frameworks include:

Claude Agent SDK
Strands Agents SDK by AWS
Rivet (GUI LLM workflow builder)
Vellum (GUI tool for complex workflows)

While frameworks simplify implementation, they can obscure underlying prompts and encourage unnecessary complexity. Recommendation: “start by using LLM APIs directly; many patterns can be implemented in a few lines of code.”

Core Patterns

Building Block: The Augmented LLM

The foundation combines an LLM with retrieval, tools, and memory capabilities. Focus on tailoring these to your use case and providing clear interfaces.

Workflow: Prompt Chaining

Decomposes tasks into sequential steps where each LLM call processes previous output. Ideal for tasks decomposable into fixed subtasks.

Examples: Marketing copy generation then translation; document outline creation before writing.

Workflow: Routing

Classifies inputs and directs them to specialized tasks. Enables optimization for distinct categories without performance tradeoffs.

Examples: Categorizing customer service queries; routing simple questions to smaller models, complex ones to capable models.

Workflow: Parallelization

Runs multiple LLM calls simultaneously:

Sectioning: Breaking tasks into independent parallel subtasks
Voting: Running the same task multiple times for diverse outputs

Examples: Parallel guardrails; code vulnerability reviews; content appropriateness evaluation.

Workflow: Orchestrator-Workers

A central LLM breaks down tasks dynamically and delegates to worker LLMs. Differs from parallelization through flexibility—subtasks aren’t pre-defined.

Examples: Complex code changes across multiple files; multi-source information gathering.

Workflow: Evaluator-Optimizer

One LLM generates responses; another provides evaluation feedback in loops. Effective with clear evaluation criteria.

Examples: Literary translation refinement; iterative search tasks.

Agents

Agents operate autonomously based on environmental feedback. They begin with user direction, plan independently, and can request human input at checkpoints.

“Agents are typically just LLMs using tools based on environmental feedback in a loop.” Therefore, clear toolset design and documentation are crucial.

When appropriate: Open-ended problems where steps can’t be predicted or hardcoded.

Examples: Resolving GitHub issues (SWE-bench); computer use implementations.

Important: Higher costs and potential for compounding errors require sandboxed testing and appropriate guardrails.

Three Core Principles for Implementation

Simplicity: Keep agent design straightforward
Transparency: Explicitly show planning steps
Tool Documentation and Testing: Invest in agent-computer interfaces (ACI) as much as human-computer interfaces (HCI)

Appendix 1: Agents in Practice

Customer Support

Agents excel here because support requires conversation flow with access to external data and actions. Tools retrieve customer information; agents execute refunds and ticket updates. Success metrics are clear—resolution completion.

Coding Agents

Software development suits agents because:

Code solutions are verifiable through tests
Agents iterate using test feedback
Problem space is well-defined
Quality is objectively measurable

Anthropic’s agents now solve real GitHub issues in SWE-bench Verified benchmarks, though human review remains essential for broader system alignment.

Appendix 2: Prompt Engineering Your Tools

Tool specifications deserve equal engineering attention as prompts themselves.

Format Selection Guidance

Give the model sufficient tokens to “think” before committing to output
Keep formats close to natural internet text
Eliminate formatting overhead (avoiding unnecessary counting or escaping)

Agent-Computer Interface Design

Test thoroughly: Run diverse inputs to identify mistakes
Use clear parameter names and descriptions
Include example usage, edge cases, and input requirements
Apply “poka-yoke” principles—design tools so mistakes are harder to make

Anthropic spent more time optimizing tools than overall prompts for SWE-bench. Example: changing to absolute filepaths eliminated relative path errors.

Summary

Success means building the right system for your needs, not the most sophisticated. “Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short.”