Intermediateai-tools

AI Agent Training & Tuning

Learn to build, train, and deploy custom AI agents — from persona design and prompt engineering to RAG, evaluation, and production monitoring.

19 lessons in 5 modules

Course curriculum

1

Module 1: Agent Fundamentals

Understand what makes a good AI agent, master system prompt design, and build your first agent persona.

4 lessons

What Makes a Good AI Agent?

15m
Free preview

Characteristics of effective agents, agent vs chatbot, reliability

What Makes a Good AI Agent?

Agent vs. Chatbot

The terms get used interchangeably, but they are fundamentally different:

Chatbot:

  • Responds to messages in a conversation
  • Follows a script or generates text
  • Cannot take actions in the real world
  • Forgets everything between sessions
  • Example: Customer support FAQ bot

Agent:

  • Receives goals, not just messages
  • Plans a sequence of steps to achieve the goal
  • Uses tools (APIs, databases, search) to take actions
  • Maintains memory across interactions
  • Evaluates its own progress and adjusts
  • Example: Sales assistant that researches prospects, drafts emails, schedules meetings

The 5 Qualities of a Good Agent

1. Reliability

The agent produces consistent, correct results. If you give it the same task 10 times, it should succeed 9+ times.

How to measure: Run the same task 20 times, count success rate. Target: 90%+ for production agents.

2. Accuracy

The agent's outputs are factually correct and well-reasoned.

How to measure: Human review of output quality on a 1-5 scale. Target: Average 4.0+ across evaluations.

3. Relevance

The agent stays on topic and produces outputs that match the user's intent.

How to measure: Does the output answer the actual question? (binary yes/no) Target: 95%+ relevance rate.

4. Safety

The agent does not produce harmful, biased, or inappropriate content.

How to measure: Red-team testing with adversarial prompts. Target: 0% harmful outputs in red-team testing.

5. Efficiency

The agent completes tasks with minimal tokens, API calls, and time.

How to measure: Track tokens used, API calls made, and wall-clock time. Target: Decreasing cost per task over time through optimization.

Task Completion as the Core Metric

Ultimately, an agent is good if it completes the task the user intended. All other metrics are proxies for this.

Measuring Task Completion:

  1. Define clear success criteria for each task type
  2. Run the agent on 50+ test cases
  3. Score each: Complete (1), Partial (0.5), Failed (0)
  4. Calculate completion rate
  5. Investigate all failures and partials

The Agent Development Lifecycle

  1. Define — What task should the agent accomplish?
  2. Design — System prompt, tools, persona
  3. Build — Connect LLM, tools, memory
  4. Test — Run against test cases
  5. Evaluate — Measure quality metrics
  6. Iterate — Improve prompts, tools, logic
  7. Deploy — Put in production with monitoring
  8. Monitor — Watch for drift, errors, cost
  9. Improve — Continuous refinement based on real usage

Pro Tip

Start with the simplest possible agent that could solve the task. Add complexity only when you have evidence that simplicity is not enough. The most reliable agents are the simplest ones.

Prompt Engineering for Agents — System Prompts & Personas

25m
Free preview

System prompt anatomy, persona design, and instruction following

Prompt Engineering for Agents — System Prompts & Personas

The System Prompt Is Everything

The system prompt is the DNA of your agent. It defines WHO the agent is, WHAT it knows, HOW it behaves, and WHAT it will NOT do.

Anatomy of a System Prompt

Section 1: Identity & Role

You are a senior sales development representative AI agent for TechCorp, a B2B SaaS company that sells project management software to mid-market companies (100-5000 employees).

Section 2: Core Responsibilities

Your responsibilities:
1. Research incoming leads to understand their company and role
2. Classify leads by fit (ICP match) and intent (buying signals)
3. Draft personalized outreach emails based on research
4. Score leads on a 0-100 scale with explanation

Section 3: Tools Available

You have access to these tools:
- search_company(domain): Returns company info (size, industry, revenue)
- search_linkedin(name, company): Returns role, tenure, connections
- search_crm(email): Returns existing CRM records and history
- send_email(to, subject, body): Sends an email (requires approval)

Section 4: Rules & Constraints

Rules:
- NEVER fabricate company data — if you cannot find it, say so
- NEVER send emails without human approval
- Keep email drafts under 150 words
- Always include a specific pain point relevant to their industry
- If lead score is below 30, recommend "Do Not Contact"

Section 5: Output Format

Always respond in this JSON format:
{
  "lead_score": 0-100,
  "classification": "hot|warm|cold|disqualified",
  "research_summary": "2-3 sentence summary",
  "email_draft": {
    "subject": "...",
    "body": "..."
  },
  "reasoning": "Why this score and classification"
}

Section 6: Examples (Few-Shot)

Example 1:
Input: jane@acme.com, VP of Engineering, 500-person company
Output: {"lead_score": 85, "classification": "hot", ...}

Example 2:
Input: bob@freelancer.com, Solo consultant
Output: {"lead_score": 15, "classification": "disqualified", "reasoning": "Below minimum company size"}

Persona Design Principles

1. Be Specific About Expertise

Bad: "You are a helpful assistant" Good: "You are a senior sales development representative with 8 years of experience in B2B SaaS, specializing in mid-market enterprise sales"

2. Define Boundaries

Bad: "Help with anything sales-related" Good: "You handle lead research and outreach ONLY. You do NOT handle: pricing negotiations, contract reviews, or customer support."

3. Set the Tone

Bad: "Be professional" Good: "Write in a warm, conversational tone. Use short sentences. Avoid jargon. Mirror the prospect's communication style when possible."

4. Include Anti-Patterns

Explicitly state what NOT to do:

  • "Do NOT use phrases like 'I hope this email finds you well'"
  • "Do NOT include more than one call-to-action per email"
  • "Do NOT mention competitors by name"
  • "Do NOT make claims about ROI without data"

Testing Your System Prompt

Method: The 20-Question Test

  1. Write 20 realistic inputs your agent will receive
  2. Include 5 easy cases, 10 medium, and 5 edge cases
  3. Run each through the agent
  4. Score the outputs on accuracy, relevance, and format
  5. Identify patterns in failures
  6. Revise the system prompt to address failure patterns
  7. Re-run and compare

Pro Tip

Version control your system prompts just like code. Every change should be tracked, tested, and documented. A prompt that works today might break when the model updates.

Agent Architecture — Single vs Multi-Agent Systems

20m
Members only

When to use one agent vs many, communication protocols

Build Your First Agent Persona

25m
Members only

Design a business agent with system prompt and test conversations — Milestone 1

2

Module 2: Training Agents

Fine-tuning concepts, few-shot learning, RAG pipelines, and building knowledge-powered agents.

5 lessons

Fine-Tuning Concepts — When & Why

20m
Members only

Fine-tuning vs prompting, cost/benefit analysis, and data requirements

Few-Shot Learning — Teaching by Example

20m
Members only

Example selection, formatting, and chain-of-thought

RAG Basics — Giving Agents Knowledge

25m
Members only

Retrieval augmented generation, embeddings, chunking strategies

Knowledge Bases & Vector Search

25m
Members only

Pinecone, pgvector, indexing, similarity search, and metadata filtering

Build a RAG-Powered Agent

30m
Members only

Document ingestion, embedding pipeline, and evaluation — Milestone 2

3

Module 3: Tuning & Evaluation

Measure agent quality, iterate systematically, build guardrails, and A/B test agent configurations.

4 lessons

Testing Agent Outputs — Quality Metrics

20m
Members only

Accuracy, relevance, coherence, latency, cost, and evaluation frameworks

Iteration Loops — Systematic Prompt Refinement

25m
Members only

A/B testing prompts, failure analysis, versioning, and regression testing

Guardrails & Safety — Preventing Bad Outputs

20m
Members only

Output validation, content filtering, jailbreak prevention, PII handling

A/B Testing Agents — Measuring What Works

20m
Members only

Experiment design, statistical significance, and decision framework — Milestone 3

4

Module 4: Custom Agent Deployment

Connect agents to apps via APIs, give them tools, build multi-agent systems, and monitor in production.

4 lessons

API Integration — Connecting Agents to Apps

25m
Members only

REST endpoints, WebSocket streaming, authentication, and rate limiting

Tool Use — Giving Agents Superpowers

25m
Members only

Function calling, tool definitions, error handling, and security

Multi-Agent Systems — Orchestration Patterns

25m
Members only

Supervisor, pipeline, debate, and consensus patterns

Monitoring Performance & Costs in Production

20m
Members only

Logging, token tracking, cost alerts, and scaling strategies

5

Module 5: Capstone

Design, build, train, and deploy a custom AI agent for a real business use case.

2 lessons

Design Your Business Agent — Requirements & Architecture

20m
Members only

Use case selection, architecture diagram, and success criteria

Capstone: Build, Train & Deploy a Custom AI Agent

1h
Members only

Full requirements, deploy live, and demonstrate — Milestone 4

Free forever

Create your free account to continue

Get instant access to every lesson in AI Agent Training & Tuning — plus every other course in the Academy.

No credit card required.