Build an AI Sales Agent

Free spec to build an AI sales agent — a research-personalize-outreach loop with a reply classifier closing the cycle. Framework-agnostic: pick model, CRM, email provider, and signal source in STACK.md. Includes typed contracts, signal-driven personalization, RFC 8058 unsubscribe headers, hand-labeled classifier eval, and deliverability hardening.

Build this in my repo →

SAL-1

Lay the foundation for your AI sales agent

SAL-2

Define the ICP and load it as a typed filter the agent reads on every loop

SAL-3

Build the signal scraper that finds verified buying signals for one prospect

SAL-4

Build the personalizer that turns one signal plus the ICP into a draft email

SAL-5

Wire the email-send tool with idempotency, plain-text fallback, and one-click unsubscribe

SAL-6

Build the reply classifier that routes every incoming reply into five categories

SAL-7

Build the followup brain that decides when (and whether) to send turn two

SAL-8

Sync every touch into the CRM as the system of record for prospect state

SAL-9

Build the eval harness that grades the classifier and runs a dry-run campaign

SAL-10

Ship the sales agent: deliverability infra, rate limits, opt-out compliance, traces

Nothing in flight

Nothing shipped yet

5 minutes of planning, ~25 hours saved. Tekk wires this 10-task spec into your repo in your stack — sign up to seed your workspace.

Build this →

How to use this spec

Click any row above to open the full task — title, description, subtasks, AI instructions, the works. Same layout the product uses internally.
Hit Copy as Prompt in the right sidebar of any task. You'll get the XML-wrapped prompt Tekk uses internally — paste it into Cursor, Claude Code, Codex, ChatGPT, or anywhere else and the agent has the full task context.
Open in jumps the same prompt directly into v0, Lovable, Bolt, Magic Patterns, Replit, or Cursor with one click.

What you're building

Goal 1: Ship a working outbound sales agent that sends only signal-driven personalized emails, classifies every reply into five categories, hands off positive replies to a human within minutes, never re-emails an unsubscribed prospect, and survives production with idempotent sends, RFC 8058 unsubscribe headers, bounce-webhook suppression, and a hand-labeled classifier eval.
The architecture is a research-personalize-outreach loop with a reply classifier closing the cycle. Five pick-once tooling decisions in STACK.md (model, language, email provider, CRM, signal source) parameterize every later task. Acceptance bar: classifier hits ≥85% accuracy on positive_interest and ≥95% on unsubscribe across a 50-example hand-labeled set; dry-run campaign on 10 mocked prospects emits zero sends to suppressed addresses, cites a verified signal_id in every email body, and produces structured trace events one-per-loop-phase.

Architecture

flowchart LR
  CRM[(CRM<br/>system of record)] -->|list_prospects| ICP[ICP Filter<br/>passes_icp]
  ICP -->|prospect| Signal[Signal Scraper<br/>scrape_signal]
  Signal -->|BuyingSignal| Personalize[Personalizer<br/>structured output]
  Personalize -->|DraftEmail| Send[Email Sender<br/>idempotent + RFC 8058]
  Send -->|SendResult| Wait((wait N days))
  Wait --> Classifier[Reply Classifier<br/>5 categories]
  Classifier -->|positive_interest| Handoff[Human Handoff]
  Classifier -->|soft_positive / objection| Drafted[Drafted Reply Queue]
  Classifier -->|ooo| Followup[Followup Brain<br/>should_followup]
  Classifier -->|unsubscribe| Suppress[Suppression List<br/>+ CRM do_not_contact]
  Followup -->|send_now + fresh signal| Personalize
  Followup -->|mark_dormant| CRM
  Send -.->|log_activity| CRM
  Classifier -.->|log_activity| CRM

The three load-bearing decisions: signal quality over volume (the personalizer raises NoSignalError when scrape_signal returns None — signal-less sends are blocked in code, not policy), reply classifier as the value gate (five-category routing turns the inbound reply flood into a triaged queue, with positive_interest reaching a human in minutes and unsubscribe immediately suppressing), and CRM as the system of record (the agent reads prospect state from the CRM at the top of every loop and only ever appends activities or sets lifecycle_stage='unsubscribed'; editor-set do_not_contact is never overridden).

When an autonomous sales agent makes sense (and when it doesn't)

The 2026 data is in: fully autonomous AI SDRs have not replaced human sales teams. AI cold-email reply rate is 4.1% vs human 5.2%; spam-flag rate is 8% AI vs 3% human; 47% of AI SDR deployments suffer domain reputation collapse inside 90 days. Build this when the loop is signal-driven and reply-triaged — not as a volume play.

✓ Build an AI sales agent when

You have a verified-signal source. Funding announcements, hires, tech-stack changes, job posts. Signal-driven AI SDRs average 3.8% reply across 500K+ emails (Digital Applied 2026) because every message cites a real reason to reach out.
Reply triage is your bottleneck, not send volume. Founders running their own outbound find the value gate is classifying replies fast and drafting context-aware responses — the classifier is where this agent earns its keep.
You have a CRM as the system of record. HubSpot, Salesforce, Attio, Pipedrive, Close. The agent reads prospect state from the CRM and never overrides editor-set do_not_contact — that boundary prevents the failure mode that ends customer relationships.
Hybrid human + AI math works for your team. Hybrid pods (1 human SDR : 2 AI seats) book 1.9x more meetings per dollar than pure-AI configurations (Digital Applied 2026). The agent does the loop; the human handles positive replies and the drafted-response queue.
You can run the hand-labeled classifier eval. A 50-example reply set graded with hard accuracy bars (≥85% positive_interest, ≥95% unsubscribe) is the difference between catching demos and burning relationships. If you won't author the eval, you'll catch neither.

⚠ Hire (or keep) a human SDR when

You can't run a 6-week domain warmup. 47% of AI SDR deployments collapse inside 90 days without it (Digital Applied 2026). Microsoft 365 inboxes are the strictest filter — most B2B prospects sit behind M365. Sending 500 cold emails from a fresh domain on day one is unrecoverable.
Your team won't honor CAN-SPAM and RFC 8058. Penalties up to $53,088 per violation; Gmail/Yahoo/Microsoft require one-click unsubscribe + complaint rate <0.3% + bounce rate <2%. Systematic opt-out failures trigger enforcement and torch the domain — not just the campaign.
Your ICP is broad or unverified. AI cold-email reply rate drops from 6.1% (SaaS) to 1.9% (financial services) — a 3.2x spread (Digital Applied 2026). Without a tight ICP, you're spending 6x more tokens and burning sender reputation for nothing.
You're optimizing for volume over signal. Per-rep monthly sends went 1,150 → 7,400 with AI; raw reply rates fell 4.7% → 2.9%. The agent makes signal-less sends impossible by design; if you'd override that constraint, hire a human and send 50 thoughtful emails.
You have no real-reply data to seed the classifier. The 50-example hand-labeled set is the load-bearing eval. With zero historical replies to draw from, the classifier is graded on synthetic data — false negatives on positive_interest (lost demos) and unsubscribe (domain reputation) won't surface until production.

What the community says

"Tried to conduct a survey last autumn to learn about people's experiences with using AI SDR products. We pretty heavily promoted it in sales spaces, and got a jack total of less than 20 fills, with an overwhelmingly negative sentiment — vaporware the whole lot of them, with spoofed ARR numbers to trigger investor FOMO."

greg_V · Hacker News — "11x has been claiming customers it doesn't have" · 2025-03-25 · news.ycombinator.com

"Per-rep monthly volume rose from a 1,150 human baseline to 7,400 AI-augmented, but raw reply rates fell from 4.7% to 2.9%. Signal-driven AI SDRs average 3.8% reply across 500,000+ emails because every message references a verified buying signal."

Digital Applied · 100K-email AI SDR analysis 2026 · 2026-01-15 · digitalapplied.com

"11x rebuilt Alice from a single ReAct agent (10-20 tools, infinite loops, the agent equivalent of a stack overflow) into a hierarchical supervisor with four named sub-agents: Researcher, Positioning Report Generator, LinkedIn Message Writer, Email Writer. Audience creation was kept separate from the agent system — a deliberate boundary."

ZenML LLMOps Database · Rebuilding an AI SDR Agent with Multi-Agent Architecture · 2025-08-12 · zenml.io

How to know it's working

Ship the spec, then measure on these criteria (the eval harness task grades them):

Classifier eval: ≥85% accuracy on positive_interest across the 50-example hand-labeled set (false negatives cost demos)
Classifier eval: ≥95% accuracy on unsubscribe across the 50-example hand-labeled set (false negatives cost the sending domain)
End-to-end dry run: zero sends to any prospect in the suppression list across 10 mocked prospects (suppression is checked in code before the provider call)
End-to-end dry run: every SendResult carries a signal_id matching a real BuyingSignal from the scrape (no fabricated personalization)
Idempotency: re-running the loop on a partially-completed campaign produces zero duplicate sends (provider returns the same email_id for the same idempotency key)
Deliverability: every outbound email carries List-Unsubscribe + List-Unsubscribe-Post headers (RFC 8058) plus a plain-text body alongside HTML
Webhook compliance: email.bounced and email.complained events route to suppression-list append AND CRM do_not_contact=true within one request lifecycle
Production hardening: regression run of run_e2e.py stays within ±5% of evals/baseline.json on p50 latency and median tokens-per-send

Sources

Every claim, pattern, and acceptance threshold on this page maps back to one of these. Read them before deviating from the spec.

Build this in your codebase tonight

Sign up — Tekk reads your repo, picks your stack from the five decisions in STACK.md, and writes a personalized version of this 10-task spec. Same architecture, your patterns, your dependencies. Want to do it yourself? Open any task above and hit Copy as Prompt — paste into Cursor, Claude Code, or Codex.

Build it in my repo →

Customer Support Agent· coming soon

Intercom-style multi-agent support — ticket classifier, RAG over docs, escalation routing, conversation memory…

use-case · customer-supportframework · claude-agent-sdk