Skip to content

AgentHerder · Course · Week 3

Tighter integrations + the autonomous-agents question

The chapter that makes Week 1's coordination-routing actually sustainable.

Week 1 set the policy: every coordination interaction routes through the agent. Week 2 installed the mechanical fleet. Week 3 is what makes the policy real across the long tail — the deep practice of browser automation, the deep practice of comms automation, and an honest answer to the question every cohort asks: "why don't we just make these agents autonomous?"

The integration thesis

Each new integration point removes a class of coordination friction. The fleet's effective throughput isn't the agent's raw speed — it's the surface area of what the agent can reach without you, multiplied by how much of your attention is left over for the work that compounds.

Give your agents more and more integration points so that all the small things across every parallel work-stream cancel out.

The implication: at flying-stage, your week's work is shaped by the gaps in integration, not the gaps in skill. The interesting question every week is: what's the next coordination surface I haven't yet routed through the agent? The answer is usually visible in the friction you're still feeling — the dashboard you still log into, the message you still draft from scratch, the partner portal where you still chase MFA codes by hand.

Week 3 is about closing those gaps, deeply.

Browser automation as a deep discipline

In the Augmented Mind Course Week 2, browser automation is taught as a baseline — stop clicking through routine SaaS yourself. At flying-stage, the deep discipline starts where the baseline ends.

Authentication at scale

The first thing that goes wrong when you ask Claude to drive a browser at the fleet level: every workflow needs auth, every auth flow is slightly different, and password managers don't natively talk to Claude's browser tools.

Patterns that work:

  • Session-cookie auth. Log in once manually, capture the session cookies, hand them to the agent. The agent uses them for read operations that don't require fresh auth. Works for most dashboards. The cookies expire eventually — when they do, re-auth and re-capture.
  • Per-stream Claude browser contexts. Each cctab can hold its own browser state. If session-cookie auth is set up in tab A, tab A's agent keeps working with it; tab B's agent has its own context and its own cookies.
  • Service-account credentials in a secrets vault (1Password CLI, op integration). The agent reads the credential at the moment of need, never logs it. Don't put credentials in CLAUDE.md or repo files.

MFA without breaking the flow

The hard part. The agent can drive the username/password step, then hits a TOTP prompt or a push notification. Several patterns:

  • Interactive prompt. The agent pauses, asks you for the TOTP code, you type it, the agent continues. Fine for a few times a day; brain-fry territory at scale.
  • TOTP via CLI. Tools like oathtool or 1password-cli's op item get --otp give the current code via shell. The agent calls the shell tool, gets the code, types it. No human interruption needed if the agent has shell access and the credential is in the vault.
  • Push-MFA on your phone. The agent pauses, you approve on your phone, agent continues. Lower friction than typing a code but still requires you to be physically present.

The discipline: pick one MFA pattern per integration and standardize it. Mixing patterns mid-fleet (one tab needs you to type a code, another sends a push, a third just stalls) is brain-fry territory.

Broken JS and slow pages

The pain class you discover only at scale: SaaS dashboards with JS that breaks the agent's selectors after a UI refresh, pages that lazy-load content the agent needs but doesn't wait for, single-page apps where the agent's "click" doesn't actually trigger the framework's handler because it was attached after the click selector was captured.

The deep-practice patterns:

  • Wait-for-condition, not wait-for-time. Don't tell the agent "wait 3 seconds." Tell it "wait until the table has at least one row" or "wait until the loading spinner disappears." Time-based waits are flaky and slow; condition-based waits are robust.
  • Robust selectors. Prefer text content (text="Sign in"), ARIA labels (role="button", name="Submit"), or data-test-ids over fragile CSS paths. Class-based selectors break the next time the UI is restyled.
  • Fall back to screenshots + visual reasoning when DOM selectors fail. Claude can look at a screenshot and click the right thing even when the markup is unreadable. Slower but recoverable.
  • Record + replay for repeated flows. If you're going to drive the same flow 50 times this week, spend an hour scripting it tight, then invoke the script. Don't redo the trial-and-error every time.

The "Claude says it can't, push back" routine

The most underrated practice at the fleet level. Claude is conservative by default — when a flow stalls, the default is to report back "I tried, didn't work, here's what happened." Most of the time, the right response is not to accept that.

Most "Claude can't do X" is "Claude tried one approach, it failed, and now needs to try a different one." Your job is to push back.

Concrete escalation ladder when Claude says it can't:

  1. Did it actually try? Read the agent's last few actions. Did it run into the failure or assume the failure?
  2. Try a different selector strategy. Often the agent picked a CSS path that broke; tell it to look at the screenshot and click visually instead.
  3. Try a different auth path. Maybe the session cookie expired; have it re-auth and try again.
  4. Decompose the action. Big "do this flow" prompts hit walls. Break into individual steps and have the agent narrate each one.
  5. Provide a screenshot of the current state and ask "what do you see, what would you do next."
  6. Take over manually only as the last resort — and when you do, capture what you did so next time the agent has a worked example.

The shift from running to flying: at running stage, "Claude says it can't" becomes "I'll do it myself." At flying stage, "Claude says it can't" becomes "let me find what's stopping it and unblock it." The fleet's coverage of coordination surface keeps growing because you treat agent stuckness as a debug target, not a hand-off.

Comms automation as a deep discipline

Same logic, applied to the second-biggest coordination surface: messages. Baseline (Week 1 territory): the agent drafts the routine reply, you approve. Deep practice (this week): the full inbox-to-outbox AI surface.

Email triage

The pattern: a meta-agent runs periodically (every few hours or on demand), reads new mail, categorises (action-needed, reply-needed, FYI, junk), drafts replies for the reply-needed bucket, queues them for you to skim-and-send. You review the queue once or twice a day instead of context-switching to email constantly.

What this needs:

  • An email integration with read + draft capability (Gmail API via a skill, or an MCP server like the Gmail MCP for the Anthropic API).
  • A rule set encoding your triage logic: which senders are noisy, which always need a reply, which can be archived without reading.
  • A drafting voice. Train the agent on your last 50 sent emails so the drafts sound like you, not like a template.
  • A clear escalation path for "I don't know what to do with this one" — drops the email back to you for direct handling.

Slack / Teams drafts

Two patterns, depending on the org culture:

  • Team channel updates. The agent watches your work-streams (via git activity, PR merges, deploy events) and drafts the periodic team-channel update. You review, edit, post.
  • Direct-message replies. Trickier — DM context is usually richer than the agent has access to. Often safer to let the agent draft replies for informational DMs ("can you point me at X?") and leave decision/relationship DMs to you directly.

Calendar response automation

Specific case worth singling out: scheduling. The agent reads incoming meeting requests, cross-references your calendar, your stated availability rules, the request priority, and either accepts, declines, or proposes alternatives. Often the highest-leverage comms automation because scheduling back-and-forth is uniquely soul-draining and uniquely automatable.

Tools that work today: the Google Calendar MCP (or equivalent) plus a meeting-acceptance skill encoding your rules ("decline anything after 5 PM," "auto-accept 1:1s with my team," "propose alternatives for anything that conflicts with deep-work blocks").

Status updates

The recurring weekly "what did the team ship" message. At fleet scale, you have so much shipped you can't remember it all by Friday. The agent does:

  • Scans the week's PRs across the repos you specified.
  • Cross-references with NEXT-STEPS.md updates to identify which work was deliberate vs opportunistic.
  • Drafts a summary in your voice, structured around the week's themes.
  • Queues it for you to edit and post.

Your editing time per week: 5–10 minutes. Your time spent constructing the update from memory: previously 30–60 minutes. The compounding adds up.

The "fully autonomous agents" question

Every cohort asks it. Usually around Week 2 or Week 3. The framing varies: "why are we still in the loop?" or "couldn't we just let the agents run?" or "what's stopping me from setting up an agent that fixes its own CI failures, deploys itself, and handles inbound emails without me?"

Fred's stance, in his own words:

You need supervision and human guidance — that's why parallelism is important. Parallelism is the human operator. We're not here to replace ourselves with agents that don't need us. The value-add is YOU — the domain expertise, the seeing where the agent should be, the guiding.

Translated into the AgentHerder posture:

This bootcamp is not training you to set up agents that run without you. It's training you to be the irreplaceable orchestrator of a large agent fleet. Those are different jobs.

The autonomous-agents path is a real path — there are products and companies pursuing it. It has different trade-offs: more brittle, harder to course-correct, lower-judgement output, more expensive incident-response when things go wrong (because nobody was watching). At the current state of model capability and tooling, autonomous-agent setups break in ways the human-in-the-loop fleet doesn't.

At fleet level (10+ parallel), the human operator is the source of:

  • Domain judgement the agents don't have. (Your company's specific risk appetite, the customer-history that explains why a feature was built the way it was, the political constraint that means option B is off the table even though option A is technically worse.)
  • Cross-stream pattern recognition the agents can't do because they don't see across each other's tabs.
  • Recovery when an integration breaks. The autonomous setup breaks silently until something falls over; the supervised setup catches it on the next scan.
  • Direction setting. Agents do what they're briefed. The brief comes from somewhere. That somewhere is you.

The mental error to avoid: assuming "autonomous" means "more leveraged." At today's capability level, autonomous usually means less leveraged, because the orchestrator role gets shifted to incident-response after the fact instead of to active direction-setting upfront.

This may change as model capability improves. The premise of the AgentHerder practice — that the human operator's judgement is load-bearing — is bet against by some serious people. It's worth knowing this is a position, not a settled fact. But it's the position the course teaches, and it's the one Fred is staking the brand on.

If you find yourself drawn to the autonomous-agents path: that's a legitimate direction, just not this course's direction. The skills overlap, the philosophy doesn't. Don't try to install both at once.

The assignment — ship one missing integration

Do this — Assignment 1 of 1

Identify one integration your current setup is missing, and ship it this week.

  • Look at where your attention still goes to coordination work — the dashboard you still log into, the message thread you still write from scratch, the partner portal where you still chase MFA codes, the weekly status update you still construct from memory.
  • Pick the single most-painful one. The one that, if it were gone, would visibly free up the most attention for actual work.
  • Build the integration. Could be: a browser-automation skill for that dashboard, a Gmail-MCP-driven triage script, a Slack draft-and-queue meta-agent, a scheduling-decision skill against your calendar. Whatever the gap is.
  • Ship it. Use it for the rest of the week.
  • Friday demo: show it working. The kind of thing that makes a teammate ask "wait, you can just do that?"

Time estimate: 2–6 hours depending on how custom the integration is and how cleanly the target system exposes the surface you need.

If you find that the most painful gap is something where the integration just doesn't exist yet — no MCP, no clean API, no scriptable surface — that's also valuable information. Write it up. The gaps are where the next generation of tools and skills come from.

Self-check — did this week land?

  • Can you describe at least one auth + MFA pattern you've made reliable for an agent-driven browser flow on a real (non-toy) system?
  • Have you applied the "Claude says it can't, push back" routine on a flow that previously stalled? Did the agent ultimately succeed?
  • Do you have at least one deep comms-automation pattern installed and running (email triage, slack drafts, calendar handling, or weekly status)?
  • Have you formed your own view on the autonomous-agents question — even if it differs from the course's stance? (The course wants the question engaged, not just adopted.)
  • Did you ship the one missing integration?
  • Are you noticing that your attention has shifted further from "what's the agent doing right now" to "where's the next coordination surface to close"? That's the directional indicator that flying-stage is starting to feel native.

What's next

Week 4 — Coordination patterns. The Team Ops sweep skill. Cross-stream context sharing. PR-nudging automation. Friday demo + weekly planning at flying-stage. The operational layer that turns 6+ parallel sessions from chaos into flow.

Get notified when the course updates

No spam. One email per meaningful update.

AgentHerder — the professional practice. cctabs — the tool. A line of products by Augmented Mind.

AgentHerder — the professional practice. cctabs — the tool (MIT).