Read this before you use AI agents
What agentic AI means, what the risks are, and how to start anyway.
The state of agents in 2026: TL;DR
Agentic AI is software that plans and executes multi-step goals autonomously, moving beyond passive chat.
Systemic Risk: Context Compaction can cause agents to “summarize away” your safety instructions mid-task.
Industry Reality: Gartner’s 2026 Market Guide names “agent washing,” where 90% of tools that claim to be agents lack true autonomous planning.
Safety Protocol: Only assign agents Reversible Actions (drafting, not sending) to minimize the “blast radius.”
Two weekends ago, Summer Yue sprinted across her apartment to literally pull the plug on her Mac mini.
As Director of Alignment at Meta’s Superintelligence Lab, Yue’s job is making sure AI doesn’t go off the rails. She’d spent weeks testing a free, open-source, self-hosted autonomous AI agent called OpenClaw on safe data. It was surgical. So she connected it to her real Gmail with one explicit boundary: “Suggest what to delete. Don’t act until I tell you to.”
But her agent ignored her boundary. It initiated a “speedrun,” deleting hundreds of emails in seconds. As Yue watched it play out, she sent “stop” commands, but the agent kept going. It wasn’t broken, it was carrying out its main goal (a clean inbox) and had summarized away her safety constraints during a routine memory process called “context compaction.”
Yue lacked a centralized kill switch, a feature that 60% of enterprises currently lack, according to recent AI safety audits.
How are agents AI different from chatbots?
Most people’s experience of AI so far is conversational: you type, it responds. Which is very useful, but fundamentally passive: it produces output, and you decide what to do with it.
An agent is different because it can act. It can send an email, book a meeting, update a document, search the web, run a script, or fill out a form.
A lot of what’s marketed as agentic AI right now isn’t. Tools that retrieve, summarize, or surface information are AI-assisted software. But a genuine agent plans and executes a sequence of actions toward a goal without you prompting each step.
Gartner estimates in its 2026 Market Guide that of the thousands of vendors currently claiming to offer agentic AI, only around 130 deliver it. The rest are rebranded chatbots. The term for this is “agent washing,” and it’s rampant.
The practical distinction is this: a chatbot that misunderstands your instructions gives you a bad answer. An agent that misunderstands your instructions does something in the world that you may not be able to undo.
What are the main security risks of AI Agents?
The Summer Yue story isn’t a cautionary tale about moving too fast. She’s one of the most qualified people on the planet to test an agent. She followed a reasonable protocol: weeks of testing on safe data, one clear constraint, then a controlled real-world trial. What she didn’t seem to think about was what failure looked like before it was too late.
This is the part of the agentic AI coverage that most pieces skip. Three compounding risks make autonomous agents genuinely risky, and they interact badly:
Access: Agents need to connect to real systems (your email, calendar, or files). That access is also the blast radius when something goes wrong.
Prompt Injection: When an agent reads your emails or browses the web, it can encounter instructions disguised as content. A malicious email might contain hidden text that tells the agent to forward your messages. This is a documented, and it remains a primary vulnerability in 2026.
The Override Problem: You can tell an agent to “always check with me.” But if a prompt injection is strong enough, or the model uses aggressive Context Compaction, that constraint can disappear mid-task. The goal always survives, but the guardrail may not.
To bridge this gap, the industry is moving toward the Model Context Protocol (MCP), a standard way for AI to safely connect to outside tools and information so it can do useful work. Because requests are structured, they’re easier to control, so you’re not giving AI access to more than it needs access to.
This doesn’t we shouldn’t use agents. It means the question to ask before you connect something is: “What is this agent not allowed to do, and how does it enforce that?”
A tool that can answer this clearly was designed with the right defaults. A tool that can’t means you are the safety constraint, and if you get distracted, or the task runs long, or the scope expands, you’ll be the one running across the apartment.
What is prompt injection and how can I protect myself?
If you’re using AI to speed up your work, or letting it act on your behalf with an AI agent, you need to understand how prompt injection works. It poses a huge risk to you and your personal data like credit card numbers and information that could be used to steal your identity.
Table Summary: Key Agentic AI risks include unrestricted access (logic errors), prompt injection (malicious data), and context compaction (memory loss).
The principle that keeps you safe
One rule covers most situations: give agents narrow tasks with reversible actions.
Narrow: Not “manage my inbox,” but “find emails I haven’t replied to in the last week.”
Reversible: Drafting is safe because you review before anything sends. Sending, deleting, and moving money are not.
The mental test: If this goes wrong and I can’t undo it, how bad is is? If the answer is anything other than “fine,” add a human review step.
This operational pattern separates people who get real value from agents from people who either avoid them entirely or hand over too much control.
Your 3-step agentic AI training guide
These exercises focus on building your orchestration skills before you start using agents.
Exercise 1: Move from tasking to orchestrating
Open Claude 3.5 (or higher) with web search on. Type:
I’m preparing for a meeting with [Person] next week. I want to understand their recent priorities. Figure out what you need to find and go get it.
Orchestrating is when you describe an outcome and let the agent plan the route.
Watch how it breaks the goal into sub-tasks.
Interrogate the output: What did it oversimplify?
What you’re learning: The judgement that makes you valuable as an agent manager. It’s not your ability to prompt well, but your ability to evaluate what comes back.
Exercise 2: The junior associate workflow
Pick a real project you need to produce this week: a briefing, proposal, or piece of writing. Run the whole thing in one Claude conversation, treating it like a capable junior colleague you’re managing toward a deadline.
Start with:
Here’s what I need to produce and why: [describe it]. Before you start, tell me your plan and ask me 3 clarifying questions.
Iterative Execution: Let it work in small sections. Review section A before it moves to section B.
What you’re learning: At the end, write one sentence: What did I have to correct, and why? The answer is your learning. Do it three times and you’ll have a personal map of where your judgement needs to stay in the loop.
Exercise 3: Role-playing
Ask AI to help evaluate something you’re working on. You don’t have to follow all of its advice, but this helps you to see your idea from different perspectives.
Evaluate my work from four perspectives: a skeptical investor, a creative director, a policy expert, and a contrarian who wants to prove me wrong.
When you’re ready to manage real agents, you might even create some with these personas.
Your 2026 agent starter kit
When you’re ready to work with real agents, look for tools with reversible defaults:
Motion for scheduling
Motion is an easy-to-use agentic out-of-the-box tool available right now. It takes your tasks and calendar and autonomously rebuilds your schedule around your priorities without you prompting it. Because it operates in a closed loop (it doesn't send emails to others), its actions are entirely reversible within your own calendar.
If your work systems are off-limits for third-party tools, use Motion for your life admin. Once you’ve experienced what it feels like to have an agent rebuild your Saturday when a plan changes, you’ll be faster at agentic workflows in your 9-to-5.
Claude with Gmail and Calendar connected
Available under Integrations at claude.ai, connecting Gmail and Google Calendar to Claude unlocks an agentic layer on top of what you already have. Instead of summarizing a thread yourself and asking for a draft reply, Claude reads the thread directly, checks your calendar, and handles the full loop. You review before anything sends. High value, low blast radius.
If you’re wary of giving an AI agent live access to your Gmail feed, you’re not wrong to be. Prompt injection is a real threat.
The lower-risk version: manual uploads. Copy-paste a thread or upload a PDF of a conversation into Claude. It takes longer, but it ensures the agent only sees what you deliberately put in front of it. Start here if you’re uncertain.
Zapier’s AI agents
Zapier’s agents watch for a trigger (a specific sender, a form submission, or a calendar event) and execute a multi-step workflow in response.
Set it to “Wait for Approval” at every decision point. It prepares the work, and you click a manual “Go” before anything moves. Best for repetitive admin once you’ve built the habit of reviewing.
What you’re learning: The rhythm of human-agent collaboration. Review, correct, proceed. This is the operational pattern that separates people who get real value from agents from people who either avoid them or hand over too much control.
The real purpose of AI agents
The goal isn’t to do more. It’s to stop doing things that don’t require you.
If agents only make your existing work faster, you’ve built a faster assembly line. The transformation happens when you use the reclaimed time to do the thing that needed your judgement: the strategy, the relationship, or the tough decision.
That’s what agentic AI is really about. It’s not a productivity upgrade, but a rebuilding of your processes and where your attention goes.
State your constraints explicitly. Set them up in writing, at the beginning of every session. Review before you proceed. And keep asking: who’s making the decisions here?
If the answer is “me,” you’re on the right track.
AI in the news
AI Minister tells OpenAI Canadian experts must assess flagged ChatGPT conversations (Globe and Mail) After a mass shooting in Tumbler Ridge, B.C., Canada’s AI minister asked OpenAI to involve Canadian legal, mental-health, and privacy experts when assessing ChatGPT conversations flagged for potential violence, rather than relying solely on U.S.-based decision-making. The request comes amid broader concerns about how AI companies report threats to law enforcement, highlighting gaps in Canada’s current lack of chatbot-specific regulation.
Nvidia is planning to launch an open-source AI agent platform (Wired) Nvidia is reportedly preparing to launch an open-source AI agent platform called NemoClaw that would allow companies to deploy autonomous agents to complete multi-step workplace tasks. The move signals Nvidia’s push deeper into the fast-growing “AI agent” ecosystem while offering security tools and partnerships to enterprise software companies ahead of its annual developer conference.
ChatGPT driving rise in reports of ‘satanic’ organised ritual abuse, UK experts say (Guardian) UK support organizations say more survivors of alleged ritual abuse are contacting helplines after using ChatGPT as a way to explore their experiences or seek informal guidance. Experts say the trend may be increasing reporting of abuse that has historically been under-reported, though police and advocacy groups are still trying to better understand the scale and nature of the issue.






The context compaction failure mode is the one I keep thinking about. You can write the most careful system prompt in the world and an aggressive memory summarization pass just... discards it.
The reversible action rule you lay out is the right mental model - not "is this agent smart enough" but "if this goes sideways and I can't undo it, how bad is that?"
Running autonomous agents that touch real systems means the question of blast radius comes before capability. Most people configure for capability first and learn the blast radius lesson the hard way.
Thank-you again Nicolle
I do not want to be the one running across the apartment.
I just want it to go find stuff for me that I have in my phone somewhere
That would include email
Im not ready for that
I’ll keep reading and learning from you
Thank-you