What an AI SOC agent actually does on a Tier 1 alert

MKMarta K. · Senior Detection Engineer & Incident Responder

AI in Security OperationsJune 15, 2026·11 min read

An AI SOC agent closed an impossible-travel alert with a full evidence chain in under four minutes. It also recommended isolating a production server over clean traffic the same week. Marta Kowalska walks one real Entra ID alert through the agent's full investigation chain — and shows exactly where the reasoning broke on a different alert class.

An AI SOC agent can close a Tier 1 alert with a full evidence chain in under four minutes. It can also confidently recommend the wrong response when it doesn't inspect the right data. I learned both lessons in the same week, on the same shift pattern.

The first lesson came on a night shift, when I decided to let an AI SOC agent run an impossible-travel alert instead of working it myself. It was 2:47 a.m., the queue had fourteen alerts in it, and this one was an Entra ID atypical-travel alert on a successful login from Warsaw and then Bucharest not long after.

It was exactly the kind of thing I'd triaged two hundred times. I knew the manual workflow cold, and I also knew it would take me thirty minutes I didn't have.

Rather than work the alert manually, I watched the agent work it and took notes on each step and query. I also noted the one place I later saw its reasoning break on another alert class that week. Before I walk through that investigation, the summary below captures the takeaways the rest of the article defends.

In brief:

Parallel investigation: an AI SOC agent generates hypotheses and dispatches enrichment queries in parallel, then lets findings determine the next investigative step, as outlined in the Google SecOps documentation.
Speed and depth: the platform enriches impossible-travel alerts with contextual telemetry and surfaces the investigation as it unfolds, and the same alert took me thirty to forty-five minutes by hand because I queried tools sequentially.
Reasoning limits: in my experience, the agent's reasoning can still fail, especially on alerts that require payload or traffic inspection the agent doesn't perform.
Coverage gain: AI first-pass triage closes the coverage gap that alert fatigue creates in traditional SOC setups, where some alerts go uninvestigated entirely.

Those four takeaways map directly to the rest of the article. The next section walks through the specific alert I handed off, and the section after that explains how the agent's evidence-led approach differs from a SOAR playbook. The closing sections cover where the agent's reasoning broke and what Tier 1 work looks like once an agent owns the first pass.

The alert I'd have triaged by hand at 3 a.m.

I picked this alert because it represents the bulk of what slows a night shift down: ordinary, frequent, and expensive to triage by hand. Microsoft Entra ID fired an atypical-travel alert covering two successful logins for the same user, one from a Warsaw IP and one from a Bucharest IP, forty minutes apart.

Physical travel between those cities would take long enough by road to make the sequence operationally suspicious. The detection fires only on successful authentications, per Microsoft's impossible travel detection documentation.

Both of those successful logins triggered MFA, which raised the bar on what the alert could mean. The user is a mid-level finance analyst with access to internal dashboards and a shared reporting tool, which is enough access to matter if the second login is an attacker rather than the legitimate user.

To rule that question out, manual triage on this alert class used to cost me thirty to forty-five minutes. I'd check both IPs against MaxMind and AbuseIPDB and query sign-in logs for the user's last 30 days of login geography. Then I'd check HR records or Slack the user's manager to see if there was a travel request on file.

Once travel context came back, I'd examine post-authentication activity in the M365 audit log for email forwarding rules, OAuth consents, or bulk file downloads, and only then contact the user. At 3 a.m., that contact step alone could take hours, and the alert sat open while the queue backed up behind it. The cost compounds across every shift, which is why the contrast with the agent's evidence-led approach matters.

An AI SOC agent builds an evidence-led investigation

The difference between SOAR and an AI SOC agent is the difference between a script and an investigator. That distinction shapes every choice the agent makes on the alert I just described.

A security orchestration, automation, and response (SOAR) playbook runs a fixed sequence of steps authored by a human at some earlier date. If the attacker's path falls outside the scripted logic, the workflow can confidently take the wrong action.

An AI SOC agent works the opposite way. It pursues the goal of investigating the alert and determining whether it is a genuine threat, and it lets findings at each step determine the next query. Because the investigation path can change as evidence comes in, two alerts that look superficially similar can take different investigative paths if the evidence warrants it.

I use agentic security as the shorthand for that split between fixed execution and evidence-led investigation. The split shows up most clearly in how the two approaches handle a familiar technique.

Take a SOAR playbook built against T1078 Valid Accounts. It reflects the steps someone authored when the playbook was written, while an agentic system can adjust its investigation as new endpoint and lateral movement evidence emerges. The system follows the active kill chain instead of the kill chain the playbook assumed would be present.

That doesn't make SOAR obsolete. It retains a real role for compliance-mandated response workflows where an exact, auditable sequence is required, and the agent earns its value during investigation. Implementations vary, but the common thread is the same one I saw on this alert: the system investigates toward a goal instead of stepping through a fixed sequence.

Step by step: how the agent worked the alert

With that goal-driven approach in mind, my notes captured the agent's investigation path on this specific impossible-travel alert. The three subsections below break it down in the order it happened: the parallel enrichment phase, the pivots the agent took when evidence surfaced a gap, and the verdict and the case file it produced.

Enrichment: what it pulled before deciding anything

Before classifying anything, the agent pulled context from every source at once, which is the first place the manual workflow and the agentic workflow diverge. Where I would have opened five browser tabs sequentially, the agent pulled relevant context from multiple systems simultaneously to investigate the alert.

I query tools one at a time because I'm one person with one browser. The agent queries all of them at once and reconciles the results, dispatching enrichment across connected data sources in parallel.

That parallel dispatch returned several findings at once. The Warsaw IP matched the user's home ISP and was consistent with 90 days of login history, the Bucharest IP resolved to a commercial VPN provider with no threat intel flags, the device ID on the Bucharest session didn't match any registered corporate endpoint, and no travel request was on file.

The agent held all of those findings in working memory before making any classification decision. While holding them, it mapped the activity to T1078.004 Cloud Accounts, the primary technique for cloud identity compromise via legitimate credentials, with T1133 External Remote Services flagged because the VPN access pattern matched that technique's detection criteria.

The unregistered device on a VPN with no travel request was the gap that mapping set up, and the agent acted on it next.

The pivots: where the evidence took it next

Working from that gap, the agent evaluated findings from the parallel queries, named the unknown device as the specific thing it needed more evidence on, and generated a new investigation plan. The Triage and Investigation Agent documentation from Google SecOps describes this as an explicit evaluate, identify gaps, generate new plan, and execute loop, and that loop is exactly what ran next.

The new plan started with post-authentication activity. The agent queried the M365 audit log for the Bucharest session, looking for any new email forwarding rules, OAuth application consents, admin role changes, or bulk file downloads in the window after the second login.

That post-authentication query came back clean enough to keep digging rather than escalate, with no email-rule, OAuth, admin-role, or unusual data-access signal surfacing in that window. With no abuse signal on the suspicious session, the agent reviewed the user's login IP addresses over the prior 30 days and found that same VPN provider three previous times, all during normal business hours.

That historical pattern shifted the investigation path. The known VPN usage explained both the geography and device mismatch, and the timing lined up with the user's normal working pattern. The agent re-scored the evidence and downweighted the geographic anomaly because of the established VPN history, and with the geographic signal downweighted, the agent had enough to close out the alert.

The verdict and case file it left behind

When it closed the alert, the agent classified it as benign with high confidence and auto-closed it. The case file it generated included every query it executed with timestamps, the raw evidence returned from each source, the reasoning chain connecting evidence to verdict, and the ATT&CK technique mapping for T1078.004 and T1133.

The full evidence package was structured for compliance defensibility, and the total time from alert ingestion to verdict came in under four minutes.

I agreed with the verdict, and I would have reached the same conclusion thirty minutes later after querying the same data sources sequentially and waiting for the user to respond to a Slack message they wouldn't see until morning.

While I was working that one alert, the other thirteen in the queue were either already being processed by parallel agent instances or queued for immediate pickup, and on my night shift those thirteen would have waited. That coverage gain is the real upside, but it doesn't mean the agent gets every alert right, and the next section is the alert where it didn't.

Where the agent's reasoning went sideways

The impossible-travel verdict held up under my review, but an IDS alert that came through the same week did not. The contrast is the clearest evidence I have for why human oversight still matters.

That IDS alert fired for a possible SQL injection pattern involving a REVERSE function on HTTP traffic from an external source to a web server, and the agent classified it as malicious and recommended isolating the host and blocking the external IP.

I overrode the verdict because the traffic was clean. The signature had a known false positive pattern on that web server's request format, and without the override, the agent would have isolated a production web server and blocked a legitimate external partner IP.

That would have created two problems at once, a false positive and a service disruption requiring incident management. That failure mode is also documented in an NDSS workshop paper on AI SOC agents where an agent over-relied on alert metadata instead of packet content.

The mechanism behind that failure is straightforward. The agent delivered a confident verdict without enough investigation depth to justify the response it recommended, because in practice the agent only reasons over what it checks.

The harder problem behind that limit is that many current deployments still lack a systematic mechanism to catch false negatives. If the agent closes an alert, the workflow should retain an auditable record and visibility into the decision. AI SOC auditability covers the trust and verification architecture behind those audit mechanisms, and that architecture is what makes the oversight model in the next section workable.

What Tier 1 looks like when the agent handles the first pass

Building on that oversight architecture, Tier 1 work doesn't disappear when an agent owns the first pass, but it does change shape. The operating model I run now is built around three things: review, coverage, and override discipline.

I've worked through the transition from manual triage to AI first-pass on two teams now, and Tier 1 work shifts toward reviewing the agent and calibrating it. The repetitive enrichment work and sequential copy-paste between console tabs at 3 a.m. go away.

Review is the first part of that model. Analysts review AI-generated investigation summaries and handle the exceptions the agent flags as low-confidence, and they also authorize response actions the agent recommends but can't execute alone. SANS integration guidance emphasizes human ownership and approval of automated security actions, and that ownership is what keeps the review loop meaningful.

Coverage is the second. In my last SOC, alerts sometimes aged out before anyone touched them, and every one of those was an uninvestigated potential threat. That is why I care about every alert getting an investigation and why investigation coverage is the metric I track first.

Override discipline is the third. If analysts constantly override, the agent is poorly tuned, and if analysts never override, they may be overtrusting it.

Because the override rate only tells you about the alerts analysts saw, I also build a separate, deliberate audit cadence for false negatives. The agent won't tell you about the ones it missed, and I've learned to treat that audit loop as a required part of the operating model.

Together, those three discipline areas keep humans responsible for higher-consequence response actions. The shape of Tier 1 work changes, but it remains analyst-owned.

Frequently asked questions about AI SOC agents

How does an AI SOC agent handle Tier 1 triage?

An AI SOC agent ingests security alerts and enriches them with context from connected tools such as SIEM, EDR, identity providers, and threat intel feeds. It generates and tests hypotheses about the activity, then delivers a verdict with a full evidence chain, auto-closing benign alerts and escalating suspicious findings with the investigation already documented.

When should a SOC use an AI agent instead of a SOAR playbook?

SOAR executes a fixed sequence of steps authored by a human before the alert arrived, while an AI SOC agent determines its next investigative step based on findings from the previous one. When the attacker's path falls outside a playbook's scripted logic, SOAR takes the wrong action or hits a dead end, while the agent follows the evidence wherever it leads, including ATT&CK techniques the original alert rule did not target.

Does an AI SOC agent replace Tier 1 analysts or change their job?

It replaces the repetitive enrichment and triage work that dominates Tier 1 shifts, while analysts still own judgment calls, feedback loops, and override decisions that keep the agent calibrated. The updated SANS curriculum added AI integration and detection engineering modules for SOC analysts, and that update points to a Tier 1 role centered on oversight and exception handling.

How does an AI SOC agent actually investigate an alert step by step?

It normalizes the alert, extracts entities such as user, IP, host, and hash, dispatches parallel enrichment queries to all connected data sources simultaneously, generates candidate hypotheses, tests each hypothesis against the evidence, scores confidence, and delivers a verdict with a natural-language reasoning summary. Google SecOps documents describe the Triage and Investigation Agent as evaluating incoming alerts, executing an investigation plan, and providing a structured analysis with findings and reasoning.

Which alert types do AI SOC agents still get wrong in practice?

Alerts requiring payload or traffic inspection still cause mistakes in practice, along with novel attack patterns outside the model's training distribution and environments where incomplete log coverage lets agents automate failure instead of investigation. Multi-stage lateral movement can also run into context window limits in large language models, where earlier critical signals drop out of the agent's working memory.

About the author

MKMarta K. is a senior detection engineer and incident responder with over eight years of hands-on experience operating and scaling security operations in high-growth SaaS and fintech environments. She started her career as a SOC analyst, working night shifts triaging alerts and investigating suspicious activity across endpoint, identity, and cloud environments. Over time, she moved into detection engineering, where she focused on building and tuning detection pipelines, reducing false positives, and mapping coverage to frameworks like MITRE ATT&CK. Marta has led incident response efforts for ransomware, credential compromise, and insider threat scenarios, and has helped teams transition from reactive alert handling to structured investigation workflows and proactive detection strategies. Her work has included implementing detection-as-code practices, improving alert fidelity, and designing playbooks that actually get used during real incidents. She writes about the reality of running security operations — from alert fatigue and broken escalation paths to what actually works when building detections and responding to incidents under pressure.

Stay sharp on security operations

Practitioner takes on SOC modernization, detection engineering, threat hunting, and more. No fluff. No product pitches.