Are We Training AI Too Late?

Ask the Expert: Cybersecurity teams need to expand their field of view to include new, unique threat sources, rather than relying on past, proven threat actors.

# Are We Training AI Too Late? The Emerging Threat Blind Spot in Cybersecurity

Cybersecurity teams are building AI defenses based on yesterday's threats. As novel threat actors and unconventional attack sources emerge, organizations risk leaving themselves defenseless against adversaries they've never seen before.

## The Training Gap: A Critical Vulnerability

Modern cybersecurity relies increasingly on artificial intelligence and machine learning to detect threats—analyzing millions of events per second to identify anomalies, malware signatures, and suspicious behavior patterns. But there's a fundamental problem at the heart of this strategy: security AI systems are trained primarily on known threat actors and historical attack data.

This creates a dangerous blind spot. While organizations excel at detecting threats from established groups like APT28, Lazarus, or Emotet variants they've encountered before, they remain vulnerable to novel threat sources—attackers and attack methodologies that fall outside historical datasets.

"Cybersecurity teams need to expand their field of view to include new, unique threat sources, rather than relying on past, proven threat actors," security experts increasingly warn. The implications are stark: as threat landscapes shift and new actors emerge, traditional AI training approaches may leave organizations dangerously exposed.

## Background: How AI Became the Security Industry's Solution

Over the past decade, cybersecurity has undergone a transformation. Manual threat hunting gave way to machine learning models that could process vast amounts of network traffic, endpoint telemetry, and security logs. These systems promised scale and speed—the ability to detect threats faster and more comprehensively than human analysts ever could.

The training process seemed straightforward:

Collect historical threat data from known attacks

Label traffic, files, and behaviors as "malicious" or "benign"

Train AI models to recognize these patterns

Deploy the model to production environments

This approach has proven effective for known threats. Security teams can confidently detect:

Signature-based malware: Variants of Emotet, Trickbot, and other established malware families

Known APT tactics: Exploitation techniques attributed to specific nation-state groups

Familiar attack chains: Phishing campaigns using established lures and infrastructure

The problem: this retrospective focus creates a vulnerability to prospective threats.

## The Emerging Threat Problem: What We're Missing

Consider the landscape shifts of recent years:

| Threat Category | Historical Focus | Emerging Gap |

|---|---|---|

| Threat Actors | Nation-states, organized crime syndicates | Ideologically-motivated groups, hacktivist collectives, lone actors |

| Attack Vectors | Email, network exploitation, known CVEs | Supply chain manipulation, AI-generated content, zero-days |

| Infrastructure | Bulletproof hosting, dark web C2 servers | Legitimate cloud infrastructure abuse, IoT botnets |

| Objectives | Data theft, financial gain, espionage | Disinfo campaigns, operational disruption, brand damage |

When a threat emerges from an actor not well-represented in historical data—perhaps a regional cybercriminal group pivoting to a new industry, or an ideological collective using novel techniques—AI systems trained on legacy threat data perform poorly.

### Why This Matters Technically

Machine learning models are fundamentally pattern-recognition systems. They excel at recognizing variations of patterns they've seen thousands or millions of times. They fail at recognizing genuinely novel patterns.

Example scenarios where training gaps appear:

New attack infrastructure: An emerging threat group uses freshly registered domains with different registration patterns than known actors. Legacy AI models trained on APT domain registration behavior don't flag these as anomalous.

Unconventional lateral movement: A regional cybercrime group uses a legitimate-but-unusual administrative technique not represented in historical breach data. The AI model sees it as normal system behavior.

Industry-specific targeting: A newly-emerged threat actor focuses on a vertical (water utilities, agricultural cooperatives, maritime shipping) that wasn't heavily targeted in historical datasets. AI trained on financial services and tech industry attacks misses the pattern.

Novel payload delivery: An attacker uses a delivery mechanism not present in historical malware datasets—perhaps embedding payloads in legitimate documents in a new way. Signature-based detection fails because the signature doesn't exist.

## The Real-World Implications

Organizations relying heavily on AI-driven security face a strategic dilemma:

Overconfidence in Known-Threat Detection

Leadership believes their security posture is adequate because AI successfully stops 99% of detected threats

What remains undetected: threats that don't match historical patterns

Result: A sophisticated attack from an unfamiliar actor can establish persistence before detection

Resource Drain on Incident Response

When novel threats do slip through, analysts lack historical data or threat intelligence to understand what they're facing

Response times slow. Attribution becomes difficult

Resources meant for proactive hunting are consumed by reactive investigation

Cascading Risk Across Supply Chains

If Organization A is compromised by a novel threat that evaded their AI defenses, they may unknowingly become a stepping stone to Organization B

The threat remains undetected in A's environment because it doesn't match trained patterns

By the time it's discovered, lateral movement to trusted partners may be well underway

## What Organizations Should Do: Broadening the Field of View

Security teams must adopt a dual-model approach:

### 1. Diversify Training Data Sources

Don't rely solely on your own historical breaches and industry-wide threat feeds

Intentionally include data from emerging threat actors, regional cybercriminal forums, and threat sources not yet heavily studied

Partner with threat intelligence organizations tracking novel actors

### 2. Implement Behavioral Anomaly Detection

Move beyond pattern-matching for known threats

Build models that detect *deviations from baseline behavior*, regardless of whether those deviations match historical patterns

Focus on: unusual privilege escalation chains, unexpected data access patterns, atypical system administration activities

### 3. Maintain Human Expertise in the Loop

Don't outsource threat assessment entirely to AI

Retain threat hunting capabilities to investigate anomalies the AI flags

Build internal expertise in emerging threat landscapes

### 4. Create Feedback Loops

When novel threats are discovered, immediately incorporate that data back into training pipelines

Establish processes for AI model retraining on shorter cycles (monthly, rather than quarterly or annually)

Share anonymized novel threat intelligence across industry peers

### 5. Monitor Threat Intelligence for Leading Indicators

Subscribe to threat feeds tracking emerging actors and novel techniques before they impact your organization

Use this intelligence to intentionally stress-test your defenses against hypothetical novel threats

Red-team against unfamiliar attack methodologies

## The Path Forward

The cybersecurity industry faces a maturity challenge: we've optimized our defenses against the threats we know, at the expense of visibility into threats we don't know. This worked when threat landscapes moved slowly and novel actors were rare. It doesn't work in an environment of rapid innovation and emerging threat sources.

The question isn't whether to use AI in security—the scale of modern threats demands it. The question is whether organizations will expand their training datasets and methodologies to account for novel threats before those threats cause significant damage.

The answer, increasingly, is clear: we need to start training our defenses for threats we haven't seen yet. The alternative is to keep training AI systems for yesterday's adversaries while tomorrow's threats go undetected.