MITRE ATT&CK Evaluations Return: More Coverage, More Nuance

MITRE released a new round of MITRE ATT&CK enterprise evaluations today. This round had a lot of big changes — first off, only 11 vendors participated, which is a drop off from the 19 that participated in 2024. Some of the most notable missing vendors include SentinelOne, Microsoft, and Palo Alto Networks. Overall, it seems like some vendors prioritized their own internal product efforts over the evaluation, likely due to investment in other areas, market and economic dynamics, and changes in the landscape.

Forrester strongly believes in the power of unbiased, third-party evaluations, especially of security products. Security products can sometimes be a black box. Evaluations like these, especially when the data is shared, make capabilities a little less opaque.

Round Seven: Breaking New Ground

This round emulated Scattered Spider, a financially motivated cybercriminal collective, and Mustang Panda, a PRC-based espionage group.

The MITRE ATT&CK team made big changes to the infrastructure in the evaluation to make it closely resemble a real-world scenario. The environment had more endpoints and subnets, which were built out into a realistic and complex network topology. Much like last round, when it introduced expanded coverage with macOS, this year, it expanded coverage to the cloud in addition to Windows and Linux devices.

The evaluations also expanded the scope to additional telemetry sources like identity, email, and cloud. For example, some of the emulations included identity compromise through single sign-on and multifactor authentication as well as the abuse of cloud services.

MITRE included unmanaged devices in the evaluation, which demonstrated a blind spot for many providers. Unmanaged devices emulate real-world environments where organizations have bring-your-own devices without managed agents, third-party contractors accessing on-premises or remotely, or test networks where endpoints won’t run standard protections.

A nuance worth noting is that the vendor tools used in this round are disparate. In past years, most vendors tested their EDR tool, but in this round, there were a variety of modules used together. For example, Trend Micro used modules from its Vision One platform, including endpoint security, network security, cloud security, and exposure management. WithSecure used its EPP, XDR, and exposure management capabilities. Cyberani used a combination of SIEM, XDR, TIP, sandbox analysis, and XDR — all part of its MDR service.

Detection Tests: Why Are We Still Dealing With Hundreds Of Alerts?

There were two detection tests that emulated Scattered Spider and Mustang Panda. Both leveraged an array of LOLBins, tool downloads, and many different devices across the network. Within the detections tests, MITRE included the reconnaissance tactic to expand the detection window, specifically phishing, which is new for this round.

Importantly, there’s a clear distinction between the vendors that provided multiple alerts and those that provided very few alerts, correlated with all context. Vendors like CrowdStrike, Cybereason, and ESET only generated a handful of detections for each scenario. Those that provided very few were not necessarily seeing less — instead, as is a theme across the industry, vendors are more effectively consolidating related alerts into single cases instead of inundating users with a disparate barrage of alerts. Others, such as Sophos and Trend Micro, generated hundreds of alerts. Some of those may be suppressed in the console, as many fall into the medium or low categories. Even still, the market is moving toward the consolidation of alerts into cases, and all vendors in this evaluation should be, as well.

Protection Tests

There were seven protection tests, one for each stage: credential theft, identity providers, unmanaged to managed devices, initial access malware execution, malware execution and lateral movement, false positives, and AWS compromise.

The goal of the protection tests wasn’t just to show an instance of “stopping of the threat” but to measure its impact. Was the attack stopped before the threat actor had a chance to gain persistence or steal credentials? This shows the importance of not only detecting an attack in progress but stopping it before it exposes the environment.

The MITRE ATT&CK team also included a protection test that incorporated false positives. In this test, every single activity that took place was considered non-malicious and was supposed to be reported on as such. If the vendor blocked a particular action, it was a false positive. Ideally, zero security alerts should be generated off that test. Of all the vendors, Cybereason, Cynet, and Sophos all blocked activity during that test, which were false positives.

Test two, which focused on an adversary manipulating IdP trust relationships, was dropped due to difficulty distinguishing legitimate administrative activities from malicious actions. This is why you’ll see no responses for that test if you’re looking at the results.

The Need For Third-Party Testing

Given the many market conversations and lower-than-average turnout in this round of testing, it’s worth addressing the future of third-party testing like this and its impact on the security community. Many practitioners Forrester speaks with struggle to interpret and understand the results of these evaluations, and for good reason: There’s a lot of data, and the MITRE ATT&CK team hasn’t made a judgment call on which outcomes signal better performance. Even still, tests like these are important — especially when they are given room to evolve.

MITRE ATT&CK made many changes in this round for the better: incorporating cloud, building a more realistic environment, continuing to incorporate noise/false positive tests, and expanding coverage to reconnaissance. Although not every practitioner will have the time or resources to dig through the data, the testing is still important to push the detection and response vendors forward. The evaluation offers a critical lens into where visibility and prevention fall short — and where vendors each perform most effectively.

If you’re a Forrester client, book an inquiry or guidance session with either of us if you have questions about the results.

Source link