Kerry Matre, senior director at Mandiant, discusses the appropriate metrics to use to measure SOC and analyst performance, and how MTTR leads to bad behavior.
Mean time to resolution (MTTR) is a commonly used metric in the security industry. While it has utility to a business’s risk function, it does not belong in security operations (SecOps).
First, let us level-set on what reporting is versus metrics. Reporting measures activity and does not drive specific action based on the numbers. In a security operations center (SOC), reporting can include the number of alerts or incidents, number of false positives, or number of analysts on staff. Metrics, on the other hand, provide insight into how a SOC is operating and helps to identify opportunities for improvement. Metrics provide the business with confidence in the service being provided by the security operations organization. If a metric cannot inform the business and drive change, then it is a metric worth doing without.
MTTR is not a good metric, and problematic, if used to report on activity within a SOC.
In a network operations center (NOC), uptime is the priority, and MTTR is an effective measure of performance. In a SOC, however, measuring analyst activity with MTTR can drive the wrong behavior. If analysts are rated on how quickly they close out an alert or incident, then they are incented to rush investigations and not feed updates back into the controls. This results in the same attackers making repeat appearances into an analysts’ console because they were not blocked effectively based on prior incidents.
Even worse than motivating rushed investigations, MTTR can lead analysts to ignore alerts that should otherwise be investigated. In a recent IDC InfoBrief from FireEye entitled, “The Voice of the Analysts: Improving Security Operations Center Processes Through Adapted Technologies” it was confirmed that analysts do in-fact ignore alerts. The report found that 35 percent of in-house analysts and 44 percent of analysts working in managed security service provider (MSSP) settings ignore them because they are overwhelmed with false positives and excessive alerts. Having productivity measured with MTTR can add to this stress, and in turn provoke poor alert-handling behavior.
Another example of poorly motivated analyst behavior driven by MTTR in a SOC is the practice of cherry-picking alerts. When analysts’ efficiency is measured by MTTR, it can lead them to favor alerts they know they can close out quickly. This can skew the comparison of one analyst’s efficiency vs. another. Cherry-picking also results in more difficult or involved investigations to be delayed, possibly increasing the dwell time of attackers.
When is MTTR Valuable to a SOC?
On the other hand, MTTR is valuable for reporting within a SOC when evaluating automation tools. If analysts are consistent in their investigations and remediation activities, then MTTR can be used to evaluate the effect of additional automation. If a new technology is implemented that allows analysts to perform the duties of their job faster, then MTTR can be used to validate and quantify the gains.
Good Metrics for SOC Performance
If MTTR is bad at measuring the effectiveness of a SOC, then what are good metrics for this?
Events per analyst hour: Good metrics enable an organization to take action to improve their operations. The gold standard for security operations is events per analyst hour (EPAH). This is a solid gauge for how overwhelmed an analyst currently is. If the EPAH is 100 hours, then analysts are overwhelmed. When analysts are overwhelmed, they ignore alerts and rush investigations. An appropriate EPAH is 8 – 13 hours. EPAH can alert the business that action is needed. The action can be education of staff, increased automation, or additional staff to handle the load of alerts.
Tunes per technology: Another operational issue in SOCs is the overabundance of false positives. The IDC study referenced above reported a percentage of false positives barraging analysts at 45 percent. Tracking the number of false positives and number of tunes per technology can reveal which technologies are causing the most amount of excess work for analysts. Constant tuning of technology is an administrative burden. Carefully evaluating the effectiveness of your technologies along with this burden can prove the value of your technological investments as well as the negative effect on analysts.
Realized value of technology: Underutilized technologies are a huge setback. Executives believe they reduce the risk to their organization by investing in new technologies; however, the protections were added to the backlog of undeployed technologies or technologies were deployed with the minimum set of capabilities turned on. Not having protections or features turned on (e.g. SSL inspection, URL filtering) prevents a SOC from effectively blocking attackers. A security organization should provide metrics on undeployed technologies, percentages of capabilities used within the deployed technologies, and the effectiveness of the technologies against real-world attacks.
Ultimately, SecOps provides a critical service to the business. The service is intended to provide confidence that the right controls are in place to prevent or detect an attack—furthermore, that the right processes are in place to enable a security team’s ability to do so. The right metrics will help provide that confidence and provide visibility into functional effectiveness and identify opportunities for improvement.
Kerry Matre is senior director at Mandiant.
Enjoy additional insights from Threatpost’s InfoSec Insider community by visiting our microsite.