You’re sitting at your desk with a half-eaten sandwich, slouching in your chair and enjoying the frustration-free morning you’ve had because all your dashboards have been looking good.
Suddenly your product manager startles you and tells you customers can’t log in to your app—and they’re upset. But your dashboards still look nominal. So you rush to your log management tool to check for errors and start the fun process of root cause analysis. That’s when you see 5xx messages exploding in front of your eyes and wonder why you didn’t receive your “proactive” alert from your logging suite.
The other half of your sandwich sits out all afternoon while you scramble to get your servers back up. Your CTO is now standing over your shoulder telling you every minute to resolution is costing the company $5,600. (That’s a real figure by the way—check out this research by Gartner from 2014. It’s undoubtedly a larger number by now.)
The Problem with Proactive Alerts
We’ve all experienced this before. You see, there’s a problem with alerting in log management tools today: They’re not really “proactive.” Most tools out there poll your logs in intervals, say 5 to 15 minute increments (depending on how much you’re paying on your plan). And in order to set up the conditions needed to have the software send you alerts, you generally must have already experienced downtime before so you know what to alert on. After that, you get to spend weeks tweaking the alerts so they catch every incident and don’t fill your inbox with unnecessary false positives.
So what happens when your alerts aren’t predictive and real-time, as in scenarios like above? That’s right. Your customers become your alerting system. And that’s when your product manager alerts you before your centralized log management tool does.
When you rely on past downtime issues to create criteria and thresholds to prevent future downtime, you know something’s broken, and it’s not your server. You can’t quickly fix downtime problems when your log management tool’s alerting technology is essentially broken itself.
Predictive Alerts Prevent Downtime
What does predictive alerting look like? Imagine your logging tool parses your logs in real-time as they get generated on your servers. Because the daemon running on your server is a smart agent written in Google’s lightning-quick golang, and it doesn’t poll entries in intervals, alerts get generated in real-time (2 seconds, not 5 minutes). And because the agent, stream processor, and alert detector are all built with enterprise-grade artificial intelligence, your logs get analyzed for predictive trends. This means you and your DevOps team get alerts based on events that lead to server sickness happening now, not 5 minutes ago. What we’ve just described is Lumberjack, Blue Matador’s logging tool from the future—available to the public in May.
We think predictive alerting is essential in centralized log management. Let’s be honest here. Brand it as what you want, but without AI-powered prediction, “proactive” alerts are really just “reactive,” only notifying you of problems when things have already gone awry. Why spend those critical seconds of downtime performing root cause analysis when your logging tool could do it for you? In fact, your tool should be doing it for you. With predictive alerts, it’s possible to be notified of problems 5 minutes before something ever affects customers, giving you more time to prevent problems without the product manager or CTO standing over your shoulder.
And that means your sandwich gets eaten at lunch and you go home on time because customers are happy. The way alerting should be working.
Centralized Log Management with Predictive Alerting
Interested in predictive alerts? Request a beta invite to Lumberjack today. Coming in May 2017.
Blue Matador Staff
We began with the goal of making monitoring truly proactive — not reactive. At Blue Matador, we provide peace of mind for DevOps professionals by enabling them to proactively monitor their infrastructure for the first time.