AddSense

Full Stack Monitoring


Full Stack Monitoring


In a complex enterprise application world, investing in a good full stack monitoring platform becomes a necessity. Getting timely actionable insight into anomalous behaviors of the application could save millions of dollars, a company's brand image and their customer relationships. A good full stack monitoring platform should provide across-the-board, high-level actionable metrics as well as the ability to drill down granular details with full context. A good platform delivers all these capabilities, easily moving between high-level and lower-level views  

 
What to Monitor?

How should you choose data metrics so that you can receive meaningful, automated alerts for potential problems and quickly investigate and get to the bottom of the issues? Instrument and collect as many metrics, events and logs as you reasonably can, Since the monitoring of complex systems demands comprehensive measurements. Collect metrics with sufficient granularity to make important information visible.  
 
Server
Uptime
Processor Utilization
Memory Utilization
Reads & Writes  
Network in & out
Disk Space Utilization 
Error logs
Middleware 
Uptime
Connections
Messages Count
JVM Memory 
Error Logs

Database
Connection Check
Active / Total Sessions
Table Space Usage %
Full Table Scans
Error Logs
Application
Login Status Check
Key services Status 
Batch Job Status 
Job Completion Time
Key Transaction Counts
Users Logged-in
Error Logs
Monitoring Technology  

I got a chance to work hands-on with two leading providers of Monitoring platform for a cloud-based Lending and Leasing system: AWS CloudWatch and Data Dog. Both these are leaders in the monitoring platforms, and they enable full system visibility by bringing all the data about full stack application performance and customer experience into one platform. Investing in full stack monitoring gave us access to all the data for an entire infrastructure and applications, so teams can visualize the connections between services and components and quickly identify the source of the issues.
  
Dashboards and Visualizations

Both AWS CloudWatch and Data Dog have modern visualization capabilities using dashboards that enable you to create re-usable graphs and visualize your infrastructure resources and applications in a cohesive view. You can graph metrics and logs data side by side in a dashboard to quickly get the context and go from diagnosing the problem to understanding the root cause. You can visualize key metrics, like CPU utilization and memory, and compare them to capacity. You can also correlate the log pattern of a specific metric and set alarms to be proactively alerted about performance and operational issues. With a single source of truth, our organization was able to track the interactions between separate components and effectively troubleshoot issues that involve multiple teams and their interdependent services. 
About the Author:  
Vinay Bhatia is an experienced product leader, passionate about building financial products that delight customers. He has an extensive background in technology, cloud and agile software development.  

No comments:

Post a Comment