Data-Driven monitoring selection approach for Azure applications.

Or how to keep your Insights bill at check

Stas(Stanislav) Lebedenko
Geek Culture

--

Photo by Jessika Arraes from Pexels

TL;DR; APM or Application Insights are great, but expensive, so you need to carefully consider amount of log data generated and do at least a ballpark estimation. What to do with this calculation and how to choose a proper monitoring solution I will explain below
You will find comprehensive list of usage scenarios for Azure Container Apps(ACA) for brown and green field projects. I will mention pros and cons and extend this links to detailed articles per scenario, so you can choose the applicable one. Visit
azurespringclean community event.

Intro

As more applications migrate to the cloud, ensuring that these applications run smoothly and without problems is becoming increasingly important. One method is to use data-driven monitoring and logging for Azure applications. In this article, we’ll look at three scenarios for monitoring and logging Azure applications and discuss when each is appropriate.

How much is the fish(data) :)?

Before you begin, you need to calculate the amount of data with your educated guess and best possible assumption, something like this.

A possible approach to estimate the number of logs generated by an application:

  1. Determine the average number of API requests per time. If available, this can be historical data or projections for expected traffic.
  2. Pick the average number of log entries generated per API request. Or try a guess number of around 6 log entries per request.
  3. Estimate the average size of each log entry in bytes. Let’s say that estimate for text-based logs is around 256 bytes per entry.
  4. Use these numbers to calculate the average amount of logs in kilobytes. If the application receives 10,000 API requests per minute and generates 6 log entries of 256 bytes each per request, the total log data generated per minute would be:
    10,000 req/min x 6 logs/req x 256 bytes/entry = 15,360,000 bytes/min
    dividing by 1024 would give the result in kb:
    15,360,000 bytes/min / 1024 bytes/kb = 15,000 kilobytes/min
  5. This would be equivalent to 15 megabytes of log data per minute.

Of course, these are just rough estimates, but they will give you a better projection than just selecting a random service, so now we can look into the three scenarios below after calculating the data.

Scenario 1.

Application Performance Management (APM) Services APM services like Application Insights, Container Insights, VM Insights, and Network Insights are the easiest way to monitor complex applications and the most expensive ones in the end. The main features

  • An auto-generated correlation ID.
  • Performance and availability alerts.
  • Easy discovery of connected applications and issues.

They are a viable solution for logs under 10 GB per day. Otherwise, the monthly cost would be more than 1000 USD if your logs exceed this limit. APM services are an excellent choice for applications generating logs under 10 GB per day.

Scenario 2.

Log Analytics Approach without Insight APM If an application generates basic logs (not structured objects, like analytics logs) between 10 GB and 30 GB per day.

This approach is around ten times cheaper than APM Insights analytics, but it requires an extensive collection of queries to analyze logs and external dashboards to visualize data. Kusto queries to the stored logs are billed separately based on the number of searches per gigabyte.

Log Analytics uses a custom query language called Kusto Query Language (KQL) to analyze data. KQL allows you to search for specific data within logs, perform complex operations on that data, but it have a learning curve, because it is not the same as SQL.

And most critical thing, You’ll need to build a custom correlation ID programmatically for distributed operation traceability and consider this twice before going for this money-saving option.

Azure Managed Grafana service or PowerBI should be used for data visualization, resulting in additional costs per table below.

One area where Grafana stands out is in its ability to handle time-series data. Grafana offers a variety of built-in functions for time-series data manipulation and the ability to use custom JavaScript functions.

Be aware that only the Graphana Enterprise version (still in public preview) can use the following data connectors like ServiceNow, DataDog, and Azure DevOps (direct connection), and it requires an expensive license.

Scenario 3.

A custom containerized solution is a viable way for applications generating more than 30 GB of logs daily. Suppose you expect that solution to continue to grow, and you will have a budget for the AKS cluster and DevOps engineer to maintain it. In that case, you need to compare these numbers with the cost of the Azure APM platform and make an architecture decision record about reasoning somewhere.

There are a variety of platforms to choose from, including GrayLog, DataDog, Grafana, Grafana Loki, and Elastic stack. It’s important to consider each platform’s features, capabilities, and costs before selecting the one that best meets your needs.

GrayLog is the easiest and most modern option available. It is an open-source log management and analysis platform that provides advanced searching, dashboards, alerts, and sidecars for Linux containers.

DataDog is a monster that can do everything for you at a considerable cost. It is a cloud-based monitoring and analytics platform that offers real-time monitoring capabilities, dashboards, advanced correlations, and analysis capabilities. Grafana is an open-source platform for creating and exploring monitoring with analytics dashboards.

While Grafana Loki is a logging solution best suited for storing and searching large amounts of log data. Elastic stack (Elasticsearch, Logstash, and Kibana) is a classic open-source stack for monitoring, but it requires more overhead to support.

Summary.

In conclusion, selecting the right option for monitoring and logging your Azure applications is crucial for ensuring your applications’ smooth performance and ability to get valuable insights as quickly as possible for both performance counters and security.

--

--

Stas(Stanislav) Lebedenko
Geek Culture

Azure MVP | MCT | Software/Cloud Architect | Dev | https://github.com/staslebedenko | Odesa MS .NET/Azure group | Serverless fan 🙃| IT2School/AtomSpace