Observability Patterns

Topic: Software Design Level: Intermediate

Observability Patterns - What?

Logging, Tracing, and Monitoring of multiple instances of distributed services across numerous servers

1. Log Aggregation

With a microservices design paradigm, the application can have multiple services designated for fulfilling user requests in a discrete, loosely coupled manner isolated in their process boundaries. There may also be numerous instances of the service running facilitating load balancing and scaling on additional machines based on the demand.

As the services involved process the request it simultaneously generates log statements (information, warning, error, debug) related to its logic processing with a specified format directed to a defined log file.

Log aggregation is responsible for consolidating the logs spanning across multiple instances onto a centralized service, such that we might be able to understand the sequence flow, debug an issue, filter and validate warnings that might arise, and troubleshoot in case of error/exceptions.

Centralized logging service further benefits in the parsing of log tracing (with exact request CorrelationID passed through all the services) to comprehend the evolution of the initial request as it traverses through multiple services.

Enables setting up alert monitoring agents based on the single source of log information that has accumulated across services.

Helps in understanding the data transition leading to the chronology of error manifestation.

2. Performance Metrics

Instrumenting the decoupled services to acquire details on the response latency, error rate, request threshold, thread consumption, CPU and memory utilization benefits in fine-tuning the service/server instance for better scalability, robust fault tolerance and graceful termination is part of performance metrics enablement and evaluation.

Metrics compiled for the application involving multiple services can provide insights into the E2E system usage and determines the threshold points for optimization and also narrow down the fundamental service responsible for degradation.

The analysis could further introduce possibilities to scale services that take a toll on the incoming request thereby enabling dynamic real-time application expansion.

3. Distributed Tracing

Tracing a particular user application request traversing through multiple services for understanding the effect of the request, so as to pinpoint its existence in the different services and to obtain the conclusive data response for such a request.

A unique transaction ID (correlation ID) is passed through the call chain of each transaction in a distributed topology. One example of a transaction is user interaction with a website.

The unique ID is generated at the entry point of the transaction. The ID is then passed to each service that is used to finish the job and written as part of the services log information. It's equally important to include timestamps in your log messages along with the ID. The ID and timestamp are combined with the action that a service is taking and the state of that action.

For instance, in the above illustration when the request enters into the application a unique alphanumeric ID is assigned to it and as the request context flows through multiple services for processing its business logic facilitation the ID is passed along the context thus enabling manageable tracing in distributed architecture.

With distributed tracing, you can,

Chronologically track the sequence of processes performed
Establish interlinkages between multiple services
Resolve the request data lifecycle
Investigate and troubleshoot issues in decouples services
Build an audit trail of events for the request

4. Health Check

Consistently monitoring the distributed services for availability can reduce unanticipated application downtimes thereby improving resiliency. Having a service client implementation that periodically invokes the service endpoints to inspect the health and state of the service instance, by reconciling with the expected response (predefined static (or) dynamic response evaluation) else alert notify on the corrective action.

At a minimum, a health check API is a separate REST service that is implemented within a microservice component that quickly returns the operational status of the service and an indication of its ability to connect to downstream dependent services. An advanced health check API can be extended to return performance information, such as connection times. The results must be returned as an HTTP status code with JSON data.

The health check APIs can be further enhanced to assess the below on the instrumented services,

Bugs
Memory Leaks
Thread Leaks
Configuration Issues
Deadlocks
Connection Pool Managment
Process Redundancies
External Connection Dependencies

References

Disclaimer:

This is a personal blog. Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated. Any views or opinions are not intended to malign any religion, ethnic group, club, organization, company, or individual. All content provided on this blog is for informational purposes only. The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information.

Downloadable Files and ImagesAny downloadable file, including but not limited to pdfs, docs, jpegs, pngs, is provided at the user’s own risk. The owner will not be liable for any losses, injuries, or damages resulting from a corrupted or damaged file.
Comments are welcome. However, the blog owner reserves the right to edit or delete any comments submitted to this blog without notice due to :
Comments deemed to be spam or questionable spam.
Comments including profanity.
Comments containing language or concepts that could be deemed offensive.
Comments containing hate speech, credible threats, or direct attacks on an individual or group.
The blog owner is not responsible for the content in the comments. This blog disclaimer is subject to change at any time.

The Lance

Search This Blog