My Page
Observability: Introduction
Observability is defined as the ability of the internal states of a system to be determined by its external outputs.
Pillars of Observability
Observability relies on three main types of telemetry data: metrics, logs, and traces.
-
Traces: Provide insight into the flow of the application. Traces represent single execution flows in application or services. It has a "Start" when the user initiated the session, and "End" when the user session ended/expired. Traces have defined points in the code and record relevant details, such as timings, events, attributes, executions of sub-routines, as well as custom data.
More information: Traces
-
Metrics: Provide real-time insight into the health and performance of applications or infrastructure.
More information: Metrics
-
Logs: Provide insight into all events and errors within a software environment.
More information: Logs
Other Telemetry Data:
-
Profiling: Continuous profiling is another telemetry type used to precisely determine how an application consumes resources.
- Event-based profilers
- Statistical profilers
More Information
Profiling
Telemetry Information
OpenTelemetry
OpenTelemetry is an open-source observability framework designed to provide comprehensive insights into software systems' health, performance, and behavior. It serves as a standard for collecting, processing, and exporting telemetry data, such as traces, metrics, and logs, from distributed systems, applications, and services.
Instrumentation
This technique effectively adds instructions to the target program to collect the required information.
- Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement APIs such as the Application Response Measurement standard.
- Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy.
- Intermediate language: instrumentation added to assembly or decompiled bytecodes giving support for multiple higher-level source languages and avoiding (non-symbolic) binary offset re-writing issues.
- Compiler assisted
- Binary translation: The tool adds instrumentation to a compiled executable.
- Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool.
- Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions.
AIOps: Introduction
AIOps is the use of AI and machine learning to help address challenges faced by IT teams. AIOps can help engineers do things like find the root cause of complex application performance problems or automatically remediate infrastructure failures.
AIOps Capabilities
- Event & Incident Management
- Automated discovery & Dependency Mapping
- Proactive Monitoring
- Root cause analysis
- Anomaly detection
- Automated remediation
These tools offer a range of features, such as intelligent event correlation, automated incident management, predictive analytics, and anomaly detection. By leveraging these AIOps tools, organizations can enhance their proactive monitoring capabilities, gain actionable insights, and streamline their monitoring processes. Each tool has its strengths and focuses on different aspects of AIOps, allowing organizations to choose the most suitable solution based on their specific monitoring needs.
AIOps Tutorial
Disclaimer: This is purely based on my learning, knowledge and reference from tutorial / documentation.
My Contact Information