My Page
Topics Covered
- 1. Observability
- 2. AIOps
Observability
Observability is defined as the ability of the internal states of a system to be determined by its external outputs.
Pillars of Observability
Observability relies on three main types of telemetry data: metrics, logs, and traces.
-
Traces: Provide insight into the flow of the application. Traces represent single execution flows in application or services. It has a "Start" when the user initiated the session, and "End" when the user session ended/expired. Traces have defined points in the code and record relevant details, such as timings, events, attributes, executions of sub-routines, as well as custom data.
More information: Traces
-
Metrics: Provide real-time insight into the health and performance of applications or infrastructure.
More information: Metrics
-
Logs: Provide insight into all events and errors within a software environment.
More information: Logs
Other Telemetry Data:
-
Profiling: Continuous profiling is another telemetry type used to precisely determine how an application consumes resources.
- Event-based profilers
- Statistical profilers
More Information
Profiling
Telemetry Information
OpenTelemetry
OpenTelemetry is an open-source observability framework designed to provide comprehensive insights into software systems' health, performance, and behavior. It serves as a standard for collecting, processing, and exporting telemetry data, such as traces, metrics, and logs, from distributed systems, applications, and services.
Instrumentation
This technique effectively adds instructions to the target program to collect the required information.
- Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement APIs such as the Application Response Measurement standard.
- Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy.
- Intermediate language: instrumentation added to assembly or decompiled bytecodes giving support for multiple higher-level source languages and avoiding (non-symbolic) binary offset re-writing issues.
- Compiler assisted
- Binary translation: The tool adds instrumentation to a compiled executable.
- Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool.
- Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions.
AIOps
AIOps is the use of AI and machine learning to help address challenges faced by IT teams. AIOps can help engineers do things like find the root cause of complex application performance problems or automatically remediate infrastructure failures.
AIOps Capabilities
- Event & Incident Management
- Automated discovery & Dependency Mapping
- Proactive Monitoring
- Root cause analysis
- Anomaly detection
- Automated remediation
These tools offer a range of features, such as intelligent event correlation, automated incident management, predictive analytics, and anomaly detection. By leveraging these AIOps tools, organizations can enhance their proactive monitoring capabilities, gain actionable insights, and streamline their monitoring processes. Each tool has its strengths and focuses on different aspects of AIOps, allowing organizations to choose the most suitable solution based on their specific monitoring needs.
My Observability Design
Follow below tutorial to build end to end visibility for your organization to identify issues proactively.
- Step 1 - Setup ITOps: CMDB, CI, mapping & discovery
- Step 2 - Data Collection: Collect, process, and route observability data using Observability tools agent / vendor agnostic methods from various technologies.
- Step 3 - Observability: Analyze the data, Generate alerts / visualization using observability tools.
- Step 4 - AIOps: Aggregate, correlate, and prioritize incident data. Filter out the alert noise and detect IT system performance issues.
My Observability Steps
Follow below tutorial to build end to end visibility for your organization to identify issues proactively.
Step 1 - Data Collection/Edge Processing
Visit My Open Telemetry Tutorial - Open Telemetry
Visit My Splunk Edge Processing Tutorial - Splunk Edge Processing
Visit My Cribl Tutorial - Cribl
Step 2 - Observability
Visit My Splunk Observability Tutorial - Splunk Observability
Visit My Dynatrace Tutorial - Dynatrace
Visit My NewRelic Tutorial - NewRelic
Visit My Grafana & Prometheus Tutorial - Grafana & Prometheus
Visit My Appdynamics Tutorial - Appdynamics
Visit My Elastic Tutorial - Elastic
Visit My DataDog Tutorial - DataDog
Step 3 - AIOps
Visit My AIOps Introduction Tutorial - AIOps
Visit My Moogsoft Tutorial - Moogsoft
Visit My ServiceNow Tutorial - ServiceNow
Visit My Bigpanda Tutorial - Bigpanda
Visit My Splunk ITSI Tutorial - Splunk ITSI
Additional Information
Splunk
Disclaimer: This is purely based on my learning, knowledge and reference from tutorial / documentation.
My Contact Information