MyObservability

Key Performance Indicator (KPIs)

A KPI (Key Performance Indicator) is a recurring saved search that returns the value of an IT performance metric.

Recommended number of KPIs per service

Path: Login to ITSI -> Configuration -> Services -> Select Service -> KPIs -> New.

Select one of the below option:

Steps:

  1. Step 1 - (Reqired) Define a KPI Source search
  2. Step 2 - (Optional) Split and filter by entitites
  3. Step 3 - (Reqired) Configure KPI monitoring calculations
  4. Step 4 - (Optional) Define KPI unit and monitoring lag
  5. Step 5 - (Optional) Enable backfill
  6. Step 6 - (Reqired) Configure KPI thresholds

Step 1 - (Reqired) Define a KPI Source search

Consider the performance implications for your particular deployment.

Search types:

Note:

Step 2 - (Optional) Split and filter by entitites

Split a KPI by entities in IT Service Intelligence (ITSI) to monitor each individual entity against which the KPI search runs.

Note:

Step 3 - (Reqired) Configure KPI monitoring calculations

KPI monitoring calculations determine how and when ITSI performs statistical calculations on the KPI. They also determine how ITSI displays gaps in your data.

Options:

Note:

Step 4 - (Optional) Define KPI unit and monitoring lag

Configure the monitoring lag to offset indexing lag and improve performance. Unit is measurement to display for the KPI like %, Secs, MBps etc.

The monitoring lag time, in seconds, is used to offset the indexing lag. Monitoring lag is an estimate of the number of seconds it takes for new events to move from the source to the index. When indexing large quantities of data, an indexing lag can occur, which can cause performance issues. Delay the search time window to ensure that events are actually in the index before running the search.

Step 5 - (Optional) Enable backfill

Enable backfill for a KPI in IT Service Intelligence (ITSI) to fill the summary index with historical raw KPI data. In other words, even though the summary index only started collecting data at the start of this week when the KPI was created, if necessary you can use the backfill option to fill the summary index with data from the past month.

Prerequisite:

Note: Backfill is a one-time operation. Once started, it cannot be redone or undone. For example, if you backfill 60 days of data and then later decide that you want 120 days, you cannot go back and change the backfill period. Think carefully about how many days of data you want to backfill before saving the service.

How backfill fills data gaps

Status

ITSI supports a maximum of 60 days of data in the summary index. Therefore, after you configure backfill, you see one of the following messages:

Step 6 - (Reqired) Configure KPI thresholds

Severity-level thresholds determine the current status of your KPI in IT Service Intelligence (ITSI). When KPI values meet or exceed threshold conditions, the KPI status changes.

Threshold Types:

Set KPI Importance values in ITSI

After you create a KPI in IT Service Intelligence (ITSI), assign the KPI an importance value. ITSI uses KPI importance values, along with the KPI severity levels, to calculate the overall service health score. A service’s health score is a weighted average of the severity levels of a service’s KPIs and dependencies.

Importance values range from 0 to 11. KPI importance values from 1-11 are included in the health score calculation, with 1 being the least important and 11 being the most important. KPIs with an importance value of 0 aren’t included in the health score calculation. The greater the KPI importance value, the greater the impact that KPI has on the service health score.

ITSI considers KPIs that have an importance value of 11 as a special case that represents a “minimum health indicator” for the service. When a KPI with an importance value of 11 reaches the critical state, the overall health score for the service turns critical, regardless of the status of other KPIs in the service .

How service health scores are calculated

Note: The Info severity level isn’t included in the service health score calculation.

For example, a service contains 2 KPIs. One KPI is Critical, so the score_contribution value is 0. The other KPI is Normal, so the score_contribution value is 100. Assuming both KPIs have the same importance values, the service health score will be 50.

The following formula is used to calculate service health scores:

Where:

For example, if you set KPI importance values as follows:

The service health score is calculated as follows:

Service health score = (100 ∗ 10/22) + (70 ∗ 7/22) + (30 ∗ 5/22) = 45.45 + 22.27 + 6.81 = 74.53

Impact of per-entity thresholds on service health scores

In some cases, entity severity contributions can cause the overall service health score to change significantly, while the aggregate KPI severity level changes only marginally or not at all. For example, if you have a CPU % utilization KPI that is running against three entities, and two of those entities show normal severity, while the third shows critical, the overall service health score might show critical, while the aggregate KPI severity level remains normal.

Create KPI base searches in ITSI

KPI base searches let you share a search definition across multiple KPIs in IT Service Intelligence (ITSI). Create base searches to consolidate multiple similar KPIs, reduce search load, and improve search performance.

ITSI module base searches

ITSI includes several pre-configured KPI base searches based on ITSI modules that you can use with your services.

Path: Configuration -> KPI Base Searches -> Create KPI Base Search

Saved searches saved in savedsearches.conf as “Indicator - shared - <-name-> Search”

Service templates and base searches

Overview of Service Templete

Path: Configuration -> Service/Service Templet -> Select Service/Service Templet -> KPIs -> New -> Select KPI

Delete a KPI base search

Wildcards in KPI base searches

KPI base search performance considerations

The performance of KPI base searches is dependent on the following factors:

Note:

Fix truncated or incorrect KPI values

Search results are processed, created, and written to the itsi_summary index via an alert action. The default limit on the number of rows that can be written is 50,000.

Calculate the number of the result rows generated by a shared base search using the following formula:

< number of services> x < number of KPIs in each service> x < number of entities per service entity rule> + < number of services> x 2 (one for the service aggregation result, one for the service maximum result)

For example, for 500 services with 10 KPIs in each service and 15 matching entities, the expected number of result rows is 500 x 10 x 15 + 500 x 2 = 76,000 rows.

If the number of result rows expected is more than 50,000, ITSI truncates the results and displays incorrect KPI values.

Increase value in $SPLUNK_HOME/etc/system/default/limits.conf

Increase the KV store bulk get limit

The KPI base search tries to get all the relevant services from the KV store internally for thresholding related operations. When a KPI base search is attached to a lot of services, the bulk get might reach the KV store bulk get size limit. The default limit is 500MB.

Synchronize KPI searches in ITSI

By default, ITSI staggers the search scheduling of KPIs in order to reduce search load. For example, if you have five KPIs that are scheduled to run every 5 minutes, the search to update the value of each KPI from the summary index is staggered over the 5 minute interval (the first KPI at minute 1, the second KPI at minute 2, and so on).

You can synchronize KPI searches so they update at the same time during the scheduled interval.


Next Chapter: Advanced Thresholding