Summary Index for Large Volumes of Data

If you ingest large volumes of NetFlow data, you may consider using summary indexes to improve search efficiency and application performance. For more information about Splunk Summary Indexes, visit Use summary indexing.

This App provides two sets of saved searches:

  1. Saved searches scheduled to run every 5 minutes - these searches create 5 min roll-up data from indexed events

  2. Saved searches scheduled to run every hour - these searches create 1 hour roll-up data from 5 minute summary index

Setting Metrics Summary Index (SI)

Please follow the steps listed below to use summary index based dashboards.

1. Create Summary Index

  1. In Splunk go to Settings > Indexes and click "New Index" button

  2. In the Index Name text field fill in the name of the new index, for example "summary_metrics". The SI saved searches by default are using "summary_metrics"

  3. In Index Data Type row select Metrics tab

  4. Click Save button

To use a different summary index please perform the following on your search heads:

In Settings->Advanced search->Search macros find the netflow_si_index macro, click on it and change the value in the Definition field

from: index=summary_metrics to: index=<your SI index>

2. Assign your SI Index in Savedsearches

  1. In Settings->Searches, report, and alerts find the20067_summary_5min search. Under Actions select Edit summary indexing in Edit drop down, and assign <your SI index>

  2. Enable the Savedsearch (Under Action select Enable)

Repeat these steps for other Savedsearches.

3. (Optional) Select 5 minute or 1 hour Summary data

SI dashboards use 5 minute summary data by default.

To use 1 hour summary data instead of 5 minute summary data, please perform the following:

In Settings->Advanced search->Search macros find the netflow_si_source macro, click on it and change the value in the Definition field

from: source=20067_summary_5min to: source=20067_summary_1h

Another macro, netflow_subnet_groups_si_source , is used to set source for Traffic by Subnet Groups SI dashboard. To use 1 hour summary instead of 5 min summary, click on this macro and change the value in the Definition field

from: source=20067_summary_5min_subnet_groups to: source=20067_summary_1h_subnet_groups

5 Minute Roll-up Saved Searches

Saved search

Description

20067_summary_5min

This search rolls up events sent to Splunk by NFO Top Traffic Monitor (events with nfc_id=20067). Use Overview > Traffic Overview SI dashboard to view this data. You can find other dashboards to view this data by going to Metrics SI > [dashbaord name] SI

20067_summary_5min_subnet_groups

This search rolls up events sent to Splunk by NFO Top Traffic Monitor (events with nfc_id=20067). Use More Traffic Statistics > Traffic by Subnets Groups SI dashboard to view this data

One Hour Roll-up Saved Searches

Saved search

Description

20067_summary_1h

This search rolls up data created by the corresponding 5 minutes roll-up search. Use Overview > Traffic Overview SI dashboard to view this data. You can find other dashboards to view this data by going to Metrics SI > [dashbaord name] SI

20067_summary_1h_subnet_groups

This search rolls up data created by the corresponding 5 minutes roll-up search. Use More Traffic Statistics > Traffic by Subnets Groups SI dashboard to view this data

Manage Summary Index Gaps

Use the backfill script (fill_summary_index.py) to insure the accuracy of your summary indexes based dashboards, or to create summary index for flow data already ingested and indexed.

For more information about backfill script, visit Manage summary index gaps.

If you have Splunk Enterprise, you can use the following command to run the backfill script:

./splunk cmd python fill_summary_index.py -app netflow -name 20067_summary_5min -et -30min -lt now -j 2 -owner admin -showprogress true -dedup true -dedupsearch '| mpr
eview `netflow_si_index` filter="source=20067_summary_5min" | stats count by time | eval search_now = time | table search_now count'

Where:

-et is the earliest time - how far the backfilling should go -lt is the latest time for backfilling -j how many searches should run in parallel, it should be adjusted based on your Splunk deployment resources 20067_summary_5min use this to backfill 5 minute summary index; replace it with 20067_summary_1h for 1 hour summary index.

NOTE: replace script name in both places in the command above