The diagram below shows how data reduction is implemented in the Top Bandwidth Consumers Module. This Module employs an in-memory Map-Shuffle-Reduce algorithm. To report top 50 bandwidth consumers, the Module sums up bytes by source IP/Port, Destination IP/Port, and Protocol -- processing every single flow record over a short period of time (e.g. 30 seconds) (Map), then the data is sorted by accumulated bytes (Shuffle), and finally the top 50 records are retrieved (Reduce), converted to syslog, and sent to a SIEM system (e.g. Splunk Enterprise). Thus this Module processes thousands of flow records per second, and reports only top 50 bandwidth consumers every 30 seconds, which are typically responsible for 98%-99% of all traffic bandwidth consumption.