Data aggregation with dogstatsd/Threadstats - Am I seeing the exact values sampled in my code? How and why is the aggregation performed?

The statsd protocol is a great setup to fire many metrics and monitor your application code without blocking HTTP calls. Data is sampled in your application code then is transmitted via UDP to the dogstatsd server (embedded in the dd-agent) that aggregates then sends data to Datadog api endpoint. You can read more about the dogstatsd setup here: http://docs.datadoghq.com/guides/dogstatsd/

This article aims at describing why and how the aggregation is performed.
(Python threadstats library variations are mentioned at the end of this article.)

Why aggregating metrics?

HTTP calls take time. The aggregation is meant to improve performance by reducing the number of api calls.

For instance, if you have a counter incremented 1,000 times (+1 each time) over a short amount of time, instead of making 1,000 separate api calls, the dogstatsd server will aggregate it into a few api calls. Depending on the situation (see below), the library may submit for instance 1 datapoint with value 1,000 or X aggregate datapoints with cumulated value 1,000.

How is aggregation performed with the dogstatsd server?

Dogstatsd uses a flush interval of 10 seconds. Every 10 seconds, dogstatsd checks all data received since the last flush (in the last 10 seconds). All values that corresponds to the same metric name and the same tags are aggregated together into a single value.

Note: with the statsd protocol, the statsd client doesn't send metrics with timestamps. The timestamp is added at the flush time. So for a flush occurring at 10:00:10, all data received by the dogstatsd server (embedded in the datadog agent) between 10:00:00 and 10:00:10 will be rolled up in a single datapoint that gets 10:00:00 as timestamp.

Aggregation rules per metric type

Among all values received during the same flush interval, the aggregated value is:

  • Gauge: the most recent datapoint received
  • Count/Counter: the sum of the received values
  • Histogram: the min, max, sum, avg, 95percentiles, count, median of all value received, check here for more details http://help.datadoghq.com/hc/en-us/articles/205638045-What-is-the-histogram-metric-type-
  • Set: the number of different values seen
  • Rate: the value difference divided by the time difference of the last 2 datapoints received

You may find more information about each metric type here: http://docs.datadoghq.com/guides/metrics/

Threadstats variations

As in dogstatsd, Threadstats performs data aggregation for performance reasons.

Variations

  • the main difference is that metrics received by Threadstats may already have a timestamp
  • besides, metrics are not aggregated via a centralized server, but they are aggregated and flushed in a python thread of your script. So you'll get a per script aggregation instead of a per host aggregation

To handle timestamps, Threadstats uses 2 parameters: a flush interval and a roll-up interval.

  • The flush interval defines the time interval between two consecutive {data aggregation + data submission}.
  • The roll up interval defines the data granularity after aggregation.

Example with flush_interval=10 and roll_up_interval=5

For instance during the flush interval of 10 seconds (between 10:00:00 and 10:00:10), Threadstatsd has received 5 datapoints for the same metric name (a counter) and same tags, with {timestamps, values} being:

  1. {09:30:15, 1}, {10:00:00, 2}, {10:00:04,1}, {10:00:05,1}, {10:00:09,1} # 1- original datapoints
  2. {09:30:10, 1}, {10:00:00, 2}, {10:00:00,1}, {10:00:05,1}, {10:00:05,1} # 2- every datapoint in the same roll_up_interval (5 seconds) gets the same timestamp
  3. {09:30:10, 1}, {10:00:00, 3}, {10:00:05,2} # 3- data is aggregated and only 4 values will be eventually submitted to Datadog

You may find more information about Threadstatsd aggregation here:https://github.com/DataDog/datadogpy/blob/master/datadog/threadstats/metrics.py

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk