Skip to content

Pravega Metrics

In the Pravega Metrics Framework, we use Micrometer Metrics as the underlying library, and provide our own API to make it easier to use.

Metrics Interfaces and Examples Usage

  • StatsProvider: The Statistics Provider which provides the whole Metric service.

  • StatsLogger: The Statistics Logger is where the required Metrics (Counter/Gauge/Timer/Distribution Summary) are registered.

  • OpStatsLogger: The Operation Statistics Logger is a sub-metric for the complex ones (Timer/Distribution Summary). It is included in StatsLogger and DynamicLogger.

Metrics Service Provider — Interface StatsProvider

Pravega Metric Framework is initiated using the StatsProvider interface: it provides the start and stop methods for the Metric service. It also provides startWithoutExporting() for testing purpose, which only stores metrics in memory without exporting them to external systems. Currently we have support for InfluxDB, Prometheus, and StatsD registries.


  • start(): Initializes the MetricRegistry and Reporters for our Metric service.
  • startWithoutExporting(): Initializes SimpleMeterRegistry that holds the latest value of each Meter in memory and does not export the data anywhere, typically for unit tests.
  • close(): Shuts down the Metric Service.
  • createStatsLogger(): Create a StatsLogger instance which is used to register and return metric objects. Application code could then perform metric operations directly with the returned metric objects.
  • createDynamicLogger(): Creates a Dynamic Logger.

Metric Logger — Interface StatsLogger

This interface can be used to register the required metrics for simple types like Counter and Gauge and some complex statistics type of Metric like OpStatsLogger, through which we provide Timer and Distribution Summary.


  • createStats(): Register and get a OpStatsLogger, which is used for complex type of metrics. Notice the optional metric tags.
  • createCounter(): Register and get a Counter Metric.
  • createMeter(): Create and register a Meter Metric.
  • registerGauge(): Register a Gauge Metric.
  • createScopeLogger(): Create the StatsLogger under the given scope name.

Metric Sub Logger — OpStatsLogger

OpStatsLogger can be used if the user is interested in measuring the latency of operations like CreateSegment and ReadSegment. Further, we could use it to record the number of operation and time/duration of each operation.


  • reportSuccessEvent(): Used to track the Timer of a successful operation and will record the latency in nanoseconds in the required metric.
  • reportFailEvent(): Used to track the Timer of a failed operation and will record the latency in nanoseconds in required metric.
  • reportSuccessValue(): Used to track the Histogram of a success value.
  • reportFailValue(): Used to track the Histogram of a failed value.
  • toOpStatsData(): Used to support the JMX Reporters and unit tests.
  • clear: Used to clear the stats for this operation.

Metric Logger — Interface DynamicLogger

The following is an example of a simple interface that exposes only the simple type metrics: (Counter/Gauge/Meter).


  • incCounterValue(): Increases the Counter with the given value. Notice the optional metric tags.
  • updateCounterValue(): Updates the Counter with the given value.
  • freezeCounter(): Notifies that, the Counter will not be updated.
  • reportGaugeValue(): Reports the Gauge value.
  • freezeGaugeValue(): Notifies that, the Gauge value will not be updated.
  • recordMeterEvents(): Records the occurrences of a given number of events in Meter.

Example for Starting a Metric Service

This example is from The code for this example can be found here. It starts Pravega Segment Store service and the Metrics Service is started as a sub service.

Example for Dynamic Counter and OpStatsLogger(Timer)

This is an example from The code for this example can be found here. In the class PravegaRequestProcessor, we have registered two metrics:

  • one Timer (createStreamSegment)
  • one dynamic counter (dynamicLogger)

From the above example, we can see the required steps to register and use dynamic counter:

  1. Get a dynamic logger from MetricsProvider:
     DynamicLogger dynamicLogger = MetricsProvider.getDynamicLogger();
  2. Increase the counter by providing metric base name and optional tags associated with the metric.

     DynamicLogger dl = getDynamicLogger();
     dl.incCounterValue(globalMetricName(SEGMENT_WRITE_BYTES), dataLength);
     dl.incCounterValue(SEGMENT_WRITE_BYTES, dataLength, segmentTags(streamSegmentName));
    Here SEGMENT_WRITE_BYTES is the base name of the metric. Below are the two metrics associated with it:

  3. The global Counter which has no tags associated.

  4. A Segment specific Counter which has a list of Segment tags associated.

Note that, the segmentTags is a method to generate tags based on fully qualified Segment name.

The following are the required steps to register and use OpStatsLogger(Timer):

  1. Get a StatsLogger from MetricsProvider.

  StatsLogger STATS_LOGGER = MetricsProvider.getStatsLogger("segmentstore");
2. Register all the desired metrics through StatsLogger.
  final OpStatsLogger createStreamSegment = STATS_LOGGER.createStats(SEGMENT_CREATE_LATENCY);
3. Use these metrics within code at the appropriate places where the values should be collected and recorded.

Here SEGMENT_CREATE_LATENCY is the name of this metric, and createStreamSegment is the metric object, which tracks operations of createSegment and we will get the time (i.e. time taken by each operation and other numbers computed based on them) for each createSegment operation happened.

Example for Dynamic Gauge

This is an example from io.pravega.controller.metrics.StreamMetrics. In this class, we report a Dynamic Gauge which represents the open Transactions of a Stream. The code for this example can be found here.

Example for Dynamic Meter

This is an example from io.pravega.segmentstore.server.SegmentStoreMetrics. The code for this example can be found here. In the class SegmentStoreMetrics, we report a Dynamic Meter which represents the Segments created with a particular container.

Metric Registries and Configurations

With Micrometer, each meter registry is responsible for both storage and exporting of metrics objects. In order to have a unified interface, Micrometer provides the CompositeMeterRegistry for the application to interact with, CompositeMeterRegistry will forward metric operations to all the concrete registries bounded to it.

Note that when metrics service start(), initially only a global registry (of type CompositeMeterRegistry) is provided, which will bind concrete registries (e.g. statsD, Influxdb) based on the configurations. If no registry is switched on in config, metrics service throws error to prevent the global registry runs into no-op mode.

Mainly for testing purpose, metrics service can also startWithoutExporting(), where a SimpleMeterRegistry is bound to the global registry. SimpleMeterRegistry holds memory only storage but does not export metrics, makes it ideal for tests to verify metrics objects.

Currently Pravega supports the following: - StatsD registry in Telegraf flavor. - Dimensional metrics data model (or metric tags). - UDP as Communication protocol. - Direct InfluxDB connection.

The reporter could be configured using the MetricsConfig. Please refer to the example.

Creating Own Metrics

  1. When starting a Segment Store/Controller Service, start a Metric Service as a sub service. Please check ServiceStarter.start()
 public class AddMetrics {
        statsProvider = MetricsProvider.getMetricsProvider();
  1. Create a new StatsLogger instance through the MetricsProvider.createStatsLogger(String loggerName), and register metric using name, e.g. STATS_LOGGER.createCounter(String name); and then update the metric object as appropriately in the code.
 static final StatsLogger STATS_LOGGER = MetricsProvider.getStatsLogger(); // <--- 1
 DynamicLogger dynamicLogger = MetricsProvider.getDynamicLogger();

  static class Metrics { // < --- 2
     //Using Stats Logger
     static final String CREATE_STREAM = "stream_created";
     static final OpStatsLogger CREATE_STREAM = STATS_LOGGER.createStats(CREATE_STREAM);
     static final String SEGMENT_CREATE_LATENCY = "segmentstore.segment.create_latency_ms";
     static final OpStatsLogger createStreamSegment = STATS_LOGGER.createStats(SEGMENT_CREATE_LATENCY);

     //Using Dynamic Logger
     static final String SEGMENT_READ_BYTES = "segmentstore.segment.read_bytes";  //Dynamic Counter
     static final String OPEN_TRANSACTIONS = "controller.transactions.opened";    //Dynamic Gauge

 //to report success or increment
 Metrics.CREATE_STREAM.reportSuccessValue(1); // < --- 3
 dynamicLogger.incCounterValue(Metrics.SEGMENT_READ_BYTES, 1);
 dynamicLogger.reportGaugeValue(OPEN_TRANSACTIONS, 0);

 //in case of failure

 //to freeze

Metrics Naming Conventions

All metric names are in the following format:

Metrics Prefix + Component Origin + Sub-Component (or Abstraction) + Metric Base Name
1. Metric Prefix: By default pravega is configurable.

  1. Component Origin: Indicates which component generates the metric, such as segmentstore, controller.

  2. Sub-Component (or Abstraction): Indicates the second level component or abstraction, such as cache, transaction, storage.

  3. Metric Base Name: Indicates the read_latency_ms, create_count.

For example: pravega.segmentstore.segment.create_latency_ms Following are some common combinations of component and sub-components (or abstractions) being used:

  • segmentstore.segment: Metrics for individual Segments
  • Metrics related to long-term storage (Tier 2)
  • segmentstore.bookkeeper: Metrics related to Bookkeeper (Tier 1)
  • segmentstore.container: Metrics for Segment Containers
  • segmentstore.thread_pool: Metrics for Segment Store thread pool
  • segmentstore.cache: Cache-related metrics
  • Metrics for operations on Streams (e.g., number of streams created)
  • controller.segments: Metrics about Segments, per Stream (e.g., count, splits, merges)
  • controller.transactions: Metrics related to Transactions (e.g., created, committed, aborted)
  • controller.retention: Metrics related to data retention, per Stream (e.g., frequency, size of truncated data)
  • controller.hosts: Metrics related to Pravega servers in the cluster (e.g., number of servers, failures)
  • controller.container: Metrics related to Container lifecycle (e.g., failovers)

Following are the two types of metrics:

  1. Global Metric: _global metrics are reporting global values per component (Segment Store or Controller) instance, and further aggregation logic is needed if looking for Pravega cluster globals. For instance, STORAGE_READ_BYTES can be classified as a Global metric.

  2. Object-based Metric: Sometimes, we need to report metrics only based on specific objects, such as Streams or Segments. This kind of metrics use metric name as a base name in the file and are "dynamically" created based on the objects to be measured. For instance, in CONTAINER_APPEND_COUNT we actually report multiple metrics, one per each containerId measured, with different container tag (e.g. ["containerId", "3"]).

There are cases in which we may want both a Global and Object-based versions for the same metric. For example, regarding SEGMENT_READ_BYTES we publish the Global version of it by adding _global suffix to the base name

to track the globally total number of bytes read, as well as the per-segment version of it by using the same base name and also supplying additional Segment tags to report in a finer granularity the events read per Segment.

segmentstore.segment.read_bytes, ["scope", "...", "stream", "...", "segment", "...", "epoch", "..."])

Available Metrics and Their Names

Metrics in JVM


Metrics in Segment Store Service

  • Segment Store Read/Write latency of storage operations (Histograms):

``` segmentstore.segment.create_latency_ms segmentstore.segment.read_latency_ms segmentstore.segment.write_latency_ms


  • Segment Store global and per-segment Read/Write Metrics (Counters):

``` // Global counters segmentstore.segment.read_bytes_global segmentstore.segment.write_bytes_global segmentstore.segment.write_events_global

  // Per segment counters - all with tags {"scope", $scope, "stream", $stream, "segment", $segment, "epoch", $epoch}



  • Segment Store cache Read/Write latency Metrics (Histogram):
  • Segment Store cache Read/Write Metrics (Counters):
  • Segment Store cache size (Gauge) and generation spread (Histogram) Metrics:
  • Tier 1 Storage DurableDataLog Read/Write latency and queuing Metrics (Histogram):
  • Tier 1 Storage DurableDataLog Read/Write (Counter) and per-container ledger count Metrics (Gauge):
  segmentstore.bookkeeper.bookkeeper_ledger_count - with tag {"container", $containerId}
  • Tier 2 Storage Read/Write latency Metrics (Histogram):
  • Tier 2 Storage Read/Write data and file creation Metrics (Counters):
  • Segment Store container-specific operation Metrics:
  // Histograms - all with tags {"container", $containerId}


  // Gauge
  • Segment Store operation processor (Counter) Metrics - all with tags {"container", $containerId}.
  // Counters/Meters
  • Segment Store active Segments (Gauge) and thread pool status (Histogram) Metrics:
      // Gauge - with tags {"container", $containerId}
      // Histograms

Metrics in Controller Service

  • Controller Stream operation latency Metrics (Histograms):

  • Controller global and per-Stream operation Metrics (Counters): - with tags {"scope", $scope, "stream", $stream}
  - with tags {"scope", $scope, "stream", $stream}
  - with tags {"scope", $scope, "stream", $stream}
  - with tags {"scope", $scope, "stream", $stream}
  - with tags {"scope", $scope, "stream", $stream}
  - with tags {"scope", $scope, "stream", $stream} - with tags {"scope", $scope, "stream", $stream}

  • Controller Stream retention frequency (Counter) and truncated size (Gauge) Metrics:

      controller.retention.frequency - with tags {"scope", $scope, "stream", $stream}
      controller.retention.truncated_size - with tags {"scope", $scope, "stream", $stream}

  • Controller Stream Segment operations (Counters) and open/timed out Transactions on a Stream (Gauge) Metrics - all with tags {"scope", $scope, "stream", $stream}:


  • Controller Transaction operation latency Metrics:


  • Controller Transaction operation counter Metrics: controller.transactions.created_global controller.transactions.created - with tags {"scope", $scope, "stream", $stream} controller.transactions.create_failed_global controller.transactions.create_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.committed_global controller.transactions.committed - with tags {"scope", $scope, "stream", $stream} controller.transactions.commit_failed_global controller.transactions.commit_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.commit_failed - with tags {"scope", $scope, "stream", $stream, "transaction", $txnId} controller.transactions.aborted_global controller.transactions.aborted - with tags {"scope", $scope, "stream", $stream} controller.transactions.abort_failed_global controller.transactions.abort_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.abort_failed - with tags {"scope", $scope, "stream", $stream, "transaction", $txnId}

  • Controller hosts available (Gauge) and host failure (Counter) Metrics:

      controller.hosts.failures - with tags {"host", $host}

  • Controller Container count per host (Gauge) and failover (Counter) Metrics:

      controller.container.failovers - with tags {"container", $containerId}

  • Controller Zookeeper session expiration (Counter) metrics: