Pravega Metrics¶
- Metrics Interfaces and Examples Usage
- Metrics Service Provider — Interface StatsProvider
- Metric Logger — Interface StatsLogger
- Metric Sub Logger — OpStatsLogger
- Metric Logger — Interface DynamicLogger
- Example for Starting a Metric Service
- Example for Dynamic Counter and OpStatsLogger(Timer)
- Example for Dynamic Gauge
- Example for Dynamic Meter
- Metric Registries and Configurations
- Creating Own Metrics
- Metrics Naming Conventions
- Available Metrics and Their Names
- Metrics in JVM
- Metrics in Segment Store Service
- Metrics in Controller Service
- Resources
In the Pravega Metrics Framework, we use Micrometer Metrics as the underlying library, and provide our own API to make it easier to use.
Metrics Interfaces and Examples Usage¶
-
StatsProvider
: The Statistics Provider which provides the whole Metric service. -
StatsLogger
: The Statistics Logger is where the required Metrics (Counter/Gauge/Timer/Distribution Summary) are registered. -
OpStatsLogger
: The Operation Statistics Logger is a sub-metric for the complex ones (Timer/Distribution Summary). It is included inStatsLogger
andDynamicLogger
.
Metrics Service Provider — Interface StatsProvider¶
Pravega Metric Framework is initiated using the StatsProvider
interface: it provides the start and stop methods for the Metric service. It also provides startWithoutExporting()
for testing purpose, which only stores metrics in memory without exporting them to external systems. Currently we have support for StatsD and InfluxDB registries.
start()
: Initializes the MetricRegistry and Reporters for our Metric service.startWithoutExporting()
: InitializesSimpleMeterRegistry
that holds the latest value of each Meter in memory and does not export the data anywhere, typically for unit tests.close()
: Shuts down the Metric Service.createStatsLogger()
: Create aStatsLogger
instance which is used to register and return metric objects. Application code could then perform metric operations directly with the returned metric objects.createDynamicLogger()
: Creates a Dynamic Logger.
Metric Logger — Interface StatsLogger¶
This interface can be used to register the required metrics for simple types like Counter and Gauge and some complex statistics type of Metric like OpStatsLogger
, through which we provide Timer and
Distribution Summary.
createStats()
: Register and get aOpStatsLogger
, which is used for complex type of metrics. Notice the optional metric tags.createCounter()
: Register and get a Counter Metric.createMeter()
: Create and register a Meter Metric.registerGauge()
: Register a Gauge Metric.createScopeLogger()
: Create theStatsLogger
under the given scope name.
Metric Sub Logger — OpStatsLogger¶
OpStatsLogger
can be used if the user is interested in measuring the latency of operations like CreateSegment
and ReadSegment
. Further, we could use it to record the number of operation and time/duration of each operation.
reportSuccessEvent()
: Used to track the Timer of a successful operation and will record the latency in nanoseconds in the required metric.reportFailEvent()
: Used to track the Timer of a failed operation and will record the latency in nanoseconds in required metric.reportSuccessValue()
: Used to track the Histogram of a success value.reportFailValue()
: Used to track the Histogram of a failed value.toOpStatsData()
: Used to support the JMX Reporters and unit tests.clear
: Used to clear the stats for this operation.
Metric Logger — Interface DynamicLogger¶
The following is an example of a simple interface that exposes only the simple type metrics: (Counter/Gauge/Meter).
incCounterValue()
: Increases the Counter with the given value. Notice the optional metric tags.updateCounterValue()
: Updates the Counter with the given value.freezeCounter()
: Notifies that, the Counter will not be updated.reportGaugeValue()
: Reports the Gauge value.freezeGaugeValue()
: Notifies that, the Gauge value will not be updated.recordMeterEvents()
: Records the occurrences of a given number of events in Meter.
Example for Starting a Metric Service¶
This example is from io.pravega.segmentstore.server.host.ServiceStarter
. The code for this example can be found here. It starts Pravega Segment Store service and the Metrics Service is started as a sub service.
Example for Dynamic Counter and OpStatsLogger(Timer)¶
This is an example from io.pravega.segmentstore.server.host.stat.SegmentStatsRecorderImpl.java
. The code for this example can be found here. In the class PravegaRequestProcessor
, we have registered two metrics:
- one Timer (
createStreamSegment
) - one dynamic counter (
dynamicLogger
)
From the above example, we can see the required steps to register and use dynamic counter:
- Get a dynamic logger from MetricsProvider:
DynamicLogger dynamicLogger = MetricsProvider.getDynamicLogger();
-
Increase the counter by providing metric base name and optional tags associated with the metric.
HereDynamicLogger dl = getDynamicLogger(); dl.incCounterValue(globalMetricName(SEGMENT_WRITE_BYTES), dataLength); ... dl.incCounterValue(SEGMENT_WRITE_BYTES, dataLength, segmentTags(streamSegmentName));
SEGMENT_WRITE_BYTES
is the base name of the metric. Below are the two metrics associated with it: -
The global Counter which has no tags associated.
- A Segment specific Counter which has a list of Segment tags associated.
Note that, the segmentTags
is a method to generate tags based on fully qualified Segment name.
The following are the required steps to register and use OpStatsLogger(Timer)
:
- Get a
StatsLogger
fromMetricsProvider
.
StatsLogger STATS_LOGGER = MetricsProvider.getStatsLogger("segmentstore");
StatsLogger
.
@Getter(AccessLevel.PROTECTED) final OpStatsLogger createStreamSegment = STATS_LOGGER.createStats(SEGMENT_CREATE_LATENCY);
getCreateStreamSegment().reportSuccessEvent(elapsed);
SEGMENT_CREATE_LATENCY
is the name of this metric, and createStreamSegment
is the metric object, which tracks operations of createSegment
and we will get the time (i.e. time taken by each operation and other numbers computed based on them) for each createSegment
operation happened.
Example for Dynamic Gauge¶
This is an example from io.pravega.controller.metrics.StreamMetrics
. In this class, we report
a Dynamic Gauge which represents the open Transactions of a Stream. The code for this example can be found here.
Example for Dynamic Meter¶
This is an example from io.pravega.segmentstore.server.SegmentStoreMetrics
. The code for this example can be found here. In the class SegmentStoreMetrics
, we report a Dynamic Meter which represents the Segments created with a particular container.
Metric Registries and Configurations¶
With Micrometer, each meter registry is responsible for both storage and exporting of metrics objects.
In order to have a unified interface, Micrometer provides the CompositeMeterRegistry
for the application to interact with, CompositeMeterRegistry
will forward metric operations to all the concrete registries bounded to it.
Note that when metrics service start()
, initially only a global registry (of type CompositeMeterRegistry
) is provided, which will bind concrete registries (e.g. statsD, Influxdb) based on the configurations. If no registry is switched on in config
, metrics service throws error to prevent the global registry runs into no-op mode.
Mainly for testing purpose, metrics service can also startWithoutExporting()
, where a SimpleMeterRegistry
is bound to the global registry. SimpleMeterRegistry
holds memory only storage but does not export metrics, makes it ideal for tests to verify metrics objects.
Currently Pravega supports the following:
- StatsD registry in Telegraf
flavor.
- Dimensional metrics data model (or metric tags).
- UDP as Communication protocol.
- Direct InfluxDB connection.
The reporter could be configured using the MetricsConfig
. Please refer to the example.
Creating Own Metrics¶
- When starting a Segment Store/Controller Service, start a Metric Service as a sub service. Please check
ServiceStarter.start()
public class AddMetrics { MetricsProvider.initialize(Config.METRICS_CONFIG); statsProvider.start(metricsConfig); statsProvider = MetricsProvider.getMetricsProvider(); statsProvider.start();
- Create a new
StatsLogger
instance through theMetricsProvider.createStatsLogger(String loggerName)
, and register metric using name, e.g.STATS_LOGGER.createCounter(String name)
; and then update the metric object as appropriately in the code.
static final StatsLogger STATS_LOGGER = MetricsProvider.getStatsLogger(); // <--- 1 DynamicLogger dynamicLogger = MetricsProvider.getDynamicLogger(); static class Metrics { // < --- 2 //Using Stats Logger static final String CREATE_STREAM = "stream_created"; static final OpStatsLogger CREATE_STREAM = STATS_LOGGER.createStats(CREATE_STREAM); static final String SEGMENT_CREATE_LATENCY = "segmentstore.segment.create_latency_ms"; static final OpStatsLogger createStreamSegment = STATS_LOGGER.createStats(SEGMENT_CREATE_LATENCY); //Using Dynamic Logger static final String SEGMENT_READ_BYTES = "segmentstore.segment.read_bytes"; //Dynamic Counter static final String OPEN_TRANSACTIONS = "controller.transactions.opened"; //Dynamic Gauge ... } //to report success or increment Metrics.CREATE_STREAM.reportSuccessValue(1); // < --- 3 Metrics.createStreamSegment.reportSuccessEvent(timer.getElapsed()); dynamicLogger.incCounterValue(Metrics.SEGMENT_READ_BYTES, 1); dynamicLogger.reportGaugeValue(OPEN_TRANSACTIONS, 0); //in case of failure Metrics.CREATE_STREAM.reportFailValue(1); Metrics.createStreamSegment.reportFailEvent(timer.getElapsed()); //to freeze dynamicLogger.freezeCounter(Metrics.SEGMENT_READ_BYTES); dynamicLogger.freezeGaugeValue(OPEN_TRANSACTIONS); }
Metrics Naming Conventions¶
All metric names are in the following format:
Metrics Prefix + Component Origin + Sub-Component (or Abstraction) + Metric Base Name
pravega
is configurable.
-
Component Origin: Indicates which component generates the metric, such as
segmentstore
,controller
. -
Sub-Component (or Abstraction): Indicates the second level component or abstraction, such as
cache
,transaction
,storage
. -
Metric Base Name: Indicates the
read_latency_ms
,create_count
.
For example:
pravega.segmentstore.segment.create_latency_ms
Following are some common combinations of component and sub-components (or abstractions) being used:
segmentstore.segment
: Metrics for individual Segmentssegmentstore.storage
: Metrics related to long-term storage (Tier 2)segmentstore.bookkeeper
: Metrics related to Bookkeeper (Tier 1)segmentstore.container
: Metrics for Segment Containerssegmentstore.thread_pool
: Metrics for Segment Store thread poolsegmentstore.cache
: Cache-related metricscontroller.stream
: Metrics for operations on Streams (e.g., number of streams created)controller.segments
: Metrics about Segments, per Stream (e.g., count, splits, merges)controller.transactions
: Metrics related to Transactions (e.g., created, committed, aborted)controller.retention
: Metrics related to data retention, per Stream (e.g., frequency, size of truncated data)controller.hosts
: Metrics related to Pravega servers in the cluster (e.g., number of servers, failures)controller.container
: Metrics related to Container lifecycle (e.g., failovers)
Following are the two types of metrics:
-
Global Metric:
_global
metrics are reporting global values per component (Segment Store or Controller) instance, and further aggregation logic is needed if looking for Pravega cluster globals. For instance,STORAGE_READ_BYTES
can be classified as a Global metric. -
Object-based Metric: Sometimes, we need to report metrics only based on specific objects, such as Streams or Segments. This kind of metrics use metric name as a base name in the file and are "dynamically" created based on the objects to be measured. For instance, in
CONTAINER_APPEND_COUNT
we actually report multiple metrics, one per eachcontainerId
measured, with different container tag (e.g.["containerId", "3"]
).
There are cases in which we may want both a Global and Object-based versions for the same metric. For example, regarding SEGMENT_READ_BYTES
we publish the Global version of it by adding _global
suffix to the base name
segmentstore.segment.read_bytes_global
segmentstore.segment.read_bytes, ["scope", "...", "stream", "...", "segment", "...", "epoch", "..."])
Available Metrics and Their Names¶
Metrics in JVM¶
jvm_gc_live_data_size jvm_gc_max_data_size jvm_gc_memory_allocated jvm_gc_memory_prompted jvm_gc_pause jvm_memory_committed jvm_memory_max jvm_memory_used jvm_threads_daemon jvm_threads_live jvm_threads_peak jvm_threads_states
Metrics in Segment Store Service¶
- Segment Store Read/Write latency of storage operations (Histograms):
``` segmentstore.segment.create_latency_ms segmentstore.segment.read_latency_ms segmentstore.segment.write_latency_ms
```
- Segment Store global and per-segment Read/Write Metrics (Counters):
``` // Global counters segmentstore.segment.read_bytes_global segmentstore.segment.write_bytes_global segmentstore.segment.write_events_global
// Per segment counters - all with tags {"scope", $scope, "stream", $stream, "segment", $segment, "epoch", $epoch} segmentstore.segment.write_bytes segmentstore.segment.read_bytes segmentstore.segment.write_events
```
- Segment Store cache Read/Write latency Metrics (Histogram):
segmentstore.cache.insert_latency_ms segmentstore.cache.get_latency
- Segment Store cache Read/Write Metrics (Counters):
segmentstore.cache.write_bytes segmentstore.cache.read_bytes
segmentstore.cache.size_bytes segmentstore.cache.gen
- Tier 1 Storage
DurableDataLog
Read/Write latency and queuing Metrics (Histogram):
segmentstore.bookkeeper.total_write_latency_ms segmentstore.bookkeeper.write_latency_ms segmentstore.bookkeeper.write_queue_size segmentstore.bookkeeper.write_queue_fill
segmentstore.bookkeeper.write_bytes segmentstore.bookkeeper.bookkeeper_ledger_count - with tag {"container", $containerId}
- Tier 2 Storage Read/Write latency Metrics (Histogram):
segmentstore.storage.read_latency_ms segmentstore.storage.write_latency_ms
- Tier 2 Storage Read/Write data and file creation Metrics (Counters):
segmentstore.storage.read_bytes segmentstore.storage.write_bytes segmentstore.storage.create_count
- Segment Store container-specific operation Metrics:
// Histograms - all with tags {"container", $containerId} segmentstore.container.process_operations.latency_ms segmentstore.container.process_operations.batch_size segmentstore.container.operation_queue.size segmentstore.container.operation_processor.in_flight segmentstore.container.operation_queue.wait_time segmentstore.container.operation_processor.delay_ms segmentstore.container.operation_commit.latency_ms segmentstore.container.operation.latency_ms segmentstore.container.operation_commit.metadata_txn_count segmentstore.container.operation_commit.memory_latency_ms // Gauge segmentstore.container.operation.log_size
- Segment Store operation processor (Counter) Metrics - all with tags {"container", $containerId}.
// Counters/Meters segmentstore.container.append_count segmentstore.container.append_offset_count segmentstore.container.update_attributes_count segmentstore.container.get_attributes_count segmentstore.container.read_count segmentstore.container.get_info_count segmentstore.container.create_segment_count segmentstore.container.delete_segment_count segmentstore.container.merge_segment_count segmentstore.container.seal_count segmentstore.container.truncate_count
- Segment Store active Segments (Gauge) and thread pool status (Histogram) Metrics:
// Gauge - with tags {"container", $containerId} segmentstore.active_segments // Histograms segmentstore.thread_pool.queue_size segmentstore.thread_pool.active_threads
Metrics in Controller Service¶
-
Controller Stream operation latency Metrics (Histograms):
controller.stream.created_latency_ms controller.stream.sealed_latency_ms controller.stream.deleted_latency_ms controller.stream.updated_latency_ms controller.stream.truncated_latency_ms
-
Controller global and per-Stream operation Metrics (Counters):
controller.stream.created controller.stream.create_failed_global controller.stream.create_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.sealed controller.stream.seal_failed_global controller.stream.seal_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.deleted controller.stream.delete_failed_global controller.stream.delete_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.updated_global controller.stream.updated - with tags {"scope", $scope, "stream", $stream} controller.stream.update_failed_global controller.stream.update_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.truncated_global controller.stream.truncated - with tags {"scope", $scope, "stream", $stream} controller.stream.truncate_failed_global controller.stream.truncate_failed - with tags {"scope", $scope, "stream", $stream}
-
Controller Stream retention frequency (Counter) and truncated size (Gauge) Metrics:
controller.retention.frequency - with tags {"scope", $scope, "stream", $stream} controller.retention.truncated_size - with tags {"scope", $scope, "stream", $stream}
-
Controller Stream Segment operations (Counters) and open/timed out Transactions on a Stream (Gauge) Metrics - all with tags {"scope", $scope, "stream", $stream}:
controller.transactions.opened controller.transactions.timedout controller.segments.count controller.segment.splits controller.segment.merges
-
Controller Transaction operation latency Metrics:
controller.transactions.created_latency_ms controller.transactions.committed_latency_ms controller.transactions.aborted_latency_ms
-
Controller Transaction operation counter Metrics:
controller.transactions.created_global controller.transactions.created - with tags {"scope", $scope, "stream", $stream} controller.transactions.create_failed_global controller.transactions.create_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.committed_global controller.transactions.committed - with tags {"scope", $scope, "stream", $stream} controller.transactions.commit_failed_global controller.transactions.commit_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.commit_failed - with tags {"scope", $scope, "stream", $stream, "transaction", $txnId} controller.transactions.aborted_global controller.transactions.aborted - with tags {"scope", $scope, "stream", $stream} controller.transactions.abort_failed_global controller.transactions.abort_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.abort_failed - with tags {"scope", $scope, "stream", $stream, "transaction", $txnId}
-
Controller hosts available (Gauge) and host failure (Counter) Metrics:
controller.hosts.count controller.hosts.failures_global controller.hosts.failures - with tags {"host", $host}
-
Controller Container count per host (Gauge) and failover (Counter) Metrics:
controller.hosts.container_count controller.container.failovers_global controller.container.failovers - with tags {"container", $containerId}
- Controller Zookeeper session expiration (Counter) metrics:
controller.zookeeper.session_expiration