Terminology¶
Here is a glossary of terms related to Pravega:
Term | Definition |
---|---|
Pravega | A data storage primitive based on append-only logs and tiered storage. |
Stream | A persistent, unbounded, append-only collection of Events. |
A Stream is identified by a name and a Scope. | |
A Stream is comprised of one or more Stream Segments. | |
Stream Segment | A shard of a Stream. |
The number of Stream Segments in a Stream might vary over time according to load and Scaling Policy. | |
In the absence of a Scale Event, Events written to a Stream with the same Routing Key are stored in the same Stream Segment and are totally ordered. | |
When a Scale Event occurs, the set of Stream Segments of a Stream changes and Events written with a given Routing Key K before the Scaling Event are stored in a different Stream Segment compared to Events written with the same Routing Key K after the event. | |
In conjunction with Reader Groups, the number of Stream Segments is the maximum amount of read parallelism of a Stream. | |
Scope | A namespace for Stream names. |
A Stream name must be unique within a Scope. | |
Event | A collection of bytes within a Stream. |
An Event is associated with a Routing Key. | |
Routing Key | A property of an Event used to route messages to Readers. |
Two Events with the same Routing Key will be read by Readers in exactly the same order they were written. | |
Reader | A software application that reads data from one or more Streams. |
Writer | A software application that writes data to one or more Streams. |
Pravega Java Client Library | A Java library that applications use to interface with Pravega |
Reader Group | A named collection of one or more Readers that read from a Stream in parallel. |
Pravega assigns Stream Segments to the Readers making sure that ll Stream Segments are assigned to at least one Reader and that hey are balanced across the Readers. | |
Position | An offset within a Stream, representing a type of recovery point for a Reader. |
If a Reader crashes, a Position can be used to initialize the failed Reader*'s replacement so that the replacement resumes processing the Stream from where the failed Reader left off. | |
Tier 1 Storage | Short term, low-latency, data storage that guarantees the durability of data written to Streams. |
The current implementation of Tier 1 uses Apache ookkeeper. | |
Tier 1 storage keeps the most recent appends to streams in Pravega. | |
As data in Tier 1 ages, it is moved out of Tier 1 into Tier 2. | |
Tier 2 Storage | A portion of Pravega storage based on cheap and deep persistent storage technology such as HDFS, DellEMC's Isilon or DellEMC's Elastic Cloud Storage. |
Pravega Server | A component of Pravega that implements the Pravega data plane API for operations such as reading from and writing to Pravega Streams. |
The data plane of Pravega, also called the Segment Store, is composed of 1 or more Pravega Server instances. | |
Segment Store | A collection of Pravega Servers that in aggregate form the data plane of a Pravega cluster. |
Controller | A component of Pravega that implements the Pravega control plane API for operations such as creating and retrieving information about Streams. |
The control plane of Pravega is composed of 1 or more Controller instances coordinated by Zookeeper. | |
Auto Scaling | A Pravega concept that allows the number of Stream Segments in a Stream to change over time, based on Scaling Policy. |
Scaling Policy | A configuration item of a Stream that determines how the number of Stream Segments in the Stream should change over time. |
There are three kinds of Scaling Policy, a Stream has exactly one of these at any given time. | |
- Fixed number of Stream Segments | |
- Change the number of Stream Segments based on the number of bytes per second written to the Stream | |
- Change the number of Stream Segments based on the number of Events er second written to the Stream | |
Scale Event | There are two types of Scale Event: Scale-Up Event and Scale-Down Event. A Scale Event triggers Auto Scaling. |
A Scale-Up Event is a situation where an increase in load causes one or more Stream Segments to be split, increasing the number of Stream Segments in the Stream. | |
A Scale-Down Event is a situation where a decrease in load causes one or more Stream Segments to be merged, reducing the number of Stream Segments in the Stream. | |
Transaction | A collection of Stream write operations that are applied atomically to the Stream. |
Either all of the bytes in a Transaction are written to the Stream or none of them are. | |
State Synchronizer | An abstraction built on top of Pravega to enable the implementation of replicated state using a Pravega segment to back up the state transformations. |
A State Synchronizer allows a piece of data to be shared between multiple processes with strong consistency and optimistic concurrency. | |
Checkpoint | A kind of Event that signals all Readers within a Reader Group to persist their state. |