Enable Durable Data Log¶
Issue¶
A DurableDataLog
is the abstraction that the Segment Store uses to interface with the specifics of
writing to Tier-1 (i.e., Bookkeeper). As visible in that interface, a DurableDataLog
can be enabled
(writable) or disabled (non-writable). In general, when a log becomes disabled, it means that there
has been an error for which the Segment Store thinks it is safer to stop writing to the log. A typical
message indicating this situation is as follows:
2022-11-03 06:14:07,826 4991892 [core-11] ERROR i.p.s.server.logs.DurableLog - DurableLog[18] Recovery FAILED.
io.pravega.segmentstore.storage.DataLogDisabledException: BookKeeperLog is disabled. Cannot initialize.
We can classify errors that can lead to disable a DurableDataLog
into persistent and transient.
Persistent errors are those that impact the data and/or metadata of the log (i.e., data corruption).
Recovering from these errors may require data recovery procedures as explained in other documents of this
section. Attempts to enable a log in this state are not advisable and will likely result in the Segment
Store disabling the log again, once the data corruption is detected. On the other hand, Transient errors
are severe but recoverable errors that might also lead to disabling the log, such as an out of memory problem.
This article focuses on the latter category and describes how to re-enable a DurableDataLog
that has been
disabled for reasons different from data corruption.
Repair Procedure¶
-
First, configure the Pravega Admin CLI from a location/instance that can access the Zookeeper and Bookkeeper services deployed in your cluster.
-
With the Pravega Admin CLI in place, run the following command:
This command lists all the available Bookkeeper Logs in this cluster (please, be sure that the Pravega Admin CLI has thebk list ... { "key": "log_summary", "value": { "logId": 1, "epoch": 81846, "version": 1062379, "enabled": false, "ledgers": 4, "truncation": "Sequence = 344181499234420, LedgerId = 3781242, EntryId = 2164" } } ...
pravegaservice.container.count
parameter set to the same number of Segment Containers as in the cluster itself). The output of the above command shows that all the disabled Bookkeeper logs exhibit"enabled": false
. -
Next, we need to be sure that we can recover the disabled Bookkeeper log(s) as a way to verify that there are no data corruption issues. To this end, you need to run the following Pravega Admin CLI command on all the disabled logs:
container recover [ID_OF_DISABLED_CONTAINER]
If the Segment Container recovers successfully, it means that there is no data corruption issue and it is safe to enable the log again.
- Finally, we have to enable the impacted Bookkeeper log(s) that are safe to enable again. To this end,
we need to type the following Pravega Admin CLI command on all the disabled logs:
bk enable [ID_OF_DISABLED_CONTAINER]
With this above command, the Segment Container associated to the re-enabled log should be able to resume
its operation. You can run again the bk list
command to check that the current state of the Bookkeeper
logs is now enabled.