8000 Added (gauge) metric "rocksdb_read_only" by jsteemann · Pull Request #14470 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Added (gauge) metric "rocksdb_read_only" #14470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
devel
-----

* APM-107: Added metric "rocksdb_read_only" to determine whether RocksDB is
currently in read-only mode due to a background error. The metric will have
a value of "1" if RocksDB is in read-only mode and "0" if RocksDB is in
normal operations mode. If the metric value is "1" it means all writes into
RocksDB will fail, so inspecting the logfiles and acting on the actual error
situation is required.

* Fix potential memleak in Pregel conductor garbage collection.

* Added a retry loop for arangorestore during the initial connection phase. The
Expand Down
31 changes: 31 additions & 0 deletions Documentation/Metrics/rocksdb_read_only.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: rocksdb_read_only
introducedIn: "3.8.1"
help: |
RocksDB metric "background-errors"
unit: number
type: gauge
category: RocksDB
complexity: simple
exposedBy:
- dbserver
- agent
- single
description: |
This metric indicates whether RocksDB currently is in read-only
mode, due to a background error. If RocksDB is in read-only mode,
this metric will have a value of "1". When in read-only mode, all
writes into RocksDB will fail. When RocksDB is in normal operations
mode, this metric will have a value of "0".
troubleshoot: |
If this value is non-zero, it means that all write operations in
RocksDB will fail until the RocksDB background error is resolved.
The arangod server logfile should show more details about the exact
errors that are happening, so logs should be inspected first.
RocksDB can set a background error when some I/O operation fails.
This is often due to disk space usage issues, so often either freeing
disk space or increasing the disk capacity will help.
Under some conditions, RocksDB can automatically resume from the
background error and go back into normal operations. However, if the
background error happens during certain RocksDB operations, it cannot
resume operations automatically, so the instance will need a manual
restart after the error condition is removed.
12 changes: 10 additions & 2 deletions arangod/RocksDBEngine/Listeners/RocksDBBackgroundErrorListener.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ void RocksDBBackgroundErrorListener::OnBackgroundError(rocksdb::BackgroundErrorR
}

if (!_called.exchange(true)) {
std::string operation = "unknown";
char const* operation = "unknown";
switch (reason) {
case rocksdb::BackgroundErrorReason::kFlush: {
operation = "flush";
Expand All @@ -61,8 +61,16 @@ void RocksDBBackgroundErrorListener::OnBackgroundError(rocksdb::BackgroundErrorR

LOG_TOPIC("fae2c", ERR, Logger::ROCKSDB)
<< "RocksDB encountered a background error during a " << operation << " operation: "
<< (status != nullptr ? status->ToString() : "unknown error") << "; The database will be put in read-only mode, and subsequent write errors are likely. It is advised to shut down this instance, resolve the error offline and then restart it.";
<< (status != nullptr ? status->ToString() : "unknown error")
<< "; The database will be put in read-only mode, and subsequent write errors are likely. It is advised to shut down this instance, resolve the error offline and then restart it.";
}
}

void RocksDBBackgroundErrorListener::OnErrorRecoveryCompleted(rocksdb::Status /* old_bg_error */) {
_called.store(false, std::memory_order_relaxed);

LOG_TOPIC("8ff56", WARN, Logger::ROCKSDB)
<< "RocksDB resuming operations after background error";
}

} // namespace arangodb
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ class RocksDBBackgroundErrorListener : public rocksdb::EventListener {

void OnBackgroundError(rocksdb::BackgroundErrorReason reason, rocksdb::Status* error) override;

void OnErrorRecoveryCompleted(rocksdb::Status /* old_bg_error */) override;

bool called() const { return _called.load(std::memory_order_relaxed); }

private:
Expand Down
5 changes: 5 additions & 0 deletions arangod/RocksDBEngine/RocksDBEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2406,6 +2406,7 @@ DECLARE_GAUGE(rocksdb_total_disk_space, uint64_t, "rocksdb_total_disk_space");
DECLARE_GAUGE(rocksdb_total_inodes, uint64_t, "rocksdb_total_inodes");
DECLARE_GAUGE(rocksdb_total_sst_files_size, uint64_t, "rocksdb_total_sst_files_size");
DECLARE_GAUGE(rocksdb_engine_throttle_bps, uint64_t, "rocksdb_engine_throttle_bps");
DECLARE_GAUGE(rocksdb_read_only, uint64_t, "rocksdb_read_only");

void RocksDBEngine::getStatistics(std::string& result, bool v2) const {
VPackBuilder stats;
Expand Down Expand Up @@ -2618,6 +2619,10 @@ void RocksDBEngine::getStatistics(VPackBuilder& builder, bool v2) const {
}
}

if (_errorListener) {
builder.add("rocksdb.read-only", VPackValue(_errorListener->called() ? 1 : 0));
}

builder.close();
}

Expand Down
0