8000 Cluster Metrics by maierlars · Pull Request #11234 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Cluster Metrics #11234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 102 commits into from
Mar 16, 2020
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
c3a2e25
some missing metrics
kvahed Dec 3, 2019
a6187c7
added the missing from startistics
kvahed Dec 4, 2019
f2dd77f
minimise reallocs
kvahed Dec 4, 2019
f7a41a4
Merge branch 'devel' into bug-fix/missing-metrics
jsteemann Jan 30, 2020
767a664
Update MetricsFeature.cpp
Simran-B Feb 3, 2020
83abd1d
devel merge
kvahed Mar 3, 2020
6274fa0
Merge branch 'bug-fix/missing-metrics' of https://github.com/arangodb…
kvahed Mar 3, 2020
e6be5fb
all in
kvahed Mar 4, 2020
f7ba3fd
oh my
kvahed Mar 4, 2020
dac8b9f
Merge branch 'devel' of https://github.com/arangodb/arangodb into bug…
kvahed Mar 4, 2020
86ddd5b
logarithmic histogram
kvahed Mar 4, 2020
72c67b3
Supervision and maintenance histogram.
Mar 5, 2020
7cf1a48
Heartbeat Timings and failure counter.
Mar 5, 2020
145a53e
rewrote the whole interface of logarithmic histograms to accomodate a…
kvahed Mar 5, 2020
b559bf4
Merge branch 'devel' of https://github.com/arangodb/arangodb into bug…
kvahed Mar 5, 2020
5cd6bca
Count number of out of sync shards.
Mar 5, 2020
0d456fb
Added counter for dropped follower event.
Mar 5, 2020
987c5dc
log output
kvahed Mar 5, 2020
08ac3bc
Query Time counter, scheduler counter.
Mar 6, 2020
e0d1aa5
More shard statistics.
Mar 6, 2020
9df019d
log tests
kvahed Mar 6, 2020
aef8899
log tests
kvahed Mar 6, 2020
2737ca2
more tests
kvahed Mar 6, 2020
677829d
Added testing framework for cluster metrics. No test implemented yet
mchacki Mar 6, 2020
138e225
Merge branch 'feature/cluster-metrics' of ssh://github.com/arangodb/A…
mchacki Mar 6, 2020
14ac1df
metrics fixing on
kvahed Mar 6, 2020
8bb019d
Added a test for Query Metrics and one for Maintenance Metrics
mchacki Mar 6, 2020
e9d4275
Small changes to histrogram parameters.
Mar 9, 2020
198c44a
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 9, 2020
d8606a5
Improved test framework for Cluster metrics. It now is rather straigh…
mchacki Mar 9, 2020
fc474cf
Removed dead test code
mchacki Mar 9, 2020
cbe2765
Merge branch 'devel' of https://github.com/arangodb/arangodb into bug…
kvahed Mar 9, 2020
064933a
added tests
kvahed Mar 9, 2020
17c5e95
corrected prometheus export
kvahed Mar 9, 2020
0d62d49
More metrics.
Mar 9, 2020
2148be5
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 9, 2020
7f7b4be
log tests
kvahed Mar 9, 2020
effafe9
Merge branch 'feature/cluster-metrics' of github.com:arangodb/arangod…
Mar 9, 2020
5d4d356
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 9, 2020
2fddbd1
Improved shard count and added a heartbeat test
mchacki Mar 9, 2020
4532c74
Static::Strings!
Mar 9, 2020
cc31a2d
Merge branch 'feature/cluster-metrics' of github.com:arangodb/arangod…
Mar 9, 2020
b5b36c1
All metrics are prefixed with arangodb_
Mar 9, 2020
b06399c
Last static string.
Mar 9, 2020
e84468f
Added supervision test
mchacki Mar 9, 2020
c05bdf4
metrics with label discrimination
kvahed Mar 9, 2020
77810dc
metrics with label discrimination
kvahed Mar 9, 2020
1750900
only counters outstanding with new key scheme
kvahed Mar 10, 2020
700796c
only counters outstanding with new key scheme
kvahed Mar 10, 2020
7248328
only counters outstanding with new key scheme
kvahed Mar 10, 2020
8385334
More static strings.
Mar 10, 2020
a8d44c0
fixed label behaviour
kvahed Mar 10, 2020
fca1583
corrected retrieve bahaviour.
kvahed Mar 10, 2020
bb12555
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 10, 2020
365fa26
feature tests
kvahed Mar 10, 2020
fbd7bea
feature tests
kvahed Mar 10, 2020
52e6c8f
fetching bug
kvahed Mar 10, 2020
abd6669
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 10, 2020
99eaed8
Merge branch 'devel' of https://github.com/arangodb/arangodb into bug…
kvahed Mar 10, 2020
a528c86
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 10, 2020
6bd179a
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 10, 2020
51c50ee
fixed uint64_t
kvahed Mar 10, 2020 8000
a77d853
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 10, 2020
ef284bb
Merge branch 'devel' of https://github.com/arangodb/arangodb into bug…
kvahed Mar 10, 2020
ce4e870
wtf
kvahed Mar 10, 2020
ee61f66
histogram count labels
kvahed Mar 11, 2020
a80de52
Merge branch 'devel' of https://github.com/arangodb/arangodb into bug…
kvahed Mar 11, 2020
3bb247a
More metrics.
Mar 11, 2020
7b98405
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 11, 2020
3b2cb84
FailedServer counter.
Mar 11, 2020
6ebb0b9
histogram feature tests
kvahed Mar 11, 2020
3a800b4
histogram feature tests
kvahed Mar 11, 2020
e0b2498
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 11, 2020
10c3aa7
gauge feature tests
kvahed Mar 11, 2020
8277d0a
long int ambiguity
kvahed Mar 11, 2020
601261d
Fixing startup.
Mar 11, 2020
ea159b2
windows warning
kvahed Mar 11, 2020
94be7d7
separator on wrong side
kvahed Mar 11, 2020
95c1956
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 11, 2020
6e5b049
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 11, 2020
853f0fc
windows warnings
kvahed Mar 11, 2020
2c05c51
added role to labels
kvahed Mar 12, 2020
3b34ab5
http requests
kvahed Mar 12, 2020
8166879
Merge remote-tracking branch 'origin/devel' into feature/cluster-metrics
Mar 12, 2020
c955d13
added agency commit histogram
kvahed Mar 12, 2020
1277e9f
Merge branch 'feature/cluster-metrics' of ssh://github.com/arangodb/A…
mchacki Mar 12, 2020
36fe26d
Fixing metric names in tests.
Mar 13, 2020
15e496e
Merge remote-tracking branch 'origin/bug-fix/missing-metrics' into fe…
Mar 13, 2020
89b54d0
Merge remote-tracking branch 'origin/devel' into feature/cluster-metrics
Mar 13, 2020
c029c7b
Attempt to fix JSLINT
mchacki Mar 15, 2020
cba19c9
Merge branch 'devel' of ssh://github.com/arangodb/ArangoDB into featu…
mchacki Mar 15, 2020
dcba604
Added a chrono-cast to make macos compile
mchacki Mar 15, 2020
d9f7cca
Fixed Promotheus => JSon parser to include all labels.
mchacki Mar 15, 2020
c62cea0
Merge branch 'devel' of ssh://github.com/arangodb/ArangoDB into featu…
mchacki Mar 15, 2020
1faf8ae
JSLINT yolo
mchacki Mar 15, 2020
bbe3767
Fun with clocks... another attempt to make usage of steady and system…
mchacki Mar 15, 2020
b05137b
Yes sure every compiler is able to cast 2.0e3 to a uint64_t, and 50. …
mchacki Mar 15, 2020
5a65eef
Another cast in the wall...
mchacki Mar 15, 2020
0d4d902
Included Metrics feature into Test servers. NOTE: MaintenanceFeature …
mchacki Mar 15, 2020
aa27b75
Initialize the Metrics feature
mchacki Mar 16, 2020
006274e
Revert steady clock and use systemclock instead.
mchacki Mar 16, 2020
e54dfb6
Add the metrics feature within tests to make them pass again
mchacki Mar 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ devel

The traversal speedups observed by these changes alone were around 8 to 10% for
single-server traversals and traversals in OneShard setups. Cluster traversals
will also benefit from these changes, but to a lesser extent. This is because the
will also benefit from these changes, but to a lesser extent. This is because the
network roundtrips have a higher share of the total query execution times there.

* Traversal performance can also be improved by not fetching the visited vertices
from the storage engine in case the traversal query does not refer to them.
from the storage engine in case the traversal query does not refer to them.
For example, in the query

FOR v, e, p IN 1..3 OUTBOUND 'collection/startVertex' edges
Expand Down Expand Up @@ -56,7 +56,7 @@ devel
specified by providing the new `validation` collection property when creating a
new collection or when updating the properties of an existing collection:

db.mycollection.properties({
db.mycollection.properties({
validation: {
rule : { nums : { type : "array", items : { type : "number", maximum : 6 }}},
message : "Json-Schema validation failed"
Expand Down
5 changes: 3 additions & 2 deletions arangod/Agency/Agent.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,9 @@ Agent::Agent(ApplicationServer& server, config_t const& config)
_server.getFeature<arangodb::MetricsFeature>().counter(
"agency_agent_read_no_leader", 0, "Agency write no leader")),
_write_hist_msec(
_server.getFeature<arangodb::MetricsFeature>().histogram(
"agency_agent_write_hist", 10, 0., 20., "Agency write histogram [ms]")) {
_server.getFeature<arangodb::MetricsFeature>().histogram<log_scale_t<float>>(
"agency_agent_write_hist", log_scale_t<float>(2., 0.2, 200., 20),
"Agency write histogram [ms]")) {
_state.configure(this);
_constituent.configure(this);
if (size() > 1) {
Expand Down
2 changes: 1 addition & 1 deletion arangod/Agency/Agent.h
Original file line number Diff line number Diff line change
Expand Up @@ -499,7 +499,7 @@ class Agent final : public arangodb::Thread, public AgentInterface {
Counter& _write_no_leader;
Counter& _read_ok;
Counter& _read_no_leader;
Histogram<double>& _write_hist_msec;
Histogram<log_scale_t<float>>& _write_hist_msec;

};
} // namespace consensus
Expand Down
238 changes: 207 additions & 31 deletions arangod/RestServer/Metrics.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@
#include "counter.h"

class Counter;
template<typename T> class Histogram;

template<typename Scale> class Histogram;

class Metric {
public:
Expand Down Expand Up @@ -136,55 +137,233 @@ template<typename T> class Gauge : public Metric {

std::ostream& operator<< (std::ostream&, Metrics::hist_type const&);

enum ScaleType {LINEAR, LOGARITHMIC};

template<typename T>
struct scale_t {
public:

using value_type = T;

scale_t(T const& low, T const& high, size_t n) :
_low(low), _high(high), _n(n) {
TRI_ASSERT(n > 1);
_delim.resize(n-1);
}
virtual ~scale_t() {}
/**
* @brief number of buckets
*/
size_t n() const {
return _n;
}
/**
* @brief number of buckets
*/
T low() const {
return _low;
}
/**
* @brief number of buckets
*/
T high() const {
return _high;
}
/**
* @brief number of buckets
*/
T const& delim(size_t const& s) const {
return (s < _delim.size()) ? _delim.at(s) : _high;
}
/**
* @brief number of buckets
*/
std::vector<T> const& delims() const {
return _delim;
}
/**
* @brief dump to builder
*/
virtual void toVelocyPack(VPackBuilder& b) const {
TRI_ASSERT(b.isOpenObject());
b.add("lower-limit", VPackValue(_low));
b.add("upper-limit", VPackValue(_high));
b.add("value-type", VPackValue(typeid(T).name()));
b.add(VPackValue("range"));
VPackArrayBuilder abb(&b);
for (auto const& i : _delim) {
b.add(VPackValue(i));
}
}
/**
* @brief dump to
*/
std::ostream& print(std::ostream& o) const {
VPackBuilder b;
{
VPackObjectBuilder bb(&b);
this->toVelocyPack(b);
}
o << b.toJson();
return o;
}
protected:
T _low, _high;
std::vector<T> _delim;
size_t _n;
};

template<typename T>
std::ostream& operator<< (std::ostream& o, scale_t<T> const& s) {
return s.print(o);
}

template<typename T>
struct log_scale_t : public scale_t<T> {
public:

using value_type = T;
static constexpr ScaleType scale_type = LOGARITHMIC;

log_scale_t(T const& base, T const& low, T const& high, size_t n) :
scale_t<T>(low, high, n), _base(base) {
double nn = -1.0*n;
for (auto& i : this->_delim) {
i = (high-low) * std::pow(base, nn++) + low;
}
_div = this->_delim.front() - low;
TRI_ASSERT(_div > 0);
_lbase = logf(_base);
}
virtual ~log_scale_t() {}
/**
* @brief index for val
* @param val value
* @return index
*/
size_t pos(T const& val) const {
return static_cast<size_t>(1+std::floor(logf((val - this->_low)/_div)/_lbase));
}
/**
* @brief Dump to builder
* @param b Envelope
*/
virtual void toVelocyPack(VPackBuilder& b) const {
b.add("scale-type", VPackValue("logarithmic"));
b.add("base", VPackValue(_base));
scale_t<T>::toVelocyPack(b);
}
/**
* @brief Base
* @return base
*/
T base() const {
return _base;
}
private:
T _base, _div, _lbase;
};

template<typename T>
struct lin_scale_t : public scale_t<T> {
public:

using value_type = T;
static constexpr ScaleType scale_type = LINEAR;

lin_scale_t(T const& low, T const& high, size_t n) :
scale_t<T>(low, high, n) {
this->_delim.resize(n-1);
_div = (high - low) / (T)n;
if (_div <= 0) {
}
TRI_ASSERT(_div > 0);
T le = low;
for (auto& i : this->_delim) {
le += _div;
i = le;
}
}
virtual ~lin_scale_t() {}
/**
* @brief index for val
* @param val value
* @return index
*/
size_t pos(T const& val) const {
return static_cast<size_t>(std::floor((val - this->_low)/ _div));
}

virtual void toVelocyPack(VPackBuilder& b) const {
b.add("scale-type", VPackValue("linear"));
scale_t<T>::toVelocyPack(b);
}
private:
T _base, _div;
};


template<typename ... Args>
std::string strfmt (std::string const& format, Args ... args) {
size_t size = snprintf( nullptr, 0, format.c_str(), args ... ) + 1;
if( size <= 0 ) {
throw std::runtime_error( "Error during formatting." );
}
std::unique_ptr<char[]> buf(new char[size]);
snprintf(buf.get(), size, format.c_str(), args ...);
return std::string(buf.get(), buf.get() + size - 1); // We don't want the '\0' inside
}



/**
* @brief Histogram functionality
*/
template<typename T> class Histogram : public Metric {
template<typename Scale> class Histogram : public Metric {

public:

using value_type = typename Scale::value_type;

Histogram() = delete;

Histogram (size_t const& buckets, T const& low, T const& high, std::string const& name, std::string const& help = "")
: Metric(name, help), _c(Metrics::hist_type(buckets)), _low(low), _high(high),
_lowr(std::numeric_limits<T>::max()), _highr(std::numeric_limits<T>::min()) {
TRI_ASSERT(_c.size() > 0);
_n = _c.size() - 1;
_div = (high - low) / (double)_c.size();
TRI_ASSERT(_div != 0);
}
Histogram (Scale scale, std::string const& name, std::string const& help = "")
: Metric(name, help), _c(Metrics::hist_type(scale.n())), _scale(std::move(scale)),
_lowr(std::numeric_limits<value_type>::max()),
_highr(std::numeric_limits<value_type>::min()),
_n(scale.n()-1) {}

~Histogram() = default;

void records(T const& t) {
if(t < _lowr) {
_lowr = t;
} else if (t > _highr) {
_highr = t;
void records(value_type const& val) {
if(val < _lowr) {
_lowr = val;
} else if (val > _highr) {
_highr = val;
}
}

size_t pos(T const& t) const {
return static_cast<size_t>(std::floor((t - _low)/ _div));
size_t pos(value_type const& t) const {
return _scale.pos(t);
}

void count(T const& t) {
void count(value_type const& t) {
count(t, 1);
}

void count(T const& t, uint64_t n) {
if (t < _low) {
void count(value_type const& t, uint64_t n) {
if (t < _scale.delims().front()) {
_c[0] += n;
} else if (t >= _high) {
} else if (t >= _scale.delims().back()) {
_c[_n] += n;
} else {
_c[pos(t)] += n;
}
records(t);
}

T const& low() const { return _low; }
T const& high() const { return _high; }
value_type const& low() const { return _scale.low(); }
value_type const& high() const { return _scale.high(); }

Metrics::hist_type::value_type& operator[](size_t n) {
return _c[n];
Expand All @@ -205,36 +384,33 @@ template<typename T> class Histogram : public Metric {
virtual void toPrometheus(std::string& result) const override {
result += "#TYPE " + name() + " histogram\n";
result += "#HELP " + name() + " " + help() + "\n";
T le = _low;
T sum = T(0);
value_type sum(0);
for (size_t i = 0; i < size(); ++i) {
uint64_t n = load(i);
sum += n;
result += name() + "_bucket{le=\"" + std::to_string(le) + "\"} " +
result += name() + "_bucket{le=\"" + std::to_string(_scale.delim(i)) + "\"} " +
std::to_string(n) + "\n";
le += _div;
}
result += name() + "_count " + std::to_string(sum) + "\n";
}

std::ostream& print(std::ostream& o) const {
o << "_div: " << _div << ", _c: " << _c << ", _r: [" << _lowr << ", " << _highr << "] " << name();
o << name() << " scale: " << _scale << " extremes: [" << _lowr << ", " << _highr << "]";
return o;
}

private:
Metrics::hist_type _c;
T _low, _high, _div, _lowr, _highr;
Scale _scale;
value_type _lowr, _highr;
size_t _n;

};


std::ostream& operator<< (std::ostream&, Metrics::counter_type const&);
template<typename T>
std::ostream& operator<<(std::ostream& o, Histogram<T> const& h) {
return h.print(o);
}


#endif
4 changes: 4 additions & 0 deletions arangod/RestServer/MetricsFeature.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,10 @@ bool MetricsFeature::exportAPI() const {
void MetricsFeature::validateOptions(std::shared_ptr<ProgramOptions>) {}

void MetricsFeature::toPrometheus(std::string& result) const {

// minimize reallocs
result.reserve(65536);

{
std::lock_guard<std::mutex> guard(_lock);
for (auto const& i : _registry) {
Expand Down
Loading
0