8000 Improve supportability (#14639) · arangodb/arangodb@e6b1770 · GitHub
[go: up one dir, main page]

Skip to content

Commit e6b1770

Browse files
jsteemannSimran-B
andauthored
Improve supportability (#14639)
* Improve supportability * Added metrics for the number of errors and warnings logged: - `arangodb_logger_warnings_total`: total number of warnings (WARN messages) logged since server start. - `arangodb_logger_errors_total`: total number of errors (ERR messages) logged since server start. * Added REST API `/_admin/support-info` to retrieve deployment information. As this API may reveal sensitive data about the deployment, it can only be accessed from inside the system database and only if `--server.harden` is not set. The API can also be turned off entirely using the new startup option flag `--server.support-api`. * let support-info requests through * handle active failover cases * adjust CHANGELOG * added tests * added tests for startup option * rename startup option * remove redundant code * extend support info API tests * add tests * remove leftover conflict marker * Add DocuBlock with single server and cluster examples * HTTP 404 in case the support info API is turned off HTTP 403 if permissions are insufficient, but that error can be raised for any endpoint Co-authored-by: Simran Spiller <simran@arangodb.com>
1 parent be8b157 commit e6b1770

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1266
-63
lines changed

CHANGELOG

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,33 @@
11
devel
22
-----
33

4+
* Added metrics for the number of errors and warnings logged:
5+
- `arangodb_logger_warnings_total`: total number of warnings (WARN messages)
6+
logged since server start.
7+
- `arangodb_logger_errors_total`: total number of errors (ERR messages)
8+
logged since server start.
9+
10+
* Added REST API `/_admin/support-info` to retrieve deployment information.
11+
As this API may reveal sensitive data about the deployment, it can only
12+
be accessed from inside the system database. In addition, there is a
13+
policy control startup option `--server.support-info-api` that
14+
determines if and to whom the API is made available. This option can
15+
have the following values:
16+
- `disabled`: support info API is disabled.
17+
- `jwt`: support info API can only be accessed via superuser JWT.
18+
- `hardened`: if `--server.harden` is set, the support info API can
19+
only be accessed via superuser JWT. Otherwise it can be accessed
20+
by admin users only.
21+
- `public`: everyone with access to `_system` database can access the
22+
support info API.
23+
424
* Fixes a bug in the maintenance's error-handling code. A shard error would
525
result in log messages like
6-
WARNING [ceb1a] {maintenance} caught exception in Maintenance shards error reporting: Expecting Object
7-
ERROR [c9a75] {maintenance} Error reporting in current: Expecting Object
26+
27+
WARNING [ceb1a] {maintenance} caught exception in Maintenance shards
28+
error reporting: Expecting Object
29+
ERROR [c9a75] {maintenance} Error reporting in current: Expecting Object
30+
831
and also prevent the maintenance from reporting the current state to the
932
agency, which in turn can prevent cluster-wide progress of various actions.
1033

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
@startDocuBlock get_admin_support_info
2+
@brief Get deployment information
3+
4+
@RESTHEADER{GET /_admin/support-info, Get information about the deployment, getSupportInfo}
5+
6+
@RESTDESCRIPTION
7+
Retrieves deployment information for support purposes. The endpoint returns data
8+
about the ArangoDB version used, the host (operating system, server ID, CPU and
9+
storage capacity, current utilization, a few metrics) and the other servers in
10+
the deployment (in case of Active Failover or cluster deployments).
11+
12+
As this API may reveal sensitive data about the deployment, it can only be
13+
accessed from inside the `_system` database. In addition, there is a policy
14+
control startup option `--server.support-info-api` that controls if and to whom
15+
the API is made available.
16+
17+
@RESTRETURNCODES
18+
19+
@RESTRETURNCODE{200}
20+
21+
@RESTREPLYBODY{date,string,required,}
22+
ISO 8601 datetime string of when the information was requested.
23+
24+
@RESTREPLYBODY{deployment,object,required,}
25+
An object with at least a `type` attribute, indicating the deployment type.
26+
27+
In case of a `"single"` server, additional information is provided in the
28+
top-level `host` attribute.
29+
30+
In case of a `"cluster"`, there is a `servers` object that contains a nested
31+
object for each Coordinator and DB-Server, using the server ID as key. Each
32+
object holds information about the ArangoDB instance as well as the host machine.
33+
There are additional attributes for the number of `agents`, `coordinators`,
34+
`dbServers`, and `shards`.
35+
36+
@RESTREPLYBODY{host,object,optional,}
37+
An object that holds information about the ArangoDB instance as well as the
38+
host machine. Only set in case of single servers.
39+
40+
@RESTRETURNCODE{404}
41+
The support info API is turned off.
42+
43+
@EXAMPLES
44+
45+
Query support information from a single server
46+
47+
@EXAMPLE_ARANGOSH_RUN{RestAdminSupportInfo}
48+
var url = "/_admin/support-info";
49+
var response = logCurlRequest("GET", url);
50+
assert(response.code === 200);
51+
assert(JSON.parse(response.body).host !== undefined);
52+
logJsonResponse(response);
53+
@END_EXAMPLE_ARANGOSH_RUN
54+
55+
Query support information from a cluster
56+
57+
@EXAMPLE_ARANGOSH_RUN{RestAdminSupportInfo_cluster}
58+
var url = "/_admin/support-info";
59+
var response = logCurlRequest("GET", url);
60+
assert(response.code === 200);
61+
assert(JSON.parse(response.body).deployment.servers !== undefined);
62+
logJsonResponse(response);
63+
@END_EXAMPLE_ARANGOSH_RUN
64+
65+
@endDocuBlock
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: arangodb_logger_errors_total
2+
introducedIn: "3.9.0"
3+
help: |
4+
Total number of errors logged.
5+
unit: number
6+
type: counter
7+
category: Errors
8+
complexity: simple
9+
exposedBy:
10+
- agent
11+
- coordinator
12+
- dbserver
13+
- single
14+
description: |
15+
Total number of errors (ERR messages) logged by the logger.
16+
17+
If a problem is encountered which is fatal to some operation, but not for
18+
the service or the application as a whole, then an _error is logged.
19+
20+
Reasons for log entries of this severity are for example include missing
21+
data, inability to open required files, incorrect connection strings,
22+
missing services.
23+
24+
If an error is logged then it should be taken seriously as it may require
25+
user intervention to solve.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
name: arangodb_logger_warnings_total
2+
introducedIn: "3.9.0"
3+
help: |
4+
Total number of warnings logged.
5+
unit: number
6+
type: counter
7+
category: Errors
8+
complexity: simple
9+
exposedBy:
10+
- agent
11+
- coordinator
12+
- dbserver
13+
- single
14+
description: |
15+
Total number of warnings (WARN messages) logged by the logger,
16+
including startup warnings.
17+
18+
Warnings might indicate problems, or might not. For example,
19+
expected transient environmental conditions such as short loss of
20+
network or database connectivity are logged as warnings, not errors.

arangod/CMakeLists.txt

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -742,6 +742,7 @@ set(LIB_ARANGOSERVER_SOURCES
742742
RestHandler/RestSimpleQueryHandler.cpp
743743
RestHandler/RestStatusHandler.cpp
744744
RestHandler/RestSupervisionStateHandler.cpp
745+
RestHandler/RestSupportInfoHandler.cpp
745746
RestHandler/RestSystemReportHandler.cpp
746747
RestHandler/RestTasksHandler.cpp
747748
RestHandler/RestTimeHandler.cpp
@@ -761,13 +762,16 @@ set(LIB_ARANGOSERVER_SOURCES
761762
RestServer/DatabaseFeature.cpp
762763
RestServer/DatabasePathFeature.cpp
763764
RestServer/EndpointFeature.cpp
765+
RestServer/EnvironmentFeature.cpp
764766
RestServer/FileDescriptorsFeature.cpp
765767
RestServer/FlushFeature.cpp
766768
RestServer/FortuneFeature.cpp
767769
RestServer/FrontendFeature.cpp
768770
RestServer/InitDatabaseFeature.cpp
769771
RestServer/LanguageCheckFeature.cpp
770772
RestServer/LockfileFeature.cpp
773+
RestServer/LogBufferFeature.cpp
774+
RestServer/MaxMapCountFeature.cpp
771775
RestServer/QueryRegistryFeature.cpp
772776
RestServer/ScriptFeature.cpp
773777
RestServer/ServerFeature.cpp
@@ -820,7 +824,10 @@ if (USE_MAINTAINER_MODE)
820824
endif()
821825

822826
if (NOT MSVC)
823-
set(LIB_ARANGOSERVER_SOURCES ${LIB_ARANGOSERVER_SOURCES} GeneralServer/AcceptorUnixDomain.cpp)
827+
set(LIB_ARANGOSERVER_SOURCES ${LIB_ARANGOSERVER_SOURCES}
828+
GeneralServer/AcceptorUnixDomain.cpp
829+
RestServer/DaemonFeature.cpp
830+
RestServer/SupervisorFeature.cpp)
824831
endif()
825832

826833
include(ClusterEngine/CMakeLists.txt)

arangod/FeaturePhases/BasicFeaturePhaseServer.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,17 @@
2323

2424
#include "BasicFeaturePhaseServer.h"
2525

26-
#include "ApplicationFeatures/DaemonFeature.h"
27-
#include "ApplicationFeatures/EnvironmentFeature.h"
2826
#include "ApplicationFeatures/GreetingsFeaturePhase.h"
2927
#include "ApplicationFeatures/LanguageFeature.h"
30-
#include "ApplicationFeatures/MaxMapCountFeature.h"
3128
#include "ApplicationFeatures/NonceFeature.h"
3229
#include "ApplicationFeatures/PrivilegeFeature.h"
33-
#include "ApplicationFeatures/SupervisorFeature.h"
3430
#include "ApplicationFeatures/TempFeature.h"
31+
#include "RestServer/DaemonFeature.h"
3532
#include "RestServer/DatabasePathFeature.h"
33+
#include "RestServer/EnvironmentFeature.h"
3634
#include "RestServer/FileDescriptorsFeature.h"
35+
#include "RestServer/MaxMapCountFeature.h"
36+
#include "RestServer/SupervisorFeature.h"
3737
#include "Scheduler/SchedulerFeature.h"
3838
#include "Sharding/ShardingFeature.h"
3939
#include "Ssl/SslFeature.h"

arangod/GeneralServer/CommTask.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ CommTask::Flow CommTask::prepareExecution(auth::TokenCache::Entry const& authTok
191191
!::startsWith(path, "/_admin/server/") &&
192192
!::startsWith(path, "/_admin/status") &&
193193
!::startsWith(path, "/_admin/statistics") &&
194+
!::startsWith(path, "/_admin/support-info") &&
194195
!::startsWith(path, "/_api/agency/agency-callbacks") &&
195196
!(req.requestType() == RequestType::GET && ::startsWith(path, "/_api/collection")) &&
196197
!::startsWith(path, "/_api/cluster/") &&

arangod/GeneralServer/GeneralServerFeature.cpp

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,9 +84,10 @@
8484
#include "RestHandler/RestShutdownHandler.h"
8585
#include "RestHandler/RestSimpleHandler.h"
8686
#include "RestHandler/RestSimpleQueryHandler.h"
87-
#include "RestHandler/RestSystemReportHandler.h"
8887
#include "RestHandler/RestStatusHandler.h"
8988
#include "RestHandler/RestSupervisionStateHandler.h"
89+
#include "RestHandler/RestSupportInfoHandler.h"
90+
#include "RestHandler/RestSystemReportHandler.h"
9091
#include "RestHandler/RestTasksHandler.h"
9192
#include "RestHandler/RestTestHandler.h"
9293
#include "RestHandler/RestTimeHandler.h"
@@ -125,6 +126,7 @@ GeneralServerFeature::GeneralServerFeature(application_features::ApplicationServ
125126
_proxyCheck(true),
126127
_permanentRootRedirect(true),
127128
_redirectRootTo("/_admin/aardvark/index.html"),
129+
_supportInfoApiPolicy("hardened"),
128130
_numIoThreads(0) {
129131
setOptional(true);
130132
startsAfter<application_features::AqlFeaturePhase>();
@@ -150,9 +152,16 @@ void GeneralServerFeature::collectOptions(std::shared_ptr<ProgramOptions> option
150152
options->addOldOption("no-server", "server.rest-server");
151153

152154
options->addOption("--server.io-threads",
153-
"Number of threads used to handle IO",
155+
"number of threads used to handle IO",
154156
new UInt64Parameter(&_numIoThreads),
155157
arangodb::options::makeDefaultFlags(arangodb::options::Flags::Dynamic));
158+
159+
options->addOption("--server.support-info-api",
160+
"policy for exposing support info API",
161+
new DiscreteValuesParameter<StringParameter>(
162+
&_supportInfoApiPolicy,
163+
std::unordered_set<std::string>{"disabled", "jwt", "hardened", "public"}))
164+
.setIntroducedIn(30900);
156165

157166
options->addSection("http", "HTTP server features");
158167

@@ -323,6 +332,10 @@ bool GeneralServerFeature::permanentRootRedirect() const {
323332
std::string GeneralServerFeature::redirectRootTo() const {
324333
return _redirectRootTo;
325334
}
335+
336+
std::string const& GeneralServerFeature::supportInfoApiPolicy() const noexcept {
337+
return _supportInfoApiPolicy;
338+
}
326339

327340
rest::RestHandlerFactory& GeneralServerFeature::handlerFactory() {
328341
return *_handlerFactory;
@@ -554,6 +567,11 @@ void GeneralServerFeature::defineHandlers() {
554567

555568
_handlerFactory->addHandler("/_admin/status",
556569
RestHandlerCreator<RestStatusHandler>::createNoData);
570+
571+
if (_supportInfoApiPolicy != "disabled") {
572+
_handlerFactory->addHandler("/_admin/support-info",
573+
RestHandlerCreator<RestSupportInfoHandler>::createNoData);
574+
}
557575

558576
_handlerFactory->addHandler("/_admin/system-report",
559577
RestHandlerCreator<RestSystemReportHandler>::createNoData);

arangod/GeneralServer/GeneralServerFeature.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ class GeneralServerFeature final : public application_features::ApplicationFeatu
5252
Result reloadTLS();
5353
bool permanentRootRedirect() const;
5454
std::string redirectRootTo() const;
55+
std::string const& supportInfoApiPolicy() const noexcept;
5556

5657
rest::RestHandlerFactory& handlerFactory();
5758
rest::AsyncJobManager& jobManager();
@@ -69,6 +70,7 @@ class GeneralServerFeature final : public application_features::ApplicationFeatu
6970
std::vector<std::string> _trustedProxies;
7071
std::vector<std::string> _accessControlAllowOrigins;
7172
std::string _redirectRootTo;
73+
std::string _supportInfoApiPolicy;
7274
std::unique_ptr<rest::RestHandlerFactory> _handlerFactory;
7375
std::unique_ptr<rest::AsyncJobManager> _jobManager;
7476
std::vector<std::unique_ptr<rest::GeneralServer>> _servers;

arangod/RestHandler/RestAdminClusterHandler.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1521,7 +1521,7 @@ RestStatus RestAdminClusterHandler::handleHealth() {
15211521
AsyncAgencyComm::RequestType::READ, VPackBuffer<uint8_t>())
15221522
.thenValue([self](AsyncAgencyCommResult&& result) {
15231523
// this lambda has to capture self since collect returns early on an
1524-
// exception and the RestHandle might be freed too early otherwise
1524+
// exception and the RestHandler might be freed too early otherwise
15251525

15261526
if (result.fail() || result.statusCode() != fuerte::StatusOK) {
15271527
THROW_ARANGO_EXCEPTION(result.asResult());

arangod/RestHandler/RestAdminLogHandler.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,12 @@
3535
#include "Cluster/ServerState.h"
3636
#include "GeneralServer/AuthenticationFeature.h"
3737
#include "GeneralServer/ServerSecurityFeature.h"
38-
#include "Logger/LogBufferFeature.h"
3938
#include "Logger/Logger.h"
4039
#include "Logger/LoggerFeature.h"
4140
#include "Network/Methods.h"
4241
#include "Network/NetworkFeature.h"
4342
#include "Network/Utils.h"
43+
#include "RestServer/LogBufferFeature.h"
4444
#include "Utils/ExecContext.h"
4545

4646
using namespace arangodb;

0 commit comments

Comments
 (0)
0