8000 Improve supportability by jsteemann · Pull Request #14639 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Improve supportability #14639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10000
27 changes: 25 additions & 2 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,10 +1,33 @@
devel
-----

* Added metrics for the number of errors and warnings logged:
- `arangodb_logger_warnings_total`: total number of warnings (WARN messages)
logged since server start.
- `arangodb_logger_errors_total`: total number of errors (ERR messages)
logged since server start.

* Added REST API `/_admin/support-info` to retrieve deployment information.
As this API may reveal sensitive data about the deployment, it can only
be accessed from inside the system database. In addition, there is a
policy control startup option `--server.support-info-api` that
determines if and to whom the API is made available. This option can
have the following values:
- `disabled`: support info API is disabled.
- `jwt`: support info API can only be accessed via superuser JWT.
- `hardened`: if `--server.harden` is set, the support info API can
only be accessed via superuser JWT. Otherwise it can be accessed
by admin users only.
- `public`: everyone with access to `_system` database can access the
support info API.

* Fixes a bug in the maintenance's error-handling code. A shard error would
result in log messages like
WARNING [ceb1a] {maintenance} caught exception in Maintenance shards error reporting: Expecting Object
ERROR [c9a75] {maintenance} Error reporting in current: Expecting Object

WARNING [ceb1a] {maintenance} caught exception in Maintenance shards
error reporting: Expecting Object
ERROR [c9a75] {maintenance} Error reporting in current: Expecting Object

and also prevent the maintenance from reporting the current state to the
agency, which in turn can prevent cluster-wide progress of various actions.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
@startDocuBlock get_admin_support_info
@brief Get deployment information

@RESTHEADER{GET /_admin/support-info, Get information about the deployment, getSupportInfo}

@RESTDESCRIPTION
Retrieves deployment information for support purposes. The endpoint returns data
about the ArangoDB version used, the host (operating system, server ID, CPU and
storage capacity, current utilization, a few metrics) and the other servers in
the deployment (in case of Active Failover or cluster deployments).

As this API may reveal sensitive data about the deployment, it can only be
accessed from inside the `_system` database. In addition, there is a policy
control startup option `--server.support-info-api` that controls if and to whom
the API is made available.

@RESTRETURNCODES

@RESTRETURNCODE{200}

@RESTREPLYBODY{date,string,required,}
ISO 8601 datetime string of when the information was requested.

@RESTREPLYBODY{deployment,object,required,}
An object with at least a `type` attribute, indicating the deployment type.

In case of a `"single"` server, additional information is provided in the
top-level `host` attribute.

In case of a `"cluster"`, there is a `servers` object that contains a nested
object for each Coordinator and DB-Server, using the server ID as key. Each
object holds information about the ArangoDB instance as well as the host machine.
There are additional attributes for the number of `agents`, `coordinators`,
`dbServers`, and `shards`.

@RESTREPLYBODY{host,object,optional,}
An object that holds information about the ArangoDB instance as well as the
host machine. Only set in case of single servers.

@RESTRETURNCODE{404}
The support info API is turned off.

@EXAMPLES

Query support information from a single server

@EXAMPLE_ARANGOSH_RUN{RestAdminSupportInfo}
var url = "/_admin/support-info";
var response = logCurlRequest("GET", url);
assert(response.code === 200);
assert(JSON.parse(response.body).host !== undefined);
logJsonResponse(response);
@END_EXAMPLE_ARANGOSH_RUN

Query support information from a cluster

@EXAMPLE_ARANGOSH_RUN{RestAdminSupportInfo_cluster}
var url = "/_admin/support-info";
var response = logCurlRequest("GET", url);
assert(response.code === 200);
assert(JSON.parse(response.body).deployment.servers !== undefined);
logJsonResponse(response);
@END_EXAMPLE_ARANGOSH_RUN

@endDocuBlock
25 changes: 25 additions & 0 deletions Documentation/Metrics/arangodb_logger_errors_total.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: arangodb_logger_errors_total
introducedIn: "3.9.0"
help: |
Total number of errors logged.
unit: number
type: counter
category: Errors
complexity: simple
exposedBy:
- agent
- coordinator
- dbserver
- single
description: |
Total number of errors (ERR messages) logged by the logger.

If a problem is encountered which is fatal to some operation, but not for
the service or the application as a whole, then an _error is logged.

Reasons for log entries of this severity are for example include missing
data, inability to open required files, incorrect connection strings,
missing services.

If an error is logged then it should be taken seriously as it may require
user intervention to solve.
20 changes: 20 additions & 0 deletions Documentation/Metrics/arangodb_logger_warnings_total.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: arangodb_logger_warnings_total
introducedIn: "3.9.0"
help: |
Total number of warnings logged.
unit: number
type: counter
category: Errors
complexity: simple
exposedBy:
- agent
- coordinator
- dbserver
- single
description: |
Total number of warnings (WARN messages) logged by the logger,
including startup warnings.

Warnings might indicate problems, or might not. For example,
expected transient environmental conditions such as short loss of
network or database connectivity are logged as warnings, not errors.
9 changes: 8 additions & 1 deletion arangod/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -742,6 +742,7 @@ set(LIB_ARANGOSERVER_SOURCES
RestHandler/RestSimpleQueryHandler.cpp
RestHandler/RestStatusHandler.cpp
RestHandler/RestSupervisionStateHandler.cpp
RestHandler/RestSupportInfoHandler.cpp
RestHandler/RestSystemReportHandler.cpp
RestHandler/RestTasksHandler.cpp
RestHandler/RestTimeHandler.cpp
Expand All @@ -761,13 +762,16 @@ set(LIB_ARANGOSERVER_SOURCES
RestServer/DatabaseFeature.cpp
RestServer/DatabasePathFeature.cpp
RestServer/EndpointFeature.cpp
RestServer/EnvironmentFeature.cpp
RestServer/FileDescriptorsFeature.cpp
RestServer/FlushFeature.cpp
RestServer/FortuneFeature.cpp
RestServer/FrontendFeature.cpp
RestServer/InitDatabaseFeature.cpp
RestServer/LanguageCheckFeature.cpp
RestServer/LockfileFeature.cpp
RestServer/LogBufferFeature.cpp
RestServer/MaxMapCountFeature.cpp
RestServer/QueryRegistryFeature.cpp
RestServer/ScriptFeature.cpp
RestServer/ServerFeature.cpp
Expand Down Expand Up @@ -820,7 +824,10 @@ if (USE_MAINTAINER_MODE)
endif()

if (NOT MSVC)
set(LIB_ARANGOSERVER_SOURCES ${LIB_ARANGOSERVER_SOURCES} GeneralServer/AcceptorUnixDomain.cpp)
set(LIB_ARANGOSERVER_SOURCES ${LIB_ARANGOSERVER_SOURCES}
GeneralServer/AcceptorUnixDomain.cpp
RestServer/DaemonFeature.cpp
RestServer/SupervisorFeature.cpp)
endif()

include(ClusterEngine/CMakeLists.txt)
Expand Down
8 changes: 4 additions & 4 deletions arangod/FeaturePhases/BasicFeaturePhaseServer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,17 @@

#include "BasicFeaturePhaseServer.h"

#include "ApplicationFeatures/DaemonFeature.h"
#include "ApplicationFeatures/EnvironmentFeature.h"
#include "ApplicationFeatures/GreetingsFeaturePhase.h"
#include "ApplicationFeatures/LanguageFeature.h"
#include "ApplicationFeatures/MaxMapCountFeature.h"
#include "ApplicationFeatures/NonceFeature.h"
#include "ApplicationFeatures/PrivilegeFeature.h"
#include "ApplicationFeatures/SupervisorFeature.h"
#include "ApplicationFeatures/TempFeature.h"
#include "RestServer/DaemonFeature.h"
#include "RestServer/DatabasePathFeature.h"
#include "RestServer/EnvironmentFeature.h"
#include "RestServer/FileDescriptorsFeature.h"
#include "RestServer/MaxMapCountFeature.h"
#include "RestServer/SupervisorFeature.h"
#include "Scheduler/SchedulerFeature.h"
#include "Sharding/ShardingFeature.h"
#include "Ssl/SslFeature.h"
Expand Down
1 change: 1 addition & 0 deletions arangod/GeneralServer/CommTask.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ CommTask::Flow CommTask::prepareExecution(auth::TokenCache::Entry const& authTok
!::startsWith(path, "/_admin/server/") &&
!::startsWith(path, "/_admin/status") &&
!::startsWith(path, "/_admin/statistics") &&
!::startsWith(path, "/_admin/support-info") &&
!::startsWith(path, "/_api/agency/agency-callbacks") &&
!(req.requestType() == RequestType::GET && ::startsWith(path, "/_api/collection")) &&
!::startsWith(path, "/_api/cluster/") &&
Expand Down
22 changes: 20 additions & 2 deletions arangod/GeneralServer/GeneralServerFeature.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,10 @@
#include "RestHandler/RestShutdownHandler.h"
#include "RestHandler/RestSimpleHandler.h"
#include "RestHandler/RestSimpleQueryHandler.h"
#include "RestHandler/RestSystemReportHandler.h"
#include "RestHandler/RestStatusHandler.h"
#include "RestHandler/RestSupervisionStateHandler.h"
#include "RestHandler/RestSupportInfoHandler.h"
#include "RestHandler/RestSystemReportHandler.h"
#include "RestHandler/RestTasksHandler.h"
#include "RestHandler/RestTestHandler.h"
#include "RestHandler/RestTimeHandler.h"
Expand Down Expand Up @@ -125,6 +126,7 @@ GeneralServerFeature::GeneralServerFeature(application_features::ApplicationServ
_proxyCheck(true),
_permanentRootRedirect(true),
_redirectRootTo("/_admin/aardvark/index.html"),
_supportInfoApiPolicy("hardened"),
_numIoThreads(0) {
setOptional(true);
startsAfter<application_features::AqlFeaturePhase>();
Expand All @@ -150,9 +152,16 @@ void GeneralServerFeature::collectOptions(std::shared_ptr<ProgramOptions> option
options->addOldOption("no-server", "server.rest-server");

options->addOption("--server.io-threads",
"Number of threads used to handle IO",
"number of threads used to handle IO",
new UInt64Parameter(&_numIoThreads),
arangodb::options::makeDefaultFlags(arangodb::options::Flags::Dynamic));

options->addOption("--server.support-info-api",
"policy for exposing support info API",
new DiscreteValuesParameter<StringParameter>(
&_supportInfoApiPolicy,
std::unordered_set<std::string>{"disabled", "jwt", "hardened", "public"}))
.setIntroducedIn(30900);

options->addSection("http", "HTTP server features");

Expand Down Expand Up @@ -323,6 +332,10 @@ bool GeneralServerFeature::permanentRootRedirect() const {
std::string GeneralServerFeature::redirectRootTo() const {
return _redirectRootTo;
}

std::string const& GeneralServerFeature::supportInfoApiPolicy() const noexcept {
return _supportInfoApiPolicy;
}

rest::RestHandlerFactory& GeneralServerFeature::handlerFactory() {
return *_handlerFactory;
Expand Down Expand Up @@ -554,6 +567,11 @@ void GeneralServerFeature::defineHandlers() {

_handlerFactory->addHandler("/_admin/status",
RestHandlerCreator<RestStatusHandler>::createNoData);

if (_supportInfoApiPolicy != "disabled") {
_handlerFactory->addHandler("/_admin/support-info",
RestHandlerCreator<RestSupportInfoHandler>::createNoData);
}

_handlerFactory->addHandler("/_admin/system-report",
RestHandlerCreator<RestSystemReportHandler>::createNoData);
Expand Down
2 changes: 2 additions & 0 deletions arangod/GeneralServer/GeneralServerFeature.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ class GeneralServerFeature final : public application_features::ApplicationFeatu
Result reloadTLS();
bool permanentRootRedirect() const;
std::string redirectRootTo() const;
std::string const& supportInfoApiPolicy() const noexcept;

rest::RestHandlerFactory& handlerFactory();
rest::AsyncJobManager& jobManager();
Expand All @@ -69,6 +70,7 @@ class GeneralServerFeature final : public application_features::ApplicationFeatu
std::vector<std::string> _trustedProxies;
std::vector<std::string> _accessControlAllowOrigins;
std::string _redirectRootTo;
std::string _supportInfoApiPolicy;
std::unique_ptr<rest::RestHandlerFactory> _handlerFactory;
std::unique_ptr<rest::AsyncJobManager> _jobManager;
std::vector<std::unique_ptr<rest::GeneralServer>> _servers;
Expand Down
2 changes: 1 addition & 1 deletion arangod/RestHandler/RestAdminClusterHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1521,7 +1521,7 @@ RestStatus RestAdminClusterHandler::handleHealth() {
AsyncAgencyComm::RequestType::READ, VPackBuffer<uint8_t>())
.thenValue([self](AsyncAgencyCommResult&& result) {
// this lambda has to capture self since collect returns early on an
// exception and the RestHandle might be freed too early otherwise
// exception and the RestHandler might be freed too early otherwise

if (result.fail() || result.statusCode() != fuerte::StatusOK) {
THROW_ARANGO_EXCEPTION(result.asResult());
Expand Down
2 changes: 1 addition & 1 deletion arangod/RestHandler/RestAdminLogHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@
#include "Cluster/ServerState.h"
#include "GeneralServer/AuthenticationFeature.h"
#include "GeneralServer/ServerSecurityFeature.h"
#include "Logger/LogBufferFeature.h"
#include "Logger/Logger.h"
#include "Logger/LoggerFeature.h"
#include "Network/Methods.h"
#include "Network/NetworkFeature.h"
#include "Network/Utils.h"
#include "RestServer/LogBufferFeature.h"
#include "Utils/ExecContext.h"

using namespace arangodb;
Expand Down
Loading
0