Prometheus is an open source monitoring and alerting toolset that has grown in popularity alongside the growth of Kubernetes. Originally built at SoundCloud, Prometheus can trace its roots back to a monitoring project at Google called Borgmon.
Prometheus is the primary proponent of collecting metrics from your applications and infrastructure using a pull-based model. This means that Prometheus will periodically send a request to your targets, and the collected metrics are then stored in a time-series database. In self-managed Prometheus instances, you provide the infrastructure to collect, store, and query the metrics. In managed versions of Prometheus, a vendor provides all of the infrastructure and a Service Level Agreement (SLA) for uptime of the service.
Prometheus provides a query language called PromQL for querying your time-series data. There are many projects that leverage PromQL, including Grafana and Alertmanager, to help with analytical and operational tasks such as data visualization, creating alerts, and scaling your monitoring infrastructure.
Prometheus is used to collect metrics from your applications and infrastructure. These metrics can be used to track the health of your systems, identify potential problems, and troubleshoot issues.
Here are some of the specific benefits of using Prometheus:
Prometheus uses a pull model to collect metrics, which means that the Prometheus server polls the systems or applications that it is monitoring for metrics. This is in contrast to the push model, used by many other monitoring systems, where you modify application code to send metrics to the server periodically.
The pull model does not require the systems or applications that are being monitored to be aware of the Prometheus server, so it can be added to a monitoring system without making changes to application code. The pull model only collects metrics when they are needed, so Prometheus does not waste resources by collecting metrics that are not being used.
Additionally, Prometheus can automatically discover data sources through the following means:
Once Prometheus has discovered a resource, it will periodically scrape the resource for metrics. The metrics are then stored in Prometheus's database, where they can be queried and visualized.
PromQL is a powerful and flexible query language for the metrics collected by Prometheus that can be used to create ad-hoc graphs, tables, and alerts.
PromQL uses a functional query language that lets the user select and aggregate time-series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API.
Here are some of the things that Prometheus and PromQL can be used for:
Here are some examples of PromQL queries:
To get the average per-second CPU usage for a specific application, you could use the following query:
To get the number of requests that have been made to a specific endpoint, you could use the following query:
To get the total number of errors that have been returned from a specific endpoint, you could use the following query: