This document is for an unreleased version of Crossplane.

This document applies to the Crossplane master branch and not to the latest release v2.1.

Crossplane produces Prometheus style metrics for effective monitoring and alerting in your environment. These metrics are essential for helping to identify and resolve potential issues. This page offers explanations of all these metrics gathered from Crossplane. Understanding these metrics helps you maintain the health and performance of your resources. Please note that this document focuses on Crossplane specific metrics and doesn’t cover standard Go metrics.

To enable the export of metrics it’s necessary to configure the --set metrics.enabled=true option in the helm chart.

1metrics:
2  enabled: true

These Prometheus annotations expose the metrics:

1prometheus.io/path: /metrics
2prometheus.io/port: "8080"
3prometheus.io/scrape: "true"

Crossplane core metrics

The Crossplane pod emits these metrics.

Metric NameDescription
function_run_function_request_totalTotal number of RunFunctionRequests sent
function_run_function_response_totalTotal number of RunFunctionResponses received
function_run_function_secondsHistogram of RunFunctionResponse latency (seconds)
function_run_function_response_cache_hits_totalTotal number of RunFunctionResponse cache hits
function_run_function_response_cache_misses_totalTotal number of RunFunctionResponse cache misses
function_run_function_response_cache_errors_totalTotal number of RunFunctionResponse cache errors
function_run_function_response_cache_writes_totalTotal number of RunFunctionResponse cache writes
function_run_function_response_cache_deletes_totalTotal number of RunFunctionResponse cache deletes
function_run_function_response_cache_bytes_written_totalTotal number of RunFunctionResponse bytes written to cache
function_run_function_response_cache_bytes_deleted_totalTotal number of RunFunctionResponse bytes deleted from cache
function_run_function_response_cache_read_secondsHistogram of cache read latency (seconds)
function_run_function_response_cache_write_secondsHistogram of cache write latency (seconds)
circuit_breaker_opens_totalNumber of times the XR circuit breaker transitioned from closed to open
circuit_breaker_closes_totalNumber of times the XR circuit breaker transitioned from open to closed
circuit_breaker_events_totalNumber of XR watch events handled by the circuit breaker, labeled by outcome
engine_controllers_started_totalTotal number of controllers started
engine_controllers_stopped_totalTotal number of controllers stopped
engine_watches_started_totalTotal number of watches started
engine_watches_stopped_totalTotal number of watches stopped

Provider metrics

Crossplane providers emit these metrics. All providers built with crossplane-runtime emit the crossplane_managed_resource_* metrics.

Providers expose metrics on the metrics port (default 8080). To scrape these metrics, configure a PodMonitor or add Prometheus annotations to the provider’s DeploymentRuntimeConfig.

Metric NameDescription
crossplane_managed_resource_existsThe number of managed resources that exist
crossplane_managed_resource_readyThe number of managed resources in Ready=True state
crossplane_managed_resource_syncedThe number of managed resources in Synced=True state
crossplane_managed_resource_deletion_secondsThe time it took to delete a managed resource
crossplane_managed_resource_first_time_to_readiness_secondsThe time it took for a managed resource to become ready first time after creation
crossplane_managed_resource_first_time_to_reconcile_secondsThe time it took to detect a managed resource by the controller
crossplane_managed_resource_drift_secondsTime elapsed after the last successful reconcile when detecting an out-of-sync resource

Upjet provider metrics

These metrics are only emitted by Upjet-based providers (such as provider-upjet-aws, provider-upjet-azure, provider-upjet-gcp).

Metric NameDescription
upjet_resource_ext_api_durationMeasures in seconds how long it takes a Cloud SDK call to complete
upjet_resource_external_api_calls_totalThe number of external API calls to cloud providers, with labels describing the endpoints and resources
upjet_resource_reconcile_delay_secondsMeasures in seconds how long the reconciles for a resource delay from the configured poll periods
upjet_resource_ttrMeasures in seconds the time-to-readiness (TTR) for managed resources
upjet_resource_cli_durationMeasures in seconds how long it takes a Terraform CLI invocation to complete
upjet_resource_active_cli_invocationsThe number of active (running) Terraform CLI invocations
upjet_resource_running_processesThe number of running Terraform CLI and Terraform provider processes

Controller-runtime and Kubernetes client metrics

These metrics come from the controller-runtime framework and Kubernetes client libraries. Both Crossplane and providers emit these metrics.

Metric NameDescription
certwatcher_read_certificate_errors_totalTotal number of certificate read errors
certwatcher_read_certificate_totalTotal number of certificate reads
controller_runtime_active_workersNumber of workers (threads processing jobs from the work queue) per controller
controller_runtime_max_concurrent_reconcilesMaximum number of concurrent reconciles per controller
controller_runtime_reconcile_errors_totalTotal number of reconciliation errors per controller. Sharp or continuous rising of this metric indicates a problem.
controller_runtime_reconcile_time_secondsHistogram of time per reconciliation per controller
controller_runtime_reconcile_totalTotal number of reconciliations per controller
controller_runtime_webhook_latency_secondsHistogram of the latency of processing admission requests
controller_runtime_webhook_requests_in_flightCurrent number of admission requests served
controller_runtime_webhook_requests_totalTotal number of admission requests by HTTP status code
rest_client_requests_totalNumber of HTTP requests, partitioned by status code, method, and host
workqueue_adds_totalTotal number of adds handled by workqueue
workqueue_depthCurrent depth of workqueue
workqueue_longest_running_processor_secondsHow long the longest running processor for workqueue has been running
workqueue_queue_duration_secondsHistogram of time an item stays in workqueue before processing starts
workqueue_retries_totalTotal number of retries handled by workqueue
workqueue_unfinished_work_secondsSeconds of work in progress not yet observed by work_duration. Large values suggest stuck threads.
workqueue_work_duration_secondsHistogram of time to process an item from workqueue (from start to completion)