Metrics
Agones controller exposes metrics via OpenCensus. OpenCensus is a single distribution of libraries that collect metrics and distributed traces from your services, we only use it for metrics but it will allow us to support multiple exporters in the future.
We choose to start with Prometheus as this is the most popular with Kubernetes but it is also compatible with Stackdriver. If you need another exporter, check the list of supported exporters. It should be pretty straightforward to register a new one. (GitHub PRs are more than welcome.)
We plan to support multiple exporters in the future via environment variables and helm flags.
Backend integrations
Prometheus
If you are running a Prometheus instance you just need to ensure that metrics and kubernetes service discovery are enabled. (helm chart values agones.metrics.prometheusEnabled
and agones.metrics.prometheusServiceDiscovery
). This will automatically add annotations required by Prometheus to discover Agones metrics and start collecting them. (see example)
Prometheus Operator
If you have Prometheus operator installed in your cluster, make sure to add a ServiceMonitor
to discover Agones metrics as shown below:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: agones
labels:
app: agones
spec:
selector:
matchLabels:
agones.dev/role: controller
endpoints:
- port: web
Finally include that ServiceMonitor
in your Prometheus instance CRD, this is usually done by adding a label to the ServiceMonitor
above that is matched by the prometheus instance of your choice.
Stackdriver
We support the OpenCensus Stackdriver exporter. In order to use it you should enable Stackdriver Monitoring API in Google Cloud Console. Follow the Stackdriver Installation steps to see your metrics on Stackdriver Monitoring website.
Metrics available
Name | Description | Type |
---|---|---|
agones_gameservers_count | The number of gameservers per fleet and status | gauge |
agones_gameserver_allocations_duration_seconds | The distribution of gameserver allocation requests latencies | histogram |
agones_gameservers_total | The total of gameservers per fleet and status | counter |
agones_fleets_replicas_count | The number of replicas per fleet (total, desired, ready, allocated) | gauge |
agones_fleet_autoscalers_able_to_scale | The fleet autoscaler can access the fleet to scale | gauge |
agones_fleet_autoscalers_buffer_limits | The limits of buffer based fleet autoscalers (min, max) | gauge |
agones_fleet_autoscalers_buffer_size | The buffer size of fleet autoscalers (count or percentage) | gauge |
agones_fleet_autoscalers_current_replicas_count | The current replicas count as seen by autoscalers | gauge |
agones_fleet_autoscalers_desired_replicas_count | The desired replicas count as seen by autoscalers | gauge |
agones_fleet_autoscalers_limited | The fleet autoscaler is capped (1) | gauge |
agones_gameservers_node_count | The distribution of gameservers per node | histogram |
agones_nodes_count | The count of nodes empty and with gameservers | gauge |
agones_k8s_client_http_request_total | The total of HTTP requests to the Kubernetes API by status code | counter |
agones_k8s_client_http_request_duration_seconds | The distribution of HTTP requests latencies to the Kubernetes API by status code | histogram |
agones_k8s_client_cache_list_total | The total number of list operations for client-go caches | counter |
agones_k8s_client_cache_list_duration_seconds | Duration of a Kubernetes list API call in seconds | histogram |
agones_k8s_client_cache_list_items | Count of items in a list from the Kubernetes API | histogram |
agones_k8s_client_cache_watches_total | The total number of watch operations for client-go caches | counter |
agones_k8s_client_cache_last_resource_version | Last resource version from the Kubernetes API | gauge |
agones_k8s_client_workqueue_depth | Current depth of the work queue | gauge |
agones_k8s_client_workqueue_latency_seconds | How long an item stays in the work queue | histogram |
agones_k8s_client_workqueue_items_total | Total number of items added to the work queue | counter |
agones_k8s_client_workqueue_work_duration_seconds | How long processing an item from the work queue takes | histogram |
agones_k8s_client_workqueue_retries_total | Total number of items retried to the work queue | counter |
agones_k8s_client_workqueue_longest_running_processor | How long the longest running workqueue processor has been running in microseconds | gauge |
agones_k8s_client_workqueue_unfinished_work_seconds | How long unfinished work has been sitting in the workqueue in seconds | gauge |
Dashboard
Grafana Dashboards
We provide a set of useful Grafana dashboards to monitor Agones workload, they are located under the grafana folder :
Agones Autoscalers allows you to monitor your current autoscalers replicas request as well as fleet replicas allocation and readyness statuses. You can only select one autoscaler at the time using the provided dropdown.
Agones GameServers displays your current game servers workload status (allocations, game servers statuses, fleets replicas) with optional fleet name filtering.
Agones GameServer Allocations displays Agones gameservers allocations rates and counts per fleet.
Agones Allocator Resource displays Agones Allocators CPU, memory usage and also some useful Golang runtime metrics.
Agones Status displays Agones controller health status.
Agones Controller Resource Usage displays Agones Controller CPU and memory usage and also some Golang runtime metrics.
Agones Controller go-client requests displays Agones Controller Kubernetes API consumption.
Agones Controller go-client caches displays Agones Controller Kubernetes Watches/Lists operations used.
Agones Controller go-client workqueues displays Agones Controller workqueue processing time and rates.
Agones Controller API Server requests displays your current API server request rate, errors rate and request latencies with optional CustomResourceDefinition filtering by Types: fleets, gameserversets, gameservers, gameserverallocations.
Dashboard screenshots :
Note
You can import our dashboards by copying the json content from each config map into your own instance of Grafana (+ > Create > Import > Or paste json) or follow the installation guide.Installation
When operating a live multiplayer game you will need to observe performances, resource usage and availability to learn more about your system. This guide will explain how you can setup Prometheus and Grafana into your own Kubernetes cluster to monitor your Agones workload.
Before attemping this guide you should make sure you have kubectl and helm installed and configured to reach your kubernetes cluster.
Prometheus installation
Prometheus is an open source monitoring solution, we will use it to store Agones controller metrics and query back the data.
Let’s install Prometheus using the helm stable repository.
helm upgrade --install --wait prom stable/prometheus --namespace metrics \
--set server.global.scrape_interval=30s \
--set server.persistentVolume.enabled=true \
--set server.persistentVolume.size=64Gi \
-f ./build/prometheus.yaml
Note
You can also run our Makefile targetmake setup-prometheus
or make kind-setup-prometheus
and make minikube-setup-prometheus
for
Kind
and
Minikube
.
For resiliency it is recommended to run Prometheus on a dedicated node which is separate from nodes where Game Servers
are scheduled. If you use the above command, with our
prometheus.yaml
to set up Prometheus, it will schedule Prometheus pods on nodes
tainted with agones.dev/agones-metrics=true:NoExecute
and labeled with agones.dev/agones-metrics=true
if available.
As an example, to set up a dedicated node pool for Prometheus on GKE, run the following command before installing Prometheus. Alternatively you can taint and label nodes manually.
gcloud container node-pools create agones-metrics --cluster=... --zone=... \
--node-taints agones.dev/agones-metrics=true:NoExecute \
--node-labels agones.dev/agones-metrics=true \
--num-nodes=1
By default we will disable the push gateway (we don’t need it for Agones) and other exporters.
The helm chart support nodeSelector, affinity and toleration, you can use them to schedule prometheus deployments on an isolated node(s) to have an homogeneous game servers workload.
This will install a Prometheus Server in your current cluster with Persistent Volume Claim (Deactivated for Minikube and Kind) for storing and querying time series, it will automatically start collecting metrics from Agones Controller.
Finally to access Prometheus metrics, rules and alerts explorer use
kubectl port-forward deployments/prom-prometheus-server 9090 -n metrics
Note
Again you can use our Makefilemake prometheus-portforward
.
(For
Kind
and
Minikube
use their specific targets make kind-prometheus-portforward
and make minikube-prometheus-portforward
)
Now you can access the prometheus dashboard http://localhost:9090.
On the landing page you can start exploring metrics by creating queries. You can also verify what targets Prometheus currently monitors (Header Status > Targets), you should see Agones controller pod in the kubernetes-pods
section.
Note
Metrics will be first registered when you will start using Agones.Now let’s install some Grafana dashboards.
Grafana installation
Grafana is a open source time series analytics platform which supports Prometheus data source. We can also easily import pre-built dashboards.
First we will install Agones dashboard as config maps in our cluster.
kubectl apply -f ./build/grafana/
Now we can install grafana chart from stable repository. (Replace <your-admin-password>
with the admin password of your choice)
helm install --wait --name grafana stable/grafana --version=5.0.13 --namespace metrics \
--set adminPassword=<your-admin-password> -f ./build/grafana.yaml
This will install Grafana with our prepopulated dashboards and prometheus datasource previously installed
Note
You can also use our Makefile targets (setup-grafana
, minikube-setup-grafana
and kind-setup-grafana
).
Finally to access dashboards run
kubectl port-forward deployments/grafana 3000 -n metrics
Open a web browser to http://localhost:3000, you should see Agones dashboards after login as admin.
Note
You can also use ourMakefile
targets make grafana-portforward
, make kind-grafana-portforward
and make minikube-grafana-portforward
.
Stackdriver installation
In order to use Stackdriver monitoring you should enable Stackdriver Monitoring API on Google Cloud Console. You need to grant all the necessary permissions to the users (see Access Control Guide). Stackdriver exporter uses a strategy called Application Default Credentials (ADC) to find your application’s credentials. Details could be found here Setting Up Authentication for Server to Server Production Applications.
Note that Stackdriver monitoring is enabled by default on GKE clusters, however you can follow this guide if it was disabled on your GKE cluster.
Default metrics exporter is Prometheus. If you are using the Helm installation, you can install or upgrade Agones to use Stackdriver, using the following chart parameters:
helm upgrade --install --wait --set agones.metrics.stackdriverEnabled=true --set agones.metrics.prometheusEnabled=false --set agones.metrics.prometheusServiceDiscovery=false my-release-name agones/agones --namespace=agones-system
With this configuration only Stackdriver exporter would be used instead of Prometheus exporter.
Create a Fleet or a Gameserver in order to check that connection with stackdriver API is configured properly and so that you will be able to see the metrics data.
Visit Stackdriver monitoring website, select your project, or choose Create a new Workspace
and select GCP project where your cluster resides. In Stackdriver metrics explorer you should be able to find new metrics with prefix agones/
after a couple of minutes. Choose the metrics you are interested in and add to a single or separate graphs. Select Kubernetes Container
resource type for each of them. You can create multiple graphs, save them into your dashboard and use various aggregation parameters and reducers for each graph.
Example of the dashboard appearance is provided below:
Currently there exists only manual way of configuring Stackdriver Dashboard. So it is up to you to set an Alignment Period (minimal is 1 minute), GroupBy, Filter parameters and other graph settings.
Troubleshooting
If you can’t see Agones metrics you should have a look at the controller logs for connection errors. Also ensure that your cluster has the necessary credentials to interact with Stackdriver Monitoring. You can configure stackdriverProjectID
manually, if the automatic discovery is not working.
Permissions problem example from controller logs:
Failed to export to Stackdriver: rpc error: code = PermissionDenied desc = Permission monitoring.metricDescriptors.create denied (or the resource may not exist).
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.