Monitoring & Telemetry
GenLayer validators expose comprehensive metrics that are ready for consumption by Prometheus and other monitoring tools. This allows you to monitor your validator's performance, health, and resource usage.
Accessing Metrics
The metrics endpoint is exposed on the operations port (default: 9153) configured in your config.yaml:
node:
ops:
port: 9153 # Metrics port
endpoints:
metrics: true # Enable metrics endpointOnce your node is running, you can access the metrics at:
http://localhost:9153/metricsAvailable Metrics
The validator exposes various metric collectors that can be individually configured:
- Node Metrics: Core validator performance metrics including block processing, transaction handling, and consensus participation
- GenVM Metrics: Virtual machine performance metrics, including execution times and resource usage
- WebDriver Metrics: Metrics related to web access and external data fetching
Configuring Metrics Collection
You can customize metrics collection in your config.yaml:
metrics:
interval: "15s" # Default collection interval
collectors:
node:
enabled: true
interval: "30s" # Override interval for specific collector
genvm:
enabled: true
interval: "20s"
webdriver:
enabled: true
interval: "60s"Example Metrics Query
To check if metrics are working correctly:
# Get all available metrics
curl http://localhost:9153/metrics
# Check specific metric (example)
curl -s http://localhost:9153/metrics | grep genlayer_node_The metrics endpoint also provides /health and /balance endpoints on the same port for additional monitoring capabilities.
Monitoring Best Practices
- Set up alerts for critical metrics like node synchronization status and missed blocks
- Monitor resource usage to ensure your validator has sufficient CPU, memory, and disk space
- Track GenVM performance to optimize LLM provider selection and configuration
- Use visualization tools like Grafana to create dashboards for easy monitoring
For production validators, we recommend setting up a complete monitoring stack with Prometheus and Grafana. This enables real-time visibility into your validator's performance and helps identify issues before they impact your validator's operation.
Logs and Metrics Forwarding
You can forward your logs and metrics to external systems for centralized monitoring and alerting by using the service alloy provided in the docker-compose.yaml from the extracted tarball.
Centralized Push to GenLayer Foundation Grafana Cloud (using Alloy)
To contribute your node's metrics and logs to the centralized GenLayer Foundation Grafana Cloud dashboard (improving aggregate network visibility, alerts, and community monitoring), use the built-in Alloy service.
Why contribute?
- Helps the Foundation and community track overall testnet health (validator participation, latency, resource usage).
- May positively influence testnet points/rewards (visible healthy nodes are prioritized).
- Setup takes 10–15 minutes once credentials are provided.
Prerequisites
- Metrics enabled in
config.yaml(endpoints.metrics: true— default in recent versions). - Ops port 9153 exposed in docker-compose (
ports: - "9153:9153"). - Credentials from the Foundation team (ask in #testnet-asimov):
CENTRAL_MONITORING_URL— Prometheus remote write URLCENTRAL_LOKI_URL— Loki push URLCENTRAL_MONITORING_USERNAME/CENTRAL_MONITORING_PASSWORD— Metrics (Prometheus) credentialsCENTRAL_LOKI_USERNAME/CENTRAL_LOKI_PASSWORD— Logs (Loki) credentials
Steps
- Create or update .env (next to your docker-compose.yaml):
# Central monitoring server endpoints for GenLayer Foundation
CENTRAL_MONITORING_URL=https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push
CENTRAL_LOKI_URL=https://logs-prod-042.grafana.net/loki/api/v1/push
# Authentication for central monitoring
# Metrics (Prometheus) credentials
CENTRAL_MONITORING_USERNAME=your-metrics-username
CENTRAL_MONITORING_PASSWORD=your-metrics-password
# Logs (Loki) credentials
CENTRAL_LOKI_USERNAME=your-logs-username
CENTRAL_LOKI_PASSWORD=your-logs-password
# Node identification
NODE_ID=validator-001
VALIDATOR_NAME=MyValidator
# Usually defaults are fine
NODE_METRICS_ENDPOINT=host.docker.internal:9153
LOG_FILE_PATTERN=/var/log/genlayer/node*.log
METRICS_SCRAPE_INTERVAL=15s- Add or verify the Alloy service in docker-compose.yaml (copy if missing):
# Grafana Alloy for both logs and metrics forwarding
# Supports both single node and multi-node configurations
#
# Single Node Mode:
# Set NODE_ID, VALIDATOR_NAME, NODE_METRICS_ENDPOINT in .env
# docker compose --profile monitoring up -d
#
# Multi-Node Mode:
# Set SCRAPE_TARGETS_JSON in .env
# docker compose --profile monitoring up -d
alloy:
image: grafana/alloy:v1.12.0
container_name: genlayer-node-alloy
command:
- run
- /etc/alloy/config.river
- --server.http.listen-addr=0.0.0.0:12345
- --storage.path=/var/lib/alloy/data
volumes:
- ./alloy-config.river:/etc/alloy/config.river:ro
- ${NODE_LOGS_PATH:-./data/node/logs}:/var/log/genlayer:ro
- alloy_data:/var/lib/alloy
environment:
# Central monitoring endpoints
- CENTRAL_LOKI_URL=${CENTRAL_LOKI_URL:-https://logs-prod-042.grafana.net/loki/api/v1/push}
- CENTRAL_MONITORING_URL=${CENTRAL_MONITORING_URL:-https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push}
# Metrics (Prometheus) authentication
- CENTRAL_MONITORING_USERNAME=${CENTRAL_MONITORING_USERNAME:-telemetric}
- CENTRAL_MONITORING_PASSWORD=${CENTRAL_MONITORING_PASSWORD:-12345678}
# Logs (Loki) authentication
- CENTRAL_LOKI_USERNAME=${CENTRAL_LOKI_USERNAME:-telemetric}
- CENTRAL_LOKI_PASSWORD=${CENTRAL_LOKI_PASSWORD:-12345678}
# Single node configuration
- NODE_ID=${NODE_ID:-local}
- VALIDATOR_NAME=${VALIDATOR_NAME:-default}
- NODE_METRICS_ENDPOINT=${NODE_METRICS_ENDPOINT:-host.docker.internal:9153}
# Multi-node configuration
# When set, overrides single node config above
- SCRAPE_TARGETS_JSON=${SCRAPE_TARGETS_JSON:-}
# Scraping configuration
- METRICS_SCRAPE_INTERVAL=${METRICS_SCRAPE_INTERVAL:-15s}
- METRICS_SCRAPE_TIMEOUT=${METRICS_SCRAPE_TIMEOUT:-10s}
- ALLOY_SELF_MONITORING_INTERVAL=${ALLOY_SELF_MONITORING_INTERVAL:-60s}
# Log collection configuration
- LOG_FILE_PATTERN=${LOG_FILE_PATTERN:-/var/log/genlayer/node*.log}
# Log batching configuration
- LOKI_BATCH_SIZE=${LOKI_BATCH_SIZE:-1MiB}
- LOKI_BATCH_WAIT=${LOKI_BATCH_WAIT:-1s}
ports:
- "12345:12345" # Alloy UI for debugging
restart: unless-stopped
profiles:
- monitoring
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
alloy_data:- Create or update ./alloy-config.river (use the provided version — it handles logs and metrics forwarding):
// Grafana Alloy Configuration for GenLayer Node Telemetry
// Handles both log collection and metrics forwarding
// ==========================================
// Log Collection and Forwarding
// ==========================================
// Discovery component to find log files using local.file_match
// Supports different log file patterns:
// - Single node: "/var/log/genlayer/node.log"
// - Multi-node: "/var/log/genlayer/*/logs/node.log" (each node in subdirectory)
// - Custom pattern via LOG_FILE_PATTERN env var
local.file_match "genlayer_logs" {
path_targets = [{
__path__ = coalesce(sys.env("LOG_FILE_PATTERN"), "/var/log/genlayer/node*.log"),
}]
}
// Relabel to add metadata labels to log entries
discovery.relabel "add_labels" {
targets = local.file_match.genlayer_logs.targets
// Add instance label from environment variable
rule {
target_label = "instance"
replacement = sys.env("NODE_ID")
}
// Add validator_name label from environment variable
rule {
target_label = "validator_name"
replacement = sys.env("VALIDATOR_NAME")
}
// Add component label
rule {
target_label = "component"
replacement = "alloy"
}
// Add job label
rule {
target_label = "job"
replacement = "genlayer-node"
}
}
// Source component to read log files
loki.source.file "genlayer" {
targets = discovery.relabel.add_labels.output
forward_to = [loki.write.central.receiver]
// Tail from end to avoid ingesting entire log history on startup
tail_from_end = true
}
// Write logs to central Loki instance
loki.write "central" {
endpoint {
url = sys.env("CENTRAL_LOKI_URL")
// HTTP Basic Authentication
basic_auth {
username = sys.env("CENTRAL_LOKI_USERNAME")
password = sys.env("CENTRAL_LOKI_PASSWORD")
}
// Configurable batch settings for efficient log sending
batch_size = coalesce(sys.env("LOKI_BATCH_SIZE"), "1MiB")
batch_wait = coalesce(sys.env("LOKI_BATCH_WAIT"), "1s")
}
}
// ==========================================
// Prometheus Metrics Collection and Forwarding
// ==========================================
// Scrape metrics from GenLayer node(s)
// Supports both single node and multi-node configurations
//
// Single Node Mode:
// Set NODE_METRICS_ENDPOINT, NODE_ID, VALIDATOR_NAME
//
// Multi-Node Mode:
// Set SCRAPE_TARGETS_JSON with JSON array of target objects
// Example: [{"__address__":"host.docker.internal:9250","instance":"0x...","validator_name":"node-1"}]
prometheus.scrape "genlayer_node" {
// Dynamic targets based on environment variable
// If SCRAPE_TARGETS_JSON is set, use it (multi-node mode)
// Otherwise, build single target from individual env vars (single node mode)
targets = encoding.from_json(coalesce(sys.env("SCRAPE_TARGETS_JSON"), string.format("[{\"__address__\":\"%s\",\"instance\":\"%s\",\"validator_name\":\"%s\"}]", coalesce(sys.env("NODE_METRICS_ENDPOINT"), "host.docker.internal:9153"), coalesce(sys.env("NODE_ID"), "local"), coalesce(sys.env("VALIDATOR_NAME"), "default"))))
forward_to = [prometheus.relabel.metrics.receiver]
// Configurable scrape intervals
scrape_interval = coalesce(sys.env("METRICS_SCRAPE_INTERVAL"), "15s")
scrape_timeout = coalesce(sys.env("METRICS_SCRAPE_TIMEOUT"), "10s")
}
// Relabel metrics to filter before forwarding
prometheus.relabel "metrics" {
forward_to = [prometheus.remote_write.central.receiver]
// Option 1: Forward all metrics (default)
// Currently forwarding all metrics from the node.
// Option 2: Only keep genlayer_node_* metrics to reduce bandwidth (recommended)
// To enable filtering and reduce bandwidth, uncomment the following rule:
/*
rule {
source_labels = ["__name__"]
regex = "genlayer_node_.*"
action = "keep"
}
*/
}
// Remote write configuration for sending metrics to central Prometheus
prometheus.remote_write "central" {
endpoint {
url = sys.env("CENTRAL_MONITORING_URL")
// HTTP Basic Authentication
basic_auth {
username = sys.env("CENTRAL_MONITORING_USERNAME")
password = sys.env("CENTRAL_MONITORING_PASSWORD")
}
// Queue configuration for reliability
queue_config {
capacity = 10000
max_shards = 5
max_samples_per_send = 500
batch_send_deadline = "15s"
}
}
}
// ==========================================
// Alloy Self-Monitoring
// ==========================================
// Alloy internal exporter for health monitoring
prometheus.exporter.self "alloy" {}
// Expose Alloy's own metrics on the HTTP server
prometheus.scrape "alloy" {
targets = prometheus.exporter.self.alloy.targets
forward_to = [] // Not forwarding Alloy metrics to reduce noise
// Configurable scrape interval for Alloy's internal health monitoring
scrape_interval = coalesce(sys.env("ALLOY_SELF_MONITORING_INTERVAL"), "60s")
}- Start Alloy:
docker compose --profile monitoring up -d- Verify it works:
- Open Alloy UI: http://localhost:12345/targets (opens in a new tab) — the "genlayer_node" scrape target should show status UP.
- Check logs for successful sends:
docker logs genlayer-node-alloy | grep "sent batch"docker logs genlayer-node-alloy | grep "remote_write"Look for messages indicating successful batch sending (no error codes like 401, 403, 500).
- In Foundation Grafana Cloud: search for metrics with labels
instance="${NODE_ID}"orvalidator_name="${VALIDATOR_NAME}"
(example:genlayer_node_uptime_seconds{instance="0xYourID"}).
Troubleshooting
- No local metrics:
curl http://localhost:9153/metrics— it should return Prometheus-formatted data. Authentication errors (401/403): Double-check CENTRAL_MONITORING_USERNAME, CENTRAL_MONITORING_PASSWORD, CENTRAL_LOKI_USERNAME, and CENTRAL_LOKI_PASSWORD in .env. No data pushed: Ensure URLs in .env have no trailing slash. Help: Share Alloy logs
docker logs genlayer-node-alloy