Run a Validator
Monitoring & Telemetry

Monitoring & Telemetry

GenLayer validators expose comprehensive metrics that are ready for consumption by Prometheus and other monitoring tools. This allows you to monitor your validator's performance, health, and resource usage.

Accessing Metrics

The metrics endpoint is exposed on the operations port (default: 9153) configured in your config.yaml:

node:
  ops:
    port: 9153 # Metrics port
    endpoints:
      metrics: true # Enable metrics endpoint

Once your node is running, you can access the metrics at:

http://localhost:9153/metrics

Available Metrics

The validator exposes various metric collectors that can be individually configured:

  • Node Metrics: Core validator performance metrics including block processing, transaction handling, and consensus participation
  • GenVM Metrics: Virtual machine performance metrics, including execution times and resource usage
  • WebDriver Metrics: Metrics related to web access and external data fetching

Configuring Metrics Collection

You can customize metrics collection in your config.yaml:

metrics:
  interval: "15s"  # Default collection interval
  collectors:
    node:
      enabled: true
      interval: "30s"  # Override interval for specific collector
    genvm:
      enabled: true
      interval: "20s"
    webdriver:
      enabled: true
      interval: "60s"

Example Metrics Query

To check if metrics are working correctly:

# Get all available metrics
curl http://localhost:9153/metrics
 
# Check specific metric (example)
curl -s http://localhost:9153/metrics | grep genlayer_node_

The metrics endpoint also provides /health and /balance endpoints on the same port for additional monitoring capabilities.

Monitoring Best Practices

  1. Set up alerts for critical metrics like node synchronization status and missed blocks
  2. Monitor resource usage to ensure your validator has sufficient CPU, memory, and disk space
  3. Track GenVM performance to optimize LLM provider selection and configuration
  4. Use visualization tools like Grafana to create dashboards for easy monitoring
📊

For production validators, we recommend setting up a complete monitoring stack with Prometheus and Grafana. This enables real-time visibility into your validator's performance and helps identify issues before they impact your validator's operation.

Logs and Metrics Forwarding

You can forward your logs and metrics to external systems for centralized monitoring and alerting by using the service alloy provided in the docker-compose.yaml from the extracted tarball.

Centralized Push to GenLayer Foundation Grafana Cloud (using Alloy)

To contribute your node's metrics and logs to the centralized GenLayer Foundation Grafana Cloud dashboard (improving aggregate network visibility, alerts, and community monitoring), use the built-in Alloy service.

Why contribute?

  • Helps the Foundation and community track overall testnet health (validator participation, latency, resource usage).
  • May positively influence testnet points/rewards (visible healthy nodes are prioritized).
  • Setup takes 10–15 minutes once credentials are provided.

Prerequisites

  • Metrics enabled in config.yaml (endpoints.metrics: true — default in recent versions).
  • Ops port 9153 exposed in docker-compose (ports: - "9153:9153").
  • Credentials from the Foundation team (ask in #testnet-asimov):
    • CENTRAL_MONITORING_URL — Prometheus remote write URL
    • CENTRAL_LOKI_URL — Loki push URL
    • CENTRAL_MONITORING_USERNAME / CENTRAL_MONITORING_PASSWORD — Metrics (Prometheus) credentials
    • CENTRAL_LOKI_USERNAME / CENTRAL_LOKI_PASSWORD — Logs (Loki) credentials

Steps

  1. Create or update .env (next to your docker-compose.yaml):
# Central monitoring server endpoints for GenLayer Foundation
CENTRAL_MONITORING_URL=https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push
CENTRAL_LOKI_URL=https://logs-prod-042.grafana.net/loki/api/v1/push

# Authentication for central monitoring
# Metrics (Prometheus) credentials
CENTRAL_MONITORING_USERNAME=your-metrics-username
CENTRAL_MONITORING_PASSWORD=your-metrics-password
# Logs (Loki) credentials
CENTRAL_LOKI_USERNAME=your-logs-username
CENTRAL_LOKI_PASSWORD=your-logs-password

# Node identification
NODE_ID=validator-001
VALIDATOR_NAME=MyValidator

# Usually defaults are fine
NODE_METRICS_ENDPOINT=host.docker.internal:9153
LOG_FILE_PATTERN=/var/log/genlayer/node*.log
METRICS_SCRAPE_INTERVAL=15s
  1. Add or verify the Alloy service in docker-compose.yaml (copy if missing):
# Grafana Alloy for both logs and metrics forwarding
# Supports both single node and multi-node configurations
#
# Single Node Mode:
#   Set NODE_ID, VALIDATOR_NAME, NODE_METRICS_ENDPOINT in .env
#   docker compose --profile monitoring up -d
#
# Multi-Node Mode:
#   Set SCRAPE_TARGETS_JSON in .env
#   docker compose --profile monitoring up -d
alloy:
  image: grafana/alloy:v1.12.0
  container_name: genlayer-node-alloy
  command:
    - run
    - /etc/alloy/config.river
    - --server.http.listen-addr=0.0.0.0:12345
    - --storage.path=/var/lib/alloy/data
  volumes:
    - ./alloy-config.river:/etc/alloy/config.river:ro
    - ${NODE_LOGS_PATH:-./data/node/logs}:/var/log/genlayer:ro
    - alloy_data:/var/lib/alloy
  environment:
    # Central monitoring endpoints
    - CENTRAL_LOKI_URL=${CENTRAL_LOKI_URL:-https://logs-prod-042.grafana.net/loki/api/v1/push}
    - CENTRAL_MONITORING_URL=${CENTRAL_MONITORING_URL:-https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push}
 
    # Metrics (Prometheus) authentication
    - CENTRAL_MONITORING_USERNAME=${CENTRAL_MONITORING_USERNAME:-telemetric}
    - CENTRAL_MONITORING_PASSWORD=${CENTRAL_MONITORING_PASSWORD:-12345678}
 
    # Logs (Loki) authentication
    - CENTRAL_LOKI_USERNAME=${CENTRAL_LOKI_USERNAME:-telemetric}
    - CENTRAL_LOKI_PASSWORD=${CENTRAL_LOKI_PASSWORD:-12345678}
 
    # Single node configuration
    - NODE_ID=${NODE_ID:-local}
    - VALIDATOR_NAME=${VALIDATOR_NAME:-default}
    - NODE_METRICS_ENDPOINT=${NODE_METRICS_ENDPOINT:-host.docker.internal:9153}
 
    # Multi-node configuration
    # When set, overrides single node config above
    - SCRAPE_TARGETS_JSON=${SCRAPE_TARGETS_JSON:-}
 
    # Scraping configuration
    - METRICS_SCRAPE_INTERVAL=${METRICS_SCRAPE_INTERVAL:-15s}
    - METRICS_SCRAPE_TIMEOUT=${METRICS_SCRAPE_TIMEOUT:-10s}
    - ALLOY_SELF_MONITORING_INTERVAL=${ALLOY_SELF_MONITORING_INTERVAL:-60s}
 
    # Log collection configuration
    - LOG_FILE_PATTERN=${LOG_FILE_PATTERN:-/var/log/genlayer/node*.log}
 
    # Log batching configuration
    - LOKI_BATCH_SIZE=${LOKI_BATCH_SIZE:-1MiB}
    - LOKI_BATCH_WAIT=${LOKI_BATCH_WAIT:-1s}
  ports:
    - "12345:12345"  # Alloy UI for debugging
  restart: unless-stopped
  profiles:
    - monitoring
  extra_hosts:
    - "host.docker.internal:host-gateway"
 
volumes:
  alloy_data:
  1. Create or update ./alloy-config.river (use the provided version — it handles logs and metrics forwarding):
// Grafana Alloy Configuration for GenLayer Node Telemetry
// Handles both log collection and metrics forwarding

// ==========================================
// Log Collection and Forwarding
// ==========================================

// Discovery component to find log files using local.file_match
// Supports different log file patterns:
// - Single node: "/var/log/genlayer/node.log"
// - Multi-node: "/var/log/genlayer/*/logs/node.log" (each node in subdirectory)
// - Custom pattern via LOG_FILE_PATTERN env var
local.file_match "genlayer_logs" {
  path_targets = [{
    __path__ = coalesce(sys.env("LOG_FILE_PATTERN"), "/var/log/genlayer/node*.log"),
  }]
}

// Relabel to add metadata labels to log entries
discovery.relabel "add_labels" {
  targets = local.file_match.genlayer_logs.targets

  // Add instance label from environment variable
  rule {
    target_label = "instance"
    replacement  = sys.env("NODE_ID")
  }

  // Add validator_name label from environment variable
  rule {
    target_label = "validator_name"
    replacement  = sys.env("VALIDATOR_NAME")
  }

  // Add component label
  rule {
    target_label = "component"
    replacement  = "alloy"
  }

  // Add job label
  rule {
    target_label = "job"
    replacement  = "genlayer-node"
  }
}

// Source component to read log files
loki.source.file "genlayer" {
  targets    = discovery.relabel.add_labels.output
  forward_to = [loki.write.central.receiver]

  // Tail from end to avoid ingesting entire log history on startup
  tail_from_end = true
}

// Write logs to central Loki instance
loki.write "central" {
  endpoint {
    url = sys.env("CENTRAL_LOKI_URL")

    // HTTP Basic Authentication
    basic_auth {
      username = sys.env("CENTRAL_LOKI_USERNAME")
      password = sys.env("CENTRAL_LOKI_PASSWORD")
    }

    // Configurable batch settings for efficient log sending
    batch_size = coalesce(sys.env("LOKI_BATCH_SIZE"), "1MiB")
    batch_wait = coalesce(sys.env("LOKI_BATCH_WAIT"), "1s")
  }
}

// ==========================================
// Prometheus Metrics Collection and Forwarding
// ==========================================

// Scrape metrics from GenLayer node(s)
// Supports both single node and multi-node configurations
//
// Single Node Mode:
//   Set NODE_METRICS_ENDPOINT, NODE_ID, VALIDATOR_NAME
//
// Multi-Node Mode:
//   Set SCRAPE_TARGETS_JSON with JSON array of target objects
//   Example: [{"__address__":"host.docker.internal:9250","instance":"0x...","validator_name":"node-1"}]
prometheus.scrape "genlayer_node" {
  // Dynamic targets based on environment variable
  // If SCRAPE_TARGETS_JSON is set, use it (multi-node mode)
  // Otherwise, build single target from individual env vars (single node mode)
  targets = encoding.from_json(coalesce(sys.env("SCRAPE_TARGETS_JSON"), string.format("[{\"__address__\":\"%s\",\"instance\":\"%s\",\"validator_name\":\"%s\"}]", coalesce(sys.env("NODE_METRICS_ENDPOINT"), "host.docker.internal:9153"), coalesce(sys.env("NODE_ID"), "local"), coalesce(sys.env("VALIDATOR_NAME"), "default"))))

  forward_to = [prometheus.relabel.metrics.receiver]

  // Configurable scrape intervals
  scrape_interval = coalesce(sys.env("METRICS_SCRAPE_INTERVAL"), "15s")
  scrape_timeout  = coalesce(sys.env("METRICS_SCRAPE_TIMEOUT"), "10s")
}

// Relabel metrics to filter before forwarding
prometheus.relabel "metrics" {
  forward_to = [prometheus.remote_write.central.receiver]

  // Option 1: Forward all metrics (default)
  // Currently forwarding all metrics from the node.

  // Option 2: Only keep genlayer_node_* metrics to reduce bandwidth (recommended)
  // To enable filtering and reduce bandwidth, uncomment the following rule:
  /*
  rule {
    source_labels = ["__name__"]
    regex        = "genlayer_node_.*"
    action       = "keep"
  }
  */
}

// Remote write configuration for sending metrics to central Prometheus
prometheus.remote_write "central" {
  endpoint {
    url = sys.env("CENTRAL_MONITORING_URL")

    // HTTP Basic Authentication
    basic_auth {
      username = sys.env("CENTRAL_MONITORING_USERNAME")
      password = sys.env("CENTRAL_MONITORING_PASSWORD")
    }

    // Queue configuration for reliability
    queue_config {
      capacity          = 10000
      max_shards        = 5
      max_samples_per_send = 500
      batch_send_deadline = "15s"
    }
  }
}

// ==========================================
// Alloy Self-Monitoring
// ==========================================

// Alloy internal exporter for health monitoring
prometheus.exporter.self "alloy" {}

// Expose Alloy's own metrics on the HTTP server
prometheus.scrape "alloy" {
  targets    = prometheus.exporter.self.alloy.targets
  forward_to = []  // Not forwarding Alloy metrics to reduce noise

  // Configurable scrape interval for Alloy's internal health monitoring
  scrape_interval = coalesce(sys.env("ALLOY_SELF_MONITORING_INTERVAL"), "60s")
}
  1. Start Alloy:
docker compose --profile monitoring up -d
  1. Verify it works:
docker logs genlayer-node-alloy | grep "sent batch"
docker logs genlayer-node-alloy | grep "remote_write"

Look for messages indicating successful batch sending (no error codes like 401, 403, 500).

  • In Foundation Grafana Cloud: search for metrics with labels
    instance="${NODE_ID}" or validator_name="${VALIDATOR_NAME}"
    (example: genlayer_node_uptime_seconds{instance="0xYourID"}).

Troubleshooting

  • No local metrics:
curl http://localhost:9153/metrics

— it should return Prometheus-formatted data. Authentication errors (401/403): Double-check CENTRAL_MONITORING_USERNAME, CENTRAL_MONITORING_PASSWORD, CENTRAL_LOKI_USERNAME, and CENTRAL_LOKI_PASSWORD in .env. No data pushed: Ensure URLs in .env have no trailing slash. Help: Share Alloy logs

docker logs genlayer-node-alloy