Go to file
Malar Kannan 49fb881f24
Add grafana dashboard configmap (#18)
* feat: Add Grafana dashboard as ConfigMap

Adds the Grafana dashboard for iperf3-monitor as a ConfigMap to the Helm chart.

The dashboard is sourced from the project's README and stored in
`charts/iperf3-monitor/grafana/iperf3-dashboard.json`.

A new template `charts/iperf3-monitor/templates/grafana-dashboard-configmap.yaml`
creates the ConfigMap, loading the dashboard JSON and labeling it with
`grafana_dashboard: "1"` to enable auto-discovery by Grafana.

* feat: Add Grafana dashboard as ConfigMap

Adds the Grafana dashboard for iperf3-monitor as a ConfigMap to the Helm chart.

The dashboard is sourced from the project's README and stored in
`charts/iperf3-monitor/grafana/iperf3-dashboard.json`.

A new template `charts/iperf3-monitor/templates/grafana-dashboard-configmap.yaml`
creates the ConfigMap, loading the dashboard JSON and labeling it with
`grafana_dashboard: "1"` to enable auto-discovery by Grafana.

* fix: Correct Helm chart label in Grafana dashboard ConfigMap

Updates the `helm.sh/chart` label in the Grafana dashboard ConfigMap
to use `{{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}`.
This resolves a Helm linting error caused by an incorrect template reference.

The previous commit added the Grafana dashboard as a ConfigMap:
feat: Add Grafana dashboard as ConfigMap

Adds the Grafana dashboard for iperf3-monitor as a ConfigMap to the Helm chart.

The dashboard is sourced from the project's README and stored in
`charts/iperf3-monitor/grafana/iperf3-dashboard.json`.

A new template `charts/iperf3-monitor/templates/grafana-dashboard-configmap.yaml`
creates the ConfigMap, loading the dashboard JSON and labeling it with
`grafana_dashboard: "1"` to enable auto-discovery by Grafana.

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2025-07-02 01:37:14 +05:30
.github/workflows feat: Implement iperf3 exporter core logic and log level configuration (#13) 2025-06-21 01:02:45 +05:30
charts/iperf3-monitor Add grafana dashboard configmap (#18) 2025-07-02 01:37:14 +05:30
docs docs: Set service account name to iperf3-monitor (#12) 2025-06-21 00:10:08 +05:30
exporter fix: Fix Dockerfile RUN command continuation (#16) 2025-06-21 02:10:19 +05:30
.gitignore Fix: Ignore packaged chart files and remove markdown files (#10) 2025-06-20 17:42:17 +05:30
LICENSE Initial commit 2025-06-18 00:07:55 +05:30
README.md Fix: Update README, license consistency, and Helm chart configurations 2025-06-19 21:13:13 +00:00
devbox.json feat: Add devbox configuration and lock files; clean up YAML files by removing trailing newlines 2025-06-20 02:28:10 +05:30
devbox.lock feat: Add devbox configuration and lock files; clean up YAML files by removing trailing newlines 2025-06-20 02:28:10 +05:30

README.md

Kubernetes-Native Network Performance Monitoring Service

This project provides a comprehensive solution for continuous network validation within a Kubernetes cluster. Leveraging industry-standard tools like iperf3, Prometheus, and Grafana, it offers proactive monitoring of network performance between nodes, helping to identify and troubleshoot latency, bandwidth, and packet loss issues before they impact applications.

Features

  • Continuous N-to-N Testing: Automatically measures network performance between all nodes in the cluster.
  • Kubernetes-Native: Deploys as standard Kubernetes workloads (DaemonSet, Deployment).
  • Dynamic Discovery: Exporter automatically discovers iperf3 server pods using the Kubernetes API.
  • Prometheus Integration: Translates iperf3 results into standard Prometheus metrics for time-series storage.
  • Grafana Visualization: Provides a rich, interactive dashboard with heatmaps and time-series graphs.
  • Helm Packaging: Packaged as a Helm chart for easy deployment and configuration management.
  • Automated CI/CD: Includes a GitHub Actions workflow for building and publishing the exporter image and Helm chart.

Architecture

The service is based on a decoupled architecture:

  1. iperf3-server DaemonSet: Deploys an iperf3 server pod on every node to act as a test endpoint. Running on the host network to measure raw node performance.
  2. iperf3-exporter Deployment: A centralized service that uses the Kubernetes API to discover server pods, orchestrates iperf3 client tests against them, parses the JSON output, and exposes performance metrics via an HTTP endpoint.
  3. Prometheus & Grafana Stack: A standard monitoring backend (like kube-prometheus-stack) that scrapes the exporter's metrics and visualizes them in a custom dashboard.

This separation of concerns ensures scalability, resilience, and aligns with Kubernetes operational principles.

Getting Started

Prerequisites

  • A running Kubernetes cluster.
  • kubectl configured to connect to your cluster.
  • Helm v3+ installed.
  • A Prometheus instance configured to scrape services (ideally using the Prometheus Operator and ServiceMonitors).
  • A Grafana instance accessible and configured with Prometheus as a data source.

Installation with Helm

  1. Add the Helm chart repository (replace with your actual repo URL once published):

    helm repo add iperf3-monitor https://malarinv.github.io/iperf3-monitor/
    
  2. Update your Helm repositories:

    helm repo update
    
  3. Install the chart:

    helm install iperf3-monitor iperf3-monitor/iperf3-monitor \
      --namespace monitoring # Or your preferred namespace \
      --create-namespace \
      --values values.yaml # Optional: Use a custom values file
    

    Note: Ensure your Prometheus instance is configured to scrape services in the namespace where you install the chart and that it recognizes ServiceMonitor resources with the label release: prometheus-operator (if using the standard kube-prometheus-stack setup).

Configuration

The Helm chart is highly configurable via the values.yaml file. You can override default settings by creating your own values.yaml and passing it during installation (--values my-values.yaml).

Refer to the comments in the default values.yaml for a detailed explanation of each parameter:

# Default values for iperf3-monitor.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# -- Override the name of the chart.
nameOverride: ""

# -- Override the fully qualified app name.
fullnameOverride: ""

exporter:
  # -- Configuration for the exporter container image.
  image:
    # -- The container image repository for the exporter.
    repository: ghcr.io/malarinv/iperf3-monitor
    # -- The container image tag for the exporter. If not set, the chart's appVersion is used.
    tag: ""
    # -- The image pull policy for the exporter container.
    pullPolicy: IfNotPresent

  # -- Number of exporter pod replicas. Typically 1 is sufficient.
  replicaCount: 1

  # -- Interval in seconds between complete test cycles (i.e., testing all server nodes).
  testInterval: 300

  # -- Timeout in seconds for a single iperf3 test run.
  testTimeout: 10

  # -- Protocol to use for testing (tcp or udp).
  testProtocol: tcp

  # -- CPU and memory resource requests and limits for the exporter pod.
  # @default -- A small default is provided if commented out.
  resources: {}
    # requests:
    #   cpu: "100m"
    #   memory: "128Mi"
    # limits:
    #   cpu: "500m"
    #   memory: "256Mi"

server:
  # -- Configuration for the iperf3 server container image (DaemonSet).
  image:
    # -- The container image repository for the iperf3 server.
    repository: networkstatic/iperf3
    # -- The container image tag for the iperf3 server.
    tag: latest

  # -- CPU and memory resource requests and limits for the iperf3 server pods (DaemonSet).
  # These should be very low as the server is mostly idle.
  # @default -- A small default is provided if commented out.
  resources: {}
    # requests:
    #   cpu: "50m"
    #   memory: "64Mi"
    # limits:
    #   cpu: "100m"
    #   memory: "128Mi"

  # -- Node selector for scheduling iperf3 server pods.
  # Use this to restrict the DaemonSet to a subset of nodes.
  # @default -- {} (schedule on all nodes)
  nodeSelector: {}

  # -- Tolerations for scheduling iperf3 server pods on tainted nodes (e.g., control-plane nodes).
  # This is often necessary to include master nodes in the test mesh.
  # @default -- Tolerates control-plane and master taints.
  tolerations:
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "node-role.kubernetes.io/master"
      operator: "Exists"
      effect: "NoSchedule"

rbac:
  # -- If true, create ServiceAccount, ClusterRole, and ClusterRoleBinding for the exporter.
  # Set to false if you manage RBAC externally.
  create: true

serviceAccount:
  # -- The name of the ServiceAccount to use for the exporter pod.
  # Only used if rbac.create is false. If not set, it defaults to the chart's fullname.
  name: ""

serviceMonitor:
  # -- If true, create a ServiceMonitor resource for integration with Prometheus Operator.
  # Requires a running Prometheus Operator in the cluster.
  enabled: true

  # -- Scrape interval for the ServiceMonitor. How often Prometheus scrapes the exporter metrics.
  interval: 60s

  # -- Scrape timeout for the ServiceMonitor. How long Prometheus waits for metrics response.
  scrapeTimeout: 30s

# -- Configuration for the exporter Service.
service:
  # -- Service type. ClusterIP is typically sufficient.
  type: ClusterIP
  # -- Port on which the exporter service is exposed.
  port: 9876
  # -- Target port on the exporter pod.
  targetPort: 9876

# -- Optional configuration for a network policy to allow traffic to the iperf3 server DaemonSet.
# This is often necessary if you are using a network policy controller.
networkPolicy:
  # -- If true, create a NetworkPolicy resource.
  enabled: false
  # -- Specify source selectors if needed (e.g., pods in a specific namespace).
  from: []
  # -- Specify namespace selectors if needed.
  namespaceSelector: {}
  # -- Specify pod selectors if needed.
  podSelector: {}

Grafana Dashboard

A custom Grafana dashboard is provided to visualize the collected iperf3 metrics.

  1. Log in to your Grafana instance.
  2. Navigate to Dashboards -> Import.
  3. Paste the full JSON model provided below into the text area and click Load.
  4. Select your Prometheus data source and click Import.
{
"__inputs": [],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "8.0.0"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"targets": [
{
"expr": "avg(iperf_network_bandwidth_mbps) by (source_node, destination_node)",
"format": "heatmap",
"legendFormat": "{{source_node}} -> {{destination_node}}",
"refId": "A"
}
],
"cards": { "cardPadding": null, "cardRound": null },
"color": {
"mode": "spectrum",
"scheme": "red-yellow-green",
"exponent": 0.5,
"reverse": false
},
"dataFormat": "tsbuckets",
"yAxis": { "show": true, "format": "short" },
"xAxis": { "show": true }
},
{
"title": "Bandwidth Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 9
},
"targets": [
{
"expr": "iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=~\"^$protocol$\"}",
"legendFormat": "Bandwidth",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "mbps"
}
}
},
{
"title": "Jitter Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 9
},
"targets": [
{
"expr": "iperf_network_jitter_ms{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=\"udp\"}",
"legendFormat": "Jitter",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "ms"
}
}
}
],
"refresh": "30s",
"schemaVersion": 36,
"style": "dark",
"tags": ["iperf3", "network", "kubernetes"],
"templating": {
"list": [
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps, source_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "source_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps, source_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "destination_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": { "selected": true, "text": "tcp", "value": "tcp" },
"hide": 0,
"includeAll": false,
"multi": false,
"name": "protocol",
"options": [
{ "selected": true, "text": "tcp", "value": "tcp" },
{ "selected": false, "text": "udp", "value": "udp" }
],
"query": "tcp,udp",
"skipUrlSync": false,
"type": "custom"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Kubernetes iperf3 Network Performance",
"uid": "k8s-iperf3-dashboard",
"version": 1,
"weekStart": ""
}

Repository Structure

The project follows a standard structure:

.
├── .github/
│   └── workflows/
│       └── release.yml    # GitHub Actions workflow for CI/CD
├── charts/
│   └── iperf3-monitor/    # The Helm chart for the service
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
│           ├── _helpers.tpl
│           ├── server-daemonset.yaml
│           ├── exporter-deployment.yaml
│           ├── rbac.yaml
│           ├── service.yaml
│           └── servicemonitor.yaml
└── exporter/
    ├── Dockerfile         # Dockerfile for the exporter
    ├── requirements.txt   # Python dependencies
    └── exporter.py        # Exporter source code
├── .gitignore             # Specifies intentionally untracked files
├── LICENSE                # Project license
└── README.md              # This file

Development and CI/CD

The project includes a GitHub Actions workflow (.github/workflows/release.yml) triggered on Git tags (v*.*.*) to automate:

  1. Linting the Helm chart.
  2. Building and publishing the Docker image for the exporter to GitHub Container Registry (ghcr.io).
  3. Updating the Helm chart version based on the Git tag.
  4. Packaging and publishing the Helm chart to GitHub Pages.

License

This project is licensed under the GNU Affero General Public License v3. See the LICENSE file for details.