- Helm Chart:
- Added `app.kubernetes.io/component: exporter` to the exporter pod
labels via `values.yaml` to match the service selector.
- Updated image tag defaulting in `exporter-controller.yaml` to use
`Chart.appVersion` directly (e.g., "0.1.0" instead of "v0.1.0").
- Build Process (`.github/workflows/release.yml`):
- Configured `docker/metadata-action` to ensure image tags are generated
without a 'v' prefix (e.g., "0.1.0" from Git tag "v0.1.0").
This aligns the published image tags with the Helm chart's
updated image tag references.
- Repository:
- Added `rendered-manifests.yaml` and `rendered-manifests-updated.yaml`
to `.gitignore`.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
||
|---|---|---|
| .github/workflows | ||
| charts/iperf3-monitor | ||
| docs | ||
| exporter | ||
| .gitignore | ||
| LICENSE | ||
| README.md | ||
| devbox.json | ||
| devbox.lock | ||
| get_helm.sh | ||
README.md
Kubernetes-Native Network Performance Monitoring Service
This project provides a comprehensive solution for continuous network validation within a Kubernetes cluster. Leveraging industry-standard tools like iperf3, Prometheus, and Grafana, it offers proactive monitoring of network performance between nodes, helping to identify and troubleshoot latency, bandwidth, and packet loss issues before they impact applications.
Features
- Continuous N-to-N Testing: Automatically measures network performance between all nodes in the cluster.
- Kubernetes-Native: Deploys as standard Kubernetes workloads (DaemonSet, Deployment).
- Dynamic Discovery: Exporter automatically discovers iperf3 server pods using the Kubernetes API.
- Prometheus Integration: Translates iperf3 results into standard Prometheus metrics for time-series storage.
- Grafana Visualization: Provides a rich, interactive dashboard with heatmaps and time-series graphs.
- Helm Packaging: Packaged as a Helm chart for easy deployment and configuration management.
- Automated CI/CD: Includes a GitHub Actions workflow for building and publishing the exporter image and Helm chart.
Architecture
The service is based on a decoupled architecture:
- iperf3-server DaemonSet: Deploys an
iperf3server pod on every node to act as a test endpoint. Running on the host network to measure raw node performance. - iperf3-exporter Deployment: A centralized service that uses the Kubernetes API to discover server pods, orchestrates
iperf3client tests against them, parses the JSON output, and exposes performance metrics via an HTTP endpoint. - Prometheus & Grafana Stack: A standard monitoring backend (like
kube-prometheus-stack) that scrapes the exporter's metrics and visualizes them in a custom dashboard.
This separation of concerns ensures scalability, resilience, and aligns with Kubernetes operational principles.
Getting Started
Prerequisites
- A running Kubernetes cluster.
kubectlconfigured to connect to your cluster.- Helm v3+ installed.
- A Prometheus instance configured to scrape services (ideally using the Prometheus Operator and ServiceMonitors).
- A Grafana instance accessible and configured with Prometheus as a data source.
Installation with Helm
-
Add the Helm chart repository (replace with your actual repo URL once published):
helm repo add iperf3-monitor https://malarinv.github.io/iperf3-monitor/ -
Update your Helm repositories:
helm repo update -
Install the chart:
helm install iperf3-monitor iperf3-monitor/iperf3-monitor \ --namespace monitoring # Or your preferred namespace \ --create-namespace \ --values values.yaml # Optional: Use a custom values fileNote: Ensure your Prometheus instance is configured to scrape services in the namespace where you install the chart and that it recognizes
ServiceMonitorresources with the labelrelease: prometheus-operator(if using the standardkube-prometheus-stacksetup).
Configuration
The Helm chart is highly configurable via the values.yaml file. You can override default settings by creating your own values.yaml and passing it during installation (--values my-values.yaml).
Refer to the comments in the default values.yaml for a detailed explanation of each parameter:
# Default values for iperf3-monitor.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# -- Override the name of the chart.
nameOverride: ""
# -- Override the fully qualified app name.
fullnameOverride: ""
# Exporter Configuration (`controllers.exporter`)
# The iperf3 exporter is managed under the `controllers.exporter` section,
# leveraging the `bjw-s/common-library` for robust workload management.
controllers:
exporter:
# -- Enable the exporter controller.
enabled: true
# -- Set the controller type for the exporter.
# Valid options are "deployment" or "daemonset".
# Use "daemonset" for N-to-N node monitoring where an exporter runs on each node (or selected nodes).
# Use "deployment" for a centralized exporter (typically with replicaCount: 1).
# @default -- "deployment"
type: deployment
# -- Number of desired exporter pods. Only used if type is "deployment".
# @default -- 1
replicas: 1
# -- Application-specific configuration for the iperf3 exporter.
# These values are used to populate environment variables for the exporter container.
appConfig:
# -- Interval in seconds between complete test cycles (i.e., testing all server nodes).
testInterval: 300
# -- Log level for the iperf3 exporter (e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL).
logLevel: INFO
# -- Timeout in seconds for a single iperf3 test run.
testTimeout: 10
# -- Protocol to use for testing (tcp or udp).
testProtocol: tcp
# -- iperf3 server port to connect to. Should match the server's listening port.
serverPort: "5201"
# -- Label selector to find iperf3 server pods.
# This is templated. Default: 'app.kubernetes.io/name=<chart-name>,app.kubernetes.io/instance=<release-name>,app.kubernetes.io/component=server'
serverLabelSelector: 'app.kubernetes.io/name={{ include "iperf3-monitor.name" . }},app.kubernetes.io/instance={{ .Release.Name }},app.kubernetes.io/component=server'
# -- Pod-level configurations for the exporter.
pod:
# -- Annotations for the exporter pod.
annotations: {}
# -- Labels for the exporter pod (the common library adds its own defaults too).
labels: {}
# -- Node selector for scheduling exporter pods. Useful for DaemonSet or specific scheduling with Deployments.
# Example:
# nodeSelector:
# kubernetes.io/os: linux
nodeSelector: {}
# -- Tolerations for scheduling exporter pods.
# Example:
# tolerations:
# - key: "node-role.kubernetes.io/control-plane"
# operator: "Exists"
# effect: "NoSchedule"
tolerations: []
# -- Affinity rules for scheduling exporter pods.
# Example:
# affinity:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: "kubernetes.io/arch"
# operator: In
# values:
# - amd64
affinity: {}
# -- Security context for the exporter pod.
# securityContext:
# fsGroup: 65534
# runAsUser: 65534
# runAsGroup: 65534
# runAsNonRoot: true
securityContext: {}
# -- Automount service account token for the pod.
automountServiceAccountToken: true
# -- Container-level configurations for the main exporter container.
containers:
exporter: # Name of the primary container
image:
repository: ghcr.io/malarinv/iperf3-monitor
tag: "" # Defaults to .Chart.AppVersion
pullPolicy: IfNotPresent
# -- Custom environment variables for the exporter container.
# These are merged with the ones generated from appConfig.
# env:
# MY_CUSTOM_VAR: "my_value"
env: {}
# -- Ports for the exporter container.
ports:
metrics: # Name of the port
port: 9876 # Container port for metrics
protocol: TCP
enabled: true
# -- CPU and memory resource requests and limits.
# resources:
# requests:
# cpu: "100m"
# memory: "128Mi"
# limits:
# cpu: "500m"
# memory: "256Mi"
resources: {}
# -- Probes configuration for the exporter container.
# probes:
# liveness:
# enabled: true # Example: enable liveness probe
# spec: # Customize probe spec if needed
# initialDelaySeconds: 30
# periodSeconds: 15
# timeoutSeconds: 5
# failureThreshold: 3
probes:
liveness:
enabled: false
readiness:
enabled: false
startup:
enabled: false
server:
# -- Configuration for the iperf3 server container image (DaemonSet).
image:
# -- The container image repository for the iperf3 server.
repository: networkstatic/iperf3
# -- The container image tag for the iperf3 server.
tag: latest
# -- CPU and memory resource requests and limits for the iperf3 server pods (DaemonSet).
# These should be very low as the server is mostly idle.
# @default -- A small default is provided if commented out.
resources: {}
# requests:
# cpu: "50m"
# memory: "64Mi"
# limits:
# cpu: "100m"
# memory: "128Mi"
# -- Node selector for scheduling iperf3 server pods.
# Use this to restrict the DaemonSet to a subset of nodes.
# @default -- {} (schedule on all nodes)
nodeSelector: {}
# -- Tolerations for scheduling iperf3 server pods on tainted nodes (e.g., control-plane nodes).
# This is often necessary to include master nodes in the test mesh.
# @default -- Tolerates control-plane and master taints.
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
rbac:
# -- If true, create ServiceAccount, ClusterRole, and ClusterRoleBinding for the exporter.
# Set to false if you manage RBAC externally.
create: true
serviceAccount:
# -- The name of the ServiceAccount to use for the exporter pod.
# Only used if rbac.create is false. If not set, it defaults to the chart's fullname.
name: ""
serviceMonitor:
# -- If true, create a ServiceMonitor resource for integration with Prometheus Operator.
# Requires a running Prometheus Operator in the cluster.
enabled: true
# -- Scrape interval for the ServiceMonitor. How often Prometheus scrapes the exporter metrics.
interval: 60s
# -- Scrape timeout for the ServiceMonitor. How long Prometheus waits for metrics response.
scrapeTimeout: 30s
# -- Configuration for the exporter Service.
service:
# -- Service type. ClusterIP is typically sufficient.
type: ClusterIP
# -- Port on which the exporter service is exposed.
port: 9876
# -- Target port on the exporter pod.
targetPort: 9876
# -- Optional configuration for a network policy to allow traffic to the iperf3 server DaemonSet.
# This is often necessary if you are using a network policy controller.
networkPolicy:
# -- If true, create a NetworkPolicy resource.
enabled: false
# -- Specify source selectors if needed (e.g., pods in a specific namespace).
from: []
# -- Specify namespace selectors if needed.
namespaceSelector: {}
# -- Specify pod selectors if needed.
podSelector: {}
Grafana Dashboard
A custom Grafana dashboard is provided to visualize the collected iperf3 metrics.
- Log in to your Grafana instance.
- Navigate to
Dashboards->Import. - Paste the full JSON model provided below into the text area and click
Load. - Select your Prometheus data source and click
Import.
{
"__inputs": [],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "8.0.0"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"targets": [
{
"expr": "avg(iperf_network_bandwidth_mbps) by (source_node, destination_node)",
"format": "heatmap",
"legendFormat": "{{source_node}} -> {{destination_node}}",
"refId": "A"
}
],
"cards": { "cardPadding": null, "cardRound": null },
"color": {
"mode": "spectrum",
"scheme": "red-yellow-green",
"exponent": 0.5,
"reverse": false
},
"dataFormat": "tsbuckets",
"yAxis": { "show": true, "format": "short" },
"xAxis": { "show": true }
},
{
"title": "Bandwidth Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 9
},
"targets": [
{
"expr": "iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=~\"^$protocol$\"}",
"legendFormat": "Bandwidth",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "mbps"
}
}
},
{
"title": "Jitter Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 9
},
"targets": [
{
"expr": "iperf_network_jitter_ms{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=\"udp\"}",
"legendFormat": "Jitter",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "ms"
}
}
}
],
"refresh": "30s",
"schemaVersion": 36,
"style": "dark",
"tags": ["iperf3", "network", "kubernetes"],
"templating": {
"list": [
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps, source_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "source_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps, source_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "destination_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": { "selected": true, "text": "tcp", "value": "tcp" },
"hide": 0,
"includeAll": false,
"multi": false,
"name": "protocol",
"options": [
{ "selected": true, "text": "tcp", "value": "tcp" },
{ "selected": false, "text": "udp", "value": "udp" }
],
"query": "tcp,udp",
"skipUrlSync": false,
"type": "custom"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Kubernetes iperf3 Network Performance",
"uid": "k8s-iperf3-dashboard",
"version": 1,
"weekStart": ""
}
Repository Structure
The project follows a standard structure:
.
├── .github/
│ └── workflows/
│ └── release.yml # GitHub Actions workflow for CI/CD
├── charts/
│ └── iperf3-monitor/ # The Helm chart for the service
│ ├── Chart.yaml
│ ├── values.yaml
│ └── templates/
│ ├── _helpers.tpl
│ ├── server-daemonset.yaml
│ ├── exporter-deployment.yaml
│ ├── rbac.yaml
│ ├── service.yaml
│ └── servicemonitor.yaml
└── exporter/
├── Dockerfile # Dockerfile for the exporter
├── requirements.txt # Python dependencies
└── exporter.py # Exporter source code
├── .gitignore # Specifies intentionally untracked files
├── LICENSE # Project license
└── README.md # This file
Development and CI/CD
The project includes a GitHub Actions workflow (.github/workflows/release.yml) triggered on Git tags (v*.*.*) to automate:
- Linting the Helm chart.
- Building and publishing the Docker image for the exporter to GitHub Container Registry (
ghcr.io). - Updating the Helm chart version based on the Git tag.
- Packaging and publishing the Helm chart to GitHub Pages.
License
This project is licensed under the GNU Affero General Public License v3. See the LICENSE file for details.