diff --git a/charts/iperf3-monitor/values.yaml b/charts/iperf3-monitor/values.yaml index 0aec506..d2de952 100644 --- a/charts/iperf3-monitor/values.yaml +++ b/charts/iperf3-monitor/values.yaml @@ -85,7 +85,7 @@ rbac: serviceAccount: # -- The name of the ServiceAccount to use for the exporter pod. # Only used if rbac.create is false. If not set, it defaults to the chart's fullname. - name: "" + name: "iperf3-monitor" serviceMonitor: # -- If true, create a ServiceMonitor resource for integration with Prometheus Operator. diff --git a/docs/DESIGN.md b/docs/DESIGN.md new file mode 100644 index 0000000..0b4c1c4 --- /dev/null +++ b/docs/DESIGN.md @@ -0,0 +1,1418 @@ +# **Architecting a Kubernetes-Native Network Performance Monitoring Service with iperf3, Prometheus, and Helm** + +## **Section 1: Architectural Blueprint for Continuous Network Validation** + +### **1.1 Introduction to Proactive Network Monitoring in Kubernetes** {#introduction-to-proactive-network-monitoring-in-kubernetes} + +In modern cloud-native infrastructures, Kubernetes has emerged as the de +facto standard for container orchestration, simplifying the deployment, +scaling, and management of complex applications.^1^ However, the very +dynamism and abstraction that make Kubernetes powerful also introduce +significant challenges in diagnosing network performance issues. The +ephemeral nature of pods, the complexity of overlay networks provided by +Container Network Interfaces (CNIs), and the multi-layered traffic +routing through Services and Ingress controllers can obscure the root +causes of latency, packet loss, and throughput degradation. + +Traditional, reactive troubleshooting---investigating network problems +only after an application has failed---is insufficient in these +environments. Performance bottlenecks can be subtle, intermittent, and +difficult to reproduce, often manifesting as degraded user experience +long before they trigger hard failures.^1^ To maintain the reliability +and performance of critical workloads, engineering teams must shift from +a reactive to a proactive stance. This requires a system that performs +continuous, automated validation of the underlying network fabric, +treating network health not as an assumption but as a measurable, +time-series metric. + +This document outlines the architecture and implementation of a +comprehensive, Kubernetes-native network performance monitoring service. +The solution leverages a suite of industry-standard, open-source tools +to provide continuous, actionable insights into cluster network health. +The core components are: + +- **iperf3:** A widely adopted tool for active network performance + > measurement, used to generate traffic and measure maximum achievable + > bandwidth, jitter, and packet loss between two points.^2^ + +- **Prometheus:** A powerful, open-source monitoring and alerting system + > that has become the standard for collecting and storing time-series + > metrics in the Kubernetes ecosystem.^3^ + +- **Grafana:** A leading visualization tool for creating rich, + > interactive dashboards from various data sources, including + > Prometheus, enabling intuitive analysis of complex datasets.^4^ + +By combining these components into a cohesive, automated service, we can +transform abstract network performance into a concrete, queryable, and +visualizable stream of data, enabling teams to detect and address +infrastructure-level issues before they impact end-users.^6^ + +### **1.2 The Core Architectural Pattern: Decoupled Test Endpoints and a Central Orchestrator** {#the-core-architectural-pattern-decoupled-test-endpoints-and-a-central-orchestrator} + +The foundation of this monitoring service is a robust, decoupled +architectural pattern designed for scalability and resilience within a +dynamic Kubernetes environment. The design separates the passive test +endpoints from the active test orchestrator, a critical distinction that +ensures the system is both efficient and aligned with Kubernetes +operational principles. + +The data flow and component interaction can be visualized as follows: + +1. A **DaemonSet** deploys an iperf3 server pod onto every node in the + > cluster, creating a mesh of passive test targets. + +2. A central **Deployment**, the iperf3-exporter, uses the Kubernetes + > API to discover the IP addresses of all iperf3 server pods. + +3. The iperf3-exporter periodically orchestrates tests, running an + > iperf3 client to connect to each server pod and measure network + > performance. + +4. The exporter parses the JSON output from iperf3, transforms the + > results into Prometheus metrics, and exposes them on a /metrics + > HTTP endpoint. + +5. A **Prometheus** server, configured via a **ServiceMonitor**, + > scrapes the /metrics endpoint of the exporter, ingesting the + > performance data into its time-series database. + +6. A **Grafana** instance, using Prometheus as a data source, + > visualizes the metrics in a purpose-built dashboard, providing + > heatmaps and time-series graphs of node-to-node bandwidth, jitter, + > and packet loss. + +This architecture is composed of three primary logical components: + +- **Component 1: The iperf3-server DaemonSet.** To accurately measure + > network performance between any two nodes (N-to-N), an iperf3 server + > process must be running and accessible on every node. The DaemonSet + > is the canonical Kubernetes controller for this exact use case. It + > guarantees that a copy of a specific pod runs on all, or a selected + > subset of, nodes within the cluster.^7^ When a new node joins the + > cluster, the + > DaemonSet controller automatically deploys an iperf3-server pod to + > it; conversely, when a node is removed, the pod is garbage + > collected. This ensures the mesh of test endpoints is always in sync + > with the state of the cluster, requiring zero manual + > intervention.^9^ This pattern of using a + > DaemonSet to deploy iperf3 across a cluster is a well-established + > practice for network validation.^11^ + +- **Component 2: The iperf3-exporter Deployment.** A separate, + > centralized component is required to act as the test orchestrator. + > This component is responsible for initiating the iperf3 client + > connections, executing the tests, parsing the results, and exposing + > them as Prometheus metrics. Since this is a stateless service whose + > primary function is to perform a periodic task, a Deployment is the + > ideal controller.^8^ A + > Deployment ensures a specified number of replicas are running, + > provides mechanisms for rolling updates, and allows for independent + > resource management and lifecycle control, decoupled from the + > iperf3-server pods it tests against.^10^ + +- **Component 3: The Prometheus & Grafana Stack.** The monitoring + > backend is provided by the kube-prometheus-stack, a comprehensive + > Helm chart that deploys Prometheus, Grafana, Alertmanager, and the + > necessary exporters for cluster monitoring.^4^ Our custom monitoring + > service is designed to integrate seamlessly with this stack, + > leveraging its Prometheus Operator for automatic scrape + > configuration and its Grafana instance for visualization. + +### **1.3 Architectural Justification and Design Rationale** {#architectural-justification-and-design-rationale} + +The primary strength of this architecture lies in its deliberate +separation of concerns, a design choice that yields significant benefits +in resilience, scalability, and operational efficiency. The DaemonSet is +responsible for the *presence* of test endpoints, while the Deployment +handles the *orchestration* of the tests. This decoupling is not +arbitrary; it is a direct consequence of applying Kubernetes-native +principles to the problem. + +The logical progression is as follows: The requirement to continuously +measure N-to-N node bandwidth necessitates that iperf3 server processes +are available on all N nodes to act as targets. The most reliable, +self-healing, and automated method to achieve this \"one-pod-per-node\" +pattern in Kubernetes is to use a DaemonSet.^7^ This makes the server +deployment automatically scale with the cluster itself. Next, a process +is needed to trigger the tests against these servers. This +\"orchestrator\" is a logically distinct, active service. It needs to be +reliable and potentially scalable, but it does not need to run on every +single node. The standard Kubernetes object for managing such stateless +services is a + +Deployment.^8^ + +This separation allows for independent and appropriate resource +allocation. The iperf3-server pods are extremely lightweight, consuming +minimal resources while idle. The iperf3-exporter, however, may be more +CPU-intensive during the brief periods it is actively running tests. By +placing them in different workload objects (DaemonSet and Deployment), +we can configure their resource requests and limits independently. This +prevents the monitoring workload from interfering with or being starved +by application workloads, a crucial consideration for any +production-grade system. This design is fundamentally more robust and +scalable than simpler, monolithic approaches, such as a single script +that attempts to manage both server and client lifecycles.^12^ + +## **Section 2: Implementing the iperf3-prometheus-exporter** + +The heart of this monitoring solution is the iperf3-prometheus-exporter, +a custom application responsible for orchestrating the network tests and +translating their results into a format that Prometheus can ingest. This +section provides a detailed breakdown of its implementation, from +technology selection to the final container image. + +### **2.1 Technology Selection: Python for Agility and Ecosystem** {#technology-selection-python-for-agility-and-ecosystem} + +Python was selected as the implementation language for the exporter due +to its powerful ecosystem and rapid development capabilities. The +availability of mature, well-maintained libraries for interacting with +both Prometheus and Kubernetes significantly accelerates the development +of a robust, cloud-native application. + +The key libraries leveraged are: + +- **prometheus-client:** The official Python client library for + > instrumenting applications with Prometheus metrics. It provides a + > simple API for defining metrics (Gauges, Counters, etc.) and + > exposing them via an HTTP server, handling much of the boilerplate + > required for creating a valid exporter.^13^ + +- **iperf3-python:** A clean, high-level Python wrapper around the + > iperf3 C library. It allows for programmatic control of iperf3 + > clients and servers, and it can directly parse the JSON output of a + > test into a convenient Python object, eliminating the need for + > manual process management and output parsing.^15^ + +- **kubernetes:** The official Python client library for the Kubernetes + > API. This library is essential for the exporter to become + > \"Kubernetes-aware,\" enabling it to dynamically discover the + > iperf3-server pods it needs to test against by querying the API + > server directly. + +### **2.2 Core Exporter Logic (Annotated Python Code)** {#core-exporter-logic-annotated-python-code} + +The exporter\'s logic can be broken down into five distinct steps, which +together form a continuous loop of discovery, testing, and reporting. + +#### **Step 1: Initialization and Metric Definition** + +The application begins by importing the necessary libraries and defining +the Prometheus metrics that will be exposed. We use a Gauge metric, as +bandwidth is a value that can go up or down. Labels are crucial for +providing context; they allow us to slice and dice the data in +Prometheus and Grafana. + +> Python + +import os +import time +import logging +from kubernetes import client, config +from prometheus_client import start_http_server, Gauge +import iperf3 + +\# \-\-- Configuration \-\-- +\# Configure logging +logging.basicConfig(level=logging.INFO, format=\'%(asctime)s - +%(levelname)s - %(message)s\') + +\# \-\-- Prometheus Metrics Definition \-\-- +IPERF_BANDWIDTH_MBPS = Gauge( +\'iperf_network_bandwidth_mbps\', +\'Network bandwidth measured by iperf3 in Megabits per second\', +\[\'source_node\', \'destination_node\', \'protocol\'\] +) +IPERF_JITTER_MS = Gauge( +\'iperf_network_jitter_ms\', +\'Network jitter measured by iperf3 in milliseconds\', +\[\'source_node\', \'destination_node\', \'protocol\'\] +) +IPERF_PACKETS_TOTAL = Gauge( +\'iperf_network_packets_total\', +\'Total packets transmitted or received during the iperf3 test\', +\[\'source_node\', \'destination_node\', \'protocol\'\] +) +IPERF_LOST_PACKETS = Gauge( +\'iperf_network_lost_packets_total\', +\'Total lost packets during the iperf3 UDP test\', +\[\'source_node\', \'destination_node\', \'protocol\'\] +) +IPERF_TEST_SUCCESS = Gauge( +\'iperf_test_success\', +\'Indicates if the iperf3 test was successful (1) or failed (0)\', +\[\'source_node\', \'destination_node\', \'protocol\'\] +) + +#### **Step 2: Kubernetes-Aware Target Discovery** + +A static list of test targets is an anti-pattern in a dynamic +environment like Kubernetes.^16^ The exporter must dynamically discover +its targets. This is achieved by using the Kubernetes Python client to +query the API server for all pods that match the label selector of our + +iperf3-server DaemonSet (e.g., app=iperf3-server). The function returns +a list of dictionaries, each containing the pod\'s IP address and the +name of the node it is running on. + +This dynamic discovery is what transforms the exporter from a simple +script into a resilient, automated service. It adapts to cluster scaling +events without any manual intervention. The logical path is clear: +Kubernetes clusters are dynamic, so a hardcoded list of IPs would become +stale instantly. The API server is the single source of truth for the +cluster\'s state. Therefore, the exporter must query this API, which in +turn necessitates including the Kubernetes client library and +configuring the appropriate Role-Based Access Control (RBAC) permissions +for its ServiceAccount. + +> Python + +def discover_iperf_servers(): +\"\"\" +Discover iperf3 server pods in the cluster using the Kubernetes API. +\"\"\" +try: +\# Load in-cluster configuration +config.load_incluster_config() +v1 = client.CoreV1Api() + +namespace = os.getenv(\'IPERF_SERVER_NAMESPACE\', \'default\') +label_selector = os.getenv(\'IPERF_SERVER_LABEL_SELECTOR\', +\'app=iperf3-server\') + +logging.info(f\"Discovering iperf3 servers with label +\'{label_selector}\' in namespace \'{namespace}\'\") + +ret = v1.list_pod_for_all_namespaces(label_selector=label_selector, +watch=False) + +servers = +for i in ret.items: +\# Ensure pod has an IP and is running +if i.status.pod_ip and i.status.phase == \'Running\': +servers.append({ +\'ip\': i.status.pod_ip, +\'node_name\': i.spec.node_name +}) +logging.info(f\"Discovered {len(servers)} iperf3 server pods.\") +return servers +except Exception as e: +logging.error(f\"Error discovering iperf servers: {e}\") +return + +#### **Step 3: The Test Orchestration Loop** + +The main function of the application contains an infinite while True +loop that orchestrates the entire process. It periodically discovers the +servers, creates a list of test pairs (node-to-node), and then executes +an iperf3 test for each pair. + +> Python + +def run_iperf_test(server_ip, server_port, protocol, source_node, +dest_node): +\"\"\" +Runs a single iperf3 test and updates Prometheus metrics. +\"\"\" +logging.info(f\"Running iperf3 test from {source_node} to {dest_node} +({server_ip}:{server_port}) using {protocol.upper()}\") + +client = iperf3.Client() +client.server_hostname = server_ip +client.port = server_port +client.protocol = protocol +client.duration = int(os.getenv(\'IPERF_TEST_DURATION\', 5)) +client.json_output = True \# Critical for parsing + +result = client.run() + +\# Parse results and update metrics +parse_and_publish_metrics(result, source_node, dest_node, protocol) + +def main_loop(): +\"\"\" +Main orchestration loop. +\"\"\" +test_interval = int(os.getenv(\'IPERF_TEST_INTERVAL\', 300)) +server_port = int(os.getenv(\'IPERF_SERVER_PORT\', 5201)) +protocol = os.getenv(\'IPERF_TEST_PROTOCOL\', \'tcp\').lower() +source_node_name = os.getenv(\'SOURCE_NODE_NAME\') \# Injected via +Downward API + +if not source_node_name: +logging.error(\"SOURCE_NODE_NAME environment variable not set. +Exiting.\") +return + +while True: +servers = discover_iperf_servers() + +for server in servers: +\# Avoid testing a node against itself +if server\[\'node_name\'\] == source_node_name: +continue + +run_iperf_test(server\[\'ip\'\], server_port, protocol, +source_node_name, server\[\'node_name\'\]) + +logging.info(f\"Completed test cycle. Sleeping for {test_interval} +seconds.\") +time.sleep(test_interval) + +#### **Step 4: Parsing and Publishing Metrics** + +After each test run, a dedicated function parses the JSON result object +provided by the iperf3-python library.^15^ It extracts the key +performance indicators and uses them to set the value of the +corresponding Prometheus + +Gauge, applying the correct labels for source and destination nodes. +Robust error handling ensures that failed tests are also recorded as a +metric, which is vital for alerting. + +> Python + +def parse_and_publish_metrics(result, source_node, dest_node, +protocol): +\"\"\" +Parses the iperf3 result and updates Prometheus gauges. +\"\"\" +labels = {\'source_node\': source_node, \'destination_node\': dest_node, +\'protocol\': protocol} + +if result.error: +logging.error(f\"Test from {source_node} to {dest_node} failed: +{result.error}\") +IPERF_TEST_SUCCESS.labels(\*\*labels).set(0) +\# Clear previous successful metrics for this path +IPERF_BANDWIDTH_MBPS.labels(\*\*labels).set(0) +IPERF_JITTER_MS.labels(\*\*labels).set(0) +return + +IPERF_TEST_SUCCESS.labels(\*\*labels).set(1) + +\# The summary data is in result.sent_Mbps or result.received_Mbps +depending on direction +\# For simplicity, we check for available attributes. +if hasattr(result, \'sent_Mbps\'): +bandwidth_mbps = result.sent_Mbps +elif hasattr(result, \'received_Mbps\'): +bandwidth_mbps = result.received_Mbps +else: +\# Fallback for different iperf3 versions/outputs +bandwidth_mbps = result.Mbps if hasattr(result, \'Mbps\') else 0 + +IPERF_BANDWIDTH_MBPS.labels(\*\*labels).set(bandwidth_mbps) + +if protocol == \'udp\': +IPERF_JITTER_MS.labels(\*\*labels).set(result.jitter_ms if +hasattr(result, \'jitter_ms\') else 0) +IPERF_PACKETS_TOTAL.labels(\*\*labels).set(result.packets if +hasattr(result, \'packets\') else 0) +IPERF_LOST_PACKETS.labels(\*\*labels).set(result.lost_packets if +hasattr(result, \'lost_packets\') else 0) + +#### **Step 5: Exposing the /metrics Endpoint** + +Finally, the main execution block starts a simple HTTP server using the +prometheus-client library. This server exposes the collected metrics on +the standard /metrics path, ready to be scraped by Prometheus.^13^ + +> Python + +if \_\_name\_\_ == \'\_\_main\_\_\': +\# Start the Prometheus metrics server +listen_port = int(os.getenv(\'LISTEN_PORT\', 9876)) +start_http_server(listen_port) +logging.info(f\"Prometheus exporter listening on port {listen_port}\") + +\# Start the main orchestration loop +main_loop() + +### **2.3 Containerizing the Exporter (Dockerfile)** {#containerizing-the-exporter-dockerfile} + +To deploy the exporter in Kubernetes, it must be packaged into a +container image. A multi-stage Dockerfile is used to create a minimal +and more secure final image by separating the build environment from the +runtime environment. This is a standard best practice for producing +production-ready containers.^14^ + +> Dockerfile + +\# Stage 1: Build stage with dependencies +FROM python:3.9-slim as builder + +WORKDIR /app + +\# Install iperf3 and build dependencies +RUN apt-get update && \\ +apt-get install -y \--no-install-recommends gcc iperf3 libiperf-dev && +\\ +rm -rf /var/lib/apt/lists/\* + +\# Install Python dependencies +COPY requirements.txt. +RUN pip install \--no-cache-dir -r requirements.txt + +\# Stage 2: Final runtime stage +FROM python:3.9-slim + +WORKDIR /app + +\# Copy iperf3 binary and library from the builder stage +COPY \--from=builder /usr/bin/iperf3 /usr/bin/iperf3 +COPY \--from=builder /usr/lib/x86_64-linux-gnu/libiperf.so.0 +/usr/lib/x86_64-linux-gnu/libiperf.so.0 + +\# Copy installed Python packages from the builder stage +COPY \--from=builder /usr/local/lib/python3.9/site-packages +/usr/local/lib/python3.9/site-packages + +\# Copy the exporter application code +COPY exporter.py. + +\# Expose the metrics port +EXPOSE 9876 + +\# Set the entrypoint +CMD \[\"python\", \"exporter.py\"\] + +The corresponding requirements.txt would contain: + +prometheus-client +iperf3 +kubernetes + +## **Section 3: Kubernetes Manifests and Deployment Strategy** + +With the architectural blueprint defined and the exporter application +containerized, the next step is to translate this design into +declarative Kubernetes manifests. These YAML files define the necessary +Kubernetes objects to deploy, configure, and manage the monitoring +service. Using static manifests here provides a clear foundation before +they are parameterized into a Helm chart in the next section. + +### **3.1 The iperf3-server DaemonSet** {#the-iperf3-server-daemonset} + +The iperf3-server component is deployed as a DaemonSet to ensure an +instance of the server pod runs on every eligible node in the +cluster.^7^ This creates the ubiquitous grid of test endpoints required +for comprehensive N-to-N testing. + +Key fields in this manifest include: + +- **spec.selector**: Connects the DaemonSet to the pods it manages via + > labels. + +- **spec.template.metadata.labels**: The label app: iperf3-server is + > applied to the pods, which is crucial for discovery by both the + > iperf3-exporter and Kubernetes Services. + +- **spec.template.spec.containers**: Defines the iperf3 container, using + > a public image and running the iperf3 -s command to start it in + > server mode. + +- **spec.template.spec.tolerations**: This is often necessary to allow + > the DaemonSet to schedule pods on control-plane (master) nodes, + > which may have taints preventing normal workloads from running + > there. This ensures the entire cluster, including masters, is part + > of the test mesh. + +- **spec.template.spec.hostNetwork: true**: This is a critical setting. + > By running the server pods on the host\'s network namespace, we + > bypass the Kubernetes network overlay (CNI) for the server side. + > This allows the test to measure the raw performance of the + > underlying node network interface, which is often the primary goal + > of infrastructure-level testing. + +> YAML + +apiVersion: apps/v1 +kind: DaemonSet +metadata: +name: iperf3-server +labels: +app: iperf3-server +spec: +selector: +matchLabels: +app: iperf3-server +template: +metadata: +labels: +app: iperf3-server +spec: +\# Run on the host network to measure raw node-to-node performance +hostNetwork: true +\# Tolerations to allow scheduling on control-plane nodes +tolerations: +- key: \"node-role.kubernetes.io/control-plane\" +operator: \"Exists\" +effect: \"NoSchedule\" +- key: \"node-role.kubernetes.io/master\" +operator: \"Exists\" +effect: \"NoSchedule\" +containers: +- name: iperf3-server +image: networkstatic/iperf3:latest +args: \[\"-s\"\] \# Start in server mode +ports: +- containerPort: 5201 +name: iperf3 +protocol: TCP +- containerPort: 5201 +name: iperf3-udp +protocol: UDP +resources: +requests: +cpu: \"50m\" +memory: \"64Mi\" +limits: +cpu: \"100m\" +memory: \"128Mi\" + +### **3.2 The iperf3-exporter Deployment** {#the-iperf3-exporter-deployment} + +The iperf3-exporter is deployed as a Deployment, as it is a stateless +application that orchestrates the tests.^14^ Only one replica is +typically needed, as it can sequentially test all nodes. + +Key fields in this manifest are: + +- **spec.replicas: 1**: A single instance is sufficient for most + > clusters. + +- **spec.template.spec.serviceAccountName**: This assigns the custom + > ServiceAccount (defined next) to the pod, granting it the necessary + > permissions to talk to the Kubernetes API. + +- **spec.template.spec.containers.env**: The SOURCE_NODE_NAME + > environment variable is populated using the Downward API. This is + > how the exporter pod knows which node *it* is running on, allowing + > it to skip testing against itself. + +- **spec.template.spec.containers.image**: This points to the custom + > exporter image built in the previous section. + +> YAML + +apiVersion: apps/v1 +kind: Deployment +metadata: +name: iperf3-exporter +labels: +app: iperf3-exporter +spec: +replicas: 1 +selector: +matchLabels: +app: iperf3-exporter +template: +metadata: +labels: +app: iperf3-exporter +spec: +serviceAccountName: iperf3-exporter-sa +containers: +- name: iperf3-exporter +image: your-repo/iperf3-prometheus-exporter:latest \# Replace with your +image +ports: +- containerPort: 9876 +name: metrics +env: +\# Use the Downward API to inject the node name this pod is running on +- name: SOURCE_NODE_NAME +valueFrom: +fieldRef: +fieldPath: spec.nodeName +\# Other configurations for the exporter script +- name: IPERF_TEST_INTERVAL +value: \"300\" +- name: IPERF_SERVER_LABEL_SELECTOR +value: \"app=iperf3-server\" +resources: +requests: +cpu: \"100m\" +memory: \"128Mi\" +limits: +cpu: \"500m\" +memory: \"256Mi\" + +### **3.3 RBAC: Granting Necessary Permissions** {#rbac-granting-necessary-permissions} + +For the exporter to perform its dynamic discovery of iperf3-server pods, +it must be granted specific, limited permissions to read information +from the Kubernetes API. This is accomplished through a ServiceAccount, +a ClusterRole, and a ClusterRoleBinding. + +- **ServiceAccount**: Provides an identity for the exporter pod within + > the cluster. + +- **ClusterRole**: Defines a set of permissions. Here, we grant get, + > list, and watch access to pods. These are the minimum required + > permissions for the discovery function to work. The role is a + > ClusterRole because the exporter needs to find pods across all + > namespaces where servers might be running. + +- **ClusterRoleBinding**: Links the ServiceAccount to the ClusterRole, + > effectively granting the permissions to any pod that uses the + > ServiceAccount. + +> YAML + +apiVersion: v1 +kind: ServiceAccount +metadata: +name: iperf3-exporter-sa +\-\-- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: +name: iperf3-exporter-role +rules: +- apiGroups: \[\"\"\] +resources: \[\"pods\"\] +verbs: \[\"get\", \"list\", \"watch\"\] +\-\-- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: +name: iperf3-exporter-rb +subjects: +- kind: ServiceAccount +name: iperf3-exporter-sa +namespace: default \# The namespace where the exporter is deployed +roleRef: +kind: ClusterRole +name: iperf3-exporter-role +apiGroup: rbac.authorization.k8s.io + +### **3.4 Network Exposure: Service and ServiceMonitor** {#network-exposure-service-and-servicemonitor} + +To make the exporter\'s metrics available to Prometheus, we need two +final objects. The Service exposes the exporter pod\'s metrics port +within the cluster, and the ServiceMonitor tells the Prometheus Operator +how to find and scrape that service. + +This ServiceMonitor-based approach is the linchpin for a GitOps-friendly +integration. Instead of manually editing the central Prometheus +configuration file---a brittle and non-declarative process---we deploy a +ServiceMonitor custom resource alongside our application.^14^ The +Prometheus Operator, a key component of the + +kube-prometheus-stack, continuously watches for these objects. When it +discovers our iperf3-exporter-sm, it automatically generates the +necessary scrape configuration and reloads Prometheus without any manual +intervention.^4^ This empowers the application team to define + +*how their application should be monitored* as part of the +application\'s own deployment package, a cornerstone of scalable, \"you +build it, you run it\" observability. + +> YAML + +apiVersion: v1 +kind: Service +metadata: +name: iperf3-exporter-svc +labels: +app: iperf3-exporter +spec: +selector: +app: iperf3-exporter +ports: +- name: metrics +port: 9876 +targetPort: metrics +protocol: TCP +\-\-- +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: +name: iperf3-exporter-sm +labels: +\# Label for Prometheus Operator to discover this ServiceMonitor +release: prometheus-operator +spec: +selector: +matchLabels: +\# This must match the labels on the Service object above +app: iperf3-exporter +endpoints: +- port: metrics +interval: 60s +scrapeTimeout: 30s + +## **Section 4: Packaging with Helm for Reusability and Distribution** + +While static YAML manifests are excellent for defining Kubernetes +resources, they lack the flexibility needed for easy configuration, +distribution, and lifecycle management. Helm, the package manager for +Kubernetes, solves this by bundling applications into +version-controlled, reusable packages called charts.^17^ This section +details how to package the entire + +iperf3 monitoring service into a professional, flexible, and +distributable Helm chart. + +### **4.1 Helm Chart Structure** {#helm-chart-structure} + +A well-organized Helm chart follows a standard directory structure. This +convention makes charts easier to understand and maintain.^19^ + +iperf3-monitor/ +├── Chart.yaml \# Metadata about the chart (name, version, etc.) +├── values.yaml \# Default configuration values for the chart +├── charts/ \# Directory for sub-chart dependencies (empty for this +project) +├── templates/ \# Directory containing the templated Kubernetes +manifests +│ ├── \_helpers.tpl \# A place for reusable template helpers +│ ├── server-daemonset.yaml +│ ├── exporter-deployment.yaml +│ ├── rbac.yaml +│ ├── service.yaml +│ └── servicemonitor.yaml +└── README.md \# Documentation for the chart + +### **4.2 Templating the Kubernetes Manifests** {#templating-the-kubernetes-manifests} + +The core of Helm\'s power lies in its templating engine, which uses Go +templates. We convert the static manifests from Section 3 into dynamic +templates by replacing hardcoded values with references to variables +defined in the values.yaml file. + +A crucial best practice is to use a \_helpers.tpl file to define common +functions and partial templates, especially for generating resource +names and labels. This reduces boilerplate, ensures consistency, and +makes the chart easier to manage.^19^ + +**Example: templates/\_helpers.tpl** + +> Code snippet + +{{/\* +Expand the name of the chart. +\*/}} +{{- define \"iperf3-monitor.name\" -}} +{{- default.Chart.Name.Values.nameOverride \| trunc 63 \| trimSuffix +\"-\" }} +{{- end -}} + +{{/\* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited +to this (by the DNS naming spec). +\*/}} +{{- define \"iperf3-monitor.fullname\" -}} +{{- if.Values.fullnameOverride }} +{{-.Values.fullnameOverride \| trunc 63 \| trimSuffix \"-\" }} +{{- else }} +{{- \$name := default.Chart.Name.Values.nameOverride }} +{{- if contains \$name.Release.Name }} +{{-.Release.Name \| trunc 63 \| trimSuffix \"-\" }} +{{- else }} +{{- printf \"%s-%s\".Release.Name \$name \| trunc 63 \| trimSuffix \"-\" +}} +{{- end }} +{{- end }} +{{- end -}} + +{{/\* +Common labels +\*/}} +{{- define \"iperf3-monitor.labels\" -}} +helm.sh/chart: {{ include \"iperf3-monitor.name\". }} +{{ include \"iperf3-monitor.selectorLabels\". }} +{{- if.Chart.AppVersion }} +app.kubernetes.io/version: {{.Chart.AppVersion \| quote }} +{{- end }} +app.kubernetes.io/managed-by: {{.Release.Service }} +{{- end -}} + +{{/\* +Selector labels +\*/}} +{{- define \"iperf3-monitor.selectorLabels\" -}} +app.kubernetes.io/name: {{ include \"iperf3-monitor.name\". }} +app.kubernetes.io/instance: {{.Release.Name }} +{{- end -}} + +**Example: Templated exporter-deployment.yaml** + +> YAML + +apiVersion: apps/v1 +kind: Deployment +metadata: +name: {{ include \"iperf3-monitor.fullname\". }}-exporter +labels: +{{- include \"iperf3-monitor.labels\". \| nindent 4 }} +app.kubernetes.io/component: exporter +spec: +replicas: {{.Values.exporter.replicaCount }} +selector: +matchLabels: +{{- include \"iperf3-monitor.selectorLabels\". \| nindent 6 }} +app.kubernetes.io/component: exporter +template: +metadata: +labels: +{{- include \"iperf3-monitor.selectorLabels\". \| nindent 8 }} +app.kubernetes.io/component: exporter +spec: +{{- if.Values.rbac.create }} +serviceAccountName: {{ include \"iperf3-monitor.fullname\". }}-sa +{{- else }} +serviceAccountName: {{.Values.serviceAccount.name }} +{{- end }} +containers: +- name: iperf3-exporter +image: \"{{.Values.exporter.image.repository +}}:{{.Values.exporter.image.tag \| default.Chart.AppVersion }}\" +imagePullPolicy: {{.Values.exporter.image.pullPolicy }} +ports: +- containerPort: 9876 +name: metrics +env: +- name: SOURCE_NODE_NAME +valueFrom: +fieldRef: +fieldPath: spec.nodeName +- name: IPERF_TEST_INTERVAL +value: \"{{.Values.exporter.testInterval }}\" +resources: +{{- toYaml.Values.exporter.resources \| nindent 10 }} + +### **4.3 Designing a Comprehensive values.yaml** {#designing-a-comprehensive-values.yaml} + +The values.yaml file is the public API of a Helm chart. A well-designed +values file is intuitive, clearly documented, and provides users with +the flexibility to adapt the chart to their specific needs. Best +practices include using clear, camelCase naming conventions and +providing comments for every parameter.^21^ + +A particularly powerful feature of Helm is conditional logic. By +wrapping entire resource definitions in if blocks based on boolean flags +in values.yaml (e.g., {{- if.Values.rbac.create }}), the chart becomes +highly adaptable. A user in a high-security environment can disable the +automatic creation of ClusterRoles by setting rbac.create: false, +allowing them to manage permissions manually without causing the Helm +installation to fail.^20^ Similarly, a user not running the Prometheus +Operator can set + +serviceMonitor.enabled: false. This adaptability transforms the chart +from a rigid, all-or-nothing package into a flexible building block, +dramatically increasing its utility across different organizations and +security postures. + +The following table documents the comprehensive set of configurable +parameters for the iperf3-monitor chart. This serves as the primary +documentation for any user wishing to install and customize the service. + +| Parameter | Description | Type | Default | +|------------------------------|----------------------------------------------------------------------|---------|-------------------------------------------| +| nameOverride | Override the name of the chart. | string | \"\" | +| fullnameOverride | Override the fully qualified app name. | string | \"\" | +| exporter.image.repository | The container image repository for the exporter. | string | ghcr.io/my-org/iperf3-prometheus-exporter | +| exporter.image.tag | The container image tag for the exporter. | string | (Chart.AppVersion) | +| exporter.image.pullPolicy | The image pull policy for the exporter. | string | IfNotPresent | +| exporter.replicaCount | Number of exporter pod replicas. | integer | 1 | +| exporter.testInterval | Interval in seconds between test cycles. | integer | 300 | +| exporter.testTimeout | Timeout in seconds for a single iperf3 test. | integer | 10 | +| exporter.testProtocol | Protocol to use for testing (tcp or udp). | string | tcp | +| exporter.resources | CPU/memory resource requests and limits for the exporter. | object | {} | +| server.image.repository | The container image repository for the iperf3 server. | string | networkstatic/iperf3 | +| server.image.tag | The container image tag for the iperf3 server. | string | latest | +| server.resources | CPU/memory resource requests and limits for the server pods. | object | {} | +| server.nodeSelector | Node selector for scheduling server pods. | object | {} | +| server.tolerations | Tolerations for scheduling server pods on tainted nodes. | array | \`\` | +| rbac.create | If true, create ServiceAccount, ClusterRole, and ClusterRoleBinding. | boolean | true | +| serviceAccount.name | The name of the ServiceAccount to use. Used if rbac.create is false. | string | \"\" | +| serviceMonitor.enabled | If true, create a ServiceMonitor for Prometheus Operator. | boolean | true | +| serviceMonitor.interval | Scrape interval for the ServiceMonitor. | string | 60s | +| serviceMonitor.scrapeTimeout | Scrape timeout for the ServiceMonitor. | string | 30s | + +## **Section 5: Visualizing Network Performance with a Custom Grafana Dashboard** + +The final piece of the user experience is a purpose-built Grafana +dashboard that transforms the raw, time-series metrics from Prometheus +into intuitive, actionable visualizations. A well-designed dashboard +does more than just display data; it tells a story, guiding an operator +from a high-level overview of cluster health to a deep-dive analysis of +a specific problematic network path.^5^ + +### **5.1 Dashboard Design Principles** {#dashboard-design-principles} + +The primary goals for this network performance dashboard are: + +1. **At-a-Glance Overview:** Provide an immediate, cluster-wide view of + > network health, allowing operators to quickly spot systemic issues + > or anomalies. + +2. **Intuitive Drill-Down:** Enable users to seamlessly transition from + > a high-level view to a detailed analysis of performance between + > specific nodes. + +3. **Correlation:** Display multiple related metrics (bandwidth, + > jitter, packet loss) on the same timeline to help identify causal + > relationships. + +4. **Clarity and Simplicity:** Avoid clutter and overly complex panels + > that can obscure meaningful data.^4^ + +### **5.2 Key Visualizations and Panels** {#key-visualizations-and-panels} + +The dashboard is constructed from several key panel types, each serving +a specific analytical purpose. + +- **Panel 1: Node-to-Node Bandwidth Heatmap.** This is the centerpiece + > of the dashboard\'s overview. It uses Grafana\'s \"Heatmap\" + > visualization to create a matrix of network performance. + + - **Y-Axis:** Source Node (source_node label). + + - **X-Axis:** Destination Node (destination_node label). + + - **Cell Color:** The value of the iperf_network_bandwidth_mbps + > metric. + + - PromQL Query: avg(iperf_network_bandwidth_mbps) by (source_node, + > destination_node) + > This panel provides an instant visual summary of the entire + > cluster\'s network fabric. A healthy cluster will show a uniformly + > \"hot\" (high bandwidth) grid, while any \"cold\" spots + > immediately draw attention to underperforming network paths. + +- **Panel 2: Time-Series Performance Graphs.** These panels use the + > \"Time series\" visualization to plot performance over time, + > allowing for trend analysis and historical investigation. + + - **Bandwidth (Mbps):** Plots + > iperf_network_bandwidth_mbps{source_node=\"\$source_node\", + > destination_node=\"\$destination_node\"}. + + - **Jitter (ms):** Plots + > iperf_network_jitter_ms{source_node=\"\$source_node\", + > destination_node=\"\$destination_node\", protocol=\"udp\"}. + + - Packet Loss (%): Plots (iperf_network_lost_packets_total{\...} / + > iperf_network_packets_total{\...}) \* 100. + > These graphs are filtered by the dashboard variables, enabling the + > drill-down analysis. + +- **Panel 3: Stat Panels.** These panels use the \"Stat\" visualization + > to display single, key performance indicators (KPIs) for the + > selected time range and nodes. + + - **Average Bandwidth:** avg(iperf_network_bandwidth_mbps{\...}) + + - **Minimum Bandwidth:** min(iperf_network_bandwidth_mbps{\...}) + + - **Maximum Jitter:** max(iperf_network_jitter_ms{\...}) + +### **5.3 Enabling Interactivity with Grafana Variables** {#enabling-interactivity-with-grafana-variables} + +The dashboard\'s interactivity is powered by Grafana\'s template +variables. These variables are dynamically populated from Prometheus and +are used to filter the data displayed in the panels.^4^ + +- **\$source_node**: A dropdown variable populated by the PromQL query + > label_values(iperf_network_bandwidth_mbps, source_node). + +- **\$destination_node**: A dropdown variable populated by + > label_values(iperf_network_bandwidth_mbps{source_node=\"\$source_node\"}, + > destination_node). This query is cascaded, meaning it only shows + > destinations relevant to the selected source. + +- **\$protocol**: A custom variable with the options tcp and udp. + +This combination of a high-level heatmap with interactive, +variable-driven drill-down graphs creates a powerful analytical +workflow. An operator can begin with a bird\'s-eye view of the cluster. +Upon spotting an anomaly on the heatmap (e.g., a low-bandwidth link +between Node-5 and Node-8), they can use the \$source_node and +\$destination_node dropdowns to select that specific path. All the +time-series panels will instantly update to show the detailed +performance history for that link, allowing the operator to correlate +bandwidth drops with jitter spikes or other events. This workflow +transforms raw data into actionable insight, dramatically reducing the +Mean Time to Identification (MTTI) for network issues. + +### **5.4 The Complete Grafana Dashboard JSON Model** {#the-complete-grafana-dashboard-json-model} + +To facilitate easy deployment, the entire dashboard is defined in a +single JSON model. This model can be imported directly into any Grafana +instance. + +> JSON + +{ +\"\_\_inputs\":, +\"\_\_requires\": \[ +{ +\"type\": \"grafana\", +\"id\": \"grafana\", +\"name\": \"Grafana\", +\"version\": \"8.0.0\" +}, +{ +\"type\": \"datasource\", +\"id\": \"prometheus\", +\"name\": \"Prometheus\", +\"version\": \"1.0.0\" +} +\], +\"annotations\": { +\"list\": \[ +{ +\"builtIn\": 1, +\"datasource\": { +\"type\": \"grafana\", +\"uid\": \"\-- Grafana \--\" +}, +\"enable\": true, +\"hide\": true, +\"iconColor\": \"rgba(0, 211, 255, 1)\", +\"name\": \"Annotations & Alerts\", +\"type\": \"dashboard\" +} +\] +}, +\"editable\": true, +\"fiscalYearStartMonth\": 0, +\"gnetId\": null, +\"graphTooltip\": 0, +\"id\": null, +\"links\":, +\"panels\":)\", +\"format\": \"heatmap\", +\"legendFormat\": \"{{source_node}} -\> {{destination_node}}\", +\"refId\": \"A\" +} +\], +\"cards\": { \"cardPadding\": null, \"cardRound\": null }, +\"color\": { +\"mode\": \"spectrum\", +\"scheme\": \"red-yellow-green\", +\"exponent\": 0.5, +\"reverse\": false +}, +\"dataFormat\": \"tsbuckets\", +\"yAxis\": { \"show\": true, \"format\": \"short\" }, +\"xAxis\": { \"show\": true } +}, +{ +\"title\": \"Bandwidth Over Time (Source: \$source_node, Dest: +\$destination_node)\", +\"type\": \"timeseries\", +\"datasource\": { +\"type\": \"prometheus\", +\"uid\": \"prometheus\" +}, +\"gridPos\": { \"h\": 8, \"w\": 12, \"x\": 0, \"y\": 9 }, +\"targets\":, +\"fieldConfig\": { +\"defaults\": { +\"unit\": \"mbps\" +} +} +}, +{ +\"title\": \"Jitter Over Time (Source: \$source_node, Dest: +\$destination_node)\", +\"type\": \"timeseries\", +\"datasource\": { +\"type\": \"prometheus\", +\"uid\": \"prometheus\" +}, +\"gridPos\": { \"h\": 8, \"w\": 12, \"x\": 12, \"y\": 9 }, +\"targets\": \[ +{ +\"expr\": \"iperf_network_jitter_ms{source_node=\\\$source_node\\, +destination_node=\\\$destination_node\\, protocol=\\udp\\}\", +\"legendFormat\": \"Jitter\", +\"refId\": \"A\" +} +\], +\"fieldConfig\": { +\"defaults\": { +\"unit\": \"ms\" +} +} +} +\], +\"refresh\": \"30s\", +\"schemaVersion\": 36, +\"style\": \"dark\", +\"tags\": \[\"iperf3\", \"network\", \"kubernetes\"\], +\"templating\": { +\"list\": \[ +{ +\"current\": {}, +\"datasource\": { +\"type\": \"prometheus\", +\"uid\": \"prometheus\" +}, +\"definition\": \"label_values(iperf_network_bandwidth_mbps, +source_node)\", +\"hide\": 0, +\"includeAll\": false, +\"multi\": false, +\"name\": \"source_node\", +\"options\":, +\"query\": \"label_values(iperf_network_bandwidth_mbps, +source_node)\", +\"refresh\": 1, +\"regex\": \"\", +\"skipUrlSync\": false, +\"sort\": 1, +\"type\": \"query\" +}, +{ +\"current\": {}, +\"datasource\": { +\"type\": \"prometheus\", +\"uid\": \"prometheus\" +}, +\"definition\": +\"label_values(iperf_network_bandwidth_mbps{source_node=\\\$source_node\\}, +destination_node)\", +\"hide\": 0, +\"includeAll\": false, +\"multi\": false, +\"name\": \"destination_node\", +\"options\":, +\"query\": +\"label_values(iperf_network_bandwidth_mbps{source_node=\\\$source_node\\}, +destination_node)\", +\"refresh\": 1, +\"regex\": \"\", +\"skipUrlSync\": false, +\"sort\": 1, +\"type\": \"query\" +}, +{ +\"current\": { \"selected\": true, \"text\": \"tcp\", \"value\": \"tcp\" +}, +\"hide\": 0, +\"includeAll\": false, +\"multi\": false, +\"name\": \"protocol\", +\"options\": \[ +{ \"selected\": true, \"text\": \"tcp\", \"value\": \"tcp\" }, +{ \"selected\": false, \"text\": \"udp\", \"value\": \"udp\" } +\], +\"query\": \"tcp,udp\", +\"skipUrlSync\": false, +\"type\": \"custom\" +} +\] +}, +\"time\": { +\"from\": \"now-1h\", +\"to\": \"now\" +}, +\"timepicker\": {}, +\"timezone\": \"browser\", +\"title\": \"Kubernetes iperf3 Network Performance\", +\"uid\": \"k8s-iperf3-dashboard\", +\"version\": 1, +\"weekStart\": \"\" +} + +## **Section 6: GitHub Repository Structure and CI/CD Workflow** + +To deliver this monitoring service as a professional, open-source-ready +project, it is essential to package it within a well-structured GitHub +repository and implement a robust Continuous Integration and Continuous +Deployment (CI/CD) pipeline. This automates the build, test, and release +process, ensuring that every version of the software is consistent, +trustworthy, and easy for consumers to adopt. + +### **6.1 Recommended Repository Structure** {#recommended-repository-structure} + +A clean, logical directory structure is fundamental for project +maintainability and ease of navigation for contributors and users. + +. +├──.github/ +│ └── workflows/ +│ └── release.yml \# GitHub Actions workflow for CI/CD +├── charts/ +│ └── iperf3-monitor/ \# The Helm chart for the service +│ ├── Chart.yaml +│ ├── values.yaml +│ └── templates/ +│ └──\... +└── exporter/ +├── Dockerfile \# Dockerfile for the exporter +├── requirements.txt \# Python dependencies +└── exporter.py \# Exporter source code +├──.gitignore +├── LICENSE +└── README.md + +This structure cleanly separates the exporter application code +(/exporter) from its deployment packaging (/charts/iperf3-monitor), and +its release automation (/.github/workflows). + +### **6.2 CI/CD Pipeline with GitHub Actions** {#cicd-pipeline-with-github-actions} + +A fully automated CI/CD pipeline is the hallmark of a mature software +project. It eliminates manual, error-prone release steps and provides +strong guarantees about the integrity of the published artifacts. By +triggering the pipeline on the creation of a Git tag (e.g., v1.2.3), we +use the tag as a single source of truth for versioning both the Docker +image and the Helm chart. This ensures that chart version 1.2.3 is built +to use image version 1.2.3, and that both have been validated before +release. This automated, atomic release process provides trust and +velocity, elevating the project from a collection of files into a +reliable, distributable piece of software. + +The following GitHub Actions workflow automates the entire release +process: + +> YAML + +\#.github/workflows/release.yml +name: Release iperf3-monitor + +on: +push: +tags: +- \'v\*.\*.\*\' + +env: +REGISTRY: ghcr.io +IMAGE_NAME: \${{ github.repository }} + +jobs: +lint-and-test: +name: Lint and Test +runs-on: ubuntu-latest +steps: +- name: Check out code +uses: actions/checkout@v3 + +- name: Set up Helm +uses: azure/setup-helm@v3 +with: +version: v3.10.0 + +- name: Helm Lint +run: helm lint./charts/iperf3-monitor + +build-and-publish-image: +name: Build and Publish Docker Image +runs-on: ubuntu-latest +needs: lint-and-test +permissions: +contents: read +packages: write +steps: +- name: Check out code +uses: actions/checkout@v3 + +- name: Log in to GitHub Container Registry +uses: docker/login-action@v2 +with: +registry: \${{ env.REGISTRY }} +username: \${{ github.actor }} +password: \${{ secrets.GITHUB_TOKEN }} + +- name: Extract metadata (tags, labels) for Docker +id: meta +uses: docker/metadata-action@v4 +with: +images: \${{ env.REGISTRY }}/\${{ env.IMAGE_NAME }} + +- name: Build and push Docker image +uses: docker/build-push-action@v4 +with: +context:./exporter +push: true +tags: \${{ steps.meta.outputs.tags }} +labels: \${{ steps.meta.outputs.labels }} + +package-and-publish-chart: +name: Package and Publish Helm Chart +runs-on: ubuntu-latest +needs: build-and-publish-image +permissions: +contents: write +steps: +- name: Check out code +uses: actions/checkout@v3 +with: +fetch-depth: 0 + +- name: Set up Helm +uses: azure/setup-helm@v3 +with: +version: v3.10.0 + +- name: Set Chart Version +run: \| +VERSION=\$(echo \"\${{ github.ref_name }}\" \| sed \'s/\^v//\') +helm-docs \--sort-values-order file +yq e -i \'.version = +strenv(VERSION)\'./charts/iperf3-monitor/Chart.yaml +yq e -i \'.appVersion = +strenv(VERSION)\'./charts/iperf3-monitor/Chart.yaml + +- name: Publish Helm chart +uses: stefanprodan/helm-gh-pages@v1.6.0 +with: +token: \${{ secrets.GITHUB_TOKEN }} +charts_dir:./charts +charts_url: https://\${{ github.repository_owner }}.github.io/\${{ +github.event.repository.name }} + +### **6.3 Documentation and Usability** {#documentation-and-usability} + +The final, and arguably most critical, component for project success is +high-quality documentation. The README.md file at the root of the +repository is the primary entry point for any user. It should clearly +explain what the project does, its architecture, and how to deploy and +use it. + +A common failure point in software projects is documentation that falls +out of sync with the code. For Helm charts, the values.yaml file +frequently changes, adding new parameters and options. To combat this, +it is a best practice to automate the documentation of these parameters. +The helm-docs tool can be integrated directly into the CI/CD pipeline to +automatically generate the \"Parameters\" section of the README.md by +parsing the comments directly from the values.yaml file.^20^ This +ensures that the documentation is always an accurate reflection of the +chart\'s configurable options, providing a seamless and trustworthy +experience for users. + +## **Conclusion** + +The proliferation of distributed microservices on Kubernetes has made +network performance a critical, yet often opaque, component of overall +application health. This report has detailed a comprehensive, +production-grade solution for establishing continuous network validation +within a Kubernetes cluster. By architecting a system around the robust, +decoupled pattern of an iperf3-server DaemonSet and a Kubernetes-aware +iperf3-exporter Deployment, this service provides a resilient and +automated foundation for network observability. + +The implementation leverages industry-standard tools---Python for the +exporter, Prometheus for metrics storage, and Grafana for +visualization---to create a powerful and flexible monitoring pipeline. +The entire service is packaged into a professional Helm chart, following +best practices for templating, configuration, and adaptability. This +allows for simple, version-controlled deployment across a wide range of +environments. The final Grafana dashboard transforms the collected data +into an intuitive, interactive narrative, enabling engineers to move +swiftly from high-level anomaly detection to root-cause analysis. + +Ultimately, by treating network performance not as a given but as a +continuously measured metric, organizations can proactively identify and +resolve infrastructure bottlenecks, enhance application reliability, and +ensure a consistent, high-quality experience for their users in the +dynamic world of Kubernetes.