1506 lines
58 KiB
Markdown
1506 lines
58 KiB
Markdown
|
|
# **Architecting a Kubernetes-Native Network Performance Monitoring Service with iperf3, Prometheus, and Helm**
|
||
|
|
|
||
|
|
## **Section 1: Architectural Blueprint for Continuous Network Validation**
|
||
|
|
|
||
|
|
### **1.1 Introduction to Proactive Network Monitoring in Kubernetes** {#introduction-to-proactive-network-monitoring-in-kubernetes}
|
||
|
|
|
||
|
|
In modern cloud-native infrastructures, Kubernetes has emerged as the de
|
||
|
|
facto standard for container orchestration, simplifying the deployment,
|
||
|
|
scaling, and management of complex applications.^1^ However, the very
|
||
|
|
dynamism and abstraction that make Kubernetes powerful also introduce
|
||
|
|
significant challenges in diagnosing network performance issues. The
|
||
|
|
ephemeral nature of pods, the complexity of overlay networks provided by
|
||
|
|
Container Network Interfaces (CNIs), and the multi-layered traffic
|
||
|
|
routing through Services and Ingress controllers can obscure the root
|
||
|
|
causes of latency, packet loss, and throughput degradation.
|
||
|
|
|
||
|
|
Traditional, reactive troubleshooting---investigating network problems
|
||
|
|
only after an application has failed---is insufficient in these
|
||
|
|
environments. Performance bottlenecks can be subtle, intermittent, and
|
||
|
|
difficult to reproduce, often manifesting as degraded user experience
|
||
|
|
long before they trigger hard failures.^1^ To maintain the reliability
|
||
|
|
and performance of critical workloads, engineering teams must shift from
|
||
|
|
a reactive to a proactive stance. This requires a system that performs
|
||
|
|
continuous, automated validation of the underlying network fabric,
|
||
|
|
treating network health not as an assumption but as a measurable,
|
||
|
|
time-series metric.
|
||
|
|
|
||
|
|
This document outlines the architecture and implementation of a
|
||
|
|
comprehensive, Kubernetes-native network performance monitoring service.
|
||
|
|
The solution leverages a suite of industry-standard, open-source tools
|
||
|
|
to provide continuous, actionable insights into cluster network health.
|
||
|
|
The core components are:
|
||
|
|
|
||
|
|
- **iperf3:** A widely adopted tool for active network performance
|
||
|
|
> measurement, used to generate traffic and measure maximum achievable
|
||
|
|
> bandwidth, jitter, and packet loss between two points.^2^
|
||
|
|
|
||
|
|
- **Prometheus:** A powerful, open-source monitoring and alerting system
|
||
|
|
> that has become the standard for collecting and storing time-series
|
||
|
|
> metrics in the Kubernetes ecosystem.^3^
|
||
|
|
|
||
|
|
- **Grafana:** A leading visualization tool for creating rich,
|
||
|
|
> interactive dashboards from various data sources, including
|
||
|
|
> Prometheus, enabling intuitive analysis of complex datasets.^4^
|
||
|
|
|
||
|
|
By combining these components into a cohesive, automated service, we can
|
||
|
|
transform abstract network performance into a concrete, queryable, and
|
||
|
|
visualizable stream of data, enabling teams to detect and address
|
||
|
|
infrastructure-level issues before they impact end-users.^6^
|
||
|
|
|
||
|
|
### **1.2 The Core Architectural Pattern: Decoupled Test Endpoints and a Central Orchestrator** {#the-core-architectural-pattern-decoupled-test-endpoints-and-a-central-orchestrator}
|
||
|
|
|
||
|
|
The foundation of this monitoring service is a robust, decoupled
|
||
|
|
architectural pattern designed for scalability and resilience within a
|
||
|
|
dynamic Kubernetes environment. The design separates the passive test
|
||
|
|
endpoints from the active test orchestrator, a critical distinction that
|
||
|
|
ensures the system is both efficient and aligned with Kubernetes
|
||
|
|
operational principles.
|
||
|
|
|
||
|
|
The data flow and component interaction can be visualized as follows:
|
||
|
|
|
||
|
|
1. A **DaemonSet** deploys an iperf3 server pod onto every node in the
|
||
|
|
> cluster, creating a mesh of passive test targets.
|
||
|
|
|
||
|
|
2. A central **Deployment**, the iperf3-exporter, uses the Kubernetes
|
||
|
|
> API to discover the IP addresses of all iperf3 server pods.
|
||
|
|
|
||
|
|
3. The iperf3-exporter periodically orchestrates tests, running an
|
||
|
|
> iperf3 client to connect to each server pod and measure network
|
||
|
|
> performance.
|
||
|
|
|
||
|
|
4. The exporter parses the JSON output from iperf3, transforms the
|
||
|
|
> results into Prometheus metrics, and exposes them on a /metrics
|
||
|
|
> HTTP endpoint.
|
||
|
|
|
||
|
|
5. A **Prometheus** server, configured via a **ServiceMonitor**,
|
||
|
|
> scrapes the /metrics endpoint of the exporter, ingesting the
|
||
|
|
> performance data into its time-series database.
|
||
|
|
|
||
|
|
6. A **Grafana** instance, using Prometheus as a data source,
|
||
|
|
> visualizes the metrics in a purpose-built dashboard, providing
|
||
|
|
> heatmaps and time-series graphs of node-to-node bandwidth, jitter,
|
||
|
|
> and packet loss.
|
||
|
|
|
||
|
|
This architecture is composed of three primary logical components:
|
||
|
|
|
||
|
|
- **Component 1: The iperf3-server DaemonSet.** To accurately measure
|
||
|
|
> network performance between any two nodes (N-to-N), an iperf3 server
|
||
|
|
> process must be running and accessible on every node. The DaemonSet
|
||
|
|
> is the canonical Kubernetes controller for this exact use case. It
|
||
|
|
> guarantees that a copy of a specific pod runs on all, or a selected
|
||
|
|
> subset of, nodes within the cluster.^7^ When a new node joins the
|
||
|
|
> cluster, the
|
||
|
|
> DaemonSet controller automatically deploys an iperf3-server pod to
|
||
|
|
> it; conversely, when a node is removed, the pod is garbage
|
||
|
|
> collected. This ensures the mesh of test endpoints is always in sync
|
||
|
|
> with the state of the cluster, requiring zero manual
|
||
|
|
> intervention.^9^ This pattern of using a
|
||
|
|
> DaemonSet to deploy iperf3 across a cluster is a well-established
|
||
|
|
> practice for network validation.^11^
|
||
|
|
|
||
|
|
- **Component 2: The iperf3-exporter Deployment.** A separate,
|
||
|
|
> centralized component is required to act as the test orchestrator.
|
||
|
|
> This component is responsible for initiating the iperf3 client
|
||
|
|
> connections, executing the tests, parsing the results, and exposing
|
||
|
|
> them as Prometheus metrics. Since this is a stateless service whose
|
||
|
|
> primary function is to perform a periodic task, a Deployment is the
|
||
|
|
> ideal controller.^8^ A
|
||
|
|
> Deployment ensures a specified number of replicas are running,
|
||
|
|
> provides mechanisms for rolling updates, and allows for independent
|
||
|
|
> resource management and lifecycle control, decoupled from the
|
||
|
|
> iperf3-server pods it tests against.^10^
|
||
|
|
|
||
|
|
- **Component 3: The Prometheus & Grafana Stack.** The monitoring
|
||
|
|
> backend is provided by the kube-prometheus-stack, a comprehensive
|
||
|
|
> Helm chart that deploys Prometheus, Grafana, Alertmanager, and the
|
||
|
|
> necessary exporters for cluster monitoring.^4^ Our custom monitoring
|
||
|
|
> service is designed to integrate seamlessly with this stack,
|
||
|
|
> leveraging its Prometheus Operator for automatic scrape
|
||
|
|
> configuration and its Grafana instance for visualization.
|
||
|
|
|
||
|
|
### **1.3 Architectural Justification and Design Rationale** {#architectural-justification-and-design-rationale}
|
||
|
|
|
||
|
|
The primary strength of this architecture lies in its deliberate
|
||
|
|
separation of concerns, a design choice that yields significant benefits
|
||
|
|
in resilience, scalability, and operational efficiency. The DaemonSet is
|
||
|
|
responsible for the *presence* of test endpoints, while the Deployment
|
||
|
|
handles the *orchestration* of the tests. This decoupling is not
|
||
|
|
arbitrary; it is a direct consequence of applying Kubernetes-native
|
||
|
|
principles to the problem.
|
||
|
|
|
||
|
|
The logical progression is as follows: The requirement to continuously
|
||
|
|
measure N-to-N node bandwidth necessitates that iperf3 server processes
|
||
|
|
are available on all N nodes to act as targets. The most reliable,
|
||
|
|
self-healing, and automated method to achieve this \"one-pod-per-node\"
|
||
|
|
pattern in Kubernetes is to use a DaemonSet.^7^ This makes the server
|
||
|
|
deployment automatically scale with the cluster itself. Next, a process
|
||
|
|
is needed to trigger the tests against these servers. This
|
||
|
|
\"orchestrator\" is a logically distinct, active service. It needs to be
|
||
|
|
reliable and potentially scalable, but it does not need to run on every
|
||
|
|
single node. The standard Kubernetes object for managing such stateless
|
||
|
|
services is a
|
||
|
|
|
||
|
|
Deployment.^8^
|
||
|
|
|
||
|
|
This separation allows for independent and appropriate resource
|
||
|
|
allocation. The iperf3-server pods are extremely lightweight, consuming
|
||
|
|
minimal resources while idle. The iperf3-exporter, however, may be more
|
||
|
|
CPU-intensive during the brief periods it is actively running tests. By
|
||
|
|
placing them in different workload objects (DaemonSet and Deployment),
|
||
|
|
we can configure their resource requests and limits independently. This
|
||
|
|
prevents the monitoring workload from interfering with or being starved
|
||
|
|
by application workloads, a crucial consideration for any
|
||
|
|
production-grade system. This design is fundamentally more robust and
|
||
|
|
scalable than simpler, monolithic approaches, such as a single script
|
||
|
|
that attempts to manage both server and client lifecycles.^12^
|
||
|
|
|
||
|
|
## **Section 2: Implementing the iperf3-prometheus-exporter**
|
||
|
|
|
||
|
|
The heart of this monitoring solution is the iperf3-prometheus-exporter,
|
||
|
|
a custom application responsible for orchestrating the network tests and
|
||
|
|
translating their results into a format that Prometheus can ingest. This
|
||
|
|
section provides a detailed breakdown of its implementation, from
|
||
|
|
technology selection to the final container image.
|
||
|
|
|
||
|
|
### **2.1 Technology Selection: Python for Agility and Ecosystem** {#technology-selection-python-for-agility-and-ecosystem}
|
||
|
|
|
||
|
|
Python was selected as the implementation language for the exporter due
|
||
|
|
to its powerful ecosystem and rapid development capabilities. The
|
||
|
|
availability of mature, well-maintained libraries for interacting with
|
||
|
|
both Prometheus and Kubernetes significantly accelerates the development
|
||
|
|
of a robust, cloud-native application.
|
||
|
|
|
||
|
|
The key libraries leveraged are:
|
||
|
|
|
||
|
|
- **prometheus-client:** The official Python client library for
|
||
|
|
> instrumenting applications with Prometheus metrics. It provides a
|
||
|
|
> simple API for defining metrics (Gauges, Counters, etc.) and
|
||
|
|
> exposing them via an HTTP server, handling much of the boilerplate
|
||
|
|
> required for creating a valid exporter.^13^
|
||
|
|
|
||
|
|
- **iperf3-python:** A clean, high-level Python wrapper around the
|
||
|
|
> iperf3 C library. It allows for programmatic control of iperf3
|
||
|
|
> clients and servers, and it can directly parse the JSON output of a
|
||
|
|
> test into a convenient Python object, eliminating the need for
|
||
|
|
> manual process management and output parsing.^15^
|
||
|
|
|
||
|
|
- **kubernetes:** The official Python client library for the Kubernetes
|
||
|
|
> API. This library is essential for the exporter to become
|
||
|
|
> \"Kubernetes-aware,\" enabling it to dynamically discover the
|
||
|
|
> iperf3-server pods it needs to test against by querying the API
|
||
|
|
> server directly.
|
||
|
|
|
||
|
|
### **2.2 Core Exporter Logic (Annotated Python Code)** {#core-exporter-logic-annotated-python-code}
|
||
|
|
|
||
|
|
The exporter\'s logic can be broken down into five distinct steps, which
|
||
|
|
together form a continuous loop of discovery, testing, and reporting.
|
||
|
|
|
||
|
|
#### **Step 1: Initialization and Metric Definition**
|
||
|
|
|
||
|
|
The application begins by importing the necessary libraries and defining
|
||
|
|
the Prometheus metrics that will be exposed. We use a Gauge metric, as
|
||
|
|
bandwidth is a value that can go up or down. Labels are crucial for
|
||
|
|
providing context; they allow us to slice and dice the data in
|
||
|
|
Prometheus and Grafana.
|
||
|
|
|
||
|
|
> Python
|
||
|
|
|
||
|
|
import os
|
||
|
|
import time
|
||
|
|
import logging
|
||
|
|
from kubernetes import client, config
|
||
|
|
from prometheus_client import start_http_server, Gauge
|
||
|
|
import iperf3
|
||
|
|
|
||
|
|
\# \-\-- Configuration \-\--
|
||
|
|
\# Configure logging
|
||
|
|
logging.basicConfig(level=logging.INFO, format=\'%(asctime)s -
|
||
|
|
%(levelname)s - %(message)s\')
|
||
|
|
|
||
|
|
\# \-\-- Prometheus Metrics Definition \-\--
|
||
|
|
IPERF_BANDWIDTH_MBPS = Gauge(
|
||
|
|
\'iperf_network_bandwidth_mbps\',
|
||
|
|
\'Network bandwidth measured by iperf3 in Megabits per second\',
|
||
|
|
\[\'source_node\', \'destination_node\', \'protocol\'\]
|
||
|
|
)
|
||
|
|
IPERF_JITTER_MS = Gauge(
|
||
|
|
\'iperf_network_jitter_ms\',
|
||
|
|
\'Network jitter measured by iperf3 in milliseconds\',
|
||
|
|
\[\'source_node\', \'destination_node\', \'protocol\'\]
|
||
|
|
)
|
||
|
|
IPERF_PACKETS_TOTAL = Gauge(
|
||
|
|
\'iperf_network_packets_total\',
|
||
|
|
\'Total packets transmitted or received during the iperf3 test\',
|
||
|
|
\[\'source_node\', \'destination_node\', \'protocol\'\]
|
||
|
|
)
|
||
|
|
IPERF_LOST_PACKETS = Gauge(
|
||
|
|
\'iperf_network_lost_packets_total\',
|
||
|
|
\'Total lost packets during the iperf3 UDP test\',
|
||
|
|
\[\'source_node\', \'destination_node\', \'protocol\'\]
|
||
|
|
)
|
||
|
|
IPERF_TEST_SUCCESS = Gauge(
|
||
|
|
\'iperf_test_success\',
|
||
|
|
\'Indicates if the iperf3 test was successful (1) or failed (0)\',
|
||
|
|
\[\'source_node\', \'destination_node\', \'protocol\'\]
|
||
|
|
)
|
||
|
|
|
||
|
|
#### **Step 2: Kubernetes-Aware Target Discovery**
|
||
|
|
|
||
|
|
A static list of test targets is an anti-pattern in a dynamic
|
||
|
|
environment like Kubernetes.^16^ The exporter must dynamically discover
|
||
|
|
its targets. This is achieved by using the Kubernetes Python client to
|
||
|
|
query the API server for all pods that match the label selector of our
|
||
|
|
|
||
|
|
iperf3-server DaemonSet (e.g., app=iperf3-server). The function returns
|
||
|
|
a list of dictionaries, each containing the pod\'s IP address and the
|
||
|
|
name of the node it is running on.
|
||
|
|
|
||
|
|
This dynamic discovery is what transforms the exporter from a simple
|
||
|
|
script into a resilient, automated service. It adapts to cluster scaling
|
||
|
|
events without any manual intervention. The logical path is clear:
|
||
|
|
Kubernetes clusters are dynamic, so a hardcoded list of IPs would become
|
||
|
|
stale instantly. The API server is the single source of truth for the
|
||
|
|
cluster\'s state. Therefore, the exporter must query this API, which in
|
||
|
|
turn necessitates including the Kubernetes client library and
|
||
|
|
configuring the appropriate Role-Based Access Control (RBAC) permissions
|
||
|
|
for its ServiceAccount.
|
||
|
|
|
||
|
|
> Python
|
||
|
|
|
||
|
|
def discover_iperf_servers():
|
||
|
|
\"\"\"
|
||
|
|
Discover iperf3 server pods in the cluster using the Kubernetes API.
|
||
|
|
\"\"\"
|
||
|
|
try:
|
||
|
|
\# Load in-cluster configuration
|
||
|
|
config.load_incluster_config()
|
||
|
|
v1 = client.CoreV1Api()
|
||
|
|
|
||
|
|
namespace = os.getenv(\'IPERF_SERVER_NAMESPACE\', \'default\')
|
||
|
|
label_selector = os.getenv(\'IPERF_SERVER_LABEL_SELECTOR\',
|
||
|
|
\'app=iperf3-server\')
|
||
|
|
|
||
|
|
logging.info(f\"Discovering iperf3 servers with label
|
||
|
|
\'{label_selector}\' in namespace \'{namespace}\'\")
|
||
|
|
|
||
|
|
ret = v1.list_pod_for_all_namespaces(label_selector=label_selector,
|
||
|
|
watch=False)
|
||
|
|
|
||
|
|
servers =
|
||
|
|
for i in ret.items:
|
||
|
|
\# Ensure pod has an IP and is running
|
||
|
|
if i.status.pod_ip and i.status.phase == \'Running\':
|
||
|
|
servers.append({
|
||
|
|
\'ip\': i.status.pod_ip,
|
||
|
|
\'node_name\': i.spec.node_name
|
||
|
|
})
|
||
|
|
logging.info(f\"Discovered {len(servers)} iperf3 server pods.\")
|
||
|
|
return servers
|
||
|
|
except Exception as e:
|
||
|
|
logging.error(f\"Error discovering iperf servers: {e}\")
|
||
|
|
return
|
||
|
|
|
||
|
|
#### **Step 3: The Test Orchestration Loop**
|
||
|
|
|
||
|
|
The main function of the application contains an infinite while True
|
||
|
|
loop that orchestrates the entire process. It periodically discovers the
|
||
|
|
servers, creates a list of test pairs (node-to-node), and then executes
|
||
|
|
an iperf3 test for each pair.
|
||
|
|
|
||
|
|
> Python
|
||
|
|
|
||
|
|
def run_iperf_test(server_ip, server_port, protocol, source_node,
|
||
|
|
dest_node):
|
||
|
|
\"\"\"
|
||
|
|
Runs a single iperf3 test and updates Prometheus metrics.
|
||
|
|
\"\"\"
|
||
|
|
logging.info(f\"Running iperf3 test from {source_node} to {dest_node}
|
||
|
|
({server_ip}:{server_port}) using {protocol.upper()}\")
|
||
|
|
|
||
|
|
client = iperf3.Client()
|
||
|
|
client.server_hostname = server_ip
|
||
|
|
client.port = server_port
|
||
|
|
client.protocol = protocol
|
||
|
|
client.duration = int(os.getenv(\'IPERF_TEST_DURATION\', 5))
|
||
|
|
client.json_output = True \# Critical for parsing
|
||
|
|
|
||
|
|
result = client.run()
|
||
|
|
|
||
|
|
\# Parse results and update metrics
|
||
|
|
parse_and_publish_metrics(result, source_node, dest_node, protocol)
|
||
|
|
|
||
|
|
def main_loop():
|
||
|
|
\"\"\"
|
||
|
|
Main orchestration loop.
|
||
|
|
\"\"\"
|
||
|
|
test_interval = int(os.getenv(\'IPERF_TEST_INTERVAL\', 300))
|
||
|
|
server_port = int(os.getenv(\'IPERF_SERVER_PORT\', 5201))
|
||
|
|
protocol = os.getenv(\'IPERF_TEST_PROTOCOL\', \'tcp\').lower()
|
||
|
|
source_node_name = os.getenv(\'SOURCE_NODE_NAME\') \# Injected via
|
||
|
|
Downward API
|
||
|
|
|
||
|
|
if not source_node_name:
|
||
|
|
logging.error(\"SOURCE_NODE_NAME environment variable not set.
|
||
|
|
Exiting.\")
|
||
|
|
return
|
||
|
|
|
||
|
|
while True:
|
||
|
|
servers = discover_iperf_servers()
|
||
|
|
|
||
|
|
for server in servers:
|
||
|
|
\# Avoid testing a node against itself
|
||
|
|
if server\[\'node_name\'\] == source_node_name:
|
||
|
|
continue
|
||
|
|
|
||
|
|
run_iperf_test(server\[\'ip\'\], server_port, protocol,
|
||
|
|
source_node_name, server\[\'node_name\'\])
|
||
|
|
|
||
|
|
logging.info(f\"Completed test cycle. Sleeping for {test_interval}
|
||
|
|
seconds.\")
|
||
|
|
time.sleep(test_interval)
|
||
|
|
|
||
|
|
#### **Step 4: Parsing and Publishing Metrics**
|
||
|
|
|
||
|
|
After each test run, a dedicated function parses the JSON result object
|
||
|
|
provided by the iperf3-python library.^15^ It extracts the key
|
||
|
|
performance indicators and uses them to set the value of the
|
||
|
|
corresponding Prometheus
|
||
|
|
|
||
|
|
Gauge, applying the correct labels for source and destination nodes.
|
||
|
|
Robust error handling ensures that failed tests are also recorded as a
|
||
|
|
metric, which is vital for alerting.
|
||
|
|
|
||
|
|
> Python
|
||
|
|
|
||
|
|
def parse_and_publish_metrics(result, source_node, dest_node,
|
||
|
|
protocol):
|
||
|
|
\"\"\"
|
||
|
|
Parses the iperf3 result and updates Prometheus gauges.
|
||
|
|
\"\"\"
|
||
|
|
labels = {\'source_node\': source_node, \'destination_node\': dest_node,
|
||
|
|
\'protocol\': protocol}
|
||
|
|
|
||
|
|
if result.error:
|
||
|
|
logging.error(f\"Test from {source_node} to {dest_node} failed:
|
||
|
|
{result.error}\")
|
||
|
|
IPERF_TEST_SUCCESS.labels(\*\*labels).set(0)
|
||
|
|
\# Clear previous successful metrics for this path
|
||
|
|
IPERF_BANDWIDTH_MBPS.labels(\*\*labels).set(0)
|
||
|
|
IPERF_JITTER_MS.labels(\*\*labels).set(0)
|
||
|
|
return
|
||
|
|
|
||
|
|
IPERF_TEST_SUCCESS.labels(\*\*labels).set(1)
|
||
|
|
|
||
|
|
\# The summary data is in result.sent_Mbps or result.received_Mbps
|
||
|
|
depending on direction
|
||
|
|
\# For simplicity, we check for available attributes.
|
||
|
|
if hasattr(result, \'sent_Mbps\'):
|
||
|
|
bandwidth_mbps = result.sent_Mbps
|
||
|
|
elif hasattr(result, \'received_Mbps\'):
|
||
|
|
bandwidth_mbps = result.received_Mbps
|
||
|
|
else:
|
||
|
|
\# Fallback for different iperf3 versions/outputs
|
||
|
|
bandwidth_mbps = result.Mbps if hasattr(result, \'Mbps\') else 0
|
||
|
|
|
||
|
|
IPERF_BANDWIDTH_MBPS.labels(\*\*labels).set(bandwidth_mbps)
|
||
|
|
|
||
|
|
if protocol == \'udp\':
|
||
|
|
IPERF_JITTER_MS.labels(\*\*labels).set(result.jitter_ms if
|
||
|
|
hasattr(result, \'jitter_ms\') else 0)
|
||
|
|
IPERF_PACKETS_TOTAL.labels(\*\*labels).set(result.packets if
|
||
|
|
hasattr(result, \'packets\') else 0)
|
||
|
|
IPERF_LOST_PACKETS.labels(\*\*labels).set(result.lost_packets if
|
||
|
|
hasattr(result, \'lost_packets\') else 0)
|
||
|
|
|
||
|
|
#### **Step 5: Exposing the /metrics Endpoint**
|
||
|
|
|
||
|
|
Finally, the main execution block starts a simple HTTP server using the
|
||
|
|
prometheus-client library. This server exposes the collected metrics on
|
||
|
|
the standard /metrics path, ready to be scraped by Prometheus.^13^
|
||
|
|
|
||
|
|
> Python
|
||
|
|
|
||
|
|
if \_\_name\_\_ == \'\_\_main\_\_\':
|
||
|
|
\# Start the Prometheus metrics server
|
||
|
|
listen_port = int(os.getenv(\'LISTEN_PORT\', 9876))
|
||
|
|
start_http_server(listen_port)
|
||
|
|
logging.info(f\"Prometheus exporter listening on port {listen_port}\")
|
||
|
|
|
||
|
|
\# Start the main orchestration loop
|
||
|
|
main_loop()
|
||
|
|
|
||
|
|
### **2.3 Containerizing the Exporter (Dockerfile)** {#containerizing-the-exporter-dockerfile}
|
||
|
|
|
||
|
|
To deploy the exporter in Kubernetes, it must be packaged into a
|
||
|
|
container image. A multi-stage Dockerfile is used to create a minimal
|
||
|
|
and more secure final image by separating the build environment from the
|
||
|
|
runtime environment. This is a standard best practice for producing
|
||
|
|
production-ready containers.^14^
|
||
|
|
|
||
|
|
> Dockerfile
|
||
|
|
|
||
|
|
\# Stage 1: Build stage with dependencies
|
||
|
|
FROM python:3.9-slim as builder
|
||
|
|
|
||
|
|
WORKDIR /app
|
||
|
|
|
||
|
|
\# Install iperf3 and build dependencies
|
||
|
|
RUN apt-get update && \\
|
||
|
|
apt-get install -y \--no-install-recommends gcc iperf3 libiperf-dev &&
|
||
|
|
\\
|
||
|
|
rm -rf /var/lib/apt/lists/\*
|
||
|
|
|
||
|
|
\# Install Python dependencies
|
||
|
|
COPY requirements.txt.
|
||
|
|
RUN pip install \--no-cache-dir -r requirements.txt
|
||
|
|
|
||
|
|
\# Stage 2: Final runtime stage
|
||
|
|
FROM python:3.9-slim
|
||
|
|
|
||
|
|
WORKDIR /app
|
||
|
|
|
||
|
|
\# Copy iperf3 binary and library from the builder stage
|
||
|
|
COPY \--from=builder /usr/bin/iperf3 /usr/bin/iperf3
|
||
|
|
COPY \--from=builder /usr/lib/x86_64-linux-gnu/libiperf.so.0
|
||
|
|
/usr/lib/x86_64-linux-gnu/libiperf.so.0
|
||
|
|
|
||
|
|
\# Copy installed Python packages from the builder stage
|
||
|
|
COPY \--from=builder /usr/local/lib/python3.9/site-packages
|
||
|
|
/usr/local/lib/python3.9/site-packages
|
||
|
|
|
||
|
|
\# Copy the exporter application code
|
||
|
|
COPY exporter.py.
|
||
|
|
|
||
|
|
\# Expose the metrics port
|
||
|
|
EXPOSE 9876
|
||
|
|
|
||
|
|
\# Set the entrypoint
|
||
|
|
CMD \[\"python\", \"exporter.py\"\]
|
||
|
|
|
||
|
|
The corresponding requirements.txt would contain:
|
||
|
|
|
||
|
|
prometheus-client
|
||
|
|
iperf3
|
||
|
|
kubernetes
|
||
|
|
|
||
|
|
## **Section 3: Kubernetes Manifests and Deployment Strategy**
|
||
|
|
|
||
|
|
With the architectural blueprint defined and the exporter application
|
||
|
|
containerized, the next step is to translate this design into
|
||
|
|
declarative Kubernetes manifests. These YAML files define the necessary
|
||
|
|
Kubernetes objects to deploy, configure, and manage the monitoring
|
||
|
|
service. Using static manifests here provides a clear foundation before
|
||
|
|
they are parameterized into a Helm chart in the next section.
|
||
|
|
|
||
|
|
### **3.1 The iperf3-server DaemonSet** {#the-iperf3-server-daemonset}
|
||
|
|
|
||
|
|
The iperf3-server component is deployed as a DaemonSet to ensure an
|
||
|
|
instance of the server pod runs on every eligible node in the
|
||
|
|
cluster.^7^ This creates the ubiquitous grid of test endpoints required
|
||
|
|
for comprehensive N-to-N testing.
|
||
|
|
|
||
|
|
Key fields in this manifest include:
|
||
|
|
|
||
|
|
- **spec.selector**: Connects the DaemonSet to the pods it manages via
|
||
|
|
> labels.
|
||
|
|
|
||
|
|
- **spec.template.metadata.labels**: The label app: iperf3-server is
|
||
|
|
> applied to the pods, which is crucial for discovery by both the
|
||
|
|
> iperf3-exporter and Kubernetes Services.
|
||
|
|
|
||
|
|
- **spec.template.spec.containers**: Defines the iperf3 container, using
|
||
|
|
> a public image and running the iperf3 -s command to start it in
|
||
|
|
> server mode.
|
||
|
|
|
||
|
|
- **spec.template.spec.tolerations**: This is often necessary to allow
|
||
|
|
> the DaemonSet to schedule pods on control-plane (master) nodes,
|
||
|
|
> which may have taints preventing normal workloads from running
|
||
|
|
> there. This ensures the entire cluster, including masters, is part
|
||
|
|
> of the test mesh.
|
||
|
|
|
||
|
|
- **spec.template.spec.hostNetwork: true**: This is a critical setting.
|
||
|
|
> By running the server pods on the host\'s network namespace, we
|
||
|
|
> bypass the Kubernetes network overlay (CNI) for the server side.
|
||
|
|
> This allows the test to measure the raw performance of the
|
||
|
|
> underlying node network interface, which is often the primary goal
|
||
|
|
> of infrastructure-level testing.
|
||
|
|
|
||
|
|
> YAML
|
||
|
|
|
||
|
|
apiVersion: apps/v1
|
||
|
|
kind: DaemonSet
|
||
|
|
metadata:
|
||
|
|
name: iperf3-server
|
||
|
|
labels:
|
||
|
|
app: iperf3-server
|
||
|
|
spec:
|
||
|
|
selector:
|
||
|
|
matchLabels:
|
||
|
|
app: iperf3-server
|
||
|
|
template:
|
||
|
|
metadata:
|
||
|
|
labels:
|
||
|
|
app: iperf3-server
|
||
|
|
spec:
|
||
|
|
\# Run on the host network to measure raw node-to-node performance
|
||
|
|
hostNetwork: true
|
||
|
|
\# Tolerations to allow scheduling on control-plane nodes
|
||
|
|
tolerations:
|
||
|
|
- key: \"node-role.kubernetes.io/control-plane\"
|
||
|
|
operator: \"Exists\"
|
||
|
|
effect: \"NoSchedule\"
|
||
|
|
- key: \"node-role.kubernetes.io/master\"
|
||
|
|
operator: \"Exists\"
|
||
|
|
effect: \"NoSchedule\"
|
||
|
|
containers:
|
||
|
|
- name: iperf3-server
|
||
|
|
image: networkstatic/iperf3:latest
|
||
|
|
args: \[\"-s\"\] \# Start in server mode
|
||
|
|
ports:
|
||
|
|
- containerPort: 5201
|
||
|
|
name: iperf3
|
||
|
|
protocol: TCP
|
||
|
|
- containerPort: 5201
|
||
|
|
name: iperf3-udp
|
||
|
|
protocol: UDP
|
||
|
|
resources:
|
||
|
|
requests:
|
||
|
|
cpu: \"50m\"
|
||
|
|
memory: \"64Mi\"
|
||
|
|
limits:
|
||
|
|
cpu: \"100m\"
|
||
|
|
memory: \"128Mi\"
|
||
|
|
|
||
|
|
### **3.2 The iperf3-exporter Deployment** {#the-iperf3-exporter-deployment}
|
||
|
|
|
||
|
|
The iperf3-exporter is deployed as a Deployment, as it is a stateless
|
||
|
|
application that orchestrates the tests.^14^ Only one replica is
|
||
|
|
typically needed, as it can sequentially test all nodes.
|
||
|
|
|
||
|
|
Key fields in this manifest are:
|
||
|
|
|
||
|
|
- **spec.replicas: 1**: A single instance is sufficient for most
|
||
|
|
> clusters.
|
||
|
|
|
||
|
|
- **spec.template.spec.serviceAccountName**: This assigns the custom
|
||
|
|
> ServiceAccount (defined next) to the pod, granting it the necessary
|
||
|
|
> permissions to talk to the Kubernetes API.
|
||
|
|
|
||
|
|
- **spec.template.spec.containers.env**: The SOURCE_NODE_NAME
|
||
|
|
> environment variable is populated using the Downward API. This is
|
||
|
|
> how the exporter pod knows which node *it* is running on, allowing
|
||
|
|
> it to skip testing against itself.
|
||
|
|
|
||
|
|
- **spec.template.spec.containers.image**: This points to the custom
|
||
|
|
> exporter image built in the previous section.
|
||
|
|
|
||
|
|
> YAML
|
||
|
|
|
||
|
|
apiVersion: apps/v1
|
||
|
|
kind: Deployment
|
||
|
|
metadata:
|
||
|
|
name: iperf3-exporter
|
||
|
|
labels:
|
||
|
|
app: iperf3-exporter
|
||
|
|
spec:
|
||
|
|
replicas: 1
|
||
|
|
selector:
|
||
|
|
matchLabels:
|
||
|
|
app: iperf3-exporter
|
||
|
|
template:
|
||
|
|
metadata:
|
||
|
|
labels:
|
||
|
|
app: iperf3-exporter
|
||
|
|
spec:
|
||
|
|
serviceAccountName: iperf3-exporter-sa
|
||
|
|
containers:
|
||
|
|
- name: iperf3-exporter
|
||
|
|
image: your-repo/iperf3-prometheus-exporter:latest \# Replace with your
|
||
|
|
image
|
||
|
|
ports:
|
||
|
|
- containerPort: 9876
|
||
|
|
name: metrics
|
||
|
|
env:
|
||
|
|
\# Use the Downward API to inject the node name this pod is running on
|
||
|
|
- name: SOURCE_NODE_NAME
|
||
|
|
valueFrom:
|
||
|
|
fieldRef:
|
||
|
|
fieldPath: spec.nodeName
|
||
|
|
\# Other configurations for the exporter script
|
||
|
|
- name: IPERF_TEST_INTERVAL
|
||
|
|
value: \"300\"
|
||
|
|
- name: IPERF_SERVER_LABEL_SELECTOR
|
||
|
|
value: \"app=iperf3-server\"
|
||
|
|
resources:
|
||
|
|
requests:
|
||
|
|
cpu: \"100m\"
|
||
|
|
memory: \"128Mi\"
|
||
|
|
limits:
|
||
|
|
cpu: \"500m\"
|
||
|
|
memory: \"256Mi\"
|
||
|
|
|
||
|
|
### **3.3 RBAC: Granting Necessary Permissions** {#rbac-granting-necessary-permissions}
|
||
|
|
|
||
|
|
For the exporter to perform its dynamic discovery of iperf3-server pods,
|
||
|
|
it must be granted specific, limited permissions to read information
|
||
|
|
from the Kubernetes API. This is accomplished through a ServiceAccount,
|
||
|
|
a ClusterRole, and a ClusterRoleBinding.
|
||
|
|
|
||
|
|
- **ServiceAccount**: Provides an identity for the exporter pod within
|
||
|
|
> the cluster.
|
||
|
|
|
||
|
|
- **ClusterRole**: Defines a set of permissions. Here, we grant get,
|
||
|
|
> list, and watch access to pods. These are the minimum required
|
||
|
|
> permissions for the discovery function to work. The role is a
|
||
|
|
> ClusterRole because the exporter needs to find pods across all
|
||
|
|
> namespaces where servers might be running.
|
||
|
|
|
||
|
|
- **ClusterRoleBinding**: Links the ServiceAccount to the ClusterRole,
|
||
|
|
> effectively granting the permissions to any pod that uses the
|
||
|
|
> ServiceAccount.
|
||
|
|
|
||
|
|
> YAML
|
||
|
|
|
||
|
|
apiVersion: v1
|
||
|
|
kind: ServiceAccount
|
||
|
|
metadata:
|
||
|
|
name: iperf3-exporter-sa
|
||
|
|
\-\--
|
||
|
|
apiVersion: rbac.authorization.k8s.io/v1
|
||
|
|
kind: ClusterRole
|
||
|
|
metadata:
|
||
|
|
name: iperf3-exporter-role
|
||
|
|
rules:
|
||
|
|
- apiGroups: \[\"\"\]
|
||
|
|
resources: \[\"pods\"\]
|
||
|
|
verbs: \[\"get\", \"list\", \"watch\"\]
|
||
|
|
\-\--
|
||
|
|
apiVersion: rbac.authorization.k8s.io/v1
|
||
|
|
kind: ClusterRoleBinding
|
||
|
|
metadata:
|
||
|
|
name: iperf3-exporter-rb
|
||
|
|
subjects:
|
||
|
|
- kind: ServiceAccount
|
||
|
|
name: iperf3-exporter-sa
|
||
|
|
namespace: default \# The namespace where the exporter is deployed
|
||
|
|
roleRef:
|
||
|
|
kind: ClusterRole
|
||
|
|
name: iperf3-exporter-role
|
||
|
|
apiGroup: rbac.authorization.k8s.io
|
||
|
|
|
||
|
|
### **3.4 Network Exposure: Service and ServiceMonitor** {#network-exposure-service-and-servicemonitor}
|
||
|
|
|
||
|
|
To make the exporter\'s metrics available to Prometheus, we need two
|
||
|
|
final objects. The Service exposes the exporter pod\'s metrics port
|
||
|
|
within the cluster, and the ServiceMonitor tells the Prometheus Operator
|
||
|
|
how to find and scrape that service.
|
||
|
|
|
||
|
|
This ServiceMonitor-based approach is the linchpin for a GitOps-friendly
|
||
|
|
integration. Instead of manually editing the central Prometheus
|
||
|
|
configuration file---a brittle and non-declarative process---we deploy a
|
||
|
|
ServiceMonitor custom resource alongside our application.^14^ The
|
||
|
|
Prometheus Operator, a key component of the
|
||
|
|
|
||
|
|
kube-prometheus-stack, continuously watches for these objects. When it
|
||
|
|
discovers our iperf3-exporter-sm, it automatically generates the
|
||
|
|
necessary scrape configuration and reloads Prometheus without any manual
|
||
|
|
intervention.^4^ This empowers the application team to define
|
||
|
|
|
||
|
|
*how their application should be monitored* as part of the
|
||
|
|
application\'s own deployment package, a cornerstone of scalable, \"you
|
||
|
|
build it, you run it\" observability.
|
||
|
|
|
||
|
|
> YAML
|
||
|
|
|
||
|
|
apiVersion: v1
|
||
|
|
kind: Service
|
||
|
|
metadata:
|
||
|
|
name: iperf3-exporter-svc
|
||
|
|
labels:
|
||
|
|
app: iperf3-exporter
|
||
|
|
spec:
|
||
|
|
selector:
|
||
|
|
app: iperf3-exporter
|
||
|
|
ports:
|
||
|
|
- name: metrics
|
||
|
|
port: 9876
|
||
|
|
targetPort: metrics
|
||
|
|
protocol: TCP
|
||
|
|
\-\--
|
||
|
|
apiVersion: monitoring.coreos.com/v1
|
||
|
|
kind: ServiceMonitor
|
||
|
|
metadata:
|
||
|
|
name: iperf3-exporter-sm
|
||
|
|
labels:
|
||
|
|
\# Label for Prometheus Operator to discover this ServiceMonitor
|
||
|
|
release: prometheus-operator
|
||
|
|
spec:
|
||
|
|
selector:
|
||
|
|
matchLabels:
|
||
|
|
\# This must match the labels on the Service object above
|
||
|
|
app: iperf3-exporter
|
||
|
|
endpoints:
|
||
|
|
- port: metrics
|
||
|
|
interval: 60s
|
||
|
|
scrapeTimeout: 30s
|
||
|
|
|
||
|
|
## **Section 4: Packaging with Helm for Reusability and Distribution**
|
||
|
|
|
||
|
|
While static YAML manifests are excellent for defining Kubernetes
|
||
|
|
resources, they lack the flexibility needed for easy configuration,
|
||
|
|
distribution, and lifecycle management. Helm, the package manager for
|
||
|
|
Kubernetes, solves this by bundling applications into
|
||
|
|
version-controlled, reusable packages called charts.^17^ This section
|
||
|
|
details how to package the entire
|
||
|
|
|
||
|
|
iperf3 monitoring service into a professional, flexible, and
|
||
|
|
distributable Helm chart.
|
||
|
|
|
||
|
|
### **4.1 Helm Chart Structure** {#helm-chart-structure}
|
||
|
|
|
||
|
|
A well-organized Helm chart follows a standard directory structure. This
|
||
|
|
convention makes charts easier to understand and maintain.^19^
|
||
|
|
|
||
|
|
iperf3-monitor/
|
||
|
|
├── Chart.yaml \# Metadata about the chart (name, version, etc.)
|
||
|
|
├── values.yaml \# Default configuration values for the chart
|
||
|
|
├── charts/ \# Directory for sub-chart dependencies (empty for this
|
||
|
|
project)
|
||
|
|
├── templates/ \# Directory containing the templated Kubernetes
|
||
|
|
manifests
|
||
|
|
│ ├── \_helpers.tpl \# A place for reusable template helpers
|
||
|
|
│ ├── server-daemonset.yaml
|
||
|
|
│ ├── exporter-deployment.yaml
|
||
|
|
│ ├── rbac.yaml
|
||
|
|
│ ├── service.yaml
|
||
|
|
│ └── servicemonitor.yaml
|
||
|
|
└── README.md \# Documentation for the chart
|
||
|
|
|
||
|
|
### **4.2 Templating the Kubernetes Manifests** {#templating-the-kubernetes-manifests}
|
||
|
|
|
||
|
|
The core of Helm\'s power lies in its templating engine, which uses Go
|
||
|
|
templates. We convert the static manifests from Section 3 into dynamic
|
||
|
|
templates by replacing hardcoded values with references to variables
|
||
|
|
defined in the values.yaml file.
|
||
|
|
|
||
|
|
A crucial best practice is to use a \_helpers.tpl file to define common
|
||
|
|
functions and partial templates, especially for generating resource
|
||
|
|
names and labels. This reduces boilerplate, ensures consistency, and
|
||
|
|
makes the chart easier to manage.^19^
|
||
|
|
|
||
|
|
**Example: templates/\_helpers.tpl**
|
||
|
|
|
||
|
|
> Code snippet
|
||
|
|
|
||
|
|
{{/\*
|
||
|
|
Expand the name of the chart.
|
||
|
|
\*/}}
|
||
|
|
{{- define \"iperf3-monitor.name\" -}}
|
||
|
|
{{- default.Chart.Name.Values.nameOverride \| trunc 63 \| trimSuffix
|
||
|
|
\"-\" }}
|
||
|
|
{{- end -}}
|
||
|
|
|
||
|
|
{{/\*
|
||
|
|
Create a default fully qualified app name.
|
||
|
|
We truncate at 63 chars because some Kubernetes name fields are limited
|
||
|
|
to this (by the DNS naming spec).
|
||
|
|
\*/}}
|
||
|
|
{{- define \"iperf3-monitor.fullname\" -}}
|
||
|
|
{{- if.Values.fullnameOverride }}
|
||
|
|
{{-.Values.fullnameOverride \| trunc 63 \| trimSuffix \"-\" }}
|
||
|
|
{{- else }}
|
||
|
|
{{- \$name := default.Chart.Name.Values.nameOverride }}
|
||
|
|
{{- if contains \$name.Release.Name }}
|
||
|
|
{{-.Release.Name \| trunc 63 \| trimSuffix \"-\" }}
|
||
|
|
{{- else }}
|
||
|
|
{{- printf \"%s-%s\".Release.Name \$name \| trunc 63 \| trimSuffix \"-\"
|
||
|
|
}}
|
||
|
|
{{- end }}
|
||
|
|
{{- end }}
|
||
|
|
{{- end -}}
|
||
|
|
|
||
|
|
{{/\*
|
||
|
|
Common labels
|
||
|
|
\*/}}
|
||
|
|
{{- define \"iperf3-monitor.labels\" -}}
|
||
|
|
helm.sh/chart: {{ include \"iperf3-monitor.name\". }}
|
||
|
|
{{ include \"iperf3-monitor.selectorLabels\". }}
|
||
|
|
{{- if.Chart.AppVersion }}
|
||
|
|
app.kubernetes.io/version: {{.Chart.AppVersion \| quote }}
|
||
|
|
{{- end }}
|
||
|
|
app.kubernetes.io/managed-by: {{.Release.Service }}
|
||
|
|
{{- end -}}
|
||
|
|
|
||
|
|
{{/\*
|
||
|
|
Selector labels
|
||
|
|
\*/}}
|
||
|
|
{{- define \"iperf3-monitor.selectorLabels\" -}}
|
||
|
|
app.kubernetes.io/name: {{ include \"iperf3-monitor.name\". }}
|
||
|
|
app.kubernetes.io/instance: {{.Release.Name }}
|
||
|
|
{{- end -}}
|
||
|
|
|
||
|
|
**Example: Templated exporter-deployment.yaml**
|
||
|
|
|
||
|
|
> YAML
|
||
|
|
|
||
|
|
apiVersion: apps/v1
|
||
|
|
kind: Deployment
|
||
|
|
metadata:
|
||
|
|
name: {{ include \"iperf3-monitor.fullname\". }}-exporter
|
||
|
|
labels:
|
||
|
|
{{- include \"iperf3-monitor.labels\". \| nindent 4 }}
|
||
|
|
app.kubernetes.io/component: exporter
|
||
|
|
spec:
|
||
|
|
replicas: {{.Values.exporter.replicaCount }}
|
||
|
|
selector:
|
||
|
|
matchLabels:
|
||
|
|
{{- include \"iperf3-monitor.selectorLabels\". \| nindent 6 }}
|
||
|
|
app.kubernetes.io/component: exporter
|
||
|
|
template:
|
||
|
|
metadata:
|
||
|
|
labels:
|
||
|
|
{{- include \"iperf3-monitor.selectorLabels\". \| nindent 8 }}
|
||
|
|
app.kubernetes.io/component: exporter
|
||
|
|
spec:
|
||
|
|
{{- if.Values.rbac.create }}
|
||
|
|
serviceAccountName: {{ include \"iperf3-monitor.fullname\". }}-sa
|
||
|
|
{{- else }}
|
||
|
|
serviceAccountName: {{.Values.serviceAccount.name }}
|
||
|
|
{{- end }}
|
||
|
|
containers:
|
||
|
|
- name: iperf3-exporter
|
||
|
|
image: \"{{.Values.exporter.image.repository
|
||
|
|
}}:{{.Values.exporter.image.tag \| default.Chart.AppVersion }}\"
|
||
|
|
imagePullPolicy: {{.Values.exporter.image.pullPolicy }}
|
||
|
|
ports:
|
||
|
|
- containerPort: 9876
|
||
|
|
name: metrics
|
||
|
|
env:
|
||
|
|
- name: SOURCE_NODE_NAME
|
||
|
|
valueFrom:
|
||
|
|
fieldRef:
|
||
|
|
fieldPath: spec.nodeName
|
||
|
|
- name: IPERF_TEST_INTERVAL
|
||
|
|
value: \"{{.Values.exporter.testInterval }}\"
|
||
|
|
resources:
|
||
|
|
{{- toYaml.Values.exporter.resources \| nindent 10 }}
|
||
|
|
|
||
|
|
### **4.3 Designing a Comprehensive values.yaml** {#designing-a-comprehensive-values.yaml}
|
||
|
|
|
||
|
|
The values.yaml file is the public API of a Helm chart. A well-designed
|
||
|
|
values file is intuitive, clearly documented, and provides users with
|
||
|
|
the flexibility to adapt the chart to their specific needs. Best
|
||
|
|
practices include using clear, camelCase naming conventions and
|
||
|
|
providing comments for every parameter.^21^
|
||
|
|
|
||
|
|
A particularly powerful feature of Helm is conditional logic. By
|
||
|
|
wrapping entire resource definitions in if blocks based on boolean flags
|
||
|
|
in values.yaml (e.g., {{- if.Values.rbac.create }}), the chart becomes
|
||
|
|
highly adaptable. A user in a high-security environment can disable the
|
||
|
|
automatic creation of ClusterRoles by setting rbac.create: false,
|
||
|
|
allowing them to manage permissions manually without causing the Helm
|
||
|
|
installation to fail.^20^ Similarly, a user not running the Prometheus
|
||
|
|
Operator can set
|
||
|
|
|
||
|
|
serviceMonitor.enabled: false. This adaptability transforms the chart
|
||
|
|
from a rigid, all-or-nothing package into a flexible building block,
|
||
|
|
dramatically increasing its utility across different organizations and
|
||
|
|
security postures.
|
||
|
|
|
||
|
|
The following table documents the comprehensive set of configurable
|
||
|
|
parameters for the iperf3-monitor chart. This serves as the primary
|
||
|
|
documentation for any user wishing to install and customize the service.
|
||
|
|
|
||
|
|
| Parameter | Description | Type | Default |
|
||
|
|
|------------------------------|----------------------------------------------------------------------|---------|-------------------------------------------|
|
||
|
|
| nameOverride | Override the name of the chart. | string | \"\" |
|
||
|
|
| fullnameOverride | Override the fully qualified app name. | string | \"\" |
|
||
|
|
| exporter.image.repository | The container image repository for the exporter. | string | ghcr.io/my-org/iperf3-prometheus-exporter |
|
||
|
|
| exporter.image.tag | The container image tag for the exporter. | string | (Chart.AppVersion) |
|
||
|
|
| exporter.image.pullPolicy | The image pull policy for the exporter. | string | IfNotPresent |
|
||
|
|
| exporter.replicaCount | Number of exporter pod replicas. | integer | 1 |
|
||
|
|
| exporter.testInterval | Interval in seconds between test cycles. | integer | 300 |
|
||
|
|
| exporter.testTimeout | Timeout in seconds for a single iperf3 test. | integer | 10 |
|
||
|
|
| exporter.testProtocol | Protocol to use for testing (tcp or udp). | string | tcp |
|
||
|
|
| exporter.resources | CPU/memory resource requests and limits for the exporter. | object | {} |
|
||
|
|
| server.image.repository | The container image repository for the iperf3 server. | string | networkstatic/iperf3 |
|
||
|
|
| server.image.tag | The container image tag for the iperf3 server. | string | latest |
|
||
|
|
| server.resources | CPU/memory resource requests and limits for the server pods. | object | {} |
|
||
|
|
| server.nodeSelector | Node selector for scheduling server pods. | object | {} |
|
||
|
|
| server.tolerations | Tolerations for scheduling server pods on tainted nodes. | array | \`\` |
|
||
|
|
| rbac.create | If true, create ServiceAccount, ClusterRole, and ClusterRoleBinding. | boolean | true |
|
||
|
|
| serviceAccount.name | The name of the ServiceAccount to use. Used if rbac.create is false. | string | \"\" |
|
||
|
|
| serviceMonitor.enabled | If true, create a ServiceMonitor for Prometheus Operator. | boolean | true |
|
||
|
|
| serviceMonitor.interval | Scrape interval for the ServiceMonitor. | string | 60s |
|
||
|
|
| serviceMonitor.scrapeTimeout | Scrape timeout for the ServiceMonitor. | string | 30s |
|
||
|
|
|
||
|
|
## **Section 5: Visualizing Network Performance with a Custom Grafana Dashboard**
|
||
|
|
|
||
|
|
The final piece of the user experience is a purpose-built Grafana
|
||
|
|
dashboard that transforms the raw, time-series metrics from Prometheus
|
||
|
|
into intuitive, actionable visualizations. A well-designed dashboard
|
||
|
|
does more than just display data; it tells a story, guiding an operator
|
||
|
|
from a high-level overview of cluster health to a deep-dive analysis of
|
||
|
|
a specific problematic network path.^5^
|
||
|
|
|
||
|
|
### **5.1 Dashboard Design Principles** {#dashboard-design-principles}
|
||
|
|
|
||
|
|
The primary goals for this network performance dashboard are:
|
||
|
|
|
||
|
|
1. **At-a-Glance Overview:** Provide an immediate, cluster-wide view of
|
||
|
|
> network health, allowing operators to quickly spot systemic issues
|
||
|
|
> or anomalies.
|
||
|
|
|
||
|
|
2. **Intuitive Drill-Down:** Enable users to seamlessly transition from
|
||
|
|
> a high-level view to a detailed analysis of performance between
|
||
|
|
> specific nodes.
|
||
|
|
|
||
|
|
3. **Correlation:** Display multiple related metrics (bandwidth,
|
||
|
|
> jitter, packet loss) on the same timeline to help identify causal
|
||
|
|
> relationships.
|
||
|
|
|
||
|
|
4. **Clarity and Simplicity:** Avoid clutter and overly complex panels
|
||
|
|
> that can obscure meaningful data.^4^
|
||
|
|
|
||
|
|
### **5.2 Key Visualizations and Panels** {#key-visualizations-and-panels}
|
||
|
|
|
||
|
|
The dashboard is constructed from several key panel types, each serving
|
||
|
|
a specific analytical purpose.
|
||
|
|
|
||
|
|
- **Panel 1: Node-to-Node Bandwidth Heatmap.** This is the centerpiece
|
||
|
|
> of the dashboard\'s overview. It uses Grafana\'s \"Heatmap\"
|
||
|
|
> visualization to create a matrix of network performance.
|
||
|
|
|
||
|
|
- **Y-Axis:** Source Node (source_node label).
|
||
|
|
|
||
|
|
- **X-Axis:** Destination Node (destination_node label).
|
||
|
|
|
||
|
|
- **Cell Color:** The value of the iperf_network_bandwidth_mbps
|
||
|
|
> metric.
|
||
|
|
|
||
|
|
- PromQL Query: avg(iperf_network_bandwidth_mbps) by (source_node,
|
||
|
|
> destination_node)
|
||
|
|
> This panel provides an instant visual summary of the entire
|
||
|
|
> cluster\'s network fabric. A healthy cluster will show a uniformly
|
||
|
|
> \"hot\" (high bandwidth) grid, while any \"cold\" spots
|
||
|
|
> immediately draw attention to underperforming network paths.
|
||
|
|
|
||
|
|
- **Panel 2: Time-Series Performance Graphs.** These panels use the
|
||
|
|
> \"Time series\" visualization to plot performance over time,
|
||
|
|
> allowing for trend analysis and historical investigation.
|
||
|
|
|
||
|
|
- **Bandwidth (Mbps):** Plots
|
||
|
|
> iperf_network_bandwidth_mbps{source_node=\"\$source_node\",
|
||
|
|
> destination_node=\"\$destination_node\"}.
|
||
|
|
|
||
|
|
- **Jitter (ms):** Plots
|
||
|
|
> iperf_network_jitter_ms{source_node=\"\$source_node\",
|
||
|
|
> destination_node=\"\$destination_node\", protocol=\"udp\"}.
|
||
|
|
|
||
|
|
- Packet Loss (%): Plots (iperf_network_lost_packets_total{\...} /
|
||
|
|
> iperf_network_packets_total{\...}) \* 100.
|
||
|
|
> These graphs are filtered by the dashboard variables, enabling the
|
||
|
|
> drill-down analysis.
|
||
|
|
|
||
|
|
- **Panel 3: Stat Panels.** These panels use the \"Stat\" visualization
|
||
|
|
> to display single, key performance indicators (KPIs) for the
|
||
|
|
> selected time range and nodes.
|
||
|
|
|
||
|
|
- **Average Bandwidth:** avg(iperf_network_bandwidth_mbps{\...})
|
||
|
|
|
||
|
|
- **Minimum Bandwidth:** min(iperf_network_bandwidth_mbps{\...})
|
||
|
|
|
||
|
|
- **Maximum Jitter:** max(iperf_network_jitter_ms{\...})
|
||
|
|
|
||
|
|
### **5.3 Enabling Interactivity with Grafana Variables** {#enabling-interactivity-with-grafana-variables}
|
||
|
|
|
||
|
|
The dashboard\'s interactivity is powered by Grafana\'s template
|
||
|
|
variables. These variables are dynamically populated from Prometheus and
|
||
|
|
are used to filter the data displayed in the panels.^4^
|
||
|
|
|
||
|
|
- **\$source_node**: A dropdown variable populated by the PromQL query
|
||
|
|
> label_values(iperf_network_bandwidth_mbps, source_node).
|
||
|
|
|
||
|
|
- **\$destination_node**: A dropdown variable populated by
|
||
|
|
> label_values(iperf_network_bandwidth_mbps{source_node=\"\$source_node\"},
|
||
|
|
> destination_node). This query is cascaded, meaning it only shows
|
||
|
|
> destinations relevant to the selected source.
|
||
|
|
|
||
|
|
- **\$protocol**: A custom variable with the options tcp and udp.
|
||
|
|
|
||
|
|
This combination of a high-level heatmap with interactive,
|
||
|
|
variable-driven drill-down graphs creates a powerful analytical
|
||
|
|
workflow. An operator can begin with a bird\'s-eye view of the cluster.
|
||
|
|
Upon spotting an anomaly on the heatmap (e.g., a low-bandwidth link
|
||
|
|
between Node-5 and Node-8), they can use the \$source_node and
|
||
|
|
\$destination_node dropdowns to select that specific path. All the
|
||
|
|
time-series panels will instantly update to show the detailed
|
||
|
|
performance history for that link, allowing the operator to correlate
|
||
|
|
bandwidth drops with jitter spikes or other events. This workflow
|
||
|
|
transforms raw data into actionable insight, dramatically reducing the
|
||
|
|
Mean Time to Identification (MTTI) for network issues.
|
||
|
|
|
||
|
|
### **5.4 The Complete Grafana Dashboard JSON Model** {#the-complete-grafana-dashboard-json-model}
|
||
|
|
|
||
|
|
To facilitate easy deployment, the entire dashboard is defined in a
|
||
|
|
single JSON model. This model can be imported directly into any Grafana
|
||
|
|
instance.
|
||
|
|
|
||
|
|
> JSON
|
||
|
|
|
||
|
|
{
|
||
|
|
\"\_\_inputs\":,
|
||
|
|
\"\_\_requires\": \[
|
||
|
|
{
|
||
|
|
\"type\": \"grafana\",
|
||
|
|
\"id\": \"grafana\",
|
||
|
|
\"name\": \"Grafana\",
|
||
|
|
\"version\": \"8.0.0\"
|
||
|
|
},
|
||
|
|
{
|
||
|
|
\"type\": \"datasource\",
|
||
|
|
\"id\": \"prometheus\",
|
||
|
|
\"name\": \"Prometheus\",
|
||
|
|
\"version\": \"1.0.0\"
|
||
|
|
}
|
||
|
|
\],
|
||
|
|
\"annotations\": {
|
||
|
|
\"list\": \[
|
||
|
|
{
|
||
|
|
\"builtIn\": 1,
|
||
|
|
\"datasource\": {
|
||
|
|
\"type\": \"grafana\",
|
||
|
|
\"uid\": \"\-- Grafana \--\"
|
||
|
|
},
|
||
|
|
\"enable\": true,
|
||
|
|
\"hide\": true,
|
||
|
|
\"iconColor\": \"rgba(0, 211, 255, 1)\",
|
||
|
|
\"name\": \"Annotations & Alerts\",
|
||
|
|
\"type\": \"dashboard\"
|
||
|
|
}
|
||
|
|
\]
|
||
|
|
},
|
||
|
|
\"editable\": true,
|
||
|
|
\"fiscalYearStartMonth\": 0,
|
||
|
|
\"gnetId\": null,
|
||
|
|
\"graphTooltip\": 0,
|
||
|
|
\"id\": null,
|
||
|
|
\"links\":,
|
||
|
|
\"panels\":)\",
|
||
|
|
\"format\": \"heatmap\",
|
||
|
|
\"legendFormat\": \"{{source_node}} -\> {{destination_node}}\",
|
||
|
|
\"refId\": \"A\"
|
||
|
|
}
|
||
|
|
\],
|
||
|
|
\"cards\": { \"cardPadding\": null, \"cardRound\": null },
|
||
|
|
\"color\": {
|
||
|
|
\"mode\": \"spectrum\",
|
||
|
|
\"scheme\": \"red-yellow-green\",
|
||
|
|
\"exponent\": 0.5,
|
||
|
|
\"reverse\": false
|
||
|
|
},
|
||
|
|
\"dataFormat\": \"tsbuckets\",
|
||
|
|
\"yAxis\": { \"show\": true, \"format\": \"short\" },
|
||
|
|
\"xAxis\": { \"show\": true }
|
||
|
|
},
|
||
|
|
{
|
||
|
|
\"title\": \"Bandwidth Over Time (Source: \$source_node, Dest:
|
||
|
|
\$destination_node)\",
|
||
|
|
\"type\": \"timeseries\",
|
||
|
|
\"datasource\": {
|
||
|
|
\"type\": \"prometheus\",
|
||
|
|
\"uid\": \"prometheus\"
|
||
|
|
},
|
||
|
|
\"gridPos\": { \"h\": 8, \"w\": 12, \"x\": 0, \"y\": 9 },
|
||
|
|
\"targets\":,
|
||
|
|
\"fieldConfig\": {
|
||
|
|
\"defaults\": {
|
||
|
|
\"unit\": \"mbps\"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
},
|
||
|
|
{
|
||
|
|
\"title\": \"Jitter Over Time (Source: \$source_node, Dest:
|
||
|
|
\$destination_node)\",
|
||
|
|
\"type\": \"timeseries\",
|
||
|
|
\"datasource\": {
|
||
|
|
\"type\": \"prometheus\",
|
||
|
|
\"uid\": \"prometheus\"
|
||
|
|
},
|
||
|
|
\"gridPos\": { \"h\": 8, \"w\": 12, \"x\": 12, \"y\": 9 },
|
||
|
|
\"targets\": \[
|
||
|
|
{
|
||
|
|
\"expr\": \"iperf_network_jitter_ms{source_node=\\\$source_node\\,
|
||
|
|
destination_node=\\\$destination_node\\, protocol=\\udp\\}\",
|
||
|
|
\"legendFormat\": \"Jitter\",
|
||
|
|
\"refId\": \"A\"
|
||
|
|
}
|
||
|
|
\],
|
||
|
|
\"fieldConfig\": {
|
||
|
|
\"defaults\": {
|
||
|
|
\"unit\": \"ms\"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
\],
|
||
|
|
\"refresh\": \"30s\",
|
||
|
|
\"schemaVersion\": 36,
|
||
|
|
\"style\": \"dark\",
|
||
|
|
\"tags\": \[\"iperf3\", \"network\", \"kubernetes\"\],
|
||
|
|
\"templating\": {
|
||
|
|
\"list\": \[
|
||
|
|
{
|
||
|
|
\"current\": {},
|
||
|
|
\"datasource\": {
|
||
|
|
\"type\": \"prometheus\",
|
||
|
|
\"uid\": \"prometheus\"
|
||
|
|
},
|
||
|
|
\"definition\": \"label_values(iperf_network_bandwidth_mbps,
|
||
|
|
source_node)\",
|
||
|
|
\"hide\": 0,
|
||
|
|
\"includeAll\": false,
|
||
|
|
\"multi\": false,
|
||
|
|
\"name\": \"source_node\",
|
||
|
|
\"options\":,
|
||
|
|
\"query\": \"label_values(iperf_network_bandwidth_mbps,
|
||
|
|
source_node)\",
|
||
|
|
\"refresh\": 1,
|
||
|
|
\"regex\": \"\",
|
||
|
|
\"skipUrlSync\": false,
|
||
|
|
\"sort\": 1,
|
||
|
|
\"type\": \"query\"
|
||
|
|
},
|
||
|
|
{
|
||
|
|
\"current\": {},
|
||
|
|
\"datasource\": {
|
||
|
|
\"type\": \"prometheus\",
|
||
|
|
\"uid\": \"prometheus\"
|
||
|
|
},
|
||
|
|
\"definition\":
|
||
|
|
\"label_values(iperf_network_bandwidth_mbps{source_node=\\\$source_node\\},
|
||
|
|
destination_node)\",
|
||
|
|
\"hide\": 0,
|
||
|
|
\"includeAll\": false,
|
||
|
|
\"multi\": false,
|
||
|
|
\"name\": \"destination_node\",
|
||
|
|
\"options\":,
|
||
|
|
\"query\":
|
||
|
|
\"label_values(iperf_network_bandwidth_mbps{source_node=\\\$source_node\\},
|
||
|
|
destination_node)\",
|
||
|
|
\"refresh\": 1,
|
||
|
|
\"regex\": \"\",
|
||
|
|
\"skipUrlSync\": false,
|
||
|
|
\"sort\": 1,
|
||
|
|
\"type\": \"query\"
|
||
|
|
},
|
||
|
|
{
|
||
|
|
\"current\": { \"selected\": true, \"text\": \"tcp\", \"value\": \"tcp\"
|
||
|
|
},
|
||
|
|
\"hide\": 0,
|
||
|
|
\"includeAll\": false,
|
||
|
|
\"multi\": false,
|
||
|
|
\"name\": \"protocol\",
|
||
|
|
\"options\": \[
|
||
|
|
{ \"selected\": true, \"text\": \"tcp\", \"value\": \"tcp\" },
|
||
|
|
{ \"selected\": false, \"text\": \"udp\", \"value\": \"udp\" }
|
||
|
|
\],
|
||
|
|
\"query\": \"tcp,udp\",
|
||
|
|
\"skipUrlSync\": false,
|
||
|
|
\"type\": \"custom\"
|
||
|
|
}
|
||
|
|
\]
|
||
|
|
},
|
||
|
|
\"time\": {
|
||
|
|
\"from\": \"now-1h\",
|
||
|
|
\"to\": \"now\"
|
||
|
|
},
|
||
|
|
\"timepicker\": {},
|
||
|
|
\"timezone\": \"browser\",
|
||
|
|
\"title\": \"Kubernetes iperf3 Network Performance\",
|
||
|
|
\"uid\": \"k8s-iperf3-dashboard\",
|
||
|
|
\"version\": 1,
|
||
|
|
\"weekStart\": \"\"
|
||
|
|
}
|
||
|
|
|
||
|
|
## **Section 6: GitHub Repository Structure and CI/CD Workflow**
|
||
|
|
|
||
|
|
To deliver this monitoring service as a professional, open-source-ready
|
||
|
|
project, it is essential to package it within a well-structured GitHub
|
||
|
|
repository and implement a robust Continuous Integration and Continuous
|
||
|
|
Deployment (CI/CD) pipeline. This automates the build, test, and release
|
||
|
|
process, ensuring that every version of the software is consistent,
|
||
|
|
trustworthy, and easy for consumers to adopt.
|
||
|
|
|
||
|
|
### **6.1 Recommended Repository Structure** {#recommended-repository-structure}
|
||
|
|
|
||
|
|
A clean, logical directory structure is fundamental for project
|
||
|
|
maintainability and ease of navigation for contributors and users.
|
||
|
|
|
||
|
|
.
|
||
|
|
├──.github/
|
||
|
|
│ └── workflows/
|
||
|
|
│ └── release.yml \# GitHub Actions workflow for CI/CD
|
||
|
|
├── charts/
|
||
|
|
│ └── iperf3-monitor/ \# The Helm chart for the service
|
||
|
|
│ ├── Chart.yaml
|
||
|
|
│ ├── values.yaml
|
||
|
|
│ └── templates/
|
||
|
|
│ └──\...
|
||
|
|
└── exporter/
|
||
|
|
├── Dockerfile \# Dockerfile for the exporter
|
||
|
|
├── requirements.txt \# Python dependencies
|
||
|
|
└── exporter.py \# Exporter source code
|
||
|
|
├──.gitignore
|
||
|
|
├── LICENSE
|
||
|
|
└── README.md
|
||
|
|
|
||
|
|
This structure cleanly separates the exporter application code
|
||
|
|
(/exporter) from its deployment packaging (/charts/iperf3-monitor), and
|
||
|
|
its release automation (/.github/workflows).
|
||
|
|
|
||
|
|
### **6.2 CI/CD Pipeline with GitHub Actions** {#cicd-pipeline-with-github-actions}
|
||
|
|
|
||
|
|
A fully automated CI/CD pipeline is the hallmark of a mature software
|
||
|
|
project. It eliminates manual, error-prone release steps and provides
|
||
|
|
strong guarantees about the integrity of the published artifacts. By
|
||
|
|
triggering the pipeline on the creation of a Git tag (e.g., v1.2.3), we
|
||
|
|
use the tag as a single source of truth for versioning both the Docker
|
||
|
|
image and the Helm chart. This ensures that chart version 1.2.3 is built
|
||
|
|
to use image version 1.2.3, and that both have been validated before
|
||
|
|
release. This automated, atomic release process provides trust and
|
||
|
|
velocity, elevating the project from a collection of files into a
|
||
|
|
reliable, distributable piece of software.
|
||
|
|
|
||
|
|
The following GitHub Actions workflow automates the entire release
|
||
|
|
process:
|
||
|
|
|
||
|
|
> YAML
|
||
|
|
|
||
|
|
\#.github/workflows/release.yml
|
||
|
|
name: Release iperf3-monitor
|
||
|
|
|
||
|
|
on:
|
||
|
|
push:
|
||
|
|
tags:
|
||
|
|
- \'v\*.\*.\*\'
|
||
|
|
|
||
|
|
env:
|
||
|
|
REGISTRY: ghcr.io
|
||
|
|
IMAGE_NAME: \${{ github.repository }}
|
||
|
|
|
||
|
|
jobs:
|
||
|
|
lint-and-test:
|
||
|
|
name: Lint and Test
|
||
|
|
runs-on: ubuntu-latest
|
||
|
|
steps:
|
||
|
|
- name: Check out code
|
||
|
|
uses: actions/checkout@v3
|
||
|
|
|
||
|
|
- name: Set up Helm
|
||
|
|
uses: azure/setup-helm@v3
|
||
|
|
with:
|
||
|
|
version: v3.10.0
|
||
|
|
|
||
|
|
- name: Helm Lint
|
||
|
|
run: helm lint./charts/iperf3-monitor
|
||
|
|
|
||
|
|
build-and-publish-image:
|
||
|
|
name: Build and Publish Docker Image
|
||
|
|
runs-on: ubuntu-latest
|
||
|
|
needs: lint-and-test
|
||
|
|
permissions:
|
||
|
|
contents: read
|
||
|
|
packages: write
|
||
|
|
steps:
|
||
|
|
- name: Check out code
|
||
|
|
uses: actions/checkout@v3
|
||
|
|
|
||
|
|
- name: Log in to GitHub Container Registry
|
||
|
|
uses: docker/login-action@v2
|
||
|
|
with:
|
||
|
|
registry: \${{ env.REGISTRY }}
|
||
|
|
username: \${{ github.actor }}
|
||
|
|
password: \${{ secrets.GITHUB_TOKEN }}
|
||
|
|
|
||
|
|
- name: Extract metadata (tags, labels) for Docker
|
||
|
|
id: meta
|
||
|
|
uses: docker/metadata-action@v4
|
||
|
|
with:
|
||
|
|
images: \${{ env.REGISTRY }}/\${{ env.IMAGE_NAME }}
|
||
|
|
|
||
|
|
- name: Build and push Docker image
|
||
|
|
uses: docker/build-push-action@v4
|
||
|
|
with:
|
||
|
|
context:./exporter
|
||
|
|
push: true
|
||
|
|
tags: \${{ steps.meta.outputs.tags }}
|
||
|
|
labels: \${{ steps.meta.outputs.labels }}
|
||
|
|
|
||
|
|
package-and-publish-chart:
|
||
|
|
name: Package and Publish Helm Chart
|
||
|
|
runs-on: ubuntu-latest
|
||
|
|
needs: build-and-publish-image
|
||
|
|
permissions:
|
||
|
|
contents: write
|
||
|
|
steps:
|
||
|
|
- name: Check out code
|
||
|
|
uses: actions/checkout@v3
|
||
|
|
with:
|
||
|
|
fetch-depth: 0
|
||
|
|
|
||
|
|
- name: Set up Helm
|
||
|
|
uses: azure/setup-helm@v3
|
||
|
|
with:
|
||
|
|
version: v3.10.0
|
||
|
|
|
||
|
|
- name: Set Chart Version
|
||
|
|
run: \|
|
||
|
|
VERSION=\$(echo \"\${{ github.ref_name }}\" \| sed \'s/\^v//\')
|
||
|
|
helm-docs \--sort-values-order file
|
||
|
|
yq e -i \'.version =
|
||
|
|
strenv(VERSION)\'./charts/iperf3-monitor/Chart.yaml
|
||
|
|
yq e -i \'.appVersion =
|
||
|
|
strenv(VERSION)\'./charts/iperf3-monitor/Chart.yaml
|
||
|
|
|
||
|
|
- name: Publish Helm chart
|
||
|
|
uses: stefanprodan/helm-gh-pages@v1.6.0
|
||
|
|
with:
|
||
|
|
token: \${{ secrets.GITHUB_TOKEN }}
|
||
|
|
charts_dir:./charts
|
||
|
|
charts_url: https://\${{ github.repository_owner }}.github.io/\${{
|
||
|
|
github.event.repository.name }}
|
||
|
|
|
||
|
|
### **6.3 Documentation and Usability** {#documentation-and-usability}
|
||
|
|
|
||
|
|
The final, and arguably most critical, component for project success is
|
||
|
|
high-quality documentation. The README.md file at the root of the
|
||
|
|
repository is the primary entry point for any user. It should clearly
|
||
|
|
explain what the project does, its architecture, and how to deploy and
|
||
|
|
use it.
|
||
|
|
|
||
|
|
A common failure point in software projects is documentation that falls
|
||
|
|
out of sync with the code. For Helm charts, the values.yaml file
|
||
|
|
frequently changes, adding new parameters and options. To combat this,
|
||
|
|
it is a best practice to automate the documentation of these parameters.
|
||
|
|
The helm-docs tool can be integrated directly into the CI/CD pipeline to
|
||
|
|
automatically generate the \"Parameters\" section of the README.md by
|
||
|
|
parsing the comments directly from the values.yaml file.^20^ This
|
||
|
|
ensures that the documentation is always an accurate reflection of the
|
||
|
|
chart\'s configurable options, providing a seamless and trustworthy
|
||
|
|
experience for users.
|
||
|
|
|
||
|
|
## **Conclusion**
|
||
|
|
|
||
|
|
The proliferation of distributed microservices on Kubernetes has made
|
||
|
|
network performance a critical, yet often opaque, component of overall
|
||
|
|
application health. This report has detailed a comprehensive,
|
||
|
|
production-grade solution for establishing continuous network validation
|
||
|
|
within a Kubernetes cluster. By architecting a system around the robust,
|
||
|
|
decoupled pattern of an iperf3-server DaemonSet and a Kubernetes-aware
|
||
|
|
iperf3-exporter Deployment, this service provides a resilient and
|
||
|
|
automated foundation for network observability.
|
||
|
|
|
||
|
|
The implementation leverages industry-standard tools---Python for the
|
||
|
|
exporter, Prometheus for metrics storage, and Grafana for
|
||
|
|
visualization---to create a powerful and flexible monitoring pipeline.
|
||
|
|
The entire service is packaged into a professional Helm chart, following
|
||
|
|
best practices for templating, configuration, and adaptability. This
|
||
|
|
allows for simple, version-controlled deployment across a wide range of
|
||
|
|
environments. The final Grafana dashboard transforms the collected data
|
||
|
|
into an intuitive, interactive narrative, enabling engineers to move
|
||
|
|
swiftly from high-level anomaly detection to root-cause analysis.
|
||
|
|
|
||
|
|
Ultimately, by treating network performance not as a given but as a
|
||
|
|
continuously measured metric, organizations can proactively identify and
|
||
|
|
resolve infrastructure bottlenecks, enhance application reliability, and
|
||
|
|
ensure a consistent, high-quality experience for their users in the
|
||
|
|
dynamic world of Kubernetes.
|
||
|
|
|
||
|
|
#### Works cited
|
||
|
|
|
||
|
|
1. How to Identify Performance Issues in Kubernetes - LabEx, accessed
|
||
|
|
> June 17, 2025,
|
||
|
|
> [[https://labex.io/questions/how-to-identify-performance-issues-in-kubernetes-11358]{.underline}](https://labex.io/questions/how-to-identify-performance-issues-in-kubernetes-11358)
|
||
|
|
|
||
|
|
2. Performing large-scale network testing on Red Hat OpenShift: A 100
|
||
|
|
> Gbps approach, accessed June 17, 2025,
|
||
|
|
> [[https://www.redhat.com/en/blog/performing-large-scale-network-testing-red-hat-openshift]{.underline}](https://www.redhat.com/en/blog/performing-large-scale-network-testing-red-hat-openshift)
|
||
|
|
|
||
|
|
3. How to Implement Full-Stack Monitoring with Prometheus/Grafana on
|
||
|
|
> FreeBSD Operating System \| Siberoloji, accessed June 17, 2025,
|
||
|
|
> [[https://www.siberoloji.com/how-to-implement-full-stack-monitoring-with-prometheusgrafana-on-freebsd-operating-system/]{.underline}](https://www.siberoloji.com/how-to-implement-full-stack-monitoring-with-prometheusgrafana-on-freebsd-operating-system/)
|
||
|
|
|
||
|
|
4. Kubernetes Metrics and Monitoring with Prometheus and Grafana - DEV
|
||
|
|
> Community, accessed June 17, 2025,
|
||
|
|
> [[https://dev.to/abhay_yt_52a8e72b213be229/kubernetes-metrics-and-monitoring-with-prometheus-and-grafana-4e9n]{.underline}](https://dev.to/abhay_yt_52a8e72b213be229/kubernetes-metrics-and-monitoring-with-prometheus-and-grafana-4e9n)
|
||
|
|
|
||
|
|
5. The Top 30 Grafana Dashboard Examples - Logit.io, accessed June 17,
|
||
|
|
> 2025,
|
||
|
|
> [[https://logit.io/blog/post/top-grafana-dashboards-and-visualisations/]{.underline}](https://logit.io/blog/post/top-grafana-dashboards-and-visualisations/)
|
||
|
|
|
||
|
|
6. Autopilot Metrics \| Grafana Labs, accessed June 17, 2025,
|
||
|
|
> [[https://grafana.com/grafana/dashboards/23123-autopilot-metrics/]{.underline}](https://grafana.com/grafana/dashboards/23123-autopilot-metrics/)
|
||
|
|
|
||
|
|
7. Kubernetes DaemonSet: Practical Guide to Monitoring in Kubernetes -
|
||
|
|
> Cast AI, accessed June 17, 2025,
|
||
|
|
> [[https://cast.ai/blog/kubernetes-daemonset-practical-guide-to-monitoring-in-kubernetes/]{.underline}](https://cast.ai/blog/kubernetes-daemonset-practical-guide-to-monitoring-in-kubernetes/)
|
||
|
|
|
||
|
|
8. Kubernetes DaemonSets vs Deployments: Key Differences and Use
|
||
|
|
> Cases - RubixKube™, accessed June 17, 2025,
|
||
|
|
> [[https://www.rubixkube.io/blog/kubernetes-daemonsets-vs-deployments-key-differences-and-use-cases-4a5i]{.underline}](https://www.rubixkube.io/blog/kubernetes-daemonsets-vs-deployments-key-differences-and-use-cases-4a5i)
|
||
|
|
|
||
|
|
9. Kubernetes DaemonSet: Examples, Use Cases & Best Practices -
|
||
|
|
> groundcover, accessed June 17, 2025,
|
||
|
|
> [[https://www.groundcover.com/blog/kubernetes-daemonset]{.underline}](https://www.groundcover.com/blog/kubernetes-daemonset)
|
||
|
|
|
||
|
|
10. Complete Comparison of Kubernetes Daemonset Vs Deployment \|
|
||
|
|
> Zeet.co, accessed June 17, 2025,
|
||
|
|
> [[https://zeet.co/blog/kubernetes-daemonset-vs-deployment]{.underline}](https://zeet.co/blog/kubernetes-daemonset-vs-deployment)
|
||
|
|
|
||
|
|
11. Testing Connectivity Between Kubernetes Pods with Iperf3 \|
|
||
|
|
> Support - SUSE, accessed June 17, 2025,
|
||
|
|
> [[https://www.suse.com/support/kb/doc/?id=000020954]{.underline}](https://www.suse.com/support/kb/doc/?id=000020954)
|
||
|
|
|
||
|
|
12. Pharb/kubernetes-iperf3: Simple wrapper around iperf3 to \... -
|
||
|
|
> GitHub, accessed June 17, 2025,
|
||
|
|
> [[https://github.com/Pharb/kubernetes-iperf3]{.underline}](https://github.com/Pharb/kubernetes-iperf3)
|
||
|
|
|
||
|
|
13. Building a Custom Prometheus Exporter in Python - SysOpsPro,
|
||
|
|
> accessed June 17, 2025,
|
||
|
|
> [[https://sysopspro.com/how-to-build-your-own-prometheus-exporter-in-python/]{.underline}](https://sysopspro.com/how-to-build-your-own-prometheus-exporter-in-python/)
|
||
|
|
|
||
|
|
14. How to Create a Prometheus Exporter? - Enix.io, accessed June 17,
|
||
|
|
> 2025,
|
||
|
|
> [[https://enix.io/en/blog/create-prometheus-exporter/]{.underline}](https://enix.io/en/blog/create-prometheus-exporter/)
|
||
|
|
|
||
|
|
15. Examples --- iperf3 0.1.10 documentation - iperf3 python wrapper,
|
||
|
|
> accessed June 17, 2025,
|
||
|
|
> [[https://iperf3-python.readthedocs.io/en/latest/examples.html]{.underline}](https://iperf3-python.readthedocs.io/en/latest/examples.html)
|
||
|
|
|
||
|
|
16. markormesher/iperf-prometheus-collector - GitHub, accessed June 17,
|
||
|
|
> 2025,
|
||
|
|
> [[https://github.com/markormesher/iperf-prometheus-collector]{.underline}](https://github.com/markormesher/iperf-prometheus-collector)
|
||
|
|
|
||
|
|
17. Using Helm with Kubernetes: A Guide to Helm Charts and Their
|
||
|
|
> Implementation, accessed June 17, 2025,
|
||
|
|
> [[https://dev.to/alexmercedcoder/using-helm-with-kubernetes-a-guide-to-helm-charts-and-their-implementation-8dg]{.underline}](https://dev.to/alexmercedcoder/using-helm-with-kubernetes-a-guide-to-helm-charts-and-their-implementation-8dg)
|
||
|
|
|
||
|
|
18. Kubernetes Helm Charts: The Basics and a Quick Tutorial - Spot.io,
|
||
|
|
> accessed June 17, 2025,
|
||
|
|
> [[https://spot.io/resources/kubernetes-architecture/kubernetes-helm-charts-the-basics-and-a-quick-tutorial/]{.underline}](https://spot.io/resources/kubernetes-architecture/kubernetes-helm-charts-the-basics-and-a-quick-tutorial/)
|
||
|
|
|
||
|
|
19. Understand a Helm chart structure - Bitnami Documentation, accessed
|
||
|
|
> June 17, 2025,
|
||
|
|
> [[https://docs.bitnami.com/kubernetes/faq/administration/understand-helm-chart/]{.underline}](https://docs.bitnami.com/kubernetes/faq/administration/understand-helm-chart/)
|
||
|
|
|
||
|
|
20. Helm Chart Essentials & Writing Effective Charts - DEV Community,
|
||
|
|
> accessed June 17, 2025,
|
||
|
|
> [[https://dev.to/hkhelil/helm-chart-essentials-writing-effective-charts-11ca]{.underline}](https://dev.to/hkhelil/helm-chart-essentials-writing-effective-charts-11ca)
|
||
|
|
|
||
|
|
21. Values - Helm, accessed June 17, 2025,
|
||
|
|
> [[https://helm.sh/docs/chart_best_practices/values/]{.underline}](https://helm.sh/docs/chart_best_practices/values/)
|
||
|
|
|
||
|
|
22. grafana.com, accessed June 17, 2025,
|
||
|
|
> [[https://grafana.com/grafana/dashboards/22901-traffic-monitoring/#:\~:text=host%20traffic%20breakdowns.-,Grafana%20Dashboard,traffic%20statistics%20by%20source%2Fdestination.]{.underline}](https://grafana.com/grafana/dashboards/22901-traffic-monitoring/#:~:text=host%20traffic%20breakdowns.-,Grafana%20Dashboard,traffic%20statistics%20by%20source%2Fdestination.)
|