18 Commits

Author SHA1 Message Date
13ca1a7fc3 fix: service account condition 2025-06-21 01:38:23 +05:30
0c490e95d2 feat: Implement iperf3 exporter core logic and log level configuration (#13)
* feat: Implement iperf3 exporter core logic and log level configuration

This commit completes the core functionality of the iperf3 exporter and adds flexible log level configuration.

Key changes:
- Added command-line (`--log-level`) and environment variable (`LOG_LEVEL`) options to configure the logging level.
- Implemented the main test orchestration loop (`main_loop`) which:
    - Discovers iperf3 server pods via the Kubernetes API.
    - Periodically runs iperf3 tests (TCP/UDP) between the exporter pod and discovered server pods.
    - Avoids self-testing.
    - Uses configurable test intervals, server ports, and protocols.
    - Requires `SOURCE_NODE_NAME` to be set.
- Refined the `parse_and_publish_metrics` function to:
    - Accurately parse iperf3 results for bandwidth, jitter, packets, and lost packets.
    - Set `IPERF_TEST_SUCCESS` metric (0 for failure, 1 for success).
    - Zero out all relevant metrics for a given path upon test failure to prevent stale data.
    - Handle UDP-specific metrics correctly, zeroing them for TCP tests.
    - Improved robustness in accessing iperf3 result attributes.
- Updated the main execution block to initialize logging, start the Prometheus HTTP server, and invoke the main loop.
- Added comprehensive docstrings and inline comments throughout `exporter/exporter.py` for improved readability and maintainability.

These changes align the exporter's implementation with the details specified in the design document (docs/DESIGN.MD).

* feat: Update Helm chart and CI for exporter enhancements

This commit introduces updates to the Helm chart to support log level
configuration for the iperf3 exporter, and modifies the CI workflow
to improve image tagging for pull requests.

Helm Chart Changes (`charts/iperf3-monitor`):
- Added `exporter.logLevel` to `values.yaml` (default: "INFO") to allow
  you to set the exporter's log level.
- Updated `templates/exporter-deployment.yaml` to use the
  `exporter.logLevel` value to set the `LOG_LEVEL` environment
  variable in the exporter container.

CI Workflow Changes (`.github/workflows/ci.yaml`):
- Modified the Docker image build process to tag images built from
  pull requests with `pr-<PR_NUMBER>`.
- Ensured that these PR-specific images are pushed to the container
  registry.
- Preserved existing tagging mechanisms (e.g., SHA-based tags).

* fix: Add Docker login and permissions to CI workflow

This commit fixes the Docker image push failure in the CI workflow
by adding the necessary Docker login step and ensuring the correct
permissions are set for the GITHUB_TOKEN.

- Added a Docker login step using `docker/login-action@v3` to the
  `Build Docker Image` job in `.github/workflows/ci.yaml`. This
  authenticates to GHCR before attempting to push images.
- Added a `permissions` block to the `Build Docker Image` job, granting
  `packages: write` scope to the `GITHUB_TOKEN`. This is required
  to allow pushing packages to the GitHub Container Registry.

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2025-06-21 01:02:45 +05:30
81b771d1ee docs: Set service account name to iperf3-monitor (#12)
The default service account name was empty, causing issues when
rbac.create was disabled. This commit sets a reasonable default.
Also, add a comprehensive design document.
2025-06-21 00:10:08 +05:30
3e21f978ee feat: Add support for arm64 architecture (#11)
* feat: Add support for arm64 architecture

This commit introduces support for the arm64 architecture by:

1.  **Updating the Dockerfile:**
    *   The `exporter/Dockerfile` now uses the `TARGETARCH` build argument to dynamically determine the correct path for `libiperf.so.0`. This allows the same Dockerfile to be used for building both `amd64` and `arm64` images.

2.  **Modifying GitHub Workflows:**
    *   The CI workflow (`.github/workflows/ci.yaml`) and the Release workflow (`.github/workflows/release.yml`) have been updated to build and push multi-architecture Docker images (`linux/amd64` and `linux/arm64`).
    *   This involves adding the `docker/setup-qemu-action` for cross-compilation and specifying the target platforms in the `docker/build-push-action`.

3.  **Helm Chart:**
    *   No changes were required for the Helm chart as the image tag will now point to a multi-arch manifest, and the default iperf3 server image (`networkstatic/iperf3:latest`) is assumed to be multi-arch. Node selectors in the chart are not architecture-specific.

These changes enable the deployment of the iperf3-monitor on Kubernetes clusters with arm64 nodes.

* fix: Ensure multi-platform builds with Docker Buildx

This commit updates the GitHub Actions workflows to correctly set up
Docker Buildx for multi-platform (amd64, arm64) image builds.

Previously, the workflows were missing the `docker/setup-buildx-action`
step, which led to errors when attempting multi-platform builds as the
default Docker driver does not support this.

The following changes were made:
1.  **Added `docker/setup-buildx-action@v3`:**
    - This step is now included in both the CI (`.github/workflows/ci.yaml`) and Release (`.github/workflows/release.yml`) workflows before the QEMU setup and build/push actions.

2.  **Dockerfile (`exporter/Dockerfile`):**
    - Remains as per the previous commit, using `TARGETARCH` to correctly copy architecture-specific libraries. This part was already correct for multi-arch builds.

3.  **Helm Chart:**
    - No changes were required for the Helm chart.

This ensures that the CI/CD pipeline can successfully build and push
Docker images for both `linux/amd64` and `linux/arm64` architectures.

* fix: Correct Dockerfile lib path and add Helm dependency toggle

This commit includes two main changes:

1.  **Fix Dockerfile library path for amd64:**
    - I updated the `exporter/Dockerfile` to correctly determine the source path for `libiperf.so.0` when building for different architectures.
    - Specifically, for `TARGETARCH=amd64`, the path `/usr/lib/x86_64-linux-gnu/libiperf.so.0` is now used.
    - For `TARGETARCH=arm64`, the path `/usr/lib/aarch64-linux-gnu/libiperf.so.0` is used.
    - I achieved this by copying the library to a canonical temporary location in the builder stage based on `TARGETARCH`, and then copying it from this location into the final image. This resolves an issue where builds for `amd64` would fail to find the library.

2.  **Add Helm chart option to disable dependencies:**
    - I added a new option `dependencies.install` (default: `true`) to `charts/iperf3-monitor/values.yaml`.
    - This allows you to disable the installation of managed dependencies (i.e., Prometheus Operator via `kube-prometheus-stack` or `prometheus-operator` from TrueCharts) even if `serviceMonitor.enabled` is true.
    - I updated the `condition` for these dependencies in `charts/iperf3-monitor/Chart.yaml` to `dependencies.install, serviceMonitor.enabled, ...`.
    - This is useful for you if you manage your Prometheus Operator installation separately.

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2025-06-20 19:07:36 +05:30
458b786ff4 Fix: Ignore packaged chart files and remove markdown files (#10)
Removes the outdated markdown files and fixes the .gitignore to ignore packaged chart files in the correct directories. This prevents them from being committed to
2025-06-20 17:42:17 +05:30
96be13a23c Fix: Use dependencies scope for truecharts prometheus-operator (#9)
Fixes an issue where truecharts prometheus operator version and
repository values where not accessible because they were not under the
`dependencies` scope.
2025-06-20 13:35:13 +05:30
8d51afc24e Feat: Add optional TrueCharts Prometheus Operator dependency (#8)
This commit introduces a configurable dependency for the Prometheus Operator,
allowing you to choose between the standard kube-prometheus-stack and
the TrueCharts version of prometheus-operator.

Changes include:

1.  **values.yaml:**
    *   Added a `dependencies` section with the following new values:
        *   `useTrueChartsPrometheusOperator` (boolean, default: false):
            Controls which operator dependency is enabled.
        *   `trueChartsPrometheusOperatorRepository` (string, default:
            "oci://tccr.io/truecharts"): Repository for the TrueCharts operator.
        *   `trueChartsPrometheusOperatorVersion` (string, default: "8.11.1"):
            Chart version for the TrueCharts operator.

2.  **Chart.yaml:**
    *   The `kube-prometheus-stack` dependency condition is updated to
        `"serviceMonitor.enabled, !values.dependencies.useTrueChartsPrometheusOperator"`.
    *   A new dependency for `prometheus-operator` (TrueCharts) is added:
        *   `name: prometheus-operator`
        *   `version: "{{ .Values.dependencies.trueChartsPrometheusOperatorVersion }}"`
        *   `repository: "{{ .Values.dependencies.trueChartsPrometheusOperatorRepository }}"`
        *   `condition: "serviceMonitor.enabled, values.dependencies.useTrueChartsPrometheusOperator"`

This provides you with more flexibility in choosing your Prometheus
Operator stack while using the iperf3-monitor chart.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2025-06-20 13:25:30 +05:30
a2d57908f6 Merge pull request #7 from malarinv/fix/readme-license-ci-helm
Fix: Final correction for yq command in release workflow
2025-06-20 03:07:58 +05:30
google-labs-jules[bot]
4298031a2d Fix: Final correction for yq command in release workflow
This commit implements a verified yq command syntax in the
`.github/workflows/release.yml` file to ensure correct and reliable
updating of Chart.yaml version and appVersion from Git tags.

The previous attempts faced issues with yq argument parsing and
environment variable substitution. The new commands:
  VERSION=$VERSION yq e -i '.version = strenv(VERSION)' ./charts/iperf3-monitor/Chart.yaml
  VERSION=$VERSION yq e -i '.appVersion = strenv(VERSION)' ./charts/iperf3-monitor/Chart.yaml
were tested and confirmed to correctly modify
the Chart.yaml file as intended.

This change should resolve the issues where chart versions were being
set incorrectly or to empty strings during the release process.
2025-06-19 21:36:48 +00:00
e6d1a8fb91 Merge pull request #6 from malarinv/fix/readme-license-ci-helm
Fix: Update README, license consistency, and Helm chart configurations
2025-06-20 02:44:04 +05:30
google-labs-jules[bot]
a9f2a49549 Fix: Update README, license consistency, and Helm chart configurations
This commit addresses several issues to improve repository accuracy and CI reliability:

1.  **README.md Updates:**
    *   I corrected the Helm repository URL to `https://malarinv.github.io/iperf3-monitor/`.
    *   I updated the default exporter image name to `ghcr.io/malarinv/iperf3-monitor` in examples.
    *   I revised the License section to accurately reflect the AGPLv3 license present in the `LICENSE` file, removing contradictory statements.

2.  **License Consistency:**
    *   I confirmed `LICENSE` file contains AGPLv3. README now correctly refers to it.

3.  **Helm Chart Adjustments:**
    *   `charts/iperf3-monitor/Chart.yaml`: I removed placeholder comments for clarity. Versioning is handled by the release workflow.
    *   `charts/iperf3-monitor/values.yaml`: I updated `exporter.image.repository` to `ghcr.io/malarinv/iperf3-monitor` to match the CI build image name.

4.  **CI Workflow Verification:**
    *   I verified that `.github/workflows/release.yml` correctly uses `yq` to set chart versions from Git tags and publishes to the correct GitHub Pages URL. This should prevent the previously noted `chart.metadata.version is required` error, which was associated with an older version of the release workflow.

These changes ensure that the documentation is up-to-date, the Helm chart defaults are correct, and the CI pipeline for chart publishing is robust.
2025-06-19 21:13:13 +00:00
7f0784d382 Merge pull request #5 from malarinv/fix_helm_add_devbox
feat: Add devbox configuration and lock files; clean up YAML files by…
2025-06-20 02:29:45 +05:30
050fbcbf3c feat: Add devbox configuration and lock files; clean up YAML files by removing trailing newlines 2025-06-20 02:28:10 +05:30
e22d2ff71d Merge pull request #4 from malarinv/fix-helm-lint-errors
I've fixed the Helm lint errors in the iperf3-monitor chart.
2025-06-20 02:18:11 +05:30
1487901337 Merge pull request #3 from malarinv/ci
feat: Add GitHub Actions CI workflow
2025-06-20 02:18:01 +05:30
fec4cf64b9 fix: Remove unnecessary dependency section from Chart.yaml and correct formatting in exporter-deployment.yaml 2025-06-20 02:17:27 +05:30
google-labs-jules[bot]
c08f4a5667 I've fixed the Helm lint errors in the iperf3-monitor chart.
This involved addressing several Helm linting issues I identified in the iperf3-monitor chart.

Here's what I changed:
- I corrected a syntax error (an unexpected backslash) in the label value within `charts/iperf3-monitor/templates/exporter-deployment.yaml`.
- I resolved a missing dependency by:
    - Adding the `prometheus-community` Helm repository.
    - Updating the dependency name in `Chart.yaml` to `kube-prometheus-stack` when a repository URL is specified.
    - Running `helm dependency update` to fetch the `kube-prometheus-stack` dependency.
- I fixed YAML parsing errors in `charts/iperf3-monitor/templates/exporter-deployment.yaml` caused by incorrect newline handling in the Helm helper templates (`charts/iperf3-monitor/templates/_helpers.tpl`). This involved:
    - Ensuring the `iperf3-monitor.selectorLabels` helper template output ends with a newline.
    - Adjusting whitespace control in the `iperf3-monitor.labels` helper template to preserve newlines between label entries.
- I restored the `app.kubernetes.io/component: exporter` label to the top-level metadata in `charts/iperf3-monitor/templates/exporter-deployment.yaml`.

After these modifications, `helm lint charts/iperf3-monitor` passes without any errors or warnings.
2025-06-19 20:45:00 +00:00
f6c26c02b1 feat: Add GitHub Actions CI workflow
Configure automated checks for pull requests including:

- Linting the Helm chart.
- Building the exporter Docker image.
- A placeholder for future tests.
2025-06-19 01:01:01 +05:30
16 changed files with 395 additions and 1607 deletions

83
.github/workflows/ci.yaml vendored Normal file
View File

@@ -0,0 +1,83 @@
name: CI
on:
pull_request:
branches: ["main"] # Or your main development branch
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
validate-chart:
name: Validate Helm Chart
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v3
- name: Set up Helm
uses: azure/setup-helm@v3
with:
version: v3.10.0
- name: Helm Lint
run: helm lint ./charts/iperf3-monitor
build:
name: Build Docker Image
runs-on: ubuntu-latest
permissions:
contents: read # Needed to checkout the repository
packages: write # Needed to push Docker images to GHCR
steps:
- name: Check out code
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
# Tag with the PR number if it's a pull request event
type=match,pattern=pull_request,value=pr-{{number}}
# Tag with the git SHA
type=sha,prefix=
# Tag with 'latest' if on the main branch (though this workflow only runs on PRs to main)
type=ref,event=branch,pattern=main,value=latest
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build Docker image
uses: docker/build-push-action@v4
with:
context: ./exporter
# Push the image if the event is a pull request.
# The workflow currently only triggers on pull_request events.
push: ${{ github.event_name == 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64
test:
name: Run Tests
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v3
# Replace this step with your actual test command(s)
- name: Placeholder Test Step
run: echo "No tests configured yet. Add your test commands here."

View File

@@ -36,6 +36,12 @@ jobs:
- name: Check out code
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Log in to GitHub Container Registry
uses: docker/login-action@v2
with:
@@ -56,6 +62,7 @@ jobs:
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64,linux/arm64
package-and-publish-chart:
name: Package and Publish Helm Chart
@@ -82,8 +89,8 @@ jobs:
- name: Set Chart Version from Tag
run: |
VERSION=$(echo "${{ github.ref_name }}" | sed 's/^v//')
yq e -i '.version = strenv(VERSION)' ./charts/iperf3-monitor/Chart.yaml
yq e -i '.appVersion = strenv(VERSION)' ./charts/iperf3-monitor/Chart.yaml
VERSION=$VERSION yq e -i '.version = strenv(VERSION)' ./charts/iperf3-monitor/Chart.yaml
VERSION=$VERSION yq e -i '.appVersion = strenv(VERSION)' ./charts/iperf3-monitor/Chart.yaml
cat ./charts/iperf3-monitor/Chart.yaml # Optional: print updated Chart.yaml
- name: Publish Helm chart

2
.gitignore vendored
View File

@@ -36,4 +36,4 @@ Thumbs.db
# Helm
!charts/iperf3-monitor/.helmignore
charts/*.tgz # Ignore packaged chart files
charts/iperf3-monitor/charts/

File diff suppressed because it is too large Load Diff

View File

@@ -37,7 +37,7 @@ This separation of concerns ensures scalability, resilience, and aligns with Kub
1. Add the Helm chart repository (replace with your actual repo URL once published):
```/dev/null/helm-install.sh#L1-1
helm repo add iperf3-monitor https://your-github-org.github.io/iperf3-monitor/
helm repo add iperf3-monitor https://malarinv.github.io/iperf3-monitor/
```
2. Update your Helm repositories:
@@ -78,7 +78,7 @@ exporter:
# -- Configuration for the exporter container image.
image:
# -- The container image repository for the exporter.
repository: ghcr.io/my-org/iperf3-prometheus-exporter # Replace with your repo URL
repository: ghcr.io/malarinv/iperf3-monitor
# -- The container image tag for the exporter. If not set, the chart's appVersion is used.
tag: ""
# -- The image pull policy for the exporter container.
@@ -430,8 +430,4 @@ The project includes a GitHub Actions workflow (`.github/workflows/release.yml`)
## License
This project is licensed under the terms defined in the `LICENSE` file.
```iperf3-monitor/LICENSE
This project is currently unlicensed. Please see the project's documentation or repository for licensing information when it becomes available.
```
This project is licensed under the GNU Affero General Public License v3. See the `LICENSE` file for details.

View File

@@ -0,0 +1,9 @@
dependencies:
- name: kube-prometheus-stack
repository: https://prometheus-community.github.io/helm-charts
version: 75.3.6
- name: prometheus-operator
repository: oci://tccr.io/truecharts
version: 11.5.1
digest: sha256:3000e63445f8ba8df601cb483f4f77d14c5c4662bff2d16ffcf5cf1f7def314b
generated: "2025-06-20T17:25:44.538372209+05:30"

View File

@@ -12,19 +12,23 @@ keywords:
- kubernetes
- prometheus
- grafana
home: https://github.com/malarinv/iperf3-monitor # Replace with your repo URL
home: https://github.com/malarinv/iperf3-monitor
sources:
- https://github.com/malarinv/iperf3-monitor # Replace with your repo URL
- https://github.com/malarinv/iperf3-monitor
maintainers:
- name: Malar Invention # Replace with your name
email: malarkannan.invention@gmail.com # Replace with your email
- name: Malar Invention
email: malarkannan.invention@gmail.com
icon: https://raw.githubusercontent.com/malarinv/iperf3-monitor/main/icon.png # Optional icon URL
annotations:
artifacthub.io/changes: |
- Add initial Helm chart structure.
artifacthub.io/category: networking
dependencies:
- name: prometheus-community/kube-prometheus-stack # Example dependency if you package the whole stack
- name: kube-prometheus-stack # Example dependency if you package the whole stack
version: ">=30.0.0" # Specify a compatible version range
repository: https://prometheus-community.github.io/helm-charts
condition: serviceMonitor.enabled # Only include if ServiceMonitor is enabled (assuming Prometheus Operator)
condition: "dependencies.install, serviceMonitor.enabled, !dependencies.useTrueChartsPrometheusOperator"
- name: prometheus-operator
version: ">=8.11.1"
repository: "oci://tccr.io/truecharts"
condition: "dependencies.install, serviceMonitor.enabled, dependencies.useTrueChartsPrometheusOperator"

View File

@@ -29,9 +29,9 @@ Create chart's labels
{{- define "iperf3-monitor.labels" -}}
helm.sh/chart: {{ include "iperf3-monitor.name" . }}-{{ .Chart.Version | replace "+" "_" }}
{{ include "iperf3-monitor.selectorLabels" . }}
{{- if .Chart.AppVersion -}}
{{ if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end -}}
{{ end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end -}}
@@ -41,15 +41,15 @@ Selector labels
{{- define "iperf3-monitor.selectorLabels" -}}
app.kubernetes.io/name: {{ include "iperf3-monitor.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end -}}
{{ end }}
{{/*
Create the name of the service account to use
*/}}
{{- define "iperf3-monitor.serviceAccountName" -}}
{{- if .Values.serviceAccount.create -}}
{{- if .Values.rbac.create -}}
{{- default (include "iperf3-monitor.fullname" .) .Values.serviceAccount.name -}}
{{- else -}}
{{- default "default" .Values.serviceAccount.name -}}
{{- end -}}
{{- end -}}
{{- end -}}

View File

@@ -34,6 +34,8 @@ spec:
value: "{{ .Values.exporter.testInterval }}"
- name: IPERF_TEST_PROTOCOL
value: "{{ .Values.exporter.testProtocol }}"
- name: LOG_LEVEL
value: "{{ .Values.exporter.logLevel }}"
- name: IPERF_SERVER_PORT
value: "5201" # Hardcoded as per server DaemonSet
- name: IPERF_SERVER_NAMESPACE
@@ -41,7 +43,7 @@ spec:
fieldRef:
fieldPath: metadata.namespace
- name: IPERF_SERVER_LABEL_SELECTOR
value: "app.kubernetes.io/name={{ include \"iperf3-monitor.name\" . }},app.kubernetes.io/instance={{ .Release.Name }},app.kubernetes.io/component=server"
value: 'app.kubernetes.io/name={{ include "iperf3-monitor.name" . }},app.kubernetes.io/instance={{ .Release.Name }},app.kubernetes.io/component=server'
{{- with .Values.exporter.resources }}
resources:
{{- toYaml . | nindent 10 }}

View File

@@ -41,4 +41,4 @@ spec:
{{- with .Values.server.resources }}
resources:
{{- toYaml . | nindent 12 }}
{{- end }}
{{- end }}

View File

@@ -12,7 +12,7 @@ exporter:
# -- Configuration for the exporter container image.
image:
# -- The container image repository for the exporter.
repository: ghcr.io/malarinv/iperf3-prometheus-exporter # Replace with your repo URL
repository: ghcr.io/malarinv/iperf3-monitor
# -- The container image tag for the exporter. If not set, the chart's appVersion is used.
tag: ""
# -- The image pull policy for the exporter container.
@@ -24,6 +24,9 @@ exporter:
# -- Interval in seconds between complete test cycles (i.e., testing all server nodes).
testInterval: 300
# -- Log level for the iperf3 exporter (e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL).
logLevel: INFO
# -- Timeout in seconds for a single iperf3 test run.
testTimeout: 10
@@ -85,7 +88,7 @@ rbac:
serviceAccount:
# -- The name of the ServiceAccount to use for the exporter pod.
# Only used if rbac.create is false. If not set, it defaults to the chart's fullname.
name: ""
name: "iperf3-monitor"
serviceMonitor:
# -- If true, create a ServiceMonitor resource for integration with Prometheus Operator.
@@ -118,3 +121,19 @@ networkPolicy:
namespaceSelector: {}
# -- Specify pod selectors if needed.
podSelector: {}
# -----------------------------------------------------------------------------
# Dependency Configuration
# -----------------------------------------------------------------------------
dependencies:
# -- Set to true to install Prometheus operator dependency if serviceMonitor.enabled is also true.
# -- Set to false to disable the installation of Prometheus operator dependency,
# -- regardless of serviceMonitor.enabled. This is useful if you have Prometheus
# -- Operator installed and managed separately in your cluster.
install: true
# -- Set to true to use the TrueCharts Prometheus Operator instead of kube-prometheus-stack.
# This chart's ServiceMonitor resources require a Prometheus Operator to be functional.
# If serviceMonitor.enabled is true and dependencies.install is true,
# one of these two dependencies will be pulled based on this flag.
useTrueChartsPrometheusOperator: false

14
devbox.json Normal file
View File

@@ -0,0 +1,14 @@
{
"$schema": "https://raw.githubusercontent.com/jetify-com/devbox/0.13.7/.schema/devbox.schema.json",
"packages": [],
"shell": {
"init_hook": [
"echo 'Welcome to devbox!' > /dev/null"
],
"scripts": {
"test": [
"echo \"Error: no test specified\" && exit 1"
]
}
}
}

4
devbox.lock Normal file
View File

@@ -0,0 +1,4 @@
{
"lockfile_version": "1",
"packages": {}
}

View File

@@ -1,6 +1,8 @@
# Stage 1: Build stage with dependencies
FROM python:3.9-slim as builder
# Declare TARGETARCH for use in this stage
ARG TARGETARCH
WORKDIR /app
# Install iperf3 and build dependencies
@@ -8,6 +10,20 @@ RUN apt-get update && \
apt-get install -y --no-install-recommends gcc iperf3 libiperf-dev && \
rm -rf /var/lib/apt/lists/*
# Determine the correct libiperf source directory based on TARGETARCH
# and copy libiperf.so.0 to a canonical temporary location /tmp/lib/ within the builder stage.
RUN echo "Builder stage TARGETARCH: ${TARGETARCH}" && \
LIBIPERF_SRC_DIR_SEGMENT="" && \
if [ "${TARGETARCH}" = "amd64" ]; then \
LIBIPERF_SRC_DIR_SEGMENT="x86_64-linux-gnu"; \
elif [ "${TARGETARCH}" = "arm64" ]; then \
LIBIPERF_SRC_DIR_SEGMENT="aarch64-linux-gnu"; \
else \
echo "Unsupported TARGETARCH in builder: ${TARGETARCH}" && exit 1; \
fi && \
mkdir -p /tmp/lib && \
cp "/usr/lib/${LIBIPERF_SRC_DIR_SEGMENT}/libiperf.so.0" /tmp/lib/libiperf.so.0
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
@@ -17,9 +33,11 @@ FROM python:3.9-slim
WORKDIR /app
# Copy iperf3 binary and library from the builder stage
# Copy iperf3 binary from the builder stage
COPY --from=builder /usr/bin/iperf3 /usr/bin/iperf3
COPY --from=builder /usr/lib/x86_64-linux-gnu/libiperf.so.0 /usr/lib/x86_64-linux-gnu/libiperf.so.0
# Copy the prepared libiperf.so.0 from the builder's canonical temporary location
# into a standard library path in the final image.
COPY --from=builder /tmp/lib/libiperf.so.0 /usr/lib/libiperf.so.0
# Copy installed Python packages from the builder stage
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages

View File

@@ -1,28 +1,60 @@
"""
Prometheus exporter for iperf3 network performance monitoring.
This script runs iperf3 tests between the node it's running on (source) and
other iperf3 server pods discovered in a Kubernetes cluster. It then exposes
these metrics for Prometheus consumption.
Configuration is primarily through environment variables and command-line arguments
for log level.
"""
import os
import time
import logging
import argparse
import sys
from kubernetes import client, config
from prometheus_client import start_http_server, Gauge
import iperf3
# --- Configuration ---
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# --- Global Configuration & Setup ---
# Argument parsing for log level configuration
# The command-line --log-level argument takes precedence over the LOG_LEVEL env var.
# Defaults to INFO if neither is set.
parser = argparse.ArgumentParser(description="iperf3 Prometheus exporter.")
parser.add_argument(
'--log-level',
default=os.environ.get('LOG_LEVEL', 'INFO').upper(),
choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'],
help='Set the logging level. Overrides LOG_LEVEL environment variable. (Default: INFO)'
)
args = parser.parse_args()
log_level_str = args.log_level
# Convert log level string (e.g., 'INFO') to its numeric representation (e.g., logging.INFO)
numeric_level = getattr(logging, log_level_str.upper(), None)
if not isinstance(numeric_level, int):
# This case should ideally not be reached if choices in argparse are respected.
logging.error(f"Invalid log level: {log_level_str}. Defaulting to INFO.")
numeric_level = logging.INFO
logging.basicConfig(level=numeric_level, format='%(asctime)s - %(levelname)s - %(message)s')
# --- Prometheus Metrics Definition ---
# These gauges will be used to expose iperf3 test results.
IPERF_BANDWIDTH_MBPS = Gauge(
'iperf_network_bandwidth_mbps',
'Network bandwidth measured by iperf3 in Megabits per second',
'Network bandwidth measured by iperf3 in Megabits per second (Mbps)',
['source_node', 'destination_node', 'protocol']
)
IPERF_JITTER_MS = Gauge(
'iperf_network_jitter_ms',
'Network jitter measured by iperf3 in milliseconds',
'Network jitter measured by iperf3 in milliseconds (ms) for UDP tests',
['source_node', 'destination_node', 'protocol']
)
IPERF_PACKETS_TOTAL = Gauge(
'iperf_network_packets_total',
'Total packets transmitted or received during the iperf3 test',
'Total packets transmitted/received during the iperf3 UDP test',
['source_node', 'destination_node', 'protocol']
)
IPERF_LOST_PACKETS = Gauge(
@@ -38,12 +70,21 @@ IPERF_TEST_SUCCESS = Gauge(
def discover_iperf_servers():
"""
Discover iperf3 server pods in the cluster using the Kubernetes API.
Discovers iperf3 server pods within a Kubernetes cluster.
It uses the in-cluster Kubernetes configuration to connect to the API.
The target namespace and label selector for iperf3 server pods are configured
via environment variables:
- IPERF_SERVER_NAMESPACE (default: 'default')
- IPERF_SERVER_LABEL_SELECTOR (default: 'app=iperf3-server')
Returns:
list: A list of dictionaries, where each dictionary contains the 'ip'
and 'node_name' of a discovered iperf3 server pod. Returns an
empty list if discovery fails or no servers are found.
"""
try:
# Load in-cluster configuration
# Assumes the exporter runs in a pod with a service account having permissions
config.load_incluster_config()
config.load_incluster_config() # Assumes running inside a Kubernetes pod
v1 = client.CoreV1Api()
namespace = os.getenv('IPERF_SERVER_NAMESPACE', 'default')
@@ -51,110 +92,206 @@ def discover_iperf_servers():
logging.info(f"Discovering iperf3 servers with label '{label_selector}' in namespace '{namespace}'")
# List pods across all namespaces with the specified label selector
# Note: list_pod_for_all_namespaces requires cluster-wide permissions
ret = v1.list_pod_for_all_namespaces(label_selector=label_selector, watch=False)
servers = []
for i in ret.items:
# Ensure pod has an IP and is running
if i.status.pod_ip and i.status.phase == 'Running':
for item in ret.items:
if item.status.pod_ip and item.status.phase == 'Running':
servers.append({
'ip': i.status.pod_ip,
'node_name': i.spec.node_name
'ip': item.status.pod_ip,
'node_name': item.spec.node_name # Node where the iperf server pod is running
})
logging.info(f"Discovered {len(servers)} iperf3 server pods.")
return servers
except config.ConfigException as e:
logging.error(f"Kubernetes config error: {e}. Is the exporter running in a cluster with RBAC permissions?")
return []
except Exception as e:
logging.error(f"Error discovering iperf servers: {e}")
return [] # Return empty list on error to avoid crashing the loop
return [] # Return empty list on error to avoid crashing the main loop
def run_iperf_test(server_ip, server_port, protocol, source_node, dest_node):
def run_iperf_test(server_ip, server_port, protocol, source_node_name, dest_node_name):
"""
Runs a single iperf3 test and updates Prometheus metrics.
Runs a single iperf3 test against a specified server and publishes metrics.
Args:
server_ip (str): The IP address of the iperf3 server.
server_port (int): The port number of the iperf3 server.
protocol (str): The protocol to use ('tcp' or 'udp').
source_node_name (str): The name of the source node (where this exporter is running).
dest_node_name (str): The name of the destination node (where the server is running).
The test duration is controlled by the IPERF_TEST_DURATION environment variable
(default: 5 seconds).
"""
logging.info(f"Running iperf3 test from {source_node} to {dest_node} ({server_ip}:{server_port}) using {protocol.upper()}")
logging.info(f"Running iperf3 {protocol.upper()} test from {source_node_name} to {dest_node_name} ({server_ip}:{server_port})")
client = iperf3.Client()
client.server_hostname = server_ip
client.port = server_port
client.protocol = protocol
# Duration of the test (seconds)
client.duration = int(os.getenv('IPERF_TEST_DURATION', 5))
# Output results as JSON for easy parsing
client.json_output = True
iperf_client = iperf3.Client()
iperf_client.server_hostname = server_ip
iperf_client.port = server_port
iperf_client.protocol = protocol
iperf_client.duration = int(os.getenv('IPERF_TEST_DURATION', 5)) # Test duration in seconds
iperf_client.json_output = True # Enables easy parsing of results
result = client.run()
# Parse results and update metrics
parse_and_publish_metrics(result, source_node, dest_node, protocol)
def parse_and_publish_metrics(result, source_node, dest_node, protocol):
"""
Parses the iperf3 result and updates Prometheus gauges.
Handles both successful and failed tests.
"""
labels = {'source_node': source_node, 'destination_node': dest_node, 'protocol': protocol}
if result and result.error:
logging.error(f"Test from {source_node} to {dest_node} failed: {result.error}")
try:
result = iperf_client.run()
parse_and_publish_metrics(result, source_node_name, dest_node_name, protocol)
except Exception as e:
# Catch unexpected errors during client.run() or parsing
logging.error(f"Exception during iperf3 test or metric parsing for {dest_node_name}: {e}")
labels = {'source_node': source_node_name, 'destination_node': dest_node_name, 'protocol': protocol}
IPERF_TEST_SUCCESS.labels(**labels).set(0)
# Set metrics to 0 on failure
try:
IPERF_BANDWIDTH_MBPS.labels(**labels).set(0)
IPERF_JITTER_MS.labels(**labels).set(0)
IPERF_PACKETS_TOTAL.labels(**labels).set(0)
IPERF_LOST_PACKETS.labels(**labels).set(0)
except KeyError:
# Labels might not be registered yet if this is the first failure
pass
logging.debug(f"KeyError setting failure metrics for {labels} after client.run() exception.")
def parse_and_publish_metrics(result, source_node, dest_node, protocol):
"""
Parses the iperf3 test result and updates Prometheus gauges.
Args:
result (iperf3.TestResult): The result object from the iperf3 client.
source_node (str): Name of the source node.
dest_node (str): Name of the destination node.
protocol (str): Protocol used for the test ('tcp' or 'udp').
"""
labels = {'source_node': source_node, 'destination_node': dest_node, 'protocol': protocol}
# Handle failed tests (e.g., server unreachable) or missing result object
if not result or result.error:
error_message = result.error if result and result.error else "No result object from iperf3 client"
logging.warning(f"Test from {source_node} to {dest_node} ({protocol.upper()}) failed: {error_message}")
IPERF_TEST_SUCCESS.labels(**labels).set(0)
# Set all relevant metrics to 0 on failure to clear stale values from previous successes
try:
IPERF_BANDWIDTH_MBPS.labels(**labels).set(0)
IPERF_JITTER_MS.labels(**labels).set(0) # Applicable for UDP, zeroed for TCP later
IPERF_PACKETS_TOTAL.labels(**labels).set(0) # Applicable for UDP, zeroed for TCP later
IPERF_LOST_PACKETS.labels(**labels).set(0) # Applicable for UDP, zeroed for TCP later
except KeyError:
# This can happen if labels were never registered due to continuous failures
logging.debug(f"KeyError when setting failure metrics for {labels}. Gauges might not be initialized.")
return
if not result:
logging.error(f"Test from {source_node} to {dest_node} failed to return a result object.")
IPERF_TEST_SUCCESS.labels(**labels).set(0)
try:
IPERF_BANDWIDTH_MBPS.labels(**labels).set(0)
IPERF_JITTER_MS.labels(**labels).set(0)
IPERF_PACKETS_TOTAL.labels(**labels).set(0)
IPERF_LOST_PACKETS.labels(**labels).set(0)
except KeyError:
pass
return
# If we reach here, the test itself was successful in execution
IPERF_TEST_SUCCESS.labels(**labels).set(1)
# The summary data is typically in result.json['end']['sum_sent'] or result.json['end']['sum_received']
# The iperf3-python client often exposes this directly as attributes like sent_Mbps or received_Mbps
# For TCP, we usually care about the received bandwidth on the client side (which is the exporter)
# For UDP, the client report contains jitter, lost packets, etc.
# Determine bandwidth:
# Order of preference: received_Mbps, sent_Mbps, Mbps, then JSON fallbacks.
# received_Mbps is often most relevant for TCP client perspective.
# sent_Mbps can be relevant for UDP or as a TCP fallback.
bandwidth_mbps = 0
if hasattr(result, 'received_Mbps') and result.received_Mbps is not None:
bandwidth_mbps = result.received_Mbps
elif hasattr(result, 'sent_Mbps') and result.sent_Mbps is not None:
# Fallback, though received_Mbps is usually more relevant for TCP client
bandwidth_mbps = result.sent_Mbps
# Add a check for the raw JSON output structure as a fallback
elif result.json and 'end' in result.json and 'sum_received' in result.json['end'] and result.json['end']['sum_received']['bits_per_second'] is not None:
bandwidth_mbps = result.json['end']['sum_received']['bits_per_second'] / 1000000
elif result.json and 'end' in result.json and 'sum_sent' in result.json['end'] and result.json['end']['sum_sent']['bits_per_second'] is not None:
bandwidth_mbps = result.json['end']['sum_sent']['bits_per_second'] / 1000000
elif hasattr(result, 'Mbps') and result.Mbps is not None: # General attribute from iperf3 library
bandwidth_mbps = result.Mbps
# Fallback to raw JSON if direct attributes are None or missing
elif result.json:
# Prefer received sum, then sent sum from the JSON output's 'end' summary
if 'end' in result.json and 'sum_received' in result.json['end'] and \
result.json['end']['sum_received'].get('bits_per_second') is not None:
bandwidth_mbps = result.json['end']['sum_received']['bits_per_second'] / 1000000.0
elif 'end' in result.json and 'sum_sent' in result.json['end'] and \
result.json['end']['sum_sent'].get('bits_per_second') is not None:
bandwidth_mbps = result.json['end']['sum_sent']['bits_per_second'] / 1000000.0
IPERF_BANDWIDTH_MBPS.labels(**labels).set(bandwidth_mbps)
# UDP specific metrics
if protocol == 'udp':
# iperf3-python exposes UDP results directly
IPERF_JITTER_MS.labels(**labels).set(result.jitter_ms if hasattr(result, 'jitter_ms') and result.jitter_ms is not None else 0)
IPERF_PACKETS_TOTAL.labels(**labels).set(result.packets if hasattr(result, 'packets') and result.packets is not None else 0)
IPERF_LOST_PACKETS.labels(**labels).set(result.lost_packets if hasattr(result, 'lost_packets') and result.lost_packets is not None else 0)
# These attributes are specific to UDP tests in iperf3
IPERF_JITTER_MS.labels(**labels).set(getattr(result, 'jitter_ms', 0) if result.jitter_ms is not None else 0)
IPERF_PACKETS_TOTAL.labels(**labels).set(getattr(result, 'packets', 0) if result.packets is not None else 0)
IPERF_LOST_PACKETS.labels(**labels).set(getattr(result, 'lost_packets', 0) if result.lost_packets is not None else 0)
else:
# Ensure UDP metrics are zeroed or absent for TCP tests
# For TCP tests, ensure UDP-specific metrics are set to 0
try:
IPERF_JITTER_MS.labels(**labels).set(0)
IPERF_PACKETS_TOTAL.labels(**labels).set(0)
IPERF_LOST_PACKETS.labels(**labels).set(0)
except KeyError:
# Can occur if labels not yet registered (e.g. first test is TCP)
logging.debug(f"KeyError for {labels} when zeroing UDP metrics for TCP test.")
pass
def main_loop():
"""
Main operational loop of the iperf3 exporter.
This loop periodically:
1. Fetches configuration from environment variables:
- IPERF_TEST_INTERVAL (default: 300s): Time between test cycles.
- IPERF_SERVER_PORT (default: 5201): Port for iperf3 servers.
- IPERF_TEST_PROTOCOL (default: 'tcp'): 'tcp' or 'udp'.
- SOURCE_NODE_NAME (critical): Name of the node this exporter runs on.
2. Discovers iperf3 server pods in the Kubernetes cluster.
3. Runs iperf3 tests against each discovered server (unless it's on the same node).
4. Sleeps for the configured test interval.
If SOURCE_NODE_NAME is not set, the script will log an error and exit.
"""
# Fetch operational configuration from environment variables
test_interval = int(os.getenv('IPERF_TEST_INTERVAL', 300))
server_port = int(os.getenv('IPERF_SERVER_PORT', 5201))
protocol = os.getenv('IPERF_TEST_PROTOCOL', 'tcp').lower() # Ensure lowercase
source_node_name = os.getenv('SOURCE_NODE_NAME')
# SOURCE_NODE_NAME is crucial for labeling metrics correctly.
if not source_node_name:
logging.error("CRITICAL: SOURCE_NODE_NAME environment variable not set. This is required. Exiting.")
sys.exit(1)
logging.info(
f"Exporter configured. Source Node: {source_node_name}, "
f"Test Interval: {test_interval}s, Server Port: {server_port}, Protocol: {protocol.upper()}"
)
while True:
logging.info("Starting new iperf test cycle...")
servers = discover_iperf_servers()
if not servers:
logging.warning("No iperf servers discovered in this cycle. Check K8s setup and RBAC permissions.")
else:
for server in servers:
dest_node_name = server.get('node_name', 'unknown_destination_node') # Default if key missing
server_ip = server.get('ip')
if not server_ip:
logging.warning(f"Discovered server entry missing an IP: {server}. Skipping.")
continue
# Avoid testing a node against itself
if dest_node_name == source_node_name:
logging.info(f"Skipping test to self: {source_node_name} to {server_ip} (on same node: {dest_node_name}).")
continue
run_iperf_test(server_ip, server_port, protocol, source_node_name, dest_node_name)
logging.info(f"Test cycle completed. Sleeping for {test_interval} seconds.")
time.sleep(test_interval)
if __name__ == '__main__':
# Initial logging (like log level) is configured globally at the start of the script.
# Fetch Prometheus exporter listen port from environment variable
listen_port = int(os.getenv('LISTEN_PORT', 9876))
try:
# Start the Prometheus HTTP server to expose metrics.
start_http_server(listen_port)
logging.info(f"Prometheus exporter listening on port {listen_port}")
except Exception as e:
logging.error(f"Failed to start Prometheus HTTP server on port {listen_port}: {e}")
sys.exit(1) # Exit if the metrics server cannot start
# Enter the main operational loop.
# main_loop() contains its own critical checks (e.g., SOURCE_NODE_NAME) and will exit if necessary.
main_loop()