DaemonSet Vs Deployment Kind for Open Telemetry Collector

Optimizing Telemetry Collection with DaemonSets and Deployments Across Two Kubernetes Clusters

Efficient telemetry collection is crucial for maintaining the health and performance of applications and infrastructure in modern cloud environments. In this blog, we explore a telemetry setup involving two Kubernetes clusters: Azure Cluster1 and AKS Cluster2, using both DaemonSets and Deployments to manage and aggregate telemetry data.


1. Overview of the Setup

Azure Cluster1:

  • Applications: app1, app2, and app3 are running with the OpenTelemetry Java agent configured to send Metrics, Events, Logs, and Traces (MELT) directly to a central collector.

  • Collector1: An OpenTelemetry Collector deployed within Azure Cluster1 to gather and forward infrastructure telemetry.

AKS Cluster2:

  • Collector2: A central OpenTelemetry Collector deployed in AKS Cluster2 to receive, aggregate, and process telemetry data from various sources, including both infrastructure telemetry from Azure Cluster1 application and infrastructure telemetry from AKS Cluster2.

2. Telemetry Collection Flow

Application Telemetry:

  • Direct to Central Collector: The OpenTelemetry agents in app1, app2, and app3 send their application telemetry directly to Collector2 in AKS Cluster2. This centralizes the application MELT data for easier aggregation and analysis.

Infrastructure Telemetry:

  • From Collector1: Collector1 in Azure Cluster1 collects node-level metrics and logs from Azure Cluster1 and forwards this infrastructure telemetry to Collector2 in AKS Cluster2.

  • From DaemonSet: Collector2 in AKS Cluster2 uses a DaemonSet to collect infrastructure telemetry from each node within AKS Cluster2.


3. Configuration in Azure Cluster1

DaemonSet for Infrastructure Telemetry:

  • Purpose: Collects node-level metrics and logs from each node in Azure Cluster1.

  • Configuration:

    • DaemonSet ensures that a collector instance runs on every node in Azure Cluster1.

    • Each node-specific collector gathers metrics such as CPU usage, memory consumption, and logs from the kubelet and containers.

    • The collected infrastructure data is forwarded to Collector2 in AKS Cluster2.

Advantages:

  • Comprehensive Node Coverage: Ensures that infrastructure data is collected from all nodes in Azure Cluster1.

  • Simplicity: Focuses on infrastructure monitoring, as application telemetry is managed directly by the central collector.


4. Configuration in AKS Cluster2

DaemonSet for Infrastructure Telemetry:

  • Purpose: Collects node-level metrics and logs from each node in AKS Cluster2.

  • Configuration:

    • DaemonSet ensures that a collector instance is running on every node in AKS Cluster2.

    • Gathers metrics like CPU, memory, and other node-level data.

Deployment for Centralized Aggregation:

  • Purpose: Acts as the central collector for application telemetry and aggregates data from multiple sources.

  • Configuration:

    • Deployment scales horizontally to handle the influx of telemetry data from Collector1 and directly from the application agents in Azure Cluster1.

    • Responsibilities:

      • Aggregates application MELT from app1, app2, and app3.

      • Processes infrastructure telemetry received from Collector1 and from the DaemonSet running in AKS Cluster2.

Advantages:

  • Centralized Processing: Efficiently manages and processes all telemetry data from multiple sources.

  • Scalability: Handles a large volume of telemetry data with horizontal scaling.


5. Summary

In this optimized telemetry setup:

  • Azure Cluster1:

    • DaemonSet: Used to collect node-specific infrastructure telemetry.

    • Deployment: Not needed in this cluster, as application telemetry is handled directly by Collector2.

  • AKS Cluster2:

    • DaemonSet: Used to collect node-specific infrastructure telemetry from AKS Cluster2.

    • Deployment: Used as the central collector to aggregate and process both application telemetry from Azure Cluster1 and infrastructure telemetry from both clusters.

This configuration ensures comprehensive monitoring and efficient telemetry management across both clusters, leveraging Kubernetes' capabilities to maintain robust observability.