OpenTelemetry Tracing¶
Experimental Feature
OpenTelemetry tracing support is currently experimental. The configuration options, span attributes, and instrumentation details may change in the future releases. We encourage you to try it out and provide feedback, but be aware that breaking changes may occur as we refine the implementation.
The argocd-agent supports distributed tracing using OpenTelemetry to help you understand the flow of resources from the principal to agents and identify performance bottlenecks. The tracing integration uses OpenTelemetry with the OTLP (OpenTelemetry Protocol) exporter, which can send traces to any OTLP-compatible backend such as Jaeger, Grafana Tempo, Datadog, New Relic, etc.
Overview¶
Tracing provides visibility into: - gRPC Communication: All gRPC calls between principal and agents are automatically traced - Resource Synchronization: Track when resources are created on the principal and synced to agents - Event Processing: Monitor the processing of events in both principal and agent - Kubernetes Operations: Observe Kubernetes API calls made by the system
Quick Start with Jaeger¶
The easiest way to get started is to use Jaeger for local development.
1. Start Jaeger¶
Using Docker:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
2. Enable Tracing in Principal¶
Start the principal with tracing enabled:
argocd-agent principal \
--otlp-address=localhost:4317 \
--otlp-insecure=true \
# ... other flags
Or using environment variables:
export ARGOCD_PRINCIPAL_OTLP_ADDRESS=localhost:4317
export ARGOCD_PRINCIPAL_OTLP_INSECURE=true
argocd-agent principal # ... other flags
3. Enable Tracing in Agent¶
Start the agent with tracing enabled:
argocd-agent agent \
--otlp-address=localhost:4317 \
--otlp-insecure=true \
# ... other flags
Or using environment variables:
export ARGOCD_AGENT_OTLP_ADDRESS=localhost:4317
export ARGOCD_AGENT_OTLP_INSECURE=true
argocd-agent agent # ... other flags
4. View Traces¶
Open Jaeger UI at http://localhost:16686 and select:
- Service: principal or agent
- Click "Find Traces" to see the traces
Configuration Options¶
The same configuration flags apply to both the principal and agent components, using their respective environment variable prefixes.
Principal Configuration¶
| Flag | Environment Variable | Default | Description |
|---|---|---|---|
--otlp-address |
ARGOCD_PRINCIPAL_OTLP_ADDRESS |
localhost:4317 |
OTLP collector endpoint address |
--otlp-insecure |
ARGOCD_PRINCIPAL_OTLP_INSECURE |
false |
Use insecure connection to OTLP endpoint |
Agent Configuration¶
| Flag | Environment Variable | Default | Description |
|---|---|---|---|
--otlp-address |
ARGOCD_AGENT_OTLP_ADDRESS |
localhost:4317 |
OTLP collector endpoint address |
--otlp-insecure |
ARGOCD_AGENT_OTLP_INSECURE |
false |
Use insecure connection to OTLP endpoint |
Trace Attributes¶
The following custom attributes are added to spans to provide context:
Common Attributes¶
argocd.component.type: Component from which the event originatesargocd.agent.name: Name of the agentargocd.agent.mode: Mode of the agent (autonomous or managed)
Resource Attributes¶
k8s.resource.kind: Kubernetes resource kind (Application, AppProject, Repository)k8s.resource.name: Resource namek8s.resource.namespace: Resource namespacek8s.resource.uid: Resource UID
Event Attributes¶
argocd.event.type: Type of event (Create, SpecUpdate, Delete, etc.)argocd.event.id: Unique event identifierargocd.operation.type: Operation type (create, update, delete, get, list, etc.)
Debugging with Traces¶
Example 1: Why is my resource not syncing?
Look for traces related to your application name. Check: - Is the principal creating the event? (Look for event spans from the principal) - Is the agent receiving the event? (Look for event spans from the agent) - Is the event processing succeeding? (Check for error status in spans)
Example 2: What's causing slow syncs?
Sort traces by duration to find slow operations: - Slow gRPC calls might indicate network issues - Slow Kubernetes operations might indicate API server load - Large span duration in event processing might indicate resource processing issues
Security Considerations¶
- Use
--otlp-insecure=falsein production and configure proper TLS certificates - Ensure your OTLP endpoint is properly secured and not exposed publicly
- Review trace data retention policies to comply with your data governance requirements
Troubleshooting¶
Traces not appearing in backend¶
- Verify tracing is enabled:
--otlp-address - Check the OTLP endpoint is correct and reachable
- Check application logs for tracing initialization messages:
OpenTelemetry tracing initialized (address=localhost:4317) - Verify your tracing backend is running and accepting OTLP data on the correct port