OpenTelemetry Distributed Tracing Implementation Guide

July 31st, 2025How To Guides, Resources, Getting Started

15 min read

Distributed tracing has become essential for understanding the performance and behavior of modern microservices architectures. As applications become more complex with multiple services communicating across different environments, traditional logging and metrics alone are insufficient for debugging performance issues and understanding request flows. OpenTelemetry provides a standardized approach to distributed tracing that works across different programming languages, frameworks, and observability backends.

This comprehensive guide will walk you through implementing distributed tracing with OpenTelemetry, from basic setup to advanced configuration and integration with Logit.io. We'll cover the core concepts, practical implementation steps, best practices, and real-world scenarios to help you build robust observability into your applications.

Contents

Understanding Distributed Tracing Fundamentals
- Core Concepts
- Benefits of Distributed Tracing
OpenTelemetry Architecture Overview
- OpenTelemetry Components
Setting Up OpenTelemetry in Your Environment
- Installing OpenTelemetry Collector
  - Docker Installation
  - Kubernetes Installation
- Configuring Logit.io Integration
  - Logit.io OpenTelemetry Endpoint
  - Authentication and Security
Instrumenting Applications with OpenTelemetry
Advanced OpenTelemetry Configuration
Integrating with Logit.io for Distributed Tracing
Best Practices for Distributed Tracing
Real-World Implementation Scenarios
- E-commerce Application Example
  - Service Architecture
  - Trace Flow Example
- Database and External API Tracing
  - Database Tracing
  - External API Tracing
Monitoring and Alerting with Traces
- Trace-Based Metrics
  - Service Performance Metrics
  - Business Metrics
- Alerting Strategies
  - Performance Alerts
  - Error Rate Alerts
Conclusion

Understanding Distributed Tracing Fundamentals

Before diving into implementation, it's crucial to understand the core concepts of distributed tracing and how OpenTelemetry standardizes these concepts across different platforms and languages.

Core Concepts

Distributed tracing follows several key concepts that help you understand request flow through your system:

Trace: A complete request journey through your distributed system, containing all spans from start to finish
Span: A unit of work within a trace, representing an operation like an HTTP request, database query, or function call
Span Context: Information that identifies a span and its relationship to other spans in the trace
Baggage: Key-value pairs that can be propagated across service boundaries for correlation
Sampling: The process of deciding which traces to collect and send to the observability backend

OpenTelemetry provides a vendor-neutral, language-agnostic specification for distributed tracing that ensures compatibility across different observability platforms and tools.

Benefits of Distributed Tracing

Implementing distributed tracing provides several key benefits:

Request flow visualization: See exactly how requests flow through your microservices
Performance bottleneck identification: Identify slow operations and bottlenecks in your system
Error correlation: Correlate errors across multiple services to understand root causes
Dependency mapping: Understand service dependencies and communication patterns
Capacity planning: Use trace data to understand resource usage and plan capacity
Debugging efficiency: Reduce time to resolution for complex issues

OpenTelemetry Architecture Overview

OpenTelemetry provides a comprehensive observability framework that includes distributed tracing, metrics, and logs. Understanding the architecture helps you make informed decisions about your implementation.

OpenTelemetry Components

The OpenTelemetry ecosystem consists of several key components:

1. OpenTelemetry SDK

The SDK provides the core functionality for instrumenting your applications:

API: Defines the interfaces for creating traces, spans, and metrics
SDK: Implements the API and provides configuration options
Instrumentation Libraries: Pre-built instrumentation for common frameworks and libraries

2. OpenTelemetry Collector

The Collector is a vendor-agnostic implementation for receiving, processing, and exporting telemetry data:

Receivers: Accept data from various sources (OTLP, Jaeger, Zipkin, etc.)
Processors: Transform, filter, and enrich telemetry data
Exporters: Send data to various backends (Logit.io, Jaeger, Zipkin, etc.)
Extensions: Provide additional functionality like health monitoring

3. OpenTelemetry Protocol (OTLP)

OTLP is the standard protocol for sending telemetry data between OpenTelemetry components:

Language agnostic: Works across all supported programming languages
Efficient: Optimized for high-throughput scenarios
Extensible: Supports custom attributes and metadata

Setting Up OpenTelemetry in Your Environment

Before implementing distributed tracing, you need to set up the OpenTelemetry infrastructure in your environment. This includes installing the OpenTelemetry Collector and configuring it to work with Logit.io.

Installing OpenTelemetry Collector

The OpenTelemetry Collector can be installed in several ways depending on your environment:

Docker Installation

For containerized environments, use the official Docker image:

# Pull the latest OpenTelemetry Collector image docker pull otel/opentelemetry-collector:latest Create a configuration file cat > otel-collector-config.yaml << EOF receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: timeout: 1s send_batch_size: 1024 memory_limiter: check_interval: 1s limit_mib: 1500 exporters: otlp: endpoint: https://your-logit-endpoint:4317 headers: authorization: "Bearer your-logit-api-key" tls: insecure: false service: pipelines: traces: receivers: [otlp] processors: [batch, memory_limiter] exporters: [otlp] EOF Run the collector

docker run -d --name otel-collector -p 4317:4317 -p 4318:4318 -v $(pwd)/otel-collector-config.yaml:/etc/otel-collector-config.yaml otel/opentelemetry-collector:latest --config /etc/otel-collector-config.yaml

Kubernetes Installation

For Kubernetes environments, use the OpenTelemetry Operator:

# Install the OpenTelemetry Operator kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml Create a collector configuration cat > otel-collector.yaml << EOF apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel-collector spec: config: | receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: timeout: 1s send_batch_size: 1024 memory_limiter: check_interval: 1s limit_mib: 1500 exporters: otlp: endpoint: https://your-logit-endpoint:4317 headers: authorization: "Bearer your-logit-api-key" tls: insecure: false service: pipelines: traces: receivers: [otlp] processors: [batch, memory_limiter] exporters: [otlp] EOF

kubectl apply -f otel-collector.yaml

Configuring Logit.io Integration

Logit.io provides native support for OpenTelemetry, making it easy to send traces from your applications:

Logit.io OpenTelemetry Endpoint

Configure your OpenTelemetry Collector to send data to Logit.io:

# Logit.io OpenTelemetry configuration
exporters:
  otlp:
    endpoint: https://your-logit-endpoint:4317
    headers:
      authorization: "Bearer your-logit-api-key"
    tls:
      insecure: false
    timeout: 30s
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

Authentication and Security

Secure your OpenTelemetry connection to Logit.io:

API Key: Use your Logit.io API key for authentication
TLS: Enable TLS for secure communication
Network Security: Ensure proper firewall rules and network access
Access Control: Configure appropriate access controls and permissions

Instrumenting Applications with OpenTelemetry

Once your OpenTelemetry infrastructure is set up, you can begin instrumenting your applications. The process varies by programming language, but the core concepts remain the same.

Node.js Application Instrumentation

Node.js applications can be instrumented using the OpenTelemetry JavaScript SDK:

Basic Setup

// Install required packages
// npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-otlp-http
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-otlp-http');
// Initialize OpenTelemetry
const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
    headers: {},
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
// Your application code here
const express = require('express');
const app = express();
app.get('/api/users', async (req, res) => {
  // This will automatically create spans for HTTP requests
  const users = await fetchUsers();
  res.json(users);
});
app.listen(3000, () => {
  console.log('Server running on port 3000');
});

Custom Instrumentation

const { trace } = require('@opentelemetry/api');
// Create custom spans
async function fetchUsers() {
  const tracer = trace.getTracer('user-service');
  return tracer.startActiveSpan('fetchUsers', async (span) => {
    try {
      // Add attributes to the span
      span.setAttribute('db.operation', 'SELECT');
      span.setAttribute('db.system', 'postgresql');
  // Simulate database query
  const users = await db.query('SELECT * FROM users');

  // Add events to the span
  span.addEvent('users.fetched', {
    count: users.length,
    timestamp: Date.now()
  });

  return users;
} catch (error) {
  // Record error in span
  span.recordException(error);
  span.setStatus({ code: 2, message: error.message }); // ERROR
  throw error;
} finally {
  span.end();
}

  });
}

Python Application Instrumentation

Python applications can be instrumented using the OpenTelemetry Python SDK:

Basic Setup

# Install required packages
pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-flask opentelemetry-exporter-otlp
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from flask import Flask
Initialize OpenTelemetry
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(
        endpoint="http://localhost:4318/v1/traces"
    ))
)
tracer = trace.get_tracer(name)
Instrument Flask application
app = Flask(name)
FlaskInstrumentor().instrument_app(app)
@app.route('/api/users')
def get_users():
    # This will automatically create spans for HTTP requests
    return {'users': fetch_users()}
def fetch_users():
    with tracer.start_as_current_span('fetch_users') as span:
        span.set_attribute('db.operation', 'SELECT')
        span.set_attribute('db.system', 'postgresql')
    # Simulate database query
    users = [{'id': 1, 'name': 'John'}, {'id': 2, 'name': 'Jane'}]

    span.add_event('users.fetched', {
        'count': len(users),
        'timestamp': time.time()
    })

    return users

if name == 'main':
    app.run(debug=True)

Java Application Instrumentation

Java applications can be instrumented using the OpenTelemetry Java SDK:

Maven Dependencies

<dependencies>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-api</artifactId>
        <version>1.32.0</version>
    </dependency>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-sdk</artifactId>
        <version>1.32.0</version>
    </dependency>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-exporter-otlp</artifactId>
        <version>1.32.0</version>
    </dependency>
    <dependency>
        <groupId>io.opentelemetry.instrumentation</groupId>
        <artifactId>opentelemetry-instrumentation-annotations</artifactId>
        <version>1.32.0</version>
    </dependency>
</dependencies>

Basic Setup

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.trace.export.OtlpGrpcSpanExporter;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;
public class OpenTelemetrySetup {
    public static void main(String[] args) {
        // Initialize OpenTelemetry
        SdkTracerProvider sdkTracerProvider = SdkTracerProvider.builder()
            .addSpanProcessor(BatchSpanProcessor.builder(
                OtlpGrpcSpanExporter.builder()
                    .setEndpoint("http://localhost:4317")
                    .build()
            ).build())
            .build();
    OpenTelemetrySdk openTelemetry = OpenTelemetrySdk.builder()
        .setTracerProvider(sdkTracerProvider)
        .buildAndRegisterGlobal();

    // Get tracer
    Tracer tracer = openTelemetry.getTracer("my-service");

    // Create spans
    var span = tracer.spanBuilder("my-operation")
        .setAttribute("custom.attribute", "value")
        .startSpan();

    try (var scope = span.makeCurrent()) {
        // Your business logic here
        System.out.println("Executing operation...");
    } catch (Exception e) {
        span.recordException(e);
        span.setStatus(io.opentelemetry.api.trace.StatusCode.ERROR);
        throw e;
    } finally {
        span.end();
    }
}

}

Advanced OpenTelemetry Configuration

Once you have basic instrumentation working, you can enhance your setup with advanced configurations for better observability and performance.

Sampling Configuration

Sampling helps control the volume of trace data sent to your observability backend:

Trace Sampling

# OpenTelemetry Collector sampling configuration processors: probabilistic_sampler: hash_seed: 22 sampling_percentage: 10.0

tail_sampling: policies: - name: error-policy type: status_code status_code: status_codes: [ERROR] decision: decision: ALWAYS_RECORD - name: slow-policy type: latency latency: threshold_ms: 1000 decision: decision: ALWAYS_RECORD - name: default-policy type: probabilistic probabilistic: sampling_percentage: 5.0

Application-Level Sampling

// Node.js sampling configuration
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');
const sdk = new NodeSDK({
  sampler: new ParentBasedSampler({
    root: new TraceIdRatioBasedSampler(0.1), // 10% sampling
    remoteParentSampled: new TraceIdRatioBasedSampler(0.1),
    remoteParentNotSampled: new TraceIdRatioBasedSampler(0.1),
    localParentSampled: new TraceIdRatioBasedSampler(0.1),
    localParentNotSampled: new TraceIdRatioBasedSampler(0.1),
  }),
  // ... other configuration
});

Span Attributes and Events

Adding meaningful attributes and events to your spans provides valuable context for debugging and analysis:

Standard Attributes

// Add standard OpenTelemetry attributes
span.setAttribute('http.method', 'GET');
span.setAttribute('http.url', 'https://api.example.com/users');
span.setAttribute('http.status_code', 200);
span.setAttribute('db.system', 'postgresql');
span.setAttribute('db.name', 'users_db');
span.setAttribute('db.operation', 'SELECT');
span.setAttribute('db.statement', 'SELECT * FROM users WHERE id = ?');
span.setAttribute('messaging.system', 'kafka');
span.setAttribute('messaging.destination', 'user-events');
span.setAttribute('messaging.operation', 'publish');

Custom Attributes

// Add custom business attributes
span.setAttribute('user.id', userId);
span.setAttribute('order.id', orderId);
span.setAttribute('payment.method', 'credit_card');
span.setAttribute('feature.flag', 'new_ui_enabled');
span.setAttribute('deployment.environment', 'production');
span.setAttribute('service.version', '1.2.3');
span.setAttribute('team.ownership', 'platform-team');

Span Events

// Add events to spans for important milestones
span.addEvent('user.authenticated', {
  'user.id': userId,
  'auth.method': 'jwt',
  'timestamp': Date.now()
});
span.addEvent('database.query.executed', {
  'query.type': 'SELECT',
  'table.name': 'users',
  'rows.affected': 1,
  'execution.time.ms': 45
});
span.addEvent('cache.miss', {
  'cache.key': 'user:123',
  'cache.type': 'redis'
});

Baggage and Context Propagation

Baggage allows you to propagate key-value pairs across service boundaries for correlation:

Setting and Using Baggage

const { baggage } = require('@opentelemetry/api');
// Set baggage in the current context
const ctx = baggage.setBaggage(
  baggage.getCurrent(),
  'user.id',
  '123'
);
// Use the context in async operations
async function processUser(userId) {
  const currentBaggage = baggage.getCurrent();
  const userContext = baggage.setBaggage(
    currentBaggage,
    'request.id',
    generateRequestId()
  );
  return baggage.withContext(userContext, async () => {
    // This context will be available in all child spans
    const span = tracer.startSpan('process-user');
    // ... processing logic
    span.end();
  });
}

Integrating with Logit.io for Distributed Tracing

Logit.io provides comprehensive support for OpenTelemetry distributed tracing, allowing you to visualize and analyze trace data alongside your logs and metrics.

Logit.io Trace Visualization

Logit.io's trace visualization features help you understand request flows and identify performance issues:

Trace Timeline: Visualize the complete request journey through your services
Span Details: View detailed information about each span including attributes, events, and errors
Service Map: See the relationships between services and their communication patterns
Performance Analysis: Identify slow operations and bottlenecks in your system
Error Tracking: Correlate errors across multiple services to understand root causes

Trace-to-Log Correlation

Logit.io enables correlation between traces and logs for comprehensive debugging:

Adding Trace Context to Logs

// Node.js example with trace context in logs
const { trace } = require('@opentelemetry/api');
const winston = require('winston');
// Create a custom Winston transport that includes trace context
class OpenTelemetryTransport extends winston.Transport {
  constructor(opts) {
    super(opts);
  }
  log(info, callback) {
    const currentSpan = trace.getActiveSpan();
    if (currentSpan) {
      const spanContext = currentSpan.spanContext();
      info.traceId = spanContext.traceId;
      info.spanId = spanContext.spanId;
    }
// Send to Logit.io
console.log(JSON.stringify(info));
callback();

  }
}
const logger = winston.createLogger({
  transports: [
    new OpenTelemetryTransport()
  ]
});
// Usage
app.get('/api/users', async (req, res) => {
  logger.info('Processing user request', {
    userId: req.params.id,
    requestId: req.headers['x-request-id']
  });
  // ... rest of the handler
});

Python Example

import logging
from opentelemetry import trace
Configure logging to include trace context
class OpenTelemetryFormatter(logging.Formatter):
    def format(self, record):
        current_span = trace.get_current_span()
        if current_span:
            span_context = current_span.get_span_context()
            record.trace_id = format(span_context.trace_id, '032x')
            record.span_id = format(span_context.span_id, '016x')
        return super().format(record)
Configure logger
logger = logging.getLogger(name)
handler = logging.StreamHandler()
handler.setFormatter(OpenTelemetryFormatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(trace_id)s - %(span_id)s - %(message)s'
))
logger.addHandler(handler)
logger.setLevel(logging.INFO)
Usage
@app.route('/api/users')
def get_users():
    logger.info('Processing user request', extra={
        'user_id': request.args.get('id'),
        'request_id': request.headers.get('X-Request-ID')
    })
    # ... rest of the handler

Advanced Logit.io Trace Features

Logit.io provides advanced features for trace analysis and monitoring:

Trace Search and Filtering

Trace ID Search: Find specific traces by trace ID
Service Filtering: Filter traces by service name
Error Filtering: Find traces with errors or specific error types
Duration Filtering: Find slow traces or traces within specific time ranges
Attribute Filtering: Filter by custom attributes and baggage

Trace Analytics

Service Performance: Analyze performance metrics by service
Error Rates: Track error rates and patterns across services
Dependency Analysis: Understand service dependencies and communication patterns
Capacity Planning: Use trace data for capacity planning and optimization

Best Practices for Distributed Tracing

Following best practices ensures your distributed tracing implementation provides maximum value while minimizing overhead and complexity.

Instrumentation Best Practices

Effective instrumentation requires careful planning and consistent implementation:

Span Naming Conventions

// Good span names
span.setName('HTTP GET /api/users');
span.setName('Database SELECT users');
span.setName('Cache GET user:123');
span.setName('External API POST /payment');
// Avoid generic names
span.setName('operation');
span.setName('function');
span.setName('process');

Attribute Naming

// Use standard OpenTelemetry attribute names
span.setAttribute('http.method', 'GET');
span.setAttribute('http.url', '/api/users');
span.setAttribute('db.system', 'postgresql');
span.setAttribute('db.operation', 'SELECT');
// Use consistent naming for custom attributes
span.setAttribute('business.user.id', userId);
span.setAttribute('business.order.id', orderId);
span.setAttribute('business.payment.method', 'credit_card');

Performance Optimization

Optimize your tracing implementation for performance and cost:

Sampling Strategies

Head-based Sampling: Sample at the beginning of the trace to ensure complete traces
Tail-based Sampling: Sample based on trace characteristics like errors or duration
Adaptive Sampling: Adjust sampling rates based on system load and requirements
Service-specific Sampling: Use different sampling rates for different services

Resource Management

Batch Processing: Use batch processors to reduce network overhead
Memory Limits: Configure appropriate memory limits for the collector
Connection Pooling: Use connection pooling for database and external API calls
Async Processing: Use async/await patterns to avoid blocking operations

Security and Privacy

Ensure your tracing implementation respects security and privacy requirements:

Data Sanitization

// Sanitize sensitive data before adding to spans
function sanitizeUserData(userData) {
  const sanitized = { ...userData };
  delete sanitized.password;
  delete sanitized.creditCard;
  delete sanitized.ssn;
  return sanitized;
}
// Use sanitized data in spans
span.setAttribute('user.data', JSON.stringify(sanitizeUserData(userData)));

Access Control

Authentication: Use proper authentication for Logit.io access
Authorization: Implement role-based access control for trace data
Data Retention: Configure appropriate retention policies for trace data
Audit Logging: Log access to sensitive trace data

Real-World Implementation Scenarios

Understanding real-world scenarios helps you implement distributed tracing effectively in your specific environment.

E-commerce Application Example

Consider an e-commerce application with multiple microservices:

Service Architecture

API Gateway: Handles incoming requests and routing
User Service: Manages user authentication and profiles
Product Service: Manages product catalog and inventory
Order Service: Handles order processing and management
Payment Service: Processes payments and transactions
Notification Service: Sends emails and notifications

Trace Flow Example

// API Gateway - Start the trace
app.post('/api/orders', async (req, res) => {
  const tracer = trace.getTracer('api-gateway');
  return tracer.startActiveSpan('POST /api/orders', async (span) => {
    try {
      span.setAttribute('http.method', 'POST');
      span.setAttribute('http.url', '/api/orders');
      span.setAttribute('business.user.id', req.user.id);
  // Call User Service
  const user = await callUserService(req.user.id);

  // Call Product Service
  const products = await callProductService(req.body.productIds);

  // Call Order Service
  const order = await callOrderService({
    userId: req.user.id,
    products: products,
    total: calculateTotal(products)
  });

  // Call Payment Service
  const payment = await callPaymentService({
    orderId: order.id,
    amount: order.total,
    method: req.body.paymentMethod
  });

  // Call Notification Service
  await callNotificationService({
    userId: req.user.id,
    type: 'order_confirmation',
    orderId: order.id
  });

  res.json({ orderId: order.id, status: 'confirmed' });
} catch (error) {
  span.recordException(error);
  span.setStatus({ code: 2, message: error.message });
  res.status(500).json({ error: 'Order creation failed' });
} finally {
  span.end();
}

  });
});

Database and External API Tracing

Trace database operations and external API calls for complete visibility:

Database Tracing

// Database operation tracing
async function getUserById(userId) {
  const tracer = trace.getTracer('user-service');
  return tracer.startActiveSpan('Database SELECT user', async (span) => {
    try {
      span.setAttribute('db.system', 'postgresql');
      span.setAttribute('db.operation', 'SELECT');
      span.setAttribute('db.table', 'users');
      span.setAttribute('db.statement', 'SELECT * FROM users WHERE id = $1');
      span.setAttribute('db.parameters', JSON.stringify([userId]));
  const startTime = Date.now();
  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
  const duration = Date.now() - startTime;

  span.setAttribute('db.duration_ms', duration);
  span.setAttribute('db.rows_returned', user.rows.length);

  return user.rows[0];
} catch (error) {
  span.recordException(error);
  span.setStatus({ code: 2, message: error.message });
  throw error;
} finally {
  span.end();
}

  });
}

External API Tracing

// External API tracing
async function callPaymentService(paymentData) {
  const tracer = trace.getTracer('payment-service');
  return tracer.startActiveSpan('External API POST /payment', async (span) => {
    try {
      span.setAttribute('http.method', 'POST');
      span.setAttribute('http.url', 'https://api.payment-gateway.com/payment');
      span.setAttribute('http.request_id', generateRequestId());
      span.setAttribute('business.payment.amount', paymentData.amount);
      span.setAttribute('business.payment.method', paymentData.method);
  const startTime = Date.now();
  const response = await fetch('https://api.payment-gateway.com/payment', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.PAYMENT_API_KEY}`
    },
    body: JSON.stringify(paymentData)
  });
  const duration = Date.now() - startTime;

  span.setAttribute('http.status_code', response.status);
  span.setAttribute('http.duration_ms', duration);

  if (!response.ok) {
    throw new Error(`Payment API error: ${response.status}`);
  }

  return await response.json();
} catch (error) {
  span.recordException(error);
  span.setStatus({ code: 2, message: error.message });
  throw error;
} finally {
  span.end();
}

  });
}

Monitoring and Alerting with Traces

Use trace data to create meaningful monitoring and alerting for your distributed systems.

Trace-Based Metrics

Extract metrics from trace data for monitoring and alerting:

Service Performance Metrics

Response Time: Track average, p95, and p99 response times by service
Throughput: Monitor requests per second for each service
Error Rate: Track error rates and failure patterns
Availability: Monitor service availability and uptime

Business Metrics

Transaction Success Rate: Track successful vs failed transactions
User Experience Metrics: Monitor page load times and user interactions
Business Process Metrics: Track order completion rates, payment success rates
Cost Metrics: Monitor costs associated with different operations

Alerting Strategies

Create alerts based on trace data to detect issues early:

Performance Alerts

# Example alert configuration for slow traces
alert: SlowTraces
  expr: histogram_quantile(0.95, rate(trace_duration_seconds_bucket[5m])) > 2
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "Slow traces detected"
    description: "95th percentile trace duration is {{ $value }}s"

Error Rate Alerts

# Example alert for high error rates
alert: HighErrorRate
  expr: rate(trace_errors_total[5m]) / rate(trace_requests_total[5m]) > 0.05
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "High error rate detected"
    description: "Error rate is {{ $value | humanizePercentage }}"

Conclusion

Implementing distributed tracing with OpenTelemetry provides powerful insights into your distributed systems, enabling you to understand request flows, identify performance bottlenecks, and debug complex issues effectively. By following this comprehensive guide, you can build a robust observability foundation that scales with your application.

The key to successful distributed tracing implementation is starting simple and gradually adding complexity as your needs grow. Focus on instrumenting critical paths first, then expand to cover more of your application. Remember that distributed tracing is most valuable when combined with logs and metrics to provide a complete picture of your system's behavior.

Logit.io's native OpenTelemetry support makes it easy to get started with distributed tracing and provides powerful visualization and analysis tools to help you make the most of your trace data. Whether you're just getting started with observability or looking to enhance your existing monitoring, OpenTelemetry and Logit.io provide a solid foundation for understanding and optimizing your distributed systems.

Ready to implement distributed tracing in your applications? Sign up for a free trial of Logit.io and start exploring the power of OpenTelemetry distributed tracing today.