Step-by-step OpenTelemetry Python SDK integration

When you cross an asyncio.create_task or ThreadPoolExecutor boundary without explicitly attaching the active context propagation object, the child span inherits an empty context and becomes a new root trace — call otel_context.attach(captured_ctx) inside the new execution unit before creating any spans.

Context and when it matters

Python’s contextvars module propagates implicit state within a single thread or event loop iteration. The OpenTelemetry SDK binds the active span to a contextvars.Context object, which normally flows down synchronous call stacks without any explicit plumbing. The failure surface is specific: asyncio.create_task takes a snapshot of contextvars.Context at creation time, and concurrent.futures.ThreadPoolExecutor spawns worker threads that have never seen the caller’s context. In both cases, the snapshot or fresh context has no active span unless you inject it. The result is orphaned spans in Jaeger or Grafana Tempo — parent_operation and its child_operation appear as disconnected roots, making MTTR analysis impossible. This page targets exactly that failure mode: async and threaded Python services where implicit propagation guarantees break.


Context propagation: broken vs. correct at asyncio boundary Left side shows parent span context not reaching child task, producing two separate traces. Right side shows explicit context.attach() inside the task, producing a single connected trace. BROKEN — context not attached parent_operation trace_id: aabbcc… create_task() asyncio boundary — empty context snapshot child_operation trace_id: 112233… ✗ Two disconnected root traces in Jaeger / Tempo CORRECT — context explicitly attached parent_operation trace_id: aabbcc… ctx = get_current() create_task(ctx) asyncio boundary — context attached inside task attach(ctx) → child_operation trace_id: aabbcc… ✓ Single connected trace tree parent_id links preserved finally: detach(token)

Core mechanism: minimal working implementation

The pattern is three lines inside every new task or thread: capture the context before the boundary, attach it after, detach in finally. Here is the minimal working form:

import asyncio
from opentelemetry import trace, context as otel_context
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter

# Initialize before any framework or middleware loads
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
tracer = trace.get_tracer(__name__)

async def child_task(ctx: otel_context.Context) -> None:
    token = otel_context.attach(ctx)          # re-establish parent context
    try:
        with tracer.start_as_current_span("child_operation") as span:
            print(span.get_span_context().trace_id)  # matches parent
    finally:
        otel_context.detach(token)            # mandatory cleanup

async def main() -> None:
    with tracer.start_as_current_span("parent_operation"):
        ctx = otel_context.get_current()      # capture BEFORE create_task
        await asyncio.create_task(child_task(ctx))

asyncio.run(main())

Without the attach/detach pair, child_operation starts from an empty Context and the SDK assigns it a new random trace_id, producing the disconnected trace tree shown in the diagram above.

Implementation detail: production-grade version with ThreadPoolExecutor

The following block handles both asyncio and ThreadPoolExecutor boundaries in a single service, and adds BatchSpanProcessor tuning for high-concurrency workloads. Every line maps to a specific tracing concept:

import asyncio
from concurrent.futures import ThreadPoolExecutor
from opentelemetry import trace, context as otel_context
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

# -- SDK initialisation (must precede all middleware) --
provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(
        ConsoleSpanExporter(),
        max_queue_size=4096,        # must exceed peak span rate × schedule_delay
        schedule_delay_millis=5000, # export interval in ms
        max_export_batch_size=512,  # upper bound per export call
    )
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)


def thread_worker(ctx: otel_context.Context, payload: str) -> str:
    """Run inside ThreadPoolExecutor — context does not cross thread boundary automatically."""
    token = otel_context.attach(ctx)   # bind parent trace context to this thread
    try:
        with tracer.start_as_current_span("thread_worker") as span:
            # span.parent now points to the root_operation span
            span.set_attribute("payload.size", len(payload))
            return f"processed:{payload}"
    finally:
        otel_context.detach(token)     # prevent context stack leak in thread-pool reuse


async def async_worker(ctx: otel_context.Context) -> None:
    """Run inside asyncio.create_task — copy of contextvars.Context does not include active span."""
    token = otel_context.attach(ctx)   # re-establish parent span reference
    try:
        with tracer.start_as_current_span("async_worker") as span:
            span.set_attribute("worker.type", "asyncio")
    finally:
        otel_context.detach(token)


async def main() -> None:
    with tracer.start_as_current_span("root_operation"):
        # Capture AFTER start_as_current_span writes the span into contextvars
        ctx = otel_context.get_current()

        # Async boundary
        await asyncio.create_task(async_worker(ctx))

        # Thread boundary
        loop = asyncio.get_running_loop()
        with ThreadPoolExecutor(max_workers=4) as pool:
            result = await loop.run_in_executor(pool, thread_worker, ctx, "payload_data")
            print(result)


if __name__ == "__main__":
    asyncio.run(main())

Key detail — set_span_in_context() for manually-started spans: When you call tracer.start_span() instead of tracer.start_as_current_span(), the new span is not automatically written to the active Context. You must call otel_context.set_span_in_context(span, otel_context.get_current()) and capture the returned context before passing it across boundaries. Omitting this step causes the child thread or task to inherit a context that points to the span’s parent, not the span itself, silently misaligning the trace tree.

Diagnostics: reproducing and isolating context loss

Before applying fixes, confirm the failure mode is exactly boundary-induced context loss rather than a misconfigured exporter or sampling drop:

from opentelemetry import trace, context as otel_context

def log_context(label: str) -> None:
    sc = trace.get_current_span().get_span_context()
    print(f"[{label}] trace_id={sc.trace_id:032x} valid={sc.is_valid}")

Call log_context("pre-boundary") immediately before create_task and log_context("post-boundary") at the top of the coroutine body before any attach. If the post-boundary trace_id differs from the pre-boundary value, the context loss is confirmed at the boundary, not elsewhere. Enable OTEL_LOG_LEVEL=debug to see SDK-internal context resolution events and filter for Context: transitions that reset to an empty state.

Decision criteria and verification

Use explicit attach/detach in every case matching any of these conditions:

  • The work unit is spawned via asyncio.create_task, asyncio.ensure_future, or asyncio.gather from a coroutine that already holds an active span.
  • CPU-bound or blocking I/O is offloaded via loop.run_in_executor, asyncio.to_thread, or a raw ThreadPoolExecutor.submit.
  • A Celery task, gRPC server interceptor, or FastAPI BackgroundTasks item is created while handling an instrumented inbound request.
  • The service uses manual span creation (tracer.start_span) rather than the context-manager form (tracer.start_as_current_span).

Verify the fix by asserting trace_id equality and parent_span_id correctness in your test suite:

# Minimal pytest assertion pattern
assert child_span.context.trace_id == parent_span.context.trace_id
assert child_span.parent.span_id == parent_span.context.span_id

Common pitfalls

  • Capturing context too early. Calling otel_context.get_current() before tracer.start_as_current_span(...) has been entered returns a context with no active span. Always capture inside or after the with tracer.start_as_current_span(...) block.
  • Skipping detach() on error paths. If an exception is raised inside the task before detach() runs, the context stack for the thread or event-loop slot is permanently corrupted. Always use try/finally, never try/except alone.
  • Relying on asyncio.gather to preserve context. asyncio.gather does not re-attach the caller’s context to each sub-coroutine. Each coroutine must receive the captured context as an explicit argument and call attach internally, or the handling of async boundaries breaks silently.

Troubleshooting FAQ

Q: My spans have the right trace_id locally but a different one in production.

The SDK is likely initializing after a middleware layer that captured a NoOpTracerProvider reference. Call trace.set_tracer_provider(provider) at the top of your entry-point module, before any import that touches trace.get_tracer(...).

Q: Why does asyncio.create_task sometimes preserve context and sometimes not?

Python 3.7+ copies contextvars.Context at task creation time. If the active span has been written to contextvars before create_task is called — which start_as_current_span guarantees — the task inherits it automatically. The failure only occurs when spans are created manually or when create_task is called before entering the span’s context manager.

Q: My Jaeger UI shows the child span as a separate root trace — is this a Jaeger bug?

No. Jaeger displays whatever trace_id the SDK reports. A separate root trace in the UI is the definitive symptom of context loss at the SDK level: the child span’s trace_id was generated fresh rather than inherited, so Jaeger has no basis for linking them.


↑ Back to OpenTelemetry SDK Setup for Backend Services