Span Lifecycle and Parent-Child Relationships

Problem Framing

Services emit spans individually, but a distributed trace only becomes interpretable when those spans assemble into a causal hierarchy. The concrete failure that brings practitioners to this page is trace fragmentation: spans appear in the backend as disconnected entries with no parent, time ranges overlap impossibly, or an entire service’s work is missing from an otherwise complete trace. The root cause is almost always a lifecycle violation — a span ended before its children were created, a context never attached in an async handler, or an exporter that flushed mid-shutdown and dropped the final batch. Fixing this requires understanding exactly what state a span is in at every point in the execution path and what operations are legal in each state.


Prerequisites

Before working through this page, you should have:


Concept Deep-Dive: The Span State Machine

A span is not a static data object — it moves through a deterministic sequence of states from the moment the SDK allocates it to the moment the exporter acknowledges receipt. Each transition has exactly one valid trigger, and performing an operation on a span in the wrong state either silently no-ops or corrupts the exported payload.

Span state machine diagram Four boxes representing span states — CREATED, ACTIVE, ENDED, EXPORTED — connected by labelled arrows showing the lifecycle transitions and the API call or event that triggers each one. CREATED allocated in memory ACTIVE recording timing + attrs ENDED duration finalised EXPORTED serialised + transmitted start() end() BatchSpanProcessor record_exception() stays ACTIVE

What each state allows:

State Legal operations Illegal (silently ignored)
CREATED None — span not yet started set_attribute, add_event, end
ACTIVE set_attribute, add_event, record_exception, set_status, end Post-end mutation
ENDED Read-only access to exported fields Any mutation
EXPORTED None (owned by exporter) Any SDK call

The most common bug is calling set_attribute or set_status after end(). The SDK ignores these calls without raising an error, so traces arrive at the backend missing attributes that the code appears to set.

When a span is created, the SDK inspects the active execution context. If an existing span is attached to that context, the new span inherits its trace_id and records the existing span’s span_id as its own parent_span_id. This produces the directed acyclic graph that backends like Jaeger render as a waterfall.

The inheritance is automatic within a single process: start_as_current_span both creates the span and attaches it to the context, so any nested call that creates another span automatically becomes a child. Across service boundaries, the link must be re-established explicitly through W3C TraceContext propagation — the receiving service extracts the traceparent header and attaches it as the parent context before creating its own spans.

Parent-child span hierarchy Two swim lanes — Service A and Service B — showing how the root span in Service A becomes the parent of a child span, which in turn propagates trace context via HTTP to Service B where a grandchild span is created with the correct parent_span_id. Service A Service B root span (span_id: A1 · parent: none) trace_id: TRACE-XYZ child span (span_id: A2 · parent_span_id: A1) inherits trace_id: TRACE-XYZ HTTP + traceparent header grandchild span (span_id: B1 · parent_span_id: A2) trace_id: TRACE-XYZ ← same trace All three spans share the same trace_id — the backend reconstructs the causal graph from span_id / parent_span_id pairs.

Step-by-Step Implementation

Step 1: Configure the TracerProvider with Batch Processing Constraints

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry import trace

# Tune max_queue_size and schedule_delay_millis for your peak RPS.
# A queue that fills faster than the exporter drains it drops spans silently.
exporter = OTLPSpanExporter(endpoint="https://collector.internal/v1/traces")
processor = BatchSpanProcessor(
    exporter,
    max_queue_size=4096,       # spans buffered before oldest are dropped
    schedule_delay_millis=5000, # flush interval
    max_export_batch_size=512  # spans per OTLP request
)
provider = TracerProvider()
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

Step 2: Define Explicit Span Boundaries

tracer = trace.get_tracer("payment-service", "1.4.0")

def process_transaction(tx_id: str):
    # start_as_current_span opens the span, attaches it to the context,
    # and calls span.end() automatically when the block exits — even on exception.
    with tracer.start_as_current_span("process_transaction") as span:
        span.set_attribute("tx.id", tx_id)
        span.set_attribute("tx.type", "credit")
        try:
            execute_ledger_update(tx_id)
            span.set_status(trace.StatusCode.OK)
        except Exception as e:
            # record_exception adds a structured event; set_status marks the span as failed.
            span.record_exception(e)
            span.set_status(trace.StatusCode.ERROR, str(e))
        # ← span.end() called here by context manager; do NOT set attributes after this line

When a request arrives from another service, extract the incoming context before creating any spans. Without this step, the new span will have no parent_span_id and the trace will appear broken in the backend.

from opentelemetry.propagate import extract
from opentelemetry import context

def handle_inbound_request(headers: dict):
    # 1. Deserialise the traceparent (and tracestate) from the incoming carrier.
    remote_ctx = extract(headers)

    # 2. Attach to the current execution context.
    #    This makes the remote span the implicit parent for any spans created below.
    token = context.attach(remote_ctx)
    try:
        with tracer.start_as_current_span("validate_payload") as span:
            # span.parent is now the remote span from the upstream service
            span.set_attribute("validation.result", "passed")
            run_business_logic()
    finally:
        # 3. Detach ALWAYS in a finally block.
        #    Forgetting this leaks the remote context into the next request
        #    processed on the same thread.
        context.detach(token)

Step 4: Propagate Context Across Async Boundaries

Python’s asyncio event loop does not automatically copy contextvars into new tasks created with asyncio.create_task. Capture the context before yielding to the event loop and pass it explicitly.

import asyncio
from opentelemetry import context

async def async_worker(task_data: dict, parent_ctx: context.Context):
    # Restore the captured context inside the new coroutine.
    token = context.attach(parent_ctx)
    try:
        with tracer.start_as_current_span("process_async_task") as span:
            span.set_attribute("task.id", task_data["id"])
            await simulate_io()
            span.set_status(trace.StatusCode.OK)
    finally:
        context.detach(token)

async def submit_async_tasks():
    with tracer.start_as_current_span("batch_submission"):
        # Capture BEFORE the first await — the event loop may schedule
        # other coroutines between here and the workers running.
        current_ctx = context.get_current()
        tasks = [
            asyncio.create_task(async_worker({"id": i}, current_ctx))
            for i in range(5)
        ]
        await asyncio.gather(*tasks)

For message queue consumers, extract context from message headers the same way as HTTP — the pattern is identical whether the carrier is Kafka record headers, SQS message attributes, or gRPC metadata.

Step 5: Register Graceful Shutdown

import atexit

# Blocks until all pending spans are exported or the timeout is reached.
# Without this, process termination drops spans still sitting in the queue.
atexit.register(provider.shutdown)

In Kubernetes, add a preStop lifecycle hook that calls the same shutdown logic so the pod does not terminate while spans are in-flight.


Verification

After deploying your instrumented service, confirm the hierarchy is intact:

In Jaeger UI: search by service.name, open any trace, and verify the waterfall has a single root span with all child spans indented beneath it. No span should appear as a disconnected root unless it genuinely originated an independent trace.

In Tempo (via Grafana): run a TraceQL query and check the parent_span_id field on child spans matches the span_id of the expected parent:

# Fetch a trace by ID and inspect parent linkage with jq
curl -s "http://tempo:3200/api/traces/<TRACE_ID>" \
  | jq '.batches[].scopeSpans[].spans[] | {name: .name, spanId: .spanId, parentSpanId: .parentSpanId}'

Expected output for a correctly instrumented two-service call:

{ "name": "process_transaction", "spanId": "A1...", "parentSpanId": "" }
{ "name": "validate_payload",    "spanId": "A2...", "parentSpanId": "A1..." }
{ "name": "db_query",            "spanId": "B1...", "parentSpanId": "A2..." }

A missing parentSpanId on a non-root span is a definitive sign of a context propagation failure at that boundary.


Edge Cases and Gotchas

  1. Setting attributes after end() — The SDK silently ignores mutation calls on an ended span. This most often surfaces in exception handlers that call span.set_status() in a finally block after the context manager has already exited. Move the set_status call inside the with block.

  2. Thread pool context loss — Python’s ThreadPoolExecutor does not copy contextvars from the submitting thread to the worker thread. Always capture context.get_current() before submitting and pass it as an explicit argument. The SDK’s context.attach() / context.detach() pair must run inside the worker, not in the submitter.

  3. Forgetting context.detach() — A context attached without a matching detach() persists for the lifetime of the thread or event loop iteration. In a thread-pooled HTTP server, this means every subsequent request on that thread inherits the stale remote context as its parent, producing incorrect trace hierarchies that are very hard to diagnose.

  4. Premature parent span closure — Calling span.end() on a parent before all child spans have called end() does not prevent child spans from being exported, but it makes the parent’s recorded duration shorter than the actual wall-clock time the operation took. Ensure the parent’s with block encompasses all child work, including async tasks that must be awaited.

  5. Clock skew across hosts — If a child service host’s system clock runs ahead of the parent host, the child span’s startTimeUnixNano may be earlier than the parent span’s endTimeUnixNano. This shows up as impossible negative gaps in the Jaeger waterfall. Synchronise NTP across all hosts; do not attempt to compensate in the SDK layer.

  6. Sampling breaking parent-child continuity — If a child service applies independent head-based sampling without first reading the upstream trace_flags, it may sample out a child span that the parent already decided to keep. Always configure ParentBased sampler on child services so the root’s decision propagates through the entire trace.

  7. Unflushed exporter on container restart — Kubernetes SIGTERM with a short terminationGracePeriodSeconds will kill the process before BatchSpanProcessor drains its queue. Register the shutdown hook and set terminationGracePeriodSeconds to at least schedule_delay_millis / 1000 + 5.


Performance and Scale Notes

Queue sizing: BatchSpanProcessor.max_queue_size defaults to 2048. At 1000 RPS with an average of 5 spans per request and a 5-second flush interval, you need 1000 × 5 × 5 = 25,000 slots before the first flush. Size the queue to peak_rps × avg_spans_per_request × schedule_delay_seconds × 1.5 for headroom.

Attribute cardinality: Every unique combination of attribute values creates a distinct time series in metric-backed backends. Keep attributes like user.id off spans (use baggage metadata for request-scoped correlation instead) and prefer low-cardinality values such as http.status_code (200, 404, 500) rather than raw URL paths with embedded IDs.

Context attach/detach overhead: Each context.attach() allocates a contextvars.Token object. In tight loops processing thousands of messages per second, the allocation rate is measurable. Batch messages under a single parent span rather than creating a new span per message where semantic precision allows it.

Async task proliferation: Creating one span per asyncio.Task in a high-throughput producer adds up quickly. Measure span count per second against your exporter’s sustained throughput (typically 5,000–20,000 spans/sec for an OTLP HTTP exporter with default settings) and aggregate low-value spans into a single parent where appropriate.


Troubleshooting FAQ

Why do spans appear as orphans in Jaeger even though my service calls succeed?

Orphaned spans are almost always a context propagation failure. The child service either did not receive the traceparent header, or received it but used an incorrectly configured propagator that could not deserialise it. Check that the W3C TraceContext propagator is registered on both the sending and receiving SDK, and log the raw headers at the boundary to confirm they arrive intact.

How do I prevent context leakage between concurrent requests in a thread pool?

Always pair context.attach() with context.detach() inside a finally block. For thread pools, capture the context immediately before handing a task to the pool and pass it explicitly as a function argument. Never rely on a thread inheriting the context of the thread that submitted the task — thread-local storage is not inherited across thread boundaries in Python’s contextvars model.

Spans are missing from traces under high load — what is happening?

The most common cause is BatchSpanProcessor queue exhaustion. When the queue fills faster than the exporter drains it, the SDK silently drops new spans. Increase max_queue_size and reduce schedule_delay_millis, or add a second exporter endpoint to distribute load. Monitor otelcol_exporter_sent_spans_total vs otelcol_exporter_failed_spans_total to quantify the drop rate.

Why does my root span show a shorter duration than some of its children?

This is a clock skew problem. Each service records wall-clock timestamps independently. If the child host’s clock runs ahead of the parent host’s clock, the child’s recorded start time predates the parent’s end time. Synchronise NTP across all hosts, and treat inter-service duration gaps as approximate in high-resolution latency analysis.

Can I set attributes on a span after it has been ended?

No. Once span.end() is called (or the context manager exits), the span transitions to the ENDED state and its attribute map is frozen. Any set_attribute calls after this point are silently ignored by the SDK. Ensure all attributes, events, and status codes are written before the span boundary closes.


↑ Back to Distributed Tracing Fundamentals & Architecture