Skip to content

Instantly share code, notes, and snippets.

@lmolkova
Last active July 28, 2025 23:07
Show Gist options
  • Save lmolkova/f218c6679a58a7e4dcd1ff3a7e1d1d78 to your computer and use it in GitHub Desktop.
Save lmolkova/f218c6679a58a7e4dcd1ff3a7e1d1d78 to your computer and use it in GitHub Desktop.
Azure SDK for Java: observability nice-to-haves

Here're some items that we've considered in the past to be useful in azure-core or clientcore related to observability, but never got to it.

  • [Azure Core only] Make slf4j dependency optional - backport Slf4JLoggerShim. Related to Azure/azure-sdk-for-java#38421

  • [Azure Core only] Merge tracing and metrics plugins into one - Azure/azure-sdk-for-java#41436 or backport plugin-free OTel support from clientcore

  • Allow adding arbitrary key-value-pairs on instrumentation context and stamping them on all nested logs and spans (similar to MDC):

    InstrumentationContext context = InstrumentationContext.fromMap(Map.of("correlation-id", "foo42"));
    client.clientCall(new RequestContext().setInstrumentationContext(context));
  • Support OTel as logging implementation. Logging to AzMon looks like this today: clientLogger -> slf4j -> log4j/logback -> otel -> anywhere (e.g. azmon).

    It could be simplified to clientLogger -> otel -> anywhere. This way we can preserve structure all the way to otel, which does not work great with slf4j - Azure/azure-sdk-for-java#39991 (comment).

    Note: log configuration in otel is in development and does not include severity-based filtering yet. It's likely to land in 2025. Does not make sense to implement anything in Azure SDK until that time.

  • auto-add HTTP instrumentation policy even if SDK does not add one. If the SDK wants to opt-out of tracing, they can disable it via flags and/or by providing custom noop policy - Azure/azure-sdk-for-java#41100

  • add error classification. Common problem in Storage with createIfNotExists and such - Azure/azure-sdk-for-java#42452, users complain that we record 404/409/412 as error on logs and spans. .NET has ResponseClassifier API which client lib can provide on per-request basis and core can take it into account when recording telemetry. Might be helfpul with retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment