分布式全鏈路監控 -- opentracing小試

前言

最近公司在考慮做全鏈路監控的事兒,主要是一個請求進來從服務網關到各個業務戰隊會流轉到很多個戰隊的業務模塊,如果一個業務中出現問題就會影響到整個調用鏈路的結果(響應時間、響應結果、異常處理等等)。因此我們需要考慮一個全鏈路監控機制來完成請求的全鏈路監控。最開始考慮直接基于pinpoint的注入插件來做,但是后來發現由于采樣率等關系,無法應用到生產上,所以考慮自己做一套自己的標準。同時我們也想引入opentracing的標準,因此有了這一篇文章所覆蓋的內容。

opentracing簡單介紹

opentracing介紹

opentracing的具體信息大家可以參照前言里的相關鏈接到opentracing的官網和github上的去看看,這里只是做一些簡單地介紹。opentracing里主要包含以下幾個組件:

Span

表示分布式調用鏈條中的一個調用單元,比方說某個dubbo的調用provider,或者是個http調用的服務提供方,他的邊界包含一個請求進到服務內部再由某種途徑(http/dubbo等)從當前服務出去。一個span一般會記錄這個調用單元內部的一些信息,例如:

  1. 日志信息
  2. 標簽信息
  3. 開始/結束時間

SpanContext

表示一個span對應的上下文,span和spanContext基本上是一一對應的關系,上下文存儲的是一些需要跨越邊界的一些信息,例如:

  1. spanId 當前這個span的id
  2. traceId 這個span所屬的traceId(也就是這次調用鏈的唯一id)
  3. baggage 其他的能過跨越多個調用單元的信息
    這個SpanContext可以通過某些媒介和方式傳遞給調用鏈的下游來做一些處理(例如子Span的id生成、信息的繼承打印日志等等)

Tracer

tracer表示的是一個通用的接口,它相當于是opentracing標準的樞紐,它有以下的職責:

  1. 建立和開啟一個span
  2. 從某種媒介中提取和注入一個spanContext

Carrier

表示的是一個承載spanContext的媒介,比方說在http調用場景中會有HttpCarrier,在dubbo調用場景中也會有對應的DubboCarrier。

Formatter

這個接口負責了具體場景中序列化反序列化上下文的具體邏輯,例如在HttpCarrier使用中通常就會有一個對應的HttpFormatter。Tracer的注入和提取就是委托給了Formatter

ScopeManager

這個類是0.30版本之后新加入的組件,這個組件的作用是能夠通過它獲取當前線程中啟用的Span信息,并且可以啟用一些處于未啟用狀態的span。在一些場景中,我們在一個線程中可能同時建立多個span,但是同一時間統一線程只會有一個span在啟用,其他的span可能處在下列的狀態中:

  1. 等待子span完成
  2. 等待某種阻塞方法
  3. 創建并未開始

除了上述組件之外,我們在實現一個分布式全鏈路監控框架的時候,還需要有一個reporter組件,通過它來打印或者上報一些關鍵鏈路信息(例如span創建和結束),只有把這些信息進行處理之后我們才能對全鏈路信息進行可視化和真正的監控。

簡單實現思路

這篇文章先介紹一些關鍵組件(涵蓋Span、SpanContext、Tracer和ScopeManager)關鍵邏輯的實現,也借鑒了一點sofa-tracer的實現思路(比方說spanId生成規則、traceId生成規則等,關于這些信息大家可以移步到sofa-tracer來查看)。我們的項目叫星圖(StarAtlas),因此我們的組件都是以這個為前綴的,這里省去我們的包名作者日期等注釋信息。
先來看Span:

import io.opentracing.Span;
import io.opentracing.SpanContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * StarAtlasSpan
 * <p>
 * the implementation of span
 *
 */
public class StarAtlasSpan implements Span {

    private StarAtlasTracer starAtlasTracer;

    private long startTime;

    private List<StarAtlasSpanReferenceRelationship> spanReferences;

    private String operationName;

    private StarAtlasSpanContext spanContext;

    private Logger logger = LoggerFactory.getLogger(this.getClass());


    public StarAtlasSpan(StarAtlasTracer starAtlasTracer, long startTime,
                         List<StarAtlasSpanReferenceRelationship> spanReferences,
                         String operationName, StarAtlasSpanContext spanContext,
                         Map<String, ?> tags) {
        AssertUtils.notNull(starAtlasTracer);
        AssertUtils.notNull(spanContext);
        this.starAtlasTracer = starAtlasTracer;
        this.startTime = startTime;
        this.spanReferences = spanReferences != null ? new ArrayList<StarAtlasSpanReferenceRelationship>(
                spanReferences) : null;
        this.operationName = operationName;
        this.spanContext = spanContext;
        //tags
        this.setTags(tags);

        // report extention to be implement
        //SpanExtensionFactory.logStartedSpan(this);
    }

    @Override
    public SpanContext context() {
        return this.spanContext;
    }

    @Override
    public Span setTag(String s, String s1) {
        return null;
    }

    @Override
    public Span setTag(String s, boolean b) {
        return null;
    }

    @Override
    public Span setTag(String s, Number number) {
        return null;
    }

    @Override
    public Span log(Map<String, ?> map) {
        return null;
    }

    @Override
    public Span log(long l, Map<String, ?> map) {
        return null;
    }

    @Override
    public Span log(String s) {
        return null;
    }

    @Override
    public Span log(long l, String s) {
        return null;
    }

    @Override
    public Span setBaggageItem(String s, String s1) {
        return null;
    }

    @Override
    public String getBaggageItem(String s) {
        return null;
    }

    @Override
    public Span setOperationName(String s) {
        return null;
    }

    @Override
    public void finish() {

    }

    @Override
    public void finish(long l) {

    }

    private void setTags(Map<String, ?> tags) {
        if (tags == null || tags.size() <= 0) {
            return;
        }
        for (Map.Entry<String, ?> entry : tags.entrySet()) {
            String key = entry.getKey();
            if (StringUtils.isBlank(key)) {
                continue;
            }
            Object value = entry.getValue();
            if (value == null) {
                continue;
            }
            if (value instanceof String) {
                //初始化時候,tags也可以作為 client 和 server 的判斷依據
                this.setTag(key, (String) value);
            } else if (value instanceof Boolean) {
                this.setTag(key, (Boolean) value);
            } else if (value instanceof Number) {
                this.setTag(key, (Number) value);
            } else {
                logger.error("Span tags unsupported type [" + value.getClass() + "]");
            }
        }
    }
}

這里比較簡單,就是創建一個Span,并且注入一些信息,這里注釋了一些打印日志的代碼。在構建函數里面有個StarAtlasSpanReferenceRelationship的list,這個類實際上是標識了這個Span和其他Span之間的關系,用于創建Span的時候維護父子從屬關系。
我們再來看看SpanContext:

import io.opentracing.SpanContext;

import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * StarAtlasSpanContext
 *
 * the span context implementation to store span information
 *
 */
public class StarAtlasSpanContext implements SpanContext {

    //spanId 分隔符
    public static final String        RPC_ID_SEPARATOR       = ".";

    //======================== 以下為序列化數據的 key ========================

    private static final String       TRACE_ID_KET           = "tcid";

    private static final String       SPAN_ID_KET            = "spid";

    private static final String       PARENT_SPAN_ID_KET     = "pspid";

    private static final String       SAMPLE_KET             = "sample";

    private AtomicInteger childContextIndex = new AtomicInteger(0);

    private String spanId;

    private String traceId;

    private String parentId;

    /***
     * 默認不會采樣
     */
    private boolean isSampled = false;

    public StarAtlasSpanContext(String traceId, String spanId, String parentId) {
        //默認不會采樣
        this(traceId, spanId, parentId, false);
    }

    public StarAtlasSpanContext(String traceId, String spanId, String parentId, boolean isSampled) {
        this.traceId = traceId;
        this.spanId = spanId;
        this.parentId = StringUtils.isBlank(parentId) ? this.genParentSpanId(spanId) : parentId;
        this.isSampled = isSampled;
    }

    @Override
    public Iterable<Map.Entry<String, String>> baggageItems() {
        return null;
    }

    /**
     * 獲取下一個子上下文的 ID
     *
     * @return 下一個 spanId
     */
    public String nextChildContextId() {
        return this.spanId + RPC_ID_SEPARATOR + childContextIndex.incrementAndGet();
    }

    public String getSpanId() {
        return spanId;
    }

    public void setSpanId(String spanId) {
        this.spanId = spanId;
    }

    public String getTraceId() {
        return traceId;
    }

    public void setTraceId(String traceId) {
        this.traceId = traceId;
    }

    public String getParentId() {
        return parentId;
    }

    public void setParentId(String parentId) {
        this.parentId = parentId;
    }

    public boolean isSampled() {
        return isSampled;
    }

    public void setSampled(boolean sampled) {
        isSampled = sampled;
    }

    private String genParentSpanId(String spanId) {
        return (StringUtils.isBlank(spanId) || spanId.lastIndexOf(RPC_ID_SEPARATOR) < 0) ? StringUtils.EMPTY_STRING
                : spanId.substring(0, spanId.lastIndexOf(RPC_ID_SEPARATOR));
    }
}

這個類跟Span類似,也是存儲了一些spanId、traceId和baggage等信息,另外有幾個比較特別的函數,包括獲取當前上下文的父級spanId,生成下一級的子span的id。
接下來再看看Scope和ScopeManager:

import io.opentracing.Scope;
import io.opentracing.ScopeManager;
import io.opentracing.Span;

/**
 * StarAtlasScopeManager
 * <p>
 * the scope manager to store and manage the scope information within a thread
 *
 */
public class StarAtlasScopeManager implements ScopeManager {

    /**
     * the thread local store for the active scope
     */
    final ThreadLocal<StarAtlasScope> scopeThreadLocal = new ThreadLocal<>();

    /**
     * singleton method
     *
     * @return
     */
    public static StarAtlasScopeManager getInstance() {
        return StarAtlasScopeManagerSingletonHolder.INSTANCE;
    }

    private StarAtlasScopeManager() {

    }

    /**
     * the method to active a span
     *
     * @param span
     * @param finishOnClose
     * @return
     */
    @Override
    public Scope activate(Span span, boolean finishOnClose) {
        if (!checkCanActivate(span)) {
            throw new IllegalStateException("a span cannot be activated more than once");
        }
        return new StarAtlasScope(this, span, finishOnClose);
    }

    /**
     * the method to get the current active span
     *
     * @return
     */
    @Override
    public Scope active() {
        return this.scopeThreadLocal.get();
    }

    /**
     * check if the span can be activate
     * if the span exists in the recover chain of the current active scope
     * then we know that the span has been activate before.
     *
     * @param span
     * @return
     */
    private boolean checkCanActivate(Span span) {
        StarAtlasScope scope = (StarAtlasScope) this.active();
        while (scope != null) {
            if (scope.span() == span) {
                return false;
            }
            scope = scope.scopeToRecover;
        }
        return true;
    }

    private static class StarAtlasScopeManagerSingletonHolder {
        private static final StarAtlasScopeManager INSTANCE = new StarAtlasScopeManager();
    }
}

這里ScopeManage主要通過一個ThreadLocal來存儲當前Span的信息(用一個Scope來包裝)。然后實現了三個方法:

  • activate 在當前線程中激活一個span,并返回一個scope封裝當前激活的span
  • active 返回當前線程激活的scope
  • checkCanActivate 這是自行實現的一個方法,我們激活一個span封裝scope的時候會把激活前線程中激活的scope以scopeToRecover變量存儲在新激活的scope中(具體可參考接下來scope的代碼)。這樣我們就可以根據當前激活的scope以scopeToRecover來不斷地追溯到最初,因此當我們激活一個span的時候,我們就可以通過這個span在不在追溯的鏈路上來判斷是否這個span被重復激活了。

Scope代碼如下:

import io.opentracing.Scope;
import io.opentracing.Span;

/**
 * StarAtlasScope
 * <p>
 * StarAtlasScope is a wrap class for span
 * It represents a active span in current thread.
 * And it support close function to deactivate a span
 *
 */
public class StarAtlasScope implements Scope {

    /**
     * finish the span or not when we close the scope
     */
    private final boolean finishOnClose;

    /**
     * the wrapped span
     */
    private final Span span;

    /**
     * scope manager
     */
    private final StarAtlasScopeManager scopeManager;

    /**
     * the scope to recover on close
     */
    final StarAtlasScope scopeToRecover;

    StarAtlasScope(StarAtlasScopeManager scopeManager, Span span, boolean finishOnClose) {
        this.finishOnClose = finishOnClose;
        this.span = span;
        this.scopeManager = scopeManager;
        // store the previous scope to recover
        this.scopeToRecover = this.scopeManager.scopeThreadLocal.get();
        // push the current scope into thread local
        // may extract into a package level method in StarAtlasScopeManager
        this.scopeManager.scopeThreadLocal.set(this);
    }

    /**
     * call close means the active period for the current thread and scope comes to an end
     */
    @Override
    public void close() {
        // if the current active scope does not equal to this
        // the close operation can not continue
        if (scopeManager.active() != this) {
            throw new IllegalStateException("can not call scope close in an unexpected way");
        }
        if (finishOnClose) {
            span.finish();
        }
        // recover the scope
        this.scopeManager.scopeThreadLocal.set(this.scopeToRecover);
    }

    @Override
    public Span span() {
        return span;
    }

}

Scope的實現基本就是封裝了一個span,并且在創建的時候把之前激活的scope存下來(印證了之前的說法),支持兩個方法:

  • close 關閉當前的scope,也連帶的把封裝的span關閉,并且恢復線程中激活的scope到之前。
  • span 返回封裝的span

最后我們再來看看Tracer:

import io.opentracing.*;
import io.opentracing.propagation.Format;

import java.util.*;

/**
 */
public class StarAtlasTracer implements Tracer {

    /**
     * traceID的KEY
     */
    public static final String KEY_TRACEID = "SA-TRACEID";

    /**
     * 正常 TRACE 開始的 spanId
     */
    public static final String  ROOT_SPAN_ID = "0";

    @Override
    public ScopeManager scopeManager() {
        return StarAtlasScopeManager.getInstance();
    }

    @Override
    public Span activeSpan() {
        return this.scopeManager().active().span();
    }

    @Override
    public SpanBuilder buildSpan(String operationName) {
        return new StarAtlasSpanBuilder(operationName);
    }

    @Override
    public <C> void inject(SpanContext spanContext, Format<C> format, C c) {

    }

    @Override
    public <C> SpanContext extract(Format<C> format, C c) {
        return null;
    }

    /**
     * the implementation of span builder
     */
    private class StarAtlasSpanBuilder implements SpanBuilder {

        private String operationName = StringUtils.EMPTY_STRING;

        private long startTime = -1;

        private List<StarAtlasSpanReferenceRelationship> references = Collections.emptyList();

        private final Map<String, Object> tags          = new HashMap<String, Object>();

        private boolean ignoreActiveSpan = false;

        public StarAtlasSpanBuilder(String operationName){
            this.operationName = operationName;
        }

        @Override
        public SpanBuilder asChildOf(SpanContext parentContext) {
            return addReference(References.CHILD_OF, parentContext);
        }

        @Override
        public SpanBuilder asChildOf(Span parentSpan) {
            if(parentSpan == null){
                return this;
            }
            return asChildOf(parentSpan.context());
        }

        @Override
        public SpanBuilder addReference(String referenceType, SpanContext referencedContext) {
            if (referencedContext == null) {
                return this;
            }
            if (!(referencedContext instanceof StarAtlasSpanContext)) {
                return this;
            }
            if (!References.CHILD_OF.equals(referenceType)
                    && !References.FOLLOWS_FROM.equals(referenceType)) {
                return this;
            }
            if (references.isEmpty()) {
                // Optimization for 99% situations, when there is only one parent
                references = Collections.singletonList(new StarAtlasSpanReferenceRelationship(
                        (StarAtlasSpanContext) referencedContext, referenceType));
            } else {
                if (references.size() == 1) {
                    //要保證有順序
                    references = new ArrayList<StarAtlasSpanReferenceRelationship>(references);
                }
                references.add(new StarAtlasSpanReferenceRelationship(
                        (StarAtlasSpanContext) referencedContext, referenceType));
            }
            return this;
        }

        @Override
        public SpanBuilder ignoreActiveSpan() {
            throw new UnsupportedOperationException("unsupport ignore active span right now");
        }

        @Override
        public SpanBuilder withTag(String key, String value) {
            this.tags.put(key, value);
            return this;
        }

        @Override
        public SpanBuilder withTag(String key, boolean value) {
            this.tags.put(key, value);
            return this;
        }

        @Override
        public SpanBuilder withTag(String key, Number value) {
            this.tags.put(key, value);
            return this;
        }

        @Override
        public SpanBuilder withStartTimestamp(long startTime) {
            this.startTime = startTime;
            return this;
        }

        @Override
        public Scope startActive(boolean finishOnClose) {
            Span span = this.start();
            return StarAtlasTracer.this.scopeManager().activate(span, finishOnClose);
        }

        @Override
        public Span startManual() {
            return null;
        }

        @Override
        public Span start() {
            StarAtlasSpanContext spanContext = null;
            if(this.references.size() > 0){
                // there is a parent context
                spanContext = createChildContext();
            }else if (!this.ignoreActiveSpan
                    && StarAtlasTracer.this.scopeManager().active() != null){
                // use the current span as default parent;
                Scope currentScope = StarAtlasTracer.this.scopeManager().active();
                this.asChildOf(currentScope.span());
                spanContext = createChildContext();
            }else {
                // it should be the root
                spanContext = createRootSpanContext();
            }
            long begin = this.startTime > 0 ? this.startTime : System.currentTimeMillis();
            StarAtlasSpan span = new StarAtlasSpan(StarAtlasTracer.this, begin,
                    this.references, this.operationName, spanContext, this.tags);
            return span;
        }

        private StarAtlasSpanContext createRootSpanContext(){
            String traceId = TraceIdGenerator.generate();
            return new StarAtlasSpanContext(traceId, ROOT_SPAN_ID, StringUtils.EMPTY_STRING);
        }

        private StarAtlasSpanContext createChildContext() {
            StarAtlasSpanContext preferredReference = preferredReference();

            StarAtlasSpanContext sofaTracerSpanContext = new StarAtlasSpanContext(
                    preferredReference.getTraceId(), preferredReference.nextChildContextId(),
                    preferredReference.getSpanId(), preferredReference.isSampled());
            return sofaTracerSpanContext;
        }

        /**
         * choose the preferred reference
         * @return
         */
        private StarAtlasSpanContext preferredReference() {
            StarAtlasSpanReferenceRelationship preferredReference = references.get(0);
            for (StarAtlasSpanReferenceRelationship reference : references) {
                // childOf takes precedence as a preferred parent
                String referencedType = reference.getReferenceType();
                if (References.CHILD_OF.equals(referencedType)
                        && !References.CHILD_OF.equals(preferredReference.getReferenceType())) {
                    preferredReference = reference;
                    break;
                }
            }
            return preferredReference.getSpanContext();
        }
    }
}

這里借鑒了一些sofa-tracer里面的實現。主要邏輯就是實現了SpanBuilder來完成創建Span的邏輯,并且提供了激活span的接口。

測試

完成了這些功能之后,我們可以編寫下列單元測試代碼來進行測試:

import io.opentracing.Scope;
import io.opentracing.Span;
import org.junit.Assert;
import org.junit.Test;

/**
 * StarAtlasTracerTest
 *
 */
public class StarAtlasTracerTest {
    /**
     * 測試僅生成root
     */
    @Test
    public void generateRoot(){
        StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
        Span root = starAtlasTracer.buildSpan("root").start();
        Assert.assertNotNull(root);
        StarAtlasSpanContext context = (StarAtlasSpanContext) root.context();
        Assert.assertEquals(context.getSpanId(), "0");
        Assert.assertEquals(context.getParentId(), "");
        Assert.assertFalse(StringUtils.isBlank(context.getTraceId()));
        Assert.assertNull(starAtlasTracer.scopeManager().active());
    }

    /**
     * 測試生成root并activate
     */
    @Test
    public void generateRootAndActivate(){
        StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
        Scope rootScope = starAtlasTracer.buildSpan("root").startActive(true);
        Assert.assertNotNull(rootScope);
        StarAtlasSpanContext context = (StarAtlasSpanContext) rootScope.span().context();
        Assert.assertEquals(context.getSpanId(), "0");
        Assert.assertEquals(context.getParentId(), "");
        Assert.assertNotNull(starAtlasTracer.scopeManager().active());
        Assert.assertEquals(rootScope, starAtlasTracer.scopeManager().active());
        rootScope.close();
        Assert.assertNull(starAtlasTracer.scopeManager().active());
    }

    /**
     * 測試生成child并activate
     */
    @Test
    public void generateChildAndActivate(){
        StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
        Scope rootScope = starAtlasTracer.buildSpan("root").startActive(true);
        StarAtlasSpanContext rootContext = (StarAtlasSpanContext) rootScope.span().context();
        Assert.assertNotNull(rootScope);
        Span child = starAtlasTracer.buildSpan("child").asChildOf(rootScope.span()).start();
        StarAtlasSpanContext context = (StarAtlasSpanContext)child.context();
        Assert.assertEquals(context.getSpanId(), "0.1");
        Assert.assertEquals(context.getTraceId(), rootContext.getTraceId());
        Assert.assertEquals(rootScope, starAtlasTracer.scopeManager().active());
        Scope childScope = starAtlasTracer.scopeManager().activate(child, true);
        Assert.assertEquals(childScope, starAtlasTracer.scopeManager().active());
        childScope.close();
        Assert.assertEquals(rootScope, starAtlasTracer.scopeManager().active());
        rootScope.close();
    }

    /**
     * 測試重復激活span
     */
    @Test
    public void testDuplicatedActivate(){
        StarAtlasTracer starAtlasTracer = new StarAtlasTracer();
        Span root = starAtlasTracer.buildSpan("root").start();
        Scope rootScope = starAtlasTracer.scopeManager().activate(root, true);
        Span child = starAtlasTracer.buildSpan("child").start();
        Scope childScope = starAtlasTracer.scopeManager().activate(child, true);
        try{
            starAtlasTracer.scopeManager().activate(root, true);
        } catch (Exception e){
            System.out.println(e.getMessage());
            Assert.assertTrue(e instanceof IllegalStateException);
        }
        childScope.close();
        rootScope.close();
    }
}

具體測試場景在注釋中都有,有興趣的同學可以自行泡一下。

后記

本篇文章講解了一下opentracing中的基本概念,并提供了一個基本的實現和測試。后續有時間和精力的情況下有可能會有后續文章討論一下如何介入dubbo/http等場景。有問題的同學可以通過評論來討論。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容