鏈路監控

鏈路監控

以前的方案
    全量校驗,邏輯上就是select * from tcbuyer.order的結果,和select * from tc.order 的結果作對比。
    偽增量校驗 ,比較上一個小時的數據。
    單流增量校驗, 基于事件的比較,當買家庫生成一筆訂單后,相應地MySQL會產生一條binlog,單流增量校驗系統就能以這條binlog作為觸發條件,解析出binlog內容,去實時反查賣家庫有沒有對應記錄。

AMG的校驗圖模型——Check Graph
    假設交易鏈路有4個業務系統需要對賬,分別是交易、庫存、資金和支付,其中涉及的事件分別對應 交易下單事件、減庫存事件、使用紅包資金事件、支付事件。對賬的需求如下:

    交易事件 和 庫存事件 做校驗;

    交易事件 和 資金事件 做校驗;

    交易事件 和 支付事件 做校驗;

    資金事件 和 支付事件 做校驗。

    一旦上面4個校驗中的其中一個出現問題,都認為是業務系統存在異常,需要及時報出來。

    明顯可以看出來,這是一個圖模型。比如,A事件和B事件校驗,則存在一條邊,連接A和B點。以事件作為點(Node),事件間的校驗方法作為邊(Edge),構造出一個圖(Graph)模型。按照上述場景,構造的圖模型如下:


        交易 《-----校驗----》

集團的各個系統為了業務解耦、保證主鏈路的性能或可用性,各系統之間常常存在各種同步異步調用、強弱依賴關系。一旦網絡抖動、業務系統bug、或是某個子系統出現異常,就可能就會出現業務數據不一致。拿最核心的交易系統和庫存系統來說,用戶下了單之后,沒減庫存,那么很有可能出現超賣;用戶關閉訂單之后,沒有回補庫存,那么就會導致少賣。這就是交易和庫存系統之間的數據不一致。

 from influxdb import InfluxDBClient

 json_body = [
    {
        "measurement": "cpu_load_short",
        "tags": {
            "host": "server01",
            "region": "us-west"
        },
        "time": "2009-11-10T23:00:00Z",
        "fields": {
            "value": 0.64
        }
    }
]

 client = InfluxDBClient('localhost', 8086, 'root', 'root', 'example')

 client.create_database('example')

 client.write_points(json_body)

 result = client.query('select value from cpu_load_short;')

 print("Result: {0}".format(result))



insert prism_trace_log,serverApp='camel',serviceName='index.api', rt=50 '2017-09-08 13:00:01' 

// TRACE 類型默認不輸出 rpcId

==========================BaseModel
traceId
rpcId
timestamp
rpcType
rpcId
hostIp
==========================RpcModel
clientApp
clientIp
clientSpan
serverApp
serverIp
serverSpan
opName //操作名稱,一般視 RPC 情況確定,如 LOCAL、SYNC、CALLBACK、FUTURE 等;對于數據庫,如 QUERY、UPDATE、INSERT、DELETE
opType //操作類型,一般視 RPC 情況確定,如序列化方式,或讀寫標記等;對于數據庫,分成 R、W 兩種表示讀、寫操作
serviceName //接口名,
methodName //方法名
error //0
result // 1,2,3,3,4,5
==========================

http 總量
select count(*),sum(error),avg(serverSpan) from prism_trace where rpcType=0 and serverApp = ?

http 按頁面統計
select count(*),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=0 and serverApp = ? group by serviceName

RPC 總量
select count(),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ?
RPC 按服務統計
select count(
),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ? group by serviceName

RPC 服務來源
select count(),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ? and serviceName=? group by clientApp
RPC 服務去向
select count(
),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ? and serviceName=? group by serverApp

RPC 應用來源
select count(),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ? group by clientApp
RPC 應用去向
select count(
),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ? group by serverApp

DB 總量
select count(),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=3 and serverApp = ?
DB 按表統計
select count(
),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ? group by serviceName
DB 統計表的來源
select count(*),avg(serverSpan),max(serverSpan),sum(error) from prism_trace where rpcType=1 and serverApp = ? and serviceName=? group by clientApp

錯誤類型:
/**
* 未知
/
UNKNOWN,
/
*
* 成功
/
OK,
/
*
* 業務錯誤
/
BIZ_ERROR,
/
*
* RPC 錯誤
/
RPC_ERROR,
/
*
* 超時
/
TIMEOUT,
/
*
* 軟錯誤,一般用于資源找不到、未命中、加鎖未成功、
* 版本不一致導致未更新等情況,需要根據中間件不同來判定
/
SOFT_ERROR,
/
*
* 限流錯誤
*/
LIMIT_ERROR,

模型的字段如下:

OpName:DB 操作名,如 QUERY、UPDATE,(TDDL v5 后增加的)INSERT、DELETE
OpType:DB 操作類型,分成 R、W 兩種表示讀、寫操作
ServiceDim1:物理庫名
ServiceDim2:tableName,例如 JOIN:TABLE_A,TABLE_C,TABLE_B
ServiceDim3:邏輯 SQL 編碼
ServerName:(db@dbName),例如 andor_mysql_group
ClientName:clientAppId
ServerDimKey:TDDL_opName@dbName:tableName

tlive,,mtop/get.do(),500
1. tlive,fun,CommentService,save,100
2. tlive,fun,CommentService,save, 90
fun,db,"table1",100
tlive,fun,MemberService,save,200
fun,db,"table2",100

/*
 * Rpc 類型的數字編號
 */
// @formatter:off
public static final int RPC_TYPE_UNKNOWN =                   255;
public static final int RPC_TYPE_TRACE =                       0;
public static final int RPC_TYPE_HSF =                         1;
public static final int RPC_TYPE_HSF_SERVER =                  2;
public static final int RPC_TYPE_NOTIFY =                      3;
public static final int RPC_TYPE_TDDL =                        4;
public static final int RPC_TYPE_TAIR =                        5;
public static final int RPC_TYPE_SEARCH =                      6;
public static final int RPC_TYPE_MASTER =                     11;
public static final int RPC_TYPE_SLAVE =                      12;
public static final int RPC_TYPE_METAQ =                      13;
public static final int RPC_TYPE_DRDS =                       14;
public static final int RPC_TYPE_TFS =                        15;
public static final int RPC_TYPE_ALIPAY =                     16;
public static final int RPC_TYPE_HTTP_B =                     20;
public static final int RPC_TYPE_HTTP =                       25;
public static final int RPC_TYPE_SENTINEL =                   26;
public static final int RPC_TYPE_LOCAL =                      30;
public static final int RPC_TYPE_JINGWEI =                    32;
public static final int RPC_TYPE_ISEARCH =                    36;
public static final int RPC_TYPE_LOCAL_NG =                   40;
public static final int RPC_TYPE_CSB_SERVER =                 52;
public static final int RPC_TYPE_HTTP_SERVER =               251;
public static final int RPC_TYPE_METAQ_RCV =                 252;
public static final int RPC_TYPE_ACCESS =                    253;
public static final int RPC_TYPE_NOTIFY_RCV =                254;
//自定的RPCTYPE
public static final int RPC_TYPE_CUSTOM_TRACE =               90;
public static final int RPC_TYPE_CUSTOM_RPC_CLIENT =          91;
public static final int RPC_TYPE_CUSTOM_RPC_SERVER =          92;
public static final int RPC_TYPE_CUSTOM_MESSAGE_PUB =         93;
public static final int RPC_TYPE_CUSTOM_MESSAGE_SUB =         96;
public static final int RPC_TYPE_CUSTOM_DB =                  94;
public static final int RPC_TYPE_CUSTOM_CACHE =               95;
public static final int RPC_TYPE_CUSTOM_PROTOCOL_CLIENT =     97;
public static final int RPC_TYPE_CUSTOM_PROTOCOL_SERVER =     98;
// @formatter:on

->A->B->C

client, server, type
-,a,0
a,b,1
a,b,2
b,c,1
b,c,2

LOCAL_IP_ADDRESS= getLocalInetAddress();

IP_16 = getIP_16(LOCAL_IP_ADDRESS);
IP_16 = getIP_16(LOCAL_IP_ADDRESS);

1.應用概要,2. 服務詳情,3. 應用去向 4.應用來源

  1. 概要
    數字
    select
    count(srSpan) as hitCount,mean(ssSpan) as rtAvg,sum(error) as errCount
    from prism_trace
    where (rpcType='0' or rpcType='2' or rpcType='3') and serverApp='cammel' and time>now()-1d and time<=now() group by serverIp,time(1d)

表格
from prism_trace
where (rpcType='0' or rpcType='2' or rpcType='3') and serverApp='cammel' and time>now()-1d and time<=now() group by rpcType,serviceName,time(1d)

  1. 服務詳情

大圖
select
count(srSpan) as hitCount,mean(ssSpan) as rtAvg,sum(error) as errCount
from prism_trace
where serverApp='cammel' and serviceName='?' and time>now()-1d and time<=now() group by time(1d)

去向
select
count(srSpan) as hitCount,mean(ssSpan) as rtAvg,sum(error) as errCount
from prism_trace
where clientApp='camel' and rpcType='1' and clientService='/login.do' and time>now()-1d and time<=now() group by serverApp,serviceName,time(1d)

來源
select
count(srSpan) as hitCount,mean(ssSpan) as rtAvg,sum(error) as errCount
from prism_trace
where serverApp='whale' and rpcType='1' and serviceName='MemberQueryService' and time>now()-1d and time<=now() group by clientApp,clientService,time(1d)

  1. 應用去向

select
count(srSpan) as hitCount,mean(ssSpan) as rtAvg,sum(error) as errCount
from prism_trace
where clientApp='camel' and rpcType='1' and time>now()-1d and time<=now() group by serverApp,serviceName,time(1d)

  1. 應用來源

select
count(srSpan) as hitCount,mean(ssSpan) as rtAvg,sum(error) as errCount
from prism_trace
where serverApp='whale' and rpcType='1' and time>now()-1d and time<=now() group by clientApp,clientService,time(1d)

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,424評論 0 10
  • 0 問題背景 隨著微服務架構的流行,服務按照不同的維度進行拆分,一次請求往往需要涉及到多個服務。互聯網應用構建在不...
    七寸知架構閱讀 39,389評論 8 91
  • 營造一種心境,然后在這種心一下去生活。 有些東西你只有去想你才能夠有結果,能想到什么程度就到什么程度,然后一步一步...
    演金閱讀 278評論 0 0
  • 最近手機攝影變得時髦了起來,想必大家也是聽說了,要破產玩單反,壕情不夠,只能靠APP逆襲。而AppStore也很給...
    烏素閱讀 1,352評論 6 11
  • 實現:獲取devicetoken(ios10 xcode8) 一、開發者AppDeveloper 注冊證書: 開發...
    huanyingmili閱讀 9,735評論 3 4