BlockCanary原理
如何計算主線程的方法執行耗時
計算方法耗時最簡單粗暴的就是在方法之前前記錄下開始時間,方法執行完后用當前時間剪去方法開始執行的時間就完事了,但是主線程那么多方法總不能每一個方法都這個干吧?那肯定崩!有沒有一個統一的地方來實現這個功能?當然有了,不然這篇博客到這里就戛然而止了......
public static void loop() {
final Looper me = myLooper();
if (me == null) {
throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
}
final MessageQueue queue = me.mQueue;
// Make sure the identity of this thread is that of the local process,
// and keep track of what that identity token actually is.
Binder.clearCallingIdentity();
final long ident = Binder.clearCallingIdentity();
for (;;) {
Message msg = queue.next(); // might block
if (msg == null) {
// No message indicates that the message queue is quitting.
return;
}
// This must be in a local variable, in case a UI event sets the logger
final Printer logging = me.mLogging;
if (logging != null) {
logging.println(">>>>> Dispatching to " + msg.target + " " +
msg.callback + ": " + msg.what);
}
final long traceTag = me.mTraceTag;
if (traceTag != 0 && Trace.isTagEnabled(traceTag)) {
Trace.traceBegin(traceTag, msg.target.getTraceName(msg));
}
try {
msg.target.dispatchMessage(msg);
} finally {
if (traceTag != 0) {
Trace.traceEnd(traceTag);
}
}
if (logging != null) {
logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
}
// Make sure that during the course of dispatching the
// identity of the thread wasn't corrupted.
final long newIdent = Binder.clearCallingIdentity();
if (ident != newIdent) {
Log.wtf(TAG, "Thread identity changed from 0x"
+ Long.toHexString(ident) + " to 0x"
+ Long.toHexString(newIdent) + " while dispatching to "
+ msg.target.getClass().getName() + " "
+ msg.callback + " what=" + msg.what);
}
msg.recycleUnchecked();
}
}
如上代碼中的loop()方法是Looper中的,我們的目的是監測主線程的卡頓問題,因為UI更新界面都是在主線程中進行的,所以在主線程中做耗時操作可能會造成界面卡頓,而主線程的Looper早已經在APP啟動的時候Android framework里面創建了main looper,那么一個線程對應一個Looper,Looper當中有一個MessageQueue,專門用來接收Handler發送過來的msg,并且在looper()方法中循環去從MessageQueue中去取msg,然后執行,而且是順序執行的,那么前面一個msg還沒處理完,loop()就會等待它處理完了才會再去執行下一個msg,如果前面一個msg處理很慢,那就會造成卡頓了,在msg.target.dispatchMessage(msg)前有:
if (logging != null) {
logging.println(">>>>> Dispatching to " + msg.target + " " +
msg.callback + ": " + msg.what);
}
而在dispatchMessage執行完了之后,又有:
if (logging != null) {
logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
}
所以,我們只需要計算打印這兩天log的時間差,就能得到dispatchMessage的耗時,android提供了Looper.getMainLooper().setMessageLogging(Printer printer)來設置這個logging對象,所以只要自定義一個Printer,然后重寫println(String x)方法即可實現耗時統計了,所以原理真的很簡單,原理固然簡單,但是還是要學會發現這個小技巧,對于BlockCanary而言,初始化:
public class MyApplication extends Application {
@Override
public void onCreate() {
super.onCreate();
BlockCanary.install(this, new BlockCanaryContext()).start();
}
}
查看start()方法里面的代碼:
/**
* Start monitoring.
*/
public void start() {
if (!mMonitorStarted) {
mMonitorStarted = true;
Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
}
}
而monitor是一個繼承Printer的LooperMonitor類new出來的對象,重寫print(String x)方法:
@Override
public void println(String x) {
if (mStopWhenDebugging && Debug.isDebuggerConnected()) {
return;
}
if (!mPrintingStarted) {
mStartTimestamp = System.currentTimeMillis();
mStartThreadTimestamp = SystemClock.currentThreadTimeMillis();
mPrintingStarted = true;
startDump();
} else {
final long endTime = System.currentTimeMillis();
mPrintingStarted = false;
if (isBlock(endTime)) {
notifyBlockEvent(endTime);
}
stopDump();
}
}
在dispatchMessage執行之前打印log的時候執行print,mPrintingStarted為false,所以就記錄當前的時間,以及當前線程時間mPrintingStarted設置為true,而dispatchMessage執行完后第二次打印log執行print方法,mPrintingStarted是true的,這時候dispatchMessage已經執行結束,然后就能計算耗時,搜集方法堆棧信息,cpu信息等等
方法堆棧信息的搜集
private void startDump() {
if (null != BlockCanaryInternals.getInstance().stackSampler) {
BlockCanaryInternals.getInstance().stackSampler.start();
}
if (null != BlockCanaryInternals.getInstance().cpuSampler) {
BlockCanaryInternals.getInstance().cpuSampler.start();
}
}
在LooperMonitor的print方法中會執行這個方法,同時采集方法堆棧信息和cpu信息,對于堆棧信息stackSampler.start():
abstract class AbstractSampler {
private static final int DEFAULT_SAMPLE_INTERVAL = 300;
protected AtomicBoolean mShouldSample = new AtomicBoolean(false);
protected long mSampleInterval;
private Runnable mRunnable = new Runnable() {
@Override
public void run() {
doSample();
if (mShouldSample.get()) {
HandlerThreadFactory.getTimerThreadHandler()
.postDelayed(mRunnable, mSampleInterval);
}
}
};
public AbstractSampler(long sampleInterval) {
if (0 == sampleInterval) {
sampleInterval = DEFAULT_SAMPLE_INTERVAL;
}
mSampleInterval = sampleInterval;
}
public void start() {
if (mShouldSample.get()) {
return;
}
mShouldSample.set(true);
HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable);
HandlerThreadFactory.getTimerThreadHandler().postDelayed(mRunnable,
BlockCanaryInternals.getInstance().getSampleDelay());
}
public void stop() {
if (!mShouldSample.get()) {
return;
}
mShouldSample.set(false);
HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable);
}
abstract void doSample();
}
調用start方法之后就執行:
private Runnable mRunnable = new Runnable() {
@Override
public void run() {
doSample();
if (mShouldSample.get()) {
HandlerThreadFactory.getTimerThreadHandler()
.postDelayed(mRunnable, mSampleInterval);
}
}
};
并且這里控制stackSampler.start()只能執行一次,在run方法里面我們可以發現每次間隔mSampleInterval就會去重新跑一次doSample(),這里會執行StackSampler的doSample():
@Override
protected void doSample() {
StringBuilder stringBuilder = new StringBuilder();
for (StackTraceElement stackTraceElement : mCurrentThread.getStackTrace()) {
stringBuilder
.append(stackTraceElement.toString())
.append(BlockInfo.SEPARATOR);
}
synchronized (sStackMap) {
if (sStackMap.size() == mMaxEntryCount && mMaxEntryCount > 0) {
sStackMap.remove(sStackMap.keySet().iterator().next());
}
sStackMap.put(System.currentTimeMillis(), stringBuilder.toString());
}
}
mCurrentThread就是主線程對象,0.8 * mSampleInterval(卡頓時長閥值)后的去獲取線程的堆棧信息并保存到sStackMap中,這里的意思是,我們認為方法執行超過mSampleInterval就表示卡頓,當方法執行時間已經到了mSampleInterval的0.8倍的時候還沒執行完,那么這時候就開始采集方法執行堆棧信息了,如果方法在0.9 * mSampleInterval的時候執行完成,那么不會警告卡頓,但是如果方法執行耗時超過mSampleInterval,那就把0.8 * mSampleInterval這個時間點的堆棧信息認為是造成耗時原因的堆棧信息,而且,這里只要方法還沒執行完,就會間隔mSampleInterval去再次獲取函數執行堆棧信息并保存,這里之所以遙在0.8 * mSampleInterval的時候就去獲取堆棧信息時為了獲取到準確的堆棧信息,因為既然函數耗時已經達到0.8 * mSampleInterval了,并且函數還沒執行結束,那么很大概率上會導致卡頓了,所以提前獲取函數執行堆棧保證獲取到造成卡頓的函數調用堆棧的正確性,后面又不斷間隔mSampleInterval去獲取函數執行堆棧式要獲取到更多完整的堆棧信息,當方法執行完成后就會停止獲取函數執行堆棧了,所有的函數執行堆棧信息最多存100條,也就是最多有100個函數調用堆棧,以當前的時間戳作為key,當監測到卡頓的時候,要把之前保存在sStackMap的函數堆棧信息展示通知出來,通過時間戳就能取到:
private void notifyBlockEvent(final long endTime) {
final long startTime = mStartTimestamp;
final long startThreadTime = mStartThreadTimestamp;
final long endThreadTime = SystemClock.currentThreadTimeMillis();
HandlerThreadFactory.getWriteLogThreadHandler().post(new Runnable() {
@Override
public void run() {
mBlockListener.onBlockEvent(startTime, endTime, startThreadTime, endThreadTime);
}
});
}
然后再看mBlockListener.onBlockEvent(startTime, endTime, startThreadTime, endThreadTime),因為初始化的時候在BlockCanaryInternals構造函數里面已經setMonitor了,并且實現了onBlockEvent:
public BlockCanaryInternals() {
stackSampler = new StackSampler(
Looper.getMainLooper().getThread(),
sContext.provideDumpInterval());
cpuSampler = new CpuSampler(sContext.provideDumpInterval());
setMonitor(new LooperMonitor(new LooperMonitor.BlockListener() {
@Override
public void onBlockEvent(long realTimeStart, long realTimeEnd,
long threadTimeStart, long threadTimeEnd) {
// Get recent thread-stack entries and cpu usage
ArrayList<String> threadStackEntries = stackSampler
.getThreadStackEntries(realTimeStart, realTimeEnd);
if (!threadStackEntries.isEmpty()) {
BlockInfo blockInfo = BlockInfo.newInstance()
.setMainThreadTimeCost(realTimeStart, realTimeEnd, threadTimeStart, threadTimeEnd)
.setCpuBusyFlag(cpuSampler.isCpuBusy(realTimeStart, realTimeEnd))
.setRecentCpuRate(cpuSampler.getCpuRateInfo())
.setThreadStackEntries(threadStackEntries)
.flushString();
LogWriter.save(blockInfo.toString());
if (mInterceptorChain.size() != 0) {
for (BlockInterceptor interceptor : mInterceptorChain) {
interceptor.onBlock(getContext().provideContext(), blockInfo);
}
}
}
}
}, getContext().provideBlockThreshold(), getContext().stopWhenDebugging()));
LogWriter.cleanObsolete();
}
再看ArrayList<String> threadStackEntries = stackSampler
.getThreadStackEntries(realTimeStart, realTimeEnd) 獲取函數堆棧信息:
public ArrayList<String> getThreadStackEntries(long startTime, long endTime) {
ArrayList<String> result = new ArrayList<>();
synchronized (sStackMap) {
for (Long entryTime : sStackMap.keySet()) {
if (startTime < entryTime && entryTime < endTime) {
result.add(BlockInfo.TIME_FORMATTER.format(entryTime)
+ BlockInfo.SEPARATOR
+ BlockInfo.SEPARATOR
+ sStackMap.get(entryTime));
}
}
}
return result;
}
這里面就是通過把開始時間,結束時間和剛才保存起來的堆棧信息的key,也就是保存堆棧信息的時間做對比,在開始時間和結束時間這個范圍內的堆棧信息才是有用的,如果一個函數執行了3秒,那么這里會把這三秒內的所有函數執行堆棧信息都取出來,然后再封裝成BlockInfo通知到外面,同時可存到文件中,到這里造成卡頓的函數執行堆棧已經采集完成
CPU信息采集
- 采集當前cpu的使用率,如果cpu使用率太高,可能會導致cpu處理來不及,所以函數執行到一半可能暫時掛起,等待cpu重新調度
- 采集當前cpu是否繁忙而處理不過來,道理如上,cpu繁忙會導致函數執行一半倍掛起,需要等到下一次cpu調度后重新繼續執行
- 當前app的cpu占用率
- 用戶使用情況,系統使用情況
- %ioWait:首先 %iowait 升高并不能證明等待I/O的進程數量增多了,也不能證明等待I/O的總時間增加了;
例如,在CPU繁忙期間發生的I/O,無論IO是多還是少,%iowait都不會變;當CPU繁忙程度下降時,有一部分IO落入CPU空閑時間段內,導致%iowait升高。
再比如,IO的并發度低,%iowait就高;IO的并發度高,%iowait可能就比較低。
可見%iowait是一個非常模糊的指標,如果看到 %iowait 升高,還需檢查I/O量有沒有明顯增加,avserv/avwait/avque等指標有沒有明顯增大,應用有沒有感覺變慢,如果都沒有,就沒什么好擔心的。
BlockCanary幾個核心
- LooperMonitor負責統計方法耗時
- StackSampler函數執行堆棧采集
- CpuSampler式cpu信息采集
- 耗時異常信息提醒與展示,相關信息持久化到文件,這些比較簡單了,這里就不再詳細敘述述
- 綜合來看BlockCanary的一個執行流程時這樣的,這邊使用下BlockCanary的github上面的一張圖片: