一.ANR是什么
Application Not Responding,字面意思就是應用無響應,稍加解釋就是用戶的一些操作無法從應用中獲取反饋。在實際的應用中應當去避免這種現(xiàn)象,雖然他暫時不會造成應用崩潰,但是卻極大的損壞了用戶體驗。
二.ANR的觸發(fā)原因
出現(xiàn)ANR之后一個直觀現(xiàn)象就是系統(tǒng)會展示出一個ANR對話框,大概是這樣。
谷歌文檔中對ANR產生的原因是這么描述的:
- Android系統(tǒng)中的應用被Activity Manager及Window Manager兩個系統(tǒng)服務監(jiān)控著,Android系統(tǒng)會在如下兩種情況展示出ANR的對話框
- 5秒內無法對輸入事件(按鍵及觸摸)做出響應
廣播接收器無法在10秒內結束運行
因此如何避免上面兩種情況的發(fā)生就是解決ANR問題的方向
三.ANR時系統(tǒng)做了什么
這里暫時不去從源碼上追究為什么出現(xiàn)上面兩種情況就會導致ANR,是不是只有這兩種情況才會導致ANR,這部分在后續(xù)有時間的時候再回來補充,我們先來看看ANR時系統(tǒng)做了什么然后從源碼分析為什么會做這些事情,知道他做了什么,對我們分析ANR是很有幫助的。
1.彈出一個丑陋無比的對話框
哪怕你的應用有多帥,比如這樣
我相信這個對話框大家都看過,就不做敘述了
2.將ANR信息輸出到traces.txt文件中
traces.txt文件是一個ANR記錄文件,用于開發(fā)人員調試,目錄位于/data/anr中,無需root權限即可通過pull命令獲取,下面的命令可以將traces.txt文件拷貝到當前目錄下
adb pull /data/anr .
這部分的內容可能以前沒接觸過ANR的同學已經(jīng)不太清楚了,第四部分會對如何分析這個文件做詳細描述。
3.將ANR信息輸出到Logcat中
這部分的信息如果不注意是很難發(fā)現(xiàn)的,截圖的內容只是其中輸出的一部分,簡要解釋一下這些內容的含義
1) ANR in xxx,PID: xxx,Reason:xxx
這里簡要敘述了出現(xiàn)ANR的應用以及進程號,比如上面的截圖,ANR發(fā)生在androitest應用中,對應的進程號是7650,后續(xù)還給出了出現(xiàn)ANR的原因,當然這種原因并不能幫我們直接發(fā)現(xiàn)問題根源
一般來說原因有兩類,分別對應于InputDispatcher.cpp文件的兩段代碼,InputDispatcher是派發(fā)事件過程中的一個中轉類,每次派發(fā)事件時他會進行如下判斷
a) 判斷是否有focused組件以及focusedApplication
這種一般是在應用啟動時觸發(fā),比如啟動時間過長在這過程中觸發(fā)了keyevent或者trackball moteionevent就會出現(xiàn)
這種情況下,對應的Reason類似于這樣
Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
我們Monkey腳本中出現(xiàn)的ANR問題絕大部分都是屬于這種問題,實際上出現(xiàn)這種ANR之前我們應用一般發(fā)生了崩潰需要重啟,但是重啟進行的操作比較耗時,但是具體并不清楚耗時的地方。
b) 判斷前面的事件是否及時完成
這里的事件包含keyevent和touchevent,雖然它們對允許的延時要求不一樣,但最終都會執(zhí)行到如下代碼
這種情況下,對應的Reason類似于這樣
Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 10. Wait queue head age: 5591.3ms.)
出現(xiàn)這種問題意味著主線程正在執(zhí)行其他的事件但是比較耗時導致輸入事件無法及時處理。
2) Load: 12.43/5.25/1.97
這里輸出了CPU的負載情況
CPU負載是指某一時刻系統(tǒng)中運行隊列長度之和加上當前正在CPU上運行的進程數(shù),而CPU平均負載可以理解為一段時間內正在使用和等待使用CPU的活動進程的平均數(shù)量。在Linux中“活動進程”是指當前狀態(tài)為運行或不可中斷阻塞的進程。通常所說的負載其實就是指平均負載。
用一個從網(wǎng)上看到的很生動的例子來說明(不考慮CPU時間片的限制),把設備中的一個單核CPU比作一個電話亭,把進程比作正在使用和等待使用電話的人,假如有一個人正在打電話,有三個人在排隊等待,此刻電話亭的負載就是4。使用中會不斷的有人打完電話離開,也會不斷的有其他人排隊等待,為了得到一個有參考價值的負載值,可以規(guī)定每隔5秒記錄一下電話亭的負載,并將某一時刻之前的一分鐘、五分鐘、十五分鐘的的負載情況分別求平均值,最終就得到了三個時段的平均負載。
實際上我們通常關心的就是在某一時刻的前一分鐘、五分鐘、十五分鐘的CPU平均負載,例如以上日志中這三個值分別是12.43,5.25,1.97,說明前一分鐘內正在使用和等待使用CPU的活動進程平均有12.43個,依此類推。在大型服務器端應用中主要關注的是第五分鐘和第十五分鐘的兩個值,但是Android主要應用在便攜手持設備中,有特殊的軟硬件環(huán)境和應用場景,短時間內的系統(tǒng)的較高負載就有可能造成ANR,所以我認為一分鐘內的平均負載相對來說更具有參考價值。
CPU的負載和使用率沒有必然關系,有可能只有一個進程在使用CPU,但執(zhí)行的是復雜的操作;也有可能等待和正在使用CPU的進程很多,但每個進程執(zhí)行的都是簡單操作。
實際處理問題時偶爾會遇到由于平均負載高引起的ANR,典型的特征就是系統(tǒng)中應用進程數(shù)量多,CPU總使用率較高,但是每個進程的CPU使用率不高,當前應用進程主線程沒有異常阻塞,一分鐘內的CPU平均負載較高。
3) CPU usage xxx
CPU usage from 75634ms to 0ms ago:
25% 869/system_server: 19% user + 6.1% kernel / faults: 86246 minor
...
CPU usage from 601ms to 1132ms later with 99% awake
...
這里會分兩段打印出出現(xiàn)ANR前后CPU的使用率情況,每一段包含了所有進程的CPU使用率,如果說某個進程在ANR發(fā)生時CPU使用率出現(xiàn)很高的情況,那么就可以知道這個進程在做非常消耗CPU的事情,一般這種情況下這個進程就是ANR進程,而消耗CPU的這個事情往往就是導致ANR的根源。
4.源碼分析
最后來看看做出上述反應的源代碼,這部分代碼位于ActivityManagerService類中
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
// mController是IActivityController接口的實例,是為Monkey測試程序預留的,默認為null
if (mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mController.appEarlyNotResponding(app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
// 更新CPU狀態(tài)信息
long anrTime = SystemClock.uptimeMillis();
if (MONITOR_CPU_USAGE) {
updateCpuStatsNow();
}
synchronized (this) {
// 某些特定情況下忽略本次ANR,比如系統(tǒng)關機,比如該進程已經(jīng)處于anr狀態(tài)或者crash狀態(tài)
if (mShuttingDown) {
Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);
return;
} else if (app.notResponding) {
Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);
return;
} else if (app.crashing) {
Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);
return;
}
// 為了防止多次對相同app的anr執(zhí)行重復代碼,在此處標注記錄,屬于上面的特定情況種的一種
app.notResponding = true;
// 記錄ANR信息到Event Log中
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
// 添加當前app到firstpids列表中
firstPids.add(app.pid);
// 如果可能添加父進程到firstpids列表種
int parentPid = app.pid;
if (parent != null && parent.app != null && parent.app.pid > 0) parentPid = parent.app.pid;
if (parentPid != app.pid) firstPids.add(parentPid);
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
// 添加所有進程到firstpids中
for (int i = mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid);
} else {
lastPids.put(pid, Boolean.TRUE);
}
}
}
}
}
// 將ANR信息存在info變量中,后續(xù)打印到LOGCAT,這部分的信息會以ActivityManager為Tag打印出來,包含了ANR的進程,出現(xiàn)原因以及當時的CPU狀態(tài),這些對分析ANR是非常重要的信息
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(app.pid).append("\n");
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// 將ANR信息輸出到traces文件
File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,
NATIVE_STACKS_OF_INTEREST);
String cpuInfo = null;
if (MONITOR_CPU_USAGE) {
updateCpuStatsNow();
synchronized (mProcessCpuTracker) {
cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
}
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
}
info.append(processCpuTracker.printCurrentState(anrTime));
// 輸出到logcat的語句
Slog.e(TAG, info.toString());
// 如果traces文件未創(chuàng)建,則只記錄當前進程trace并且發(fā)送QUIT信號到進程
if (tracesFile == null) {
// There is no trace file, so dump (only) the alleged culprit's threads to the log
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
}
// 將ANR信息處處到DropBox目錄下,也就是說除了traces文件還會有一個dropbox文件用于記錄ANR
addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
//...
synchronized (this) {
mBatteryStatsService.noteProcessAnr(app.processName, app.uid);
if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {
app.kill("bg anr", true);
return;
}
// Set the app's notResponding state, and look up the errorReportReceiver
makeAppNotRespondingLocked(app,
activity != null ? activity.shortComponentName : null,
annotation != null ? "ANR " + annotation : "ANR",
info.toString());
// 發(fā)送SHOW_NOT_RESPONDING_MSG,準備顯示ANR對話框
Message msg = Message.obtain();
HashMap<String, Object> map = new HashMap<String, Object>();
msg.what = SHOW_NOT_RESPONDING_MSG;
msg.obj = map;
msg.arg1 = aboveSystem ? 1 : 0;
map.put("app", app);
if (activity != null) {
map.put("activity", activity);
}
mUiHandler.sendMessage(msg);
}
}
四.ANR的分析方法
由上面對ANR源碼的分析可以看出來,我們ANR可以生成traces.txt以及DropBox目錄下的ANR歷史紀錄,因此可以考慮閱讀該文件來分析,除此之外我們還有DDMS幫助我們分析ANR,這兩種方式實際上是大同小異的,只是應用的場景不同。在講ANR分析之前,我會先說Java應用的分析
1.Java線程調用分析方法
為什么要在講Android的ANR分析方法之前提到Java的分析方法呢,因為需要在解釋ANR之前稍微介紹一下線程狀態(tài)的概念,以便后面做敘述,同時也可以更方便的帶入分析的方法。JDK中有一個關鍵命令可以幫助我們分析和調試Java應用——jstack,命令的使用方法是
jstack {pid}
其中pid可以通過jps命令獲得,jps命令會列出當前系統(tǒng)中運行的所有Java虛擬機進程,比如這樣
wangchen15:~ wangchen$ jps
7249
7266 Test
7267 Jps
上面的命令可以發(fā)現(xiàn)系統(tǒng)中目前有7266和7267兩個Java虛擬機進程,此時如果我想知道當前Test進程的情況,就可以通過jstack命令來查看,jstack命令的輸出結果很簡單,它會打印出該進程中所有線程的狀態(tài)以及調用關系,甚至會給出一些簡單的分析結果
wangchen15:~ wangchen$ jstack 7266
2016-06-20 14:01:54
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.71-b01 mixed mode):
"Attach Listener" daemon prio=5 tid=0x00007fde7385d800 nid=0x3507 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"DestroyJavaVM" prio=5 tid=0x00007fde73873000 nid=0x1303 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Thread-1" prio=5 tid=0x00007fde73872800 nid=0x4a03 waiting for monitor entry [0x000000011cb30000]
java.lang.Thread.State: BLOCKED (on object monitor)
at Test.rightLeft(Test.java:48)
- waiting to lock <0x00000007d56540a0> (a Test$LeftObject)
- locked <0x00000007d5656180> (a Test$RightObject)
at Test$2.run(Test.java:68)
at java.lang.Thread.run(Thread.java:745)
"Thread-0" prio=5 tid=0x00007fde73871800 nid=0x4803 waiting for monitor entry [0x000000011ca2d000]
java.lang.Thread.State: BLOCKED (on object monitor)
at Test.leftRight(Test.java:34)
- waiting to lock <0x00000007d5656180> (a Test$RightObject)
- locked <0x00000007d56540a0> (a Test$LeftObject)
at Test$1.run(Test.java:60)
at java.lang.Thread.run(Thread.java:745)
"Service Thread" daemon prio=5 tid=0x00007fde73821000 nid=0x4403 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=5 tid=0x00007fde73035000 nid=0x4203 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=5 tid=0x00007fde7381e000 nid=0x4003 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=5 tid=0x00007fde7481d800 nid=0x300f runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=5 tid=0x00007fde73010000 nid=0x2d03 in Object.wait() [0x000000011aacb000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007d55047f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000007d55047f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" daemon prio=5 tid=0x00007fde7300f000 nid=0x2b03 in Object.wait() [0x000000011a9c8000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007d5504410> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000007d5504410> (a java.lang.ref.Reference$Lock)
"VM Thread" prio=5 tid=0x00007fde7300c800 nid=0x2903 runnable
"GC task thread#0 (ParallelGC)" prio=5 tid=0x00007fde74000800 nid=0x2103 runnable
"GC task thread#1 (ParallelGC)" prio=5 tid=0x00007fde7400c000 nid=0x2303 runnable
"GC task thread#2 (ParallelGC)" prio=5 tid=0x00007fde7400c800 nid=0x2503 runnable
"GC task thread#3 (ParallelGC)" prio=5 tid=0x00007fde7400d000 nid=0x2703 runnable
"VM Periodic Task Thread" prio=5 tid=0x00007fde7481e000 nid=0x4603 waiting on condition
JNI global references: 110
Found one Java-level deadlock:
=============================
"Thread-1":
waiting to lock monitor 0x00007fde73818ab8 (object 0x00000007d56540a0, a Test$LeftObject),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x00007fde73819f58 (object 0x00000007d5656180, a Test$RightObject),
which is held by "Thread-1"
Java stack information for the threads listed above:
===================================================
"Thread-1":
at Test.rightLeft(Test.java:48)
- waiting to lock <0x00000007d56540a0> (a Test$LeftObject)
- locked <0x00000007d5656180> (a Test$RightObject)
at Test$2.run(Test.java:68)
at java.lang.Thread.run(Thread.java:745)
"Thread-0":
at Test.leftRight(Test.java:34)
- waiting to lock <0x00000007d5656180> (a Test$RightObject)
- locked <0x00000007d56540a0> (a Test$LeftObject)
at Test$1.run(Test.java:60)
at java.lang.Thread.run(Thread.java:745)
Found 1 deadlock.
1) Thread基礎信息
輸出種包含所有的線程,取其中的一條
"Thread-1" prio=5 tid=0x00007fde73872800 nid=0x4a03 waiting for monitor entry [0x000000011cb30000]
java.lang.Thread.State: BLOCKED (on object monitor)
at Test.rightLeft(Test.java:48)
- waiting to lock <0x00000007d56540a0> (a Test$LeftObject)
- locked <0x00000007d5656180> (a Test$RightObject)
at Test$2.run(Test.java:68)
at java.lang.Thread.run(Thread.java:745)
a) "Thread-1" prio=5 tid=0x00007fde73872800 nid=0x4a03 waiting for monitor entry [0x000000011cb30000]
首先描述了線程名是『Thread-1』,然后prio=5表示優(yōu)先級,tid表示的是線程id,nid表示native層的線程id,他們的值實際都是一個地址,后續(xù)給出了對于線程狀態(tài)的描述,waiting for monitor entry [0x000000011cb30000]這里表示該線程目前處于一個等待進入臨界區(qū)狀態(tài),該臨界區(qū)的地址是[0x000000011cb30000]
這里對線程的描述多種多樣,簡單解釋下上面出現(xiàn)的幾種狀態(tài)
- waiting on condition(等待某個事件出現(xiàn))
- waiting for monitor entry(等待進入臨界區(qū))
- runnable(正在運行)
- in Object.wait(處于等待狀態(tài))
b) java.lang.Thread.State: BLOCKED (on object monitor)
這段是描述線程狀態(tài),我們知道Java的6種線程狀態(tài)定義在Thread.java中
//Thread.java
public class Thread implements Runnable {
...
public enum State {
/**
* The thread has been created, but has never been started.
*/
NEW,
/**
* The thread may be run.
*/
RUNNABLE,
/**
* The thread is blocked and waiting for a lock.
*/
BLOCKED,
/**
* The thread is waiting.
*/
WAITING,
/**
* The thread is waiting for a specified amount of time.
*/
TIMED_WAITING,
/**
* The thread has been terminated.
*/
TERMINATED
}
...
}
由此我們知道Thread-1這個線程的狀態(tài)是BLOCKED,根據(jù)線程狀態(tài)的描述知道它在等待一個鎖
c) at xxx 調用棧
at Test.rightLeft(Test.java:48)
- waiting to lock <0x00000007d56540a0> (a Test$LeftObject)
- locked <0x00000007d5656180> (a Test$RightObject)
at Test$2.run(Test.java:68)
at java.lang.Thread.run(Thread.java:745)
這段線程的調用棧,可以看到線程在我們執(zhí)行jstack命令的時候運行到Test.java的48行,而在68行到48行之間,線程對一個Test$RightObject上了鎖,并且目前在等待Test$LeftObject鎖
2) jstack分析信息
Found one Java-level deadlock:
=============================
"Thread-1":
waiting to lock monitor 0x00007fde73818ab8 (object 0x00000007d56540a0, a Test$LeftObject),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x00007fde73819f58 (object 0x00000007d5656180, a Test$RightObject),
which is held by "Thread-1"
Java stack information for the threads listed above:
===================================================
"Thread-1":
at Test.rightLeft(Test.java:48)
- waiting to lock <0x00000007d56540a0> (a Test$LeftObject)
- locked <0x00000007d5656180> (a Test$RightObject)
at Test$2.run(Test.java:68)
at java.lang.Thread.run(Thread.java:745)
"Thread-0":
at Test.leftRight(Test.java:34)
- waiting to lock <0x00000007d5656180> (a Test$RightObject)
- locked <0x00000007d56540a0> (a Test$LeftObject)
at Test$1.run(Test.java:60)
at java.lang.Thread.run(Thread.java:745)
說明中的信息很詳細,它認為我們的應用出現(xiàn)了一個Java層的死鎖,即Thread-1等待一個被Thread-0持有的鎖,Thread-0等待一個被Thread-1持有的鎖,實際上的確也是這樣,最后再來看看源代碼是不是這么回事
public class Test {
public static class LeftObject {
}
public static class RightObject {
}
private Object leftLock = new LeftObject();
private Object rightLock = new RightObject();
public void leftRight() {
synchronized (leftLock) {
try {
TimeUnit.SECONDS.sleep(3);
} catch (InterruptedException e) {
e.printStackTrace();
}
synchronized (rightLock) {
System.out.println("leftRight");
}
}
}
public void rightLeft() {
synchronized (rightLock) {
try {
TimeUnit.SECONDS.sleep(3);
} catch (InterruptedException e) {
e.printStackTrace();
}
synchronized (leftLock) {
System.out.println("leftRight");
}
}
}
public static void main(String[] args) {
final Test test = new Test();
new Thread(new Runnable() {
@Override
public void run() {
test.leftRight();
}
}).start();
new Thread(new Runnable() {
@Override
public void run() {
test.rightLeft();
}
}).start();
}
}
一目了然的故意死鎖行為。。。實際的死鎖可能比這個復雜,但是原理就是這樣了
2.DDMS分析ANR問題
有了上面的基礎,再來看看Android的ANR如何分析,Android的DDMS工具其實已經(jīng)給我們提供了一個類似于jstack命令的玩意,可以很好的讓我們調試的時候實時查看Android虛擬機的線程狀況。
1) 使用DDMS——Update Threads工具
使用DDMS的Update Threads工具可以分為如下幾步
- 選擇需要查看的進程
- 點擊Update Threads按鈕
- 在Threads視圖查看該進程的所有線程狀態(tài)
2) 閱讀Update Threads的輸出
Update Threads工具可以輸出當前進程的所有線程的狀態(tài),上半部分是線程列表,選中其中一條下半部分將展現(xiàn)出該線程當前的調用棧
a) 線程列表
上半部分種的線程列表分為好幾列,其中ID欄表示的序號,其中帶有『*』標志的是守護線程,Tid是線程號,Status表示線程狀態(tài),utime表示執(zhí)行用戶代碼的累計時間,stime表示執(zhí)行系統(tǒng)代碼的累計時間,Name表示的是線程名字。實際上utime還有stime他們具體的含義我也不是太清楚,不過這不影響我們分析問題,這里需要特別注意的是main線程啦,還有線程狀態(tài)。
b) main線程
main線程就是應用主線程啦,點擊上半部分線程列表選中main線程,我們可以發(fā)現(xiàn),絕大多數(shù)不操作應用的情況下,調用棧應該是如下樣式的
這是一個空閑等待狀態(tài),等待其他線程或者進程發(fā)送消息到主線程,再由主線程處理相應事件,具體可以看看老羅寫的一篇關于Looper和Handler的文章(http://blog.csdn.net/luoshengyang/article/details/6817933) 。 如果主線程在執(zhí)行過程中出現(xiàn)了問題,就會出現(xiàn)ANR,結合下面關于線程狀態(tài)的分析可以知道如果主線程的狀態(tài)是MONITOR一般肯定就是出現(xiàn)了ANR了。
c) 線程狀態(tài)
我們剛剛在分心Java線程狀態(tài)時明明只有6個狀態(tài),但是現(xiàn)在Android虛擬機給出的線程狀態(tài)超出了這6個的限制,這也是需要在源碼中尋找答案的,VMThread.java類中有這么一段代碼
/**
* Holds a mapping from native Thread statuses to Java one. Required for
* translating back the result of getStatus().
*/
static final Thread.State[] STATE_MAP = new Thread.State[] {
Thread.State.TERMINATED, // ZOMBIE
Thread.State.RUNNABLE, // RUNNING
Thread.State.TIMED_WAITING, // TIMED_WAIT
Thread.State.BLOCKED, // MONITOR
Thread.State.WAITING, // WAIT
Thread.State.NEW, // INITIALIZING
Thread.State.NEW, // STARTING
Thread.State.RUNNABLE, // NATIVE
Thread.State.WAITING, // VMWAIT
Thread.State.RUNNABLE // SUSPENDED
};
而且,native層的Thread.cpp中還有一段代碼
const char* dvmGetThreadStatusStr(ThreadStatus status)
{
switch (status) {
case THREAD_ZOMBIE: return "ZOMBIE";
case THREAD_RUNNING: return "RUNNABLE";
case THREAD_TIMED_WAIT: return "TIMED_WAIT";
case THREAD_MONITOR: return "MONITOR";
case THREAD_WAIT: return "WAIT";
case THREAD_INITIALIZING: return "INITIALIZING";
case THREAD_STARTING: return "STARTING";
case THREAD_NATIVE: return "NATIVE";
case THREAD_VMWAIT: return "VMWAIT";
case THREAD_SUSPENDED: return "SUSPENDED";
default: return "UNKNOWN";
}
}
由此我們可以看到Android虛擬機中有10種線程狀態(tài),對應于Java的6種線程狀態(tài),表格如下
Android線程狀態(tài)
Java線程狀態(tài)
ZOMBIE TERMINATED
RUNNABLE RUNNABLE
TIMED_WAIT TIMED_WAITING in Object.wait()
MONITOR BLOCKED(on a monitor)
WAIT WAITING
INITIALIZING NEW allocated not yet running
STARTING NEW started not yet on thread list
NATIVE RUNNABLE off in a JNI native method
VMWAIT WAITING waiting on a VM resource
SUSPENDED RUNNABLE suspended usually by GC or debugger
于是各自含義就清晰了。
3) 舉個梨子
package com.example.wangchen.androitest;
import android.os.Bundle;
import android.os.Environment;
import android.support.v7.app.AppCompatActivity;
import android.util.Log;
import android.view.View;
import android.widget.Button;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Button mBtn = (Button) findViewById(R.id.button);
mBtn.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
print();
}
});
}
public void print() {
BufferedReader bufferedReader = null;
String tmp = null;
try {
bufferedReader = new BufferedReader(new FileReader(new File(Environment.getExternalStorageDirectory() + "/test")));
while ((tmp = bufferedReader.readLine()) != null) {
Log.i("wangchen", tmp);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (bufferedReader != null) {
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
簡單的一個Activity,點擊按鈕時將讀取文件內容并進行打印到logcat,本身沒有什么大問題,但是在該Activity的按鈕被點擊時卻出現(xiàn)了未響應的情況
通過DDMS,我們查看到當前未響應時主線程一直處于如下調用狀態(tài)
at android.util.Log.println_native(Native Method)
at android.util.Log.i(Log.java:173)
at com.example.wangchen.androitest.MainActivity.print(MainActivity.java:37)
at com.example.wangchen.androitest.MainActivity$1.onClick(MainActivity.java:26)
at android.view.View.performClick(View.java:4446)
at android.view.View$PerformClick.run(View.java:18480)
at android.os.Handler.handleCallback(Handler.java:733)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:136)
at android.app.ActivityThread.main(ActivityThread.java:5314)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:864)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:680)
at dalvik.system.NativeStart.main(Native Method)
由上面對主線程的分析可以知道,正常情況下主線程應當是處于空閑等待狀態(tài),如果長時間處于處理某一個任務時就會導致其他被發(fā)送到主線程的事件無法被及時處理,導致ANR,實際上這里的test文件有30M,完全打印是非常耗時的,導致ANR也就理所當然了,所以對于文件讀寫操作還是建議在非主線程操作。
3.traces文件分析ANR問題
我們在開發(fā)調試過程中遇到ANR問題大多是可以通過DDMS方法來分析問題原因的,但是所有的ANR問題不一定會在開發(fā)階段出現(xiàn),如果在測試或者發(fā)版之后出現(xiàn)了ANR問題,那么就需要通過traces文件來分析。根據(jù)之前的分析我們知道,traces文件位于/data/anr目錄下,即便是沒有root的手機也是可以通過adb命令將該文件pull出來,一個traces文件中包含了出現(xiàn)ANR時當前系統(tǒng)的所有活動進程的情況,其中每一個進程會包含所有線程的情況,因此文件的內容量往往比較大。但是一般造成ANR的進程會被記錄在頭一段,因此盡可能詳細的分析頭一段進程是解析traces文件的重要方法。
1) 文件內容解析
在traces文件中我們會看到很多段類似于如下文本的內容,其中每一段是一個進程,N段表示N個進程,共同描述了出現(xiàn)ANR時系統(tǒng)進程的狀況
----- pid 4280 at 2016-05-30 00:17:13 -----
Cmd line: com.quicinc.cne.CNEService
Build fingerprint: 'Xiaomi/virgo/virgo:6.0.1/MMB29M/6.3.21:user/release-keys'
ABI: 'arm'
Build type: optimized
Zygote loaded classes=4124 post zygote classes=18
Intern table: 51434 strong; 17 weak
JNI: CheckJNI is off; globals=286 (plus 277 weak)
Libraries: /system/lib/libandroid.so /system/lib/libcompiler_rt.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/libmiuinative.so /system/lib/libsechook.so /system/lib/libwebviewchromium_loader.so libjavacore.so (9)
Heap: 50% free, 16MB/33MB; 33690 objects
Dumping cumulative Gc timings
Total number of allocations 33690
Total bytes allocated 16MB
Total bytes freed 0B
Free memory 16MB
Free memory until GC 16MB
Free memory until OOME 111MB
Total memory 33MB
Max memory 128MB
Zygote space size 1624KB
Total mutator paused time: 0
Total time waiting for GC to complete: 0
Total GC count: 0
Total GC time: 0
Total blocking GC count: 0
Total blocking GC time: 0
suspend all histogram: Sum: 102us 99% C.I. 3us-25us Avg: 8.500us Max: 25us
DALVIK THREADS (10):
"Signal Catcher" daemon prio=5 tid=2 Runnable
| group="system" sCount=0 dsCount=0 obj=0x12c470a0 self=0xaeb8b000
| sysTid=4319 nice=0 cgrp=default sched=0/0 handle=0xb424f930
| state=R schedstat=( 111053493 34114006 33 ) utm=6 stm=5 core=0 HZ=100
| stack=0xb4153000-0xb4155000 stackSize=1014KB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 00370e89 /system/lib/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, char const*, art::ArtMethod*, void*)+160)
native: #01 pc 003504f7 /system/lib/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+150)
native: #02 pc 0035a3fb /system/lib/libart.so (art::DumpCheckpoint::Run(art::Thread*)+442)
native: #03 pc 0035afb9 /system/lib/libart.so (art::ThreadList::RunCheckpoint(art::Closure*)+212)
native: #04 pc 0035b4e7 /system/lib/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+142)
native: #05 pc 0035bbf7 /system/lib/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+334)
native: #06 pc 00333d3f /system/lib/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+74)
native: #07 pc 0033b0a5 /system/lib/libart.so (art::SignalCatcher::HandleSigQuit()+928)
native: #08 pc 0033b989 /system/lib/libart.so (art::SignalCatcher::Run(void*)+340)
native: #09 pc 0003f54f /system/lib/libc.so (__pthread_start(void*)+30)
native: #10 pc 00019c2f /system/lib/libc.so (__start_thread+6)
(no managed stack frames)
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x7541b3c0 self=0xb4cf6500
| sysTid=4280 nice=-1 cgrp=default sched=0/0 handle=0xb6f5cb34
| state=S schedstat=( 52155108 81807757 159 ) utm=2 stm=3 core=0 HZ=100
| stack=0xbe121000-0xbe123000 stackSize=8MB
| held mutexes=
native: #00 pc 00040984 /system/lib/libc.so (__epoll_pwait+20)
native: #01 pc 00019f5b /system/lib/libc.so (epoll_pwait+26)
native: #02 pc 00019f69 /system/lib/libc.so (epoll_wait+6)
native: #03 pc 00012c57 /system/lib/libutils.so (android::Looper::pollInner(int)+102)
native: #04 pc 00012ed3 /system/lib/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+130)
native: #05 pc 00082bed /system/lib/libandroid_runtime.so (android::NativeMessageQueue::pollOnce(_JNIEnv*, _jobject*, int)+22)
native: #06 pc 0000055d /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+96)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:323)
at android.os.Looper.loop(Looper.java:135)
at android.app.ActivityThread.main(ActivityThread.java:5435)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:735)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:625)
// ...
----- end 4280 -----
a) 進程頭部信息
----- pid 4280 at 2016-05-30 00:17:13 -----
Cmd line: com.quicinc.cne.CNEService
這里記錄了出現(xiàn)ANR時該進程的pid號以及當前時間,進程名是com.quicinc.cne.CNEService
b) 進程資源狀態(tài)信息
Build fingerprint: 'Xiaomi/virgo/virgo:6.0.1/MMB29M/6.3.21:user/release-keys'
ABI: 'arm'
Build type: optimized
Zygote loaded classes=4124 post zygote classes=18
Intern table: 51434 strong; 17 weak
JNI: CheckJNI is off; globals=286 (plus 277 weak)
Libraries: /system/lib/libandroid.so /system/lib/libcompiler_rt.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/libmiuinative.so /system/lib/libsechook.so /system/lib/libwebviewchromium_loader.so libjavacore.so (9)
Heap: 50% free, 16MB/33MB; 33690 objects
Dumping cumulative Gc timings
Total number of allocations 33690
Total bytes allocated 16MB
Total bytes freed 0B
Free memory 16MB
Free memory until GC 16MB
Free memory until OOME 111MB
Total memory 33MB
Max memory 128MB
Zygote space size 1624KB
Total mutator paused time: 0
Total time waiting for GC to complete: 0
Total GC count: 0
Total GC time: 0
Total blocking GC count: 0
Total blocking GC time: 0
這里打印了一大段關于硬件狀態(tài)的信息,雖然目前我還沒用到這里的信息,不過我覺得在某些時候這里的數(shù)據(jù)是會有作用的
c) 每條線程的信息
"main" prio=5 tid=1 Native // 輸出了線程名,優(yōu)先級,線程號,線程狀態(tài),帶有『deamon』字樣的線程表示守護線程,即DDMS中『*』線程
| group="main" sCount=1 dsCount=0 obj=0x7541b3c0 self=0xb4cf6500 // 輸出了線程組名,sCount被掛起次數(shù),dsCount被調試器掛起次數(shù),obj表示線程對象的地址,self表示線程本身的地址
| sysTid=4280 nice=-1 cgrp=default sched=0/0 handle=0xb6f5cb34 // sysTid是Linux下的內核線程id,nice是線程的調度優(yōu)先級,sched分別標志了線程的調度策略和優(yōu)先級,cgrp是調度屬組,handle是線程的處理函數(shù)地址。
| state=S schedstat=( 52155108 81807757 159 ) utm=2 stm=3 core=0 HZ=100 // state是調度狀態(tài);schedstat從 /proc/[pid]/task/[tid]/schedstat讀出,三個值分別表示線程在cpu上執(zhí)行的時間、線程的等待時間和線程執(zhí)行的時間片長度,有的android內核版本不支持這項信息,得到的三個值都是0;utm是線程用戶態(tài)下使用的時間值(單位是jiffies);stm是內核態(tài)下的調度時間值;core是最后執(zhí)行這個線程的cpu核的序號。
| stack=0xbe121000-0xbe123000 stackSize=8MB
| held mutexes=
native: #00 pc 00040984 /system/lib/libc.so (__epoll_pwait+20)
native: #01 pc 00019f5b /system/lib/libc.so (epoll_pwait+26)
native: #02 pc 00019f69 /system/lib/libc.so (epoll_wait+6)
native: #03 pc 00012c57 /system/lib/libutils.so (android::Looper::pollInner(int)+102)
native: #04 pc 00012ed3 /system/lib/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+130)
native: #05 pc 00082bed /system/lib/libandroid_runtime.so (android::NativeMessageQueue::pollOnce(_JNIEnv*, _jobject*, int)+22)
native: #06 pc 0000055d /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+96)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:323)
at android.os.Looper.loop(Looper.java:135)
at android.app.ActivityThread.main(ActivityThread.java:5435)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:735)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:625)
該有的描述上面已經(jīng)用注釋的方式放入了,最后的部分是調用棧信息。
2) 再舉個梨子
----- pid 12838 at 2016-05-30 10:41:04 -----
Cmd line: 略
// 進程狀態(tài)信息省略
suspend all histogram: Sum: 1.456ms 99% C.I. 3us-508.799us Avg: 97.066us Max: 523us
DALVIK THREADS (19):
"Signal Catcher" daemon prio=5 tid=2 Runnable
| group="system" sCount=0 dsCount=0 obj=0x32c02100 self=0xb82f1d40
| sysTid=12843 nice=0 cgrp=bg_non_interactive sched=0/0 handle=0xb39ec930
| state=R schedstat=( 10914800 1156480 11 ) utm=0 stm=0 core=2 HZ=100
| stack=0xb38f0000-0xb38f2000 stackSize=1014KB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 00371069 /system/lib/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiPKcPNS_9ArtMethodEPv+160)
native: #01 pc 003508c3 /system/lib/libart.so (_ZNK3art6Thread4DumpERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+150)
native: #02 pc 0035a5bb /system/lib/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+442)
native: #03 pc 0035b179 /system/lib/libart.so (_ZN3art10ThreadList13RunCheckpointEPNS_7ClosureE+212)
native: #04 pc 0035b6a7 /system/lib/libart.so (_ZN3art10ThreadList4DumpERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+142)
native: #05 pc 0035bdb7 /system/lib/libart.so (_ZN3art10ThreadList14DumpForSigQuitERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+334)
native: #06 pc 00331179 /system/lib/libart.so (_ZN3art7Runtime14DumpForSigQuitERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+72)
native: #07 pc 0033b27d /system/lib/libart.so (_ZN3art13SignalCatcher13HandleSigQuitEv+928)
native: #08 pc 0033bb61 /system/lib/libart.so (_ZN3art13SignalCatcher3RunEPv+340)
native: #09 pc 00041737 /system/lib/libc.so (_ZL15__pthread_startPv+30)
native: #10 pc 00019433 /system/lib/libc.so (__start_thread+6)
(no managed stack frames)
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 obj=0x759002c0 self=0xb737fee8
| sysTid=12838 nice=-1 cgrp=bg_non_interactive sched=0/0 handle=0xb6f1eb38
| state=S schedstat=( 743081924 64813008 709 ) utm=50 stm=23 core=4 HZ=100
| stack=0xbe54e000-0xbe550000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/12838/stack)
native: #00 pc 00016aa4 /system/lib/libc.so (syscall+28)
native: #01 pc 000f739d /system/lib/libart.so (_ZN3art17ConditionVariable4WaitEPNS_6ThreadE+96)
native: #02 pc 002bcd8d /system/lib/libart.so (_ZN3art7Monitor4LockEPNS_6ThreadE+408)
native: #03 pc 002bed73 /system/lib/libart.so (_ZN3art7Monitor4WaitEPNS_6ThreadExibNS_11ThreadStateE+922)
native: #04 pc 002bfbaf /system/lib/libart.so (_ZN3art7Monitor4WaitEPNS_6ThreadEPNS_6mirror6ObjectExibNS_11ThreadStateE+142)
native: #05 pc 002d1403 /system/lib/libart.so (_ZN3artL11Object_waitEP7_JNIEnvP8_jobject+38)
native: #06 pc 0000036f /data/dalvik-cache/arm/system@framework@boot.oat (Java_java_lang_Object_wait__+74)
at java.lang.Object.wait!(Native method)
- waiting to lock <0x0520de84> (a java.lang.Object) held by thread 22
at com.xx(unavailable:-1)
- locked <0x00e3266d>
- locked <0x0520de84> (a java.lang.Object)
at com.xx.R(unavailable:-1)
at com.xx.ux(unavailable:-1)
// 其余棧略
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:684)
// 其他線程省略
"Thread-654" prio=5 tid=22 Blocked
| group="main" sCount=1 dsCount=0 obj=0x32c027c0 self=0xb83e9750
| sysTid=12891 nice=0 cgrp=bg_non_interactive sched=0/0 handle=0x9cf1c930
| state=S schedstat=( 50601200 1215760 62 ) utm=4 stm=0 core=7 HZ=100
| stack=0x9ce1a000-0x9ce1c000 stackSize=1038KB
| held mutexes=
at com.yy(unavailable:-1)
- waiting to lock <0x00e3266d> held by thread 1
at com.yy.MX(unavailable:-1)
at com.yy.run(unavailable:-1)
- locked <0x0520de84> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:833)
從traces文件種可以很明顯的看到我們的主線程處于Blcoked狀態(tài),詳細查看Blcoked的原因知道,它在等待一個被22號線程持有的對象鎖,于是我們查看tid=22的線程,可以看出這個線程的確鎖住了一個對象,該對象正是主線程正在等待上鎖的對象,那這個線程為何沒有釋放鎖呢,因為它在等一個被1號線程持有的對象鎖,因此死鎖問題導致了ANR現(xiàn)象。
五.總結
略