文章有錯誤之處,歡迎批評指正!
什么是ANR
在Android中,如果主線程被長時間阻塞,導致無法響應用戶的操作,即造成ANR(Application Not Responding)。通常的表現是彈出一個應用無響應的對話框,讓用戶選擇強制退出或者等待。
注意:主線程做耗時操作本身是不會產生ANR的,導致ANR的根本還是應用程序無法在一定時間內響應用戶的操作。因為主線程被耗時操作阻塞了,主線程無法對下一個操作進行響應才會ANR,沒有需要響應的操作自然就不會產生ANR,或者應該這樣說:主線程做耗時操作,非常容易引發ANR。
ANR的類型
KeyDispatch Timeout :按鍵或觸摸事件在特定時間內無響應。超時時間5秒。超時時間是在ActivityManagerService
類中定義的。
// How long we wait until we timeout on key dispatching.
static final int KEY_DISPATCHING_TIMEOUT = 5*1000;
Broadcast Timeout :BroadcastReceiver在特定時間內無法處理完成。前臺廣播10秒,后臺廣播60秒。超時時間是在ActivityManagerService
類中定義的。
// How long we allow a receiver to run before giving up on it.
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;
Service Timeout :Service在特定的時間內生命周期函數無法處理完成。前臺服務20秒,后臺服務200秒。超時時間是在ActiveServices
類中定義的。
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
ContentProvider Timeout :ContentProvider在特定的時間內沒有完成發布。超時時間10秒。超時時間是在ActivityManagerService
類中定義的。
// How long we wait for an attached process to publish its content providers
// before we decide it must be hung.
static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10*1000;
關于ANR類型的詳細信息不在本篇文章的敘述范圍之內,請自行查閱資料。
造成ANR的常見原因
- 應用在主線程上進行長時間的計算。
- 應用在主線程上執行耗時的I/O的操作。
- 主線程處于阻塞狀態,等待獲取鎖。
- 主線程與其他線程之間發生死鎖。
- 主線程在對另一個進程進行同步Binder調用,而后者需要很長時間才能返回。(如果我們知道調用遠程方法需要很長時間,我們應該避免在主線程調用)
上述原因都會造成主線程被長時間阻塞,導致無法響應用戶的操作,從而造成ANR。
ANR原因排查
ANR發生以后,在Logcat中有相應的日志輸出,并且會在/data/anr/
目錄下輸出一個traces.tx
文件,該文件記錄了ANR的更加詳細的信息,我們可以導出分析。接下來我們就依次模擬上述5種方式來制造ANR,然后分析產生的Logcat和traces.txt文件。
測試環境:Android Studio 3.6.1
測試手機: HUAWEI MLA-AL10,Android版本: 7.0
1.應用在主線程上進行長時間的計算
//使用冒泡排序對一個大數組排序
private fun sortBigArray() {
val currTime = System.currentTimeMillis()
val random = IntArray(1000000)
for (i in random.indices) {
random[i] = (Math.random() * 10000000).toInt()
}
BubbleSort.sort(random)
println("耗時" + (System.currentTimeMillis() - currTime) + "ms")
for (i in random.indices) {
println(random[i].toString())
}
}
我們點擊一個按鈕調用sortBigArray()方法,內部調用BubbleSort類的sort()方法對一個大數組(100萬)進行排序,然后點擊幾次返回鍵,然后就出現ANR了。
我們先看一下Logcat日志輸出
//debug級別日志
2020-06-03 21:20:24.209 com.example.android.jetpackdemo I/art: Wrote stack traces to '/data/anr/traces.txt'
//error級別日志
2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 15564
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.7 / 7.48 / 7.35
CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
0.7% 2661/com.android.systemui: 0.6% user + 0.1% kernel / faults: 1648 minor 1 major
0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 21 minor
0.7% 24463/com.huawei.hwid.persistent: 0.6% user + 0% kernel / faults: 4650 minor 1 major
0.5% 4018/com.huawei.android.launcher: 0.4% user + 0% kernel / faults: 16025 minor 3 major
0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
0.4% 28932/com.huawei.appmarket: 0.3% user + 0% kernel / faults: 2526 minor
//...
2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
99% 15564/oid.jetpackdemo: 99% user + 0% kernel
7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
1.8% 2862/Binder:2001_6: 0% user + 1.8% kernel
1.8% 3089/Binder:2001_7: 1.8% user + 0% kernel
5.3% 29428/adbd: 0% user + 5.3% kernel / faults: 480 minor
3.5% 29430/->transport: 0% user + 3.5% kernel
1.7% 29428/adbd: 0% user + 1.7% kernel
1.3% 53/rcuop/6: 0% user + 1.3% kernel
16% TOTAL: 14% user + 2.1% kernel + 0.2% irq + 0.2% softirq
在上面的日志中輸出了堆棧信息的保存在 /data/anr/traces.txt
文件中。
com.example.android.jetpackdemo I/dalvikvm: Wrote stack traces to '/data/anr/traces.txt'
發生ANR進程的包名信息,所在的類,進程id和ANR的類型
2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 15564
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
包名com.example.android.jetpackdemo
,具體的類com.example.android.jetpackdemo.StartActivity
,進程號是PID: 15564
,ANR的類型是Input dispatching timed out
。
CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
//...
2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
99% 15564/oid.jetpackdemo: 99% user + 0% kernel
7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
//...
注意:
在ANR發生之前,2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139,這段時間CPU的使用并不高。
在ANR發生的時候,2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389,這段時間CPU的使用相當高,已經達到99%了。
99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel
- 99%:內存占用率
- 15564/com.example.android.jetpackdemo:進程id和進程名。
這兩段CPU 信息分別代表ANR發生前和ANR時的CPU占用率,在輸出的CPU使用信息中我們也可以看出一些端倪,我們注意到我們的進程CPU的占用率比較高,說明我們的進程比較忙碌,這里需要說明一下,進程忙碌并不一定代表主線程忙碌,也可能是進程中的后臺線程忙碌。
但是現在我們雖然知道了ANR發生的所在的類,但是如何精確定位到具體的哪一行代碼呢?這就需要分析發生ANR的時候保存的traces.txt文件了。
導出traces文件
使用adb命令導出traces.txt文件
adb pull /data/anr/traces.txt traces_1.txt
/data/anr/traces.txt: 1 file pulled, 0 skipped. 28.5 MB/s (701726 bytes in 0.023s)
如果入到permission相關問題,請使用bugreport命令導出,參考 Capture and read bug reports。
traces.txt部分信息
----- pid 15564 at 2020-06-03 21:20:24 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
ABI: 'arm64'
//...
在traces.txt文件的最頂部,首先輸出的是發生ANR的進程號和包名信息,然后我們可以在traces.txt中搜索我們的進程號或者包名。
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
//...
我們首先看一下和線程相關的部分信息。
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
線程基本信息:
線程名:main
線程優先級:prio=5,優先級取值范圍[1,10],詳見
Thread
類:
//最小取值
public final static int MIN_PRIORITY = 1;
//默認優先級
public final static int NORM_PRIORITY = 5;
//最大取值
public final static int MAX_PRIORITY = 10;
- 線程id: tid=1,1代表主線程
- 線程狀態:Runnable,狀態取值如下,詳見
Thread.State
枚舉類:
NEW, //線程還沒啟動
RUNNABLE, //正在執行
BLOCKED, //等待獲取鎖
WAITING, //等待其他線程執行一個特定的動作,比如說調用Object.notify或Object.notifyAll()
TIMED_WAITING, //等待一定時間
TERMINATED //執行完畢
- 線程組名稱:group="main"
- 線程被掛起的次數:sCount=0
- 線程被調試器掛起的次數:dsCount=0
- 線程的java的對象地址:obj= 0x77d21af8
- 線程本身的Native對象地址:self= 0x7fa2ea2a00
線程調度信息:
- Linux系統中內核線程id: sysTid= 15564 與進程號相同
- 線程調度優先級:nice=-10,詳細信息可參考 淺析Linux線程調度
- 線程調度組:cgrp=default
- 線程調度策略和優先級:sched=0/0
- 線程處理函數地址:handle= 0x7fa6f4ba98
線程的堆棧信息:
- 堆棧地址和大小:stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
held mutexes:
- held mutexes 到底是什么意思我沒有找到官方的文檔解釋,網上大多數關于held mutexes的解釋也都是一筆帶過沒有實際參考意義,我們這里先忽略這個東西,并不會影響我們排查問題。
從上面traces.txt文件中這段信息可以看出,導致ANR的最終原因是在BubbleSort.java的第45行。
at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
at java.lang.reflect.Method.invoke!(Native method)
2.應用在主線程上執行耗時的I/O的操作
/**
* 拷貝文件,注意要有讀寫權限
*/
private fun doIo() {
val prePath = Environment.getExternalStorageDirectory().path
val file = File("${prePath}/test/View.java")
if (file.exists()) {
Log.d(TAG, "doIo: ${file.length()}")
val reader = FileReader(file)
val fileWriter = FileWriter("${prePath}/test/ViewCopy.java", true)
for (index in 0 until 5) {
var count: Int
while (reader.read().also { count = it } != -1) {
fileWriter.write(count)
}
try {
reader.reset()
} catch (e: IOException) {
Log.d(TAG, "doIo: error ${e.message}")
}
}
}
}
調用doIo()方法以后,多次點擊返回鍵,制造ANR。
Logcat日志輸出
2020-06-04 21:05:24.462 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 16295
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.49 / 7.45 / 7.24
CPU usage from 87491ms to 0ms ago (2020-06-04 21:03:53.035 to 2020-06-04 21:05:20.526):
7.8% 2001/system_server: 6.1% user + 1.7% kernel / faults: 34095 minor 3 major
4.2% 28932/com.huawei.appmarket: 3.7% user + 0.5% kernel / faults: 12314 minor 5 major
2.8% 2661/com.android.systemui: 2.2% user + 0.5% kernel / faults: 4222 minor 1 major
2% 412/msm-core:sampli: 0% user + 2% kernel
1.7% 24463/com.huawei.hwid.persistent: 1.5% user + 0.1% kernel / faults: 3317 minor 1 major
1.5% 607/surfaceflinger: 1% user + 0.5% kernel / faults: 24 minor
//...
2020-06-04 21:05:24.462 ? E/ActivityManager: CPU usage from 1696ms to 2226ms later (2020-06-04 21:05:22.222 to 2020-06-04 21:05:22.752):
84% 16295/com.example.android.jetpackdemo: 84% user + 0% kernel / faults: 562 minor 1 major
68% 16295/oid.jetpackdemo: 68% user + 0% kernel
12% 16317/RenderThread: 12% user + 0% kernel
1.8% 16307/HeapTaskDaemon: 1.8% user + 0% kernel
+0% 16461/DeferredSaveThr: 0% user + 0% kernel
9.1% 2001/system_server: 1.8% user + 7.3% kernel / faults: 7 minor
7.3% 2014/ActivityManager: 0% user + 7.3% kernel
3.6% 2536/Binder:2001_3: 3.6% user + 0% kernel
5.5% 607/surfaceflinger: 2.7% user + 2.7% kernel
1.3% 607/surfaceflinger: 1.3% user + 0% kernel
1.3% 658/Binder:607_1: 0% user + 1.3% kernel
4.3% 2661/com.android.systemui: 2.9% user + 1.4% kernel / faults: 26 minor
4.3% 3614/RenderThread: 2.9% user + 1.4% kernel
1.4% 2661/ndroid.systemui: 1.4% user + 0% kernel
1.3% 25/rcuop/2: 0% user + 1.3% kernel
1.3% 339/irq/171-tsens_i: 0% user + 1.3% kernel
1.5% 11851/mdss_fb0: 0% user + 1.5% kernel
1.6% 14246/kworker/u16:5: 0% user + 1.6% kernel
1.6% 16318/kworker/u16:4: 0% user + 1.6% kernel
15% TOTAL: 13% user + 1.8% kernel
從上面的日志信息中我們也看出來發生ANR的時候,我們的進程com.example.android.jetpackdemo
CPU占用率是比較高的,說明我們進程內存在比較忙碌的線程。然后我們繼續看一下對應的traces.txt文件。
traces.txt部分信息
----- pid 16295 at 2020-06-04 21:05:20 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通過進程號pid 16295搜索
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=16295 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=R schedstat=( 16406184130 12254163 407 ) utm=1630 stm=10 core=6 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 0000000000478088 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
native: #01 pc 0000000000478084 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
native: #02 pc 000000000044c604 /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+524)
native: #03 pc 0000000000463f60 /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
native: #04 pc 000000000044d510 /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
native: #05 pc 00000000000ff870 /system/lib64/libart.so (_ZN3art27ScopedObjectAccessUncheckedD2Ev+576)
native: #06 pc 000000000010a764 /system/lib64/libart.so (_ZN3art8CheckJNI23GetPrimitiveArrayRegionEPKcNS_9Primitive4TypeEP7_JNIEnvP7_jarrayiiPv+1164)
native: #07 pc 0000000000022ee4 /system/lib64/libjavacore.so (???)
native: #08 pc 00000000004747a8 /data/dalvik-cache/arm64/system@framework@boot-core-libart.oat (Java_libcore_icu_NativeConverter_encode__J_3CI_3BI_3IZ+244)
at libcore.icu.NativeConverter.encode(Native method)
at java.nio.charset.CharsetEncoderICU.encodeLoop(CharsetEncoderICU.java:169)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:579)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:271)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
- locked <0x05b5279d> (a java.io.FileWriter)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:113)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
我們重點看一下這段信息
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)
從上面這段信息可以看出,導致ANR的最終原因是在OutputStreamWriter.java的第194行。而我們的代碼出問題的地方是StartActivity.kt的116行。
3.主線程處于阻塞狀態,等待獲取鎖
//鎖資源
val lockedResource: Any = Any()
fun onClick(v: View) {
when (v.id) {
R.id.btnWaitLockedResource -> {
LockTask().execute(arrayListOf<Int>())
Log.d(TAG, "onClick: 主線程先睡眠一會,避免先獲取到鎖")
Thread.sleep(200)
Log.d(TAG, "onClick: 主線程先睡眠結束,嘗試獲取鎖")
synchronized(lockedResource) {
for (index in 0 until 10) {
Log.d(TAG, "onClick: 主線程獲取到鎖了$index")
}
}
}
}
}
//LockTask后臺線程
inner class LockTask : AsyncTask<MutableList<Int>, Int, Unit>() {
override fun doInBackground(vararg params: MutableList<Int>) =
synchronized(lockedResource) {
val list = params[0]
for (i in 0 until 1000000) {
list.add((Math.random() * 10000000).toInt())
}
list.forEach {
Log.d(TAG, "doInBackground: for each element is $it")
}
}
}
調用onClick()方法以后,先讓后臺線程獲取鎖,然后主線程再嘗試獲取鎖。然后多次點擊返回鍵,制造ANR。
Logcat日志輸出
2020-06-04 09:55:04.396 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 20008
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 8.27 / 7.73 / 7.37
CPU usage from 83152ms to 0ms ago (2020-06-04 09:53:36.842 to 2020-06-04 09:54:59.995) with 99% awake:
19% 508/logd: 15% user + 3.5% kernel / faults: 533 minor 1 major
5.5% 2001/system_server: 3.9% user + 1.5% kernel / faults: 10843 minor 7 major
4.9% 28932/com.huawei.appmarket: 4.3% user + 0.6% kernel / faults: 13003 minor 79 major
2.6% 2661/com.android.systemui: 2.2% user + 0.3% kernel / faults: 7158 minor 2 major
1.5% 607/surfaceflinger: 0.9% user + 0.6% kernel / faults: 190 minor 1 major
1.2% 24307/logcat: 0.7% user + 0.4% kernel
0.8% 11161/com.android.settings: 0.6% user + 0.1% kernel / faults: 9084 minor 20 major
0.6% 24305/logcat: 0.2% user + 0.3% kernel
0.4% 24301/fingerprint_log: 0% user + 0.4% kernel
0.3% 15363/kworker/u16:10: 0% user + 0.3% kernel
0.2% 6831/kworker/u16:5: 0% user + 0.2% kernel
0.2% 837/imonitor: 0% user + 0.1% kernel
//...
2020-06-04 09:55:04.396 ? E/ActivityManager: CPU usage from 2211ms to 2742ms later (2020-06-04 09:55:02.206 to 2020-06-04 09:55:02.737):
105% 20008/com.example.android.jetpackdemo: 92% user + 13% kernel / faults: 220 minor
99% 20096/AsyncTask #1: 86% user + 13% kernel
5.6% 20019/HeapTaskDaemon: 5.6% user + 0% kernel
103% 508/logd: 99% user + 3.7% kernel / faults: 8 minor
92% 24315/logd.reader.per: 92% user + 0% kernel
7.5% 511/logd.writer: 5.6% user + 1.8% kernel
3.7% 24314/logd.reader.per: 0% user + 3.7% kernel
1.8% 24313/logd.reader.per: 0% user + 1.8% kernel
11% 2661/com.android.systemui: 11% user + 0% kernel / faults: 52 minor
9.3% 3614/RenderThread: 7.5% user + 1.8% kernel
1.8% 2661/ndroid.systemui: 1.8% user + 0% kernel
9.3% 607/surfaceflinger: 9.3% user + 0% kernel
3.7% 607/surfaceflinger: 3.7% user + 0% kernel
1.8% 2614/Binder:607_4: 0% user + 1.8% kernel
5.6% 2001/system_server: 1.8% user + 3.7% kernel / faults: 2 minor
5.6% 2014/ActivityManager: 0% user + 5.6% kernel
3.3% 19794/adbd: 1.6% user + 1.6% kernel / faults: 147 minor
1.6% 19794/adbd: 0% user + 1.6% kernel
1.6% 19796/->transport: 0% user + 1.6% kernel
1.6% 19797/<-transport: 0% user + 1.6% kernel
3.4% 24307/logcat: 0% user + 3.4% kernel
1.3% 624/mm-pp-dpps: 0% user + 1.3% kernel
1.3% 717/ABA_THREAD: 1.3% user + 0% kernel
1.6% 18971/kworker/0:2: 0% user + 1.6% kernel
1.6% 18974/kworker/u16:0: 0% user + 1.6% kernel
1.6% 19095/mdss_fb0: 0% user + 1.6% kernel
1.7% 24301/fingerprint_log: 1.7% user + 0% kernel
1.7% 24305/logcat: 1.7% user + 0% kernel
31% TOTAL: 26% user + 4% kernel + 0.2% irq + 0.2% softirq
從上面的日志信息中我們也看出來我們的進程CPU占用率是比較高的,說明我們進程內存在比較忙碌的線程。然后我們繼續看一下對應的traces.txt文件。
traces.txt部分信息
----- pid 20008 at 2020-06-04 09:55:00 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通過進程號pid 20008搜索
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=20008 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 278831875 7233747 156 ) utm=22 stm=5 core=0 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
- waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
關鍵信息
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
- waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16
在StartActivity的61行,在等待一個鎖對象<0x0f8c80b0>
,該對象是一個Object
對象(a java.lang.Object),這個鎖對象正在被線程id為16的線程持有。那么我們下面在traces.txt文件中搜索一下這個鎖對象<0x0f8c80b0>
。如下所示:
DALVIK THREADS (16):
"AsyncTask #1" prio=5 tid=16 Runnable
| group="main" sCount=0 dsCount=0 obj=0x12cd61f0 self=0x7f93187200
| sysTid=20096 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x7f84346450
| state=R schedstat=( 13814173056 6030204 1355 ) utm=1193 stm=188 core=3 HZ=100
| stack=0x7f84244000-0x7f84246000 stackSize=1037KB
| held mutexes= "mutator lock"(shared held)
at java.lang.Integer.stringSize(Integer.java:414)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:630)
at java.lang.StringBuilder.append(StringBuilder.java:220)
at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
- locked <0x0f8c80b0> (a java.lang.Object)
at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:99)
at android.os.AsyncTask$2.call(AsyncTask.java:316)
at java.util.concurrent.FutureTask.run(FutureTask.java:237)
at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:255)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
at java.lang.Thread.run(Thread.java:776)
關鍵信息
at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
- locked <0x0f8c80b0> (a java.lang.Object)
我們看到正是這個AsyncTask在107行持有鎖對象0x0f8c80b0
,導致主線程無法獲取鎖而阻塞,最終導致ANR。
4.主線程與其他線程之間發生死鎖
val resourceFirst = "resourceFirst"
val resourceSecond = "resourceSecond"
private fun mockDeadLock() {
//啟動一個后臺線程
thread(start = false) {
synchronized(resourceSecond) {
Log.d(TAG, "工作線程獲取了鎖 resourceSecond")
Thread.sleep(100)
Log.d(TAG, "工作線程嘗試獲取鎖 resourceFirst")
synchronized(resourceFirst) {
while (true) {
Log.d(TAG, "工作線程 mockDeadLock")
}
}
}
}.start()
//主線程睡眠30ms后開始獲取鎖
Thread.sleep(30)
synchronized(resourceFirst) {
Log.d(TAG, "主線程獲取了鎖 resourceFirst")
Log.d(TAG, "主線程嘗試獲取鎖 resourceSecond")
synchronized(resourceSecond) {
Log.d(TAG, "主線程獲取了鎖 resourceFirst")
while (true) {
Log.d(TAG, "主線程 mockDeadLock")
}
}
}
}
上面這段代碼邏輯:
- 工作線程先獲取鎖
resourceSecond
,然后睡眠100ms保證主線程能獲取到鎖resourceFirst
。 - 主線程睡眠30ms后先獲取鎖
resourceFirst
,然后再嘗試獲取鎖resourceSecond
,這時候是獲取不到的,因為工作線程已經持有鎖resourceSecond
并且不釋放。 - 工作線程睡眠結束以后嘗試獲取鎖
resourceFirst
,這時候是獲取不到的,因為主線程持有了鎖resourceFirst
并且不釋放。 - 最終,造成死鎖。
調用mockDeadLock()方法以后,多次點擊返回鍵,制造ANR。
Logcat輸出
2020-06-04 15:07:41.246 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
PID: 13626
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.53 / 6.81 / 6.4
CPU usage from 177565ms to 0ms ago (2020-06-04 15:04:39.715 to 2020-06-04 15:07:37.281):
11% 2001/system_server: 7.1% user + 4.4% kernel / faults: 68219 minor 37 major
3.4% 2661/com.android.systemui: 2.8% user + 0.6% kernel / faults: 20555 minor 29 major
2% 508/logd: 0.9% user + 1.1% kernel / faults: 76 minor
1.8% 607/surfaceflinger: 1.1% user + 0.7% kernel / faults: 82 minor 1 major
0% 24463/com.huawei.hwid.persistent: 0% user + 0% kernel / faults: 7819 minor 24 major
0.9% 2823/com.huawei.systemmanager:service: 0.6% user + 0.2% kernel / faults: 13277 minor 12 major
//...
2020-06-04 15:07:41.246 ? E/ActivityManager: CPU usage from 1714ms to 2243ms later (2020-06-04 15:07:38.994 to 2020-06-04 15:07:39.523):
12% 2001/system_server: 9% user + 3.6% kernel / faults: 8 minor
10% 2014/ActivityManager: 5.4% user + 5.4% kernel
1.8% 2399/UEventObserver: 1.8% user + 0% kernel
1.5% 13652/kworker/u16:7: 0% user + 1.5% kernel
2.3% TOTAL: 1.1% user + 1.1% kernel
上面的Logcat輸出并沒有關于我們進程的CUP信息,說明我們的進程CPU占用率很低。那么我們繼續看一下traces.txt文件。
traces.txt部分信息
----- pid 13626 at 2020-06-04 15:07:37 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通過進程號pid 13626搜索
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=13626 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 288564792 6939269 224 ) utm=23 stm=5 core=0 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
at com.example.android.jetpackdemo.StartActivity.mockDeadLock(StartActivity.kt:142)
- waiting to lock <0x0a43b5c8> (a java.lang.String) held by thread 17
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:70)
at java.lang.reflect.Method.invoke!(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
主線程狀態是線程狀態是Blocked,說明正在等待獲取鎖對象,等待獲取的鎖對象<0x0a43b5c8>
是一個String對象(a java.lang.String),該對象被線程id為17的線程持有。然后我們搜索這個鎖對象。
"Thread-2" prio=5 tid=17 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c89dc0 self=0x7f931cd000
| sysTid=13763 nice=0 cgrp=default sched=0/0 handle=0x7f84344450
| state=S schedstat=( 886406 280365 2 ) utm=0 stm=0 core=0 HZ=100
| stack=0x7f84242000-0x7f84244000 stackSize=1037KB
| held mutexes=
at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:127)
- waiting to lock <0x0ec26674> (a java.lang.String) held by thread 1
- locked <0x0a43b5c8> (a java.lang.String)
at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:21)
at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
Thread-2,線程狀態是Blocked,說明正在等待獲取鎖對象,等待獲取的鎖對象<0x0ec26674>
是一個String對象(a java.lang.String),這個對象被線程id為1的線程(也就是主線程)持有。并且當前線程持有鎖對象<0x0a43b5c8>
。
最終,主線程和工作線程Thread-2造成死鎖,導致應用無響應。
5.主線程在對另一個進程進行同步Binder調用,而后者需要很長時間才能返回
我們的代碼是實現從客戶端的兩個EditText中獲取兩個數字,然后通過Binder調用服務端的方法計算兩個數的和返回給客戶端,然后客戶端講計算結果展示在界面上。完整代碼請參考 AIDLDemo。
客戶端部分代碼
private IMyAidlInterface iMyAidlInterface;
private ServiceConnection conn = new ServiceConnection() {
@Override
public void onServiceConnected(ComponentName name, IBinder service) {
//獲取Binder對象
iMyAidlInterface = IMyAidlInterface.Stub.asInterface(service);
}
//...
};
public void onClick(View view) {
switch (view.getId()) {
case R.id.btn_count:
mNum1 = Integer.parseInt(etNum1.getText().toString());
mNum2 = Integer.parseInt(etNum2.getText().toString());
try {
//在主線程進行同步binder調用
mTotal = iMyAidlInterface.add(mNum1, mNum2);
} catch (RemoteException e) {
e.printStackTrace();
Log.e(TAG, "onClick: " + e.getMessage());
}
editShowResult.setText("mTotal=" + mTotal);
break;
}
}
服務端部分代碼
public class IRemoteService extends Service {
private static final String TAG = "IRemoteService";
private IBinder iBinder = new IMyAidlInterface.Stub() {
@Override
public int add(int num1, int num2) throws RemoteException {
Log.d(TAG, "remote method add: start sleep thread id =" + Thread.currentThread().getId()+"," +
"thread name = "+Thread.currentThread().getName());
try {
//睡眠一段時間,然后才進行計算
Thread.sleep(120000);
} catch (InterruptedException e) {
e.printStackTrace();
}
Log.d(TAG, "remote method add: finish sleep return calculate result");
return num1 + num2;
}
};
public IRemoteService() {
}
@Override
public IBinder onBind(Intent intent) {
return iBinder;
}
}
注意:我們需要先把Binder服務端運行起來,然后再運行Binder客戶端執行相應的方法。
Logcat輸出
2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: ANR in com.hm.aidlclient (com.hm.aidlclient/.BaseKnowledgeActivity)
PID: 18096
Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 2.)
Load: 7.55 / 7.26 / 6.87
CPU usage from 755516ms to 0ms ago (2020-06-04 15:37:07.545 to 2020-06-04 15:49:43.062) with 99% awake:
5.1% 2001/system_server: 3.5% user + 1.5% kernel / faults: 139606 minor 17 major
1.2% 508/logd: 0.5% user + 0.6% kernel / faults: 35 minor
0.9% 2661/com.android.systemui: 0.7% user + 0.1% kernel / faults: 13039 minor 4 major
0.8% 12442/adbd: 0.2% user + 0.6% kernel / faults: 23957 minor
0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 183 minor 2 major
0.6% 28932/com.huawei.appmarket: 0.5% user + 0.1% kernel / faults: 9311 minor 64 major
0.4% 24463/com.huawei.hwid.persistent: 0.3% user + 0% kernel / faults: 11607 minor 6 major
0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
0.3% 4128/com.google.android.gms: 0.2% user + 0% kernel / faults: 26970 minor 16 major
//...
2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: CPU usage from 1701ms to 2232ms later (2020-06-04 15:49:44.762 to 2020-06-04 15:49:45.293):
28% 2001/system_server: 21% user + 7.2% kernel / faults: 38 minor
16% 2010/HeapTaskDaemon: 16% user + 0% kernel
9% 2014/ActivityManager: 1.8% user + 7.2% kernel
1.8% 2001/system_server: 0% user + 1.8% kernel
1.8% 2540/NetdConnector: 1.8% user + 0% kernel
9% 607/surfaceflinger: 9% user + 0% kernel
5.4% 607/surfaceflinger: 5.4% user + 0% kernel
1.8% 658/Binder:607_1: 0% user + 1.8% kernel
1.8% 677/EventThread: 0% user + 1.8% kernel
7.1% 2661/com.android.systemui: 5.3% user + 1.7% kernel / faults: 38 minor
8.9% 3614/RenderThread: 7.1% user + 1.7% kernel
1.3% 508/logd: 1.3% user + 0% kernel
1.3% 624/mm-pp-dpps: 1.3% user + 0% kernel
2.7% 717/ABA_THREAD: 1.3% user + 1.3% kernel
1.5% 15978/mdss_fb0: 0% user + 1.5% kernel
1.6% 18228/logcat: 0% user + 1.6% kernel
8.3% TOTAL: 5.8% user + 2.5% kernel
Logcat輸出的信息中并沒有什么有價值的信息。那么我們繼續看一下traces.txt文件。
traces.txt中客戶端相關信息
----- pid 18096 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlclient
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通過進程號pid 18096搜索
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=18096 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 464662186 22498334 359 ) utm=38 stm=8 core=0 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0x70/0x7c
kernel: binder_thread_read+0x4cc/0x13f0
kernel: binder_ioctl+0x53c/0xbcc
kernel: do_vfs_ioctl+0x570/0x5a8
kernel: SyS_ioctl+0x60/0x88
kernel: el0_svc_naked+0x24/0x28
native: #00 pc 000000000006ad6c /system/lib64/libc.so (__ioctl+4)
native: #01 pc 000000000001fa48 /system/lib64/libc.so (ioctl+144)
native: #02 pc 00000000000555a4 /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+260)
native: #03 pc 0000000000056388 /system/lib64/libbinder.so (_ZN7android14IPCThreadState15waitForResponseEPNS_6ParcelEPi+352)
native: #04 pc 000000000004b250 /system/lib64/libbinder.so (_ZN7android8BpBinder8transactEjRKNS_6ParcelEPS1_j+72)
native: #05 pc 0000000000103354 /system/lib64/libandroid_runtime.so (???)
native: #06 pc 0000000000b36238 /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:617)
at com.hm.aidlserver.IMyAidlInterface$Stub$Proxy.add(IMyAidlInterface.java:90)
at com.hm.aidlclient.BaseKnowledgeActivity.onClick(BaseKnowledgeActivity.java:109)
at com.hm.aidlclient.BaseKnowledgeActivity_ViewBinding$1.doClick(BaseKnowledgeActivity_ViewBinding.java:41)
at butterknife.internal.DebouncingOnClickListener.onClick(DebouncingOnClickListener.java:22)
at android.view.View.performClick(View.java:5646)
at android.view.View$PerformClick.run(View.java:22473)
at android.os.Handler.handleCallback(Handler.java:761)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:156)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
這里我們看到Binder客戶端主線程的狀態是Native
,這個狀態是native線程的一個狀態,對應java線程的RUNNABLE
狀態。更詳細的對應關系可以參考VMThread.java。然后從上面的信息中我們只看到BinderProxy調用了transactNative()方法,這是一個本地方法,最終會調用服務端Binder對象的transact()方法,實現真正的跨進程通信。除了這些我們沒有看到其他有用的信息了。那么我們接下來看一看服務端的一些信息,看看能不能找到一些線索。
traces.txt中服務端相關信息
----- pid 17773 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlserver
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
通過進程號pid 17773搜索
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
| sysTid=17773 nice=0 cgrp=default sched=0/0 handle=0x7fa6f4ba98
| state=S schedstat=( 213791882 16481247 206 ) utm=18 stm=3 core=1 HZ=100
| stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0x70/0x7c
kernel: SyS_epoll_wait+0x2d4/0x394
kernel: SyS_epoll_pwait+0xc4/0x150
kernel: el0_svc_naked+0x24/0x28
native: #00 pc 000000000006ac80 /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 000000000001e21c /system/lib64/libc.so (epoll_pwait+64)
native: #02 pc 00000000000181d8 /vendor/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+156)
native: #03 pc 000000000001808c /vendor/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+60)
native: #04 pc 00000000000f66dc /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
native: #05 pc 0000000000b91ec0 /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:356)
at android.os.Looper.loop(Looper.java:138)
at android.app.ActivityThread.main(ActivityThread.java:6517)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
服務端的進程號是pid 17773,我們看到服務端的主線程中也沒有什么線索,不要慌,這里我們似乎忘了一點什么,那就是服務端的Binder對象是運行在服務端的Binder線程池中的。那我們怎么找到具體是Binder線程池中的哪個線程呢?其實在traces.txt文件中也是輸出了的。
----- binder transactions -----
18096:18096(m.hm.aidlclient:m.hm.aidlclient) -> 17773:17788(m.hm.aidlserver:Binder:17773_2) code: 1
----- end binder transactions -----
上面這段信息的意思就是,我們是在進程id為18096,內核線程為18096的線程(就是主線程)向進程id為17773,內核線id為17788的線程發起跨進程通信。內核線程id為17788的線程的線程名稱是Binder:17773_2
。那么我們就搜索一下Binder:17773_2
。搜索結果如下所示:
"Binder:17773_2" prio=5 tid=10 Sleeping
| group="main" sCount=1 dsCount=0 obj=0x32c064c0 self=0x7f9a624800
| sysTid=17788 nice=-10 cgrp=default sched=0/0 handle=0x7fa0fc3450
| state=S schedstat=( 3077762 6086666 14 ) utm=0 stm=0 core=6 HZ=100
| stack=0x7fa0ec9000-0x7fa0ecb000 stackSize=1005KB
| held mutexes=
at java.lang.Thread.sleep!(Native method)
- sleeping on <0x05eea4a7> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:379)
- locked <0x05eea4a7> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:321)
at com.hm.aidlserver.IRemoteService$1.add(IRemoteService.java:18)
at com.hm.aidlserver.IMyAidlInterface$Stub.onTransact(IMyAidlInterface.java:55)
at android.os.Binder.execTransact(Binder.java:565)
這里我們終于發現了原因,我們看到Binder:17773_2
狀態是Sleeping,就是服務端的Binder對象的add()方法內部第18行調用了Thread.sleep方法造成長時間無法返回,從而使客戶端方法執行無法結束,最終導致ANR。
總結:本篇文章列舉了幾種常見原因造成ANR的示例,并分析了相關日志和traces.txt文件。然而在真實的場景中可能還會有各種稀奇古怪的原因造成ANR,排查起來也會復雜的多,所以最重要的還是防患于未然,在實際的開發過程中盡量避免主線程被長時間阻塞。
參考鏈接: