Android造成ANR的常見原因及示例分析

文章有錯誤之處,歡迎批評指正!

什么是ANR

在Android中,如果主線程被長時間阻塞,導致無法響應用戶的操作,即造成ANR(Application Not Responding)。通常的表現是彈出一個應用無響應的對話框,讓用戶選擇強制退出或者等待。

ANR_Dialog.png

注意:主線程做耗時操作本身是不會產生ANR的,導致ANR的根本還是應用程序無法在一定時間內響應用戶的操作。因為主線程被耗時操作阻塞了,主線程無法對下一個操作進行響應才會ANR,沒有需要響應的操作自然就不會產生ANR,或者應該這樣說:主線程做耗時操作,非常容易引發ANR。

ANR的類型

KeyDispatch Timeout :按鍵或觸摸事件在特定時間內無響應。超時時間5秒。超時時間是在ActivityManagerService類中定義的。

// How long we wait until we timeout on key dispatching.
static final int KEY_DISPATCHING_TIMEOUT = 5*1000;

Broadcast Timeout :BroadcastReceiver在特定時間內無法處理完成。前臺廣播10秒,后臺廣播60秒。超時時間是在ActivityManagerService類中定義的。

// How long we allow a receiver to run before giving up on it.
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;

Service Timeout :Service在特定的時間內生命周期函數無法處理完成。前臺服務20秒,后臺服務200秒。超時時間是在ActiveServices類中定義的。

// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

ContentProvider Timeout :ContentProvider在特定的時間內沒有完成發布。超時時間10秒。超時時間是在ActivityManagerService類中定義的。

// How long we wait for an attached process to publish its content providers
// before we decide it must be hung.
static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10*1000;

關于ANR類型的詳細信息不在本篇文章的敘述范圍之內,請自行查閱資料。

造成ANR的常見原因

  1. 應用在主線程上進行長時間的計算。
  2. 應用在主線程上執行耗時的I/O的操作。
  3. 主線程處于阻塞狀態,等待獲取鎖。
  4. 主線程與其他線程之間發生死鎖。
  5. 主線程在對另一個進程進行同步Binder調用,而后者需要很長時間才能返回。(如果我們知道調用遠程方法需要很長時間,我們應該避免在主線程調用)

上述原因都會造成主線程被長時間阻塞,導致無法響應用戶的操作,從而造成ANR。

ANR原因排查

ANR發生以后,在Logcat中有相應的日志輸出,并且會在/data/anr/目錄下輸出一個traces.tx文件,該文件記錄了ANR的更加詳細的信息,我們可以導出分析。接下來我們就依次模擬上述5種方式來制造ANR,然后分析產生的Logcat和traces.txt文件。

測試環境:Android Studio 3.6.1
測試手機: HUAWEI MLA-AL10,Android版本: 7.0

1.應用在主線程上進行長時間的計算

//使用冒泡排序對一個大數組排序
private fun sortBigArray() {
    val currTime = System.currentTimeMillis()
    val random = IntArray(1000000)
    for (i in random.indices) {
        random[i] = (Math.random() * 10000000).toInt()
    }
    BubbleSort.sort(random)
    println("耗時" + (System.currentTimeMillis() - currTime) + "ms")
    for (i in random.indices) {
        println(random[i].toString())
    }
}

我們點擊一個按鈕調用sortBigArray()方法,內部調用BubbleSort類的sort()方法對一個大數組(100萬)進行排序,然后點擊幾次返回鍵,然后就出現ANR了。

我們先看一下Logcat日志輸出

//debug級別日志
2020-06-03 21:20:24.209 com.example.android.jetpackdemo I/art: Wrote stack traces to '/data/anr/traces.txt'

//error級別日志
2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 15564
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.7 / 7.48 / 7.35
    CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
      4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
      3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
      1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
      0.7% 2661/com.android.systemui: 0.6% user + 0.1% kernel / faults: 1648 minor 1 major
      0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 21 minor
      0.7% 24463/com.huawei.hwid.persistent: 0.6% user + 0% kernel / faults: 4650 minor 1 major
      0.5% 4018/com.huawei.android.launcher: 0.4% user + 0% kernel / faults: 16025 minor 3 major
      0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
      0.4% 28932/com.huawei.appmarket: 0.3% user + 0% kernel / faults: 2526 minor
     //...
   2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
      99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
        99% 15564/oid.jetpackdemo: 99% user + 0% kernel
      7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
        5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
        1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
        1.8% 2862/Binder:2001_6: 0% user + 1.8% kernel
        1.8% 3089/Binder:2001_7: 1.8% user + 0% kernel
      5.3% 29428/adbd: 0% user + 5.3% kernel / faults: 480 minor
        3.5% 29430/->transport: 0% user + 3.5% kernel
        1.7% 29428/adbd: 0% user + 1.7% kernel
      1.3% 53/rcuop/6: 0% user + 1.3% kernel
    16% TOTAL: 14% user + 2.1% kernel + 0.2% irq + 0.2% softirq

在上面的日志中輸出了堆棧信息的保存在 /data/anr/traces.txt文件中。

com.example.android.jetpackdemo I/dalvikvm: Wrote stack traces to '/data/anr/traces.txt'

發生ANR進程的包名信息,所在的類,進程id和ANR的類型

2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 15564
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)  

包名com.example.android.jetpackdemo,具體的類com.example.android.jetpackdemo.StartActivity,進程號是PID: 15564,ANR的類型是Input dispatching timed out

 CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
      4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
      3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
      1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor

//...

2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
      99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
        99% 15564/oid.jetpackdemo: 99% user + 0% kernel
      7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
        5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
        1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel

//...

注意:
在ANR發生之前,2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139,這段時間CPU的使用并不高。

在ANR發生的時候,2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389,這段時間CPU的使用相當高,已經達到99%了。

99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel
  • 99%:內存占用率
  • 15564/com.example.android.jetpackdemo:進程id和進程名。

這兩段CPU 信息分別代表ANR發生前和ANR時的CPU占用率,在輸出的CPU使用信息中我們也可以看出一些端倪,我們注意到我們的進程CPU的占用率比較高,說明我們的進程比較忙碌,這里需要說明一下,進程忙碌并不一定代表主線程忙碌,也可能是進程中的后臺線程忙碌。

但是現在我們雖然知道了ANR發生的所在的類,但是如何精確定位到具體的哪一行代碼呢?這就需要分析發生ANR的時候保存的traces.txt文件了。

導出traces文件

使用adb命令導出traces.txt文件

adb pull /data/anr/traces.txt traces_1.txt 
/data/anr/traces.txt: 1 file pulled, 0 skipped. 28.5 MB/s (701726 bytes in 0.023s)

如果入到permission相關問題,請使用bugreport命令導出,參考 Capture and read bug reports

traces.txt部分信息

----- pid 15564 at 2020-06-03 21:20:24 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
ABI: 'arm64'
//...

在traces.txt文件的最頂部,首先輸出的是發生ANR的進程號和包名信息,然后我們可以在traces.txt中搜索我們的進程號或者包名。

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
  at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
//...

我們首先看一下和線程相關的部分信息。

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)

線程基本信息:

  • 線程名:main

  • 線程優先級:prio=5,優先級取值范圍[1,10],詳見Thread類:

//最小取值
public final static int MIN_PRIORITY = 1;
//默認優先級
public final static int NORM_PRIORITY = 5;
//最大取值
public final static int MAX_PRIORITY = 10;
  • 線程id: tid=1,1代表主線程
  • 線程狀態:Runnable,狀態取值如下,詳見Thread.State枚舉類:
NEW, //線程還沒啟動

RUNNABLE, //正在執行

BLOCKED, //等待獲取鎖

WAITING, //等待其他線程執行一個特定的動作,比如說調用Object.notify或Object.notifyAll()

TIMED_WAITING, //等待一定時間

TERMINATED //執行完畢
  • 線程組名稱:group="main"
  • 線程被掛起的次數:sCount=0
  • 線程被調試器掛起的次數:dsCount=0
  • 線程的java的對象地址:obj= 0x77d21af8
  • 線程本身的Native對象地址:self= 0x7fa2ea2a00

線程調度信息:

  • Linux系統中內核線程id: sysTid= 15564 與進程號相同
  • 線程調度優先級:nice=-10,詳細信息可參考 淺析Linux線程調度
  • 線程調度組:cgrp=default
  • 線程調度策略和優先級:sched=0/0
  • 線程處理函數地址:handle= 0x7fa6f4ba98

線程的堆棧信息:

  • 堆棧地址和大小:stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB

held mutexes:

  • held mutexes 到底是什么意思我沒有找到官方的文檔解釋,網上大多數關于held mutexes的解釋也都是一筆帶過沒有實際參考意義,我們這里先忽略這個東西,并不會影響我們排查問題。

從上面traces.txt文件中這段信息可以看出,導致ANR的最終原因是在BubbleSort.java的第45行。

 at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
 at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
 at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
 at java.lang.reflect.Method.invoke!(Native method)
anr_1.png

2.應用在主線程上執行耗時的I/O的操作

/**
 * 拷貝文件,注意要有讀寫權限
 */
private fun doIo() {
    val prePath = Environment.getExternalStorageDirectory().path
    val file = File("${prePath}/test/View.java")
    if (file.exists()) {
        Log.d(TAG, "doIo: ${file.length()}")

        val reader = FileReader(file)
        val fileWriter = FileWriter("${prePath}/test/ViewCopy.java", true)

        for (index in 0 until 5) {
            var count: Int
            while (reader.read().also { count = it } != -1) {
                fileWriter.write(count)
            }
            try {
                reader.reset()
            } catch (e: IOException) {
                Log.d(TAG, "doIo: error ${e.message}")
            }
        }
    }
}

調用doIo()方法以后,多次點擊返回鍵,制造ANR。

Logcat日志輸出

2020-06-04 21:05:24.462 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 16295
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.49 / 7.45 / 7.24
    CPU usage from 87491ms to 0ms ago (2020-06-04 21:03:53.035 to 2020-06-04 21:05:20.526):
      7.8% 2001/system_server: 6.1% user + 1.7% kernel / faults: 34095 minor 3 major
      4.2% 28932/com.huawei.appmarket: 3.7% user + 0.5% kernel / faults: 12314 minor 5 major
      2.8% 2661/com.android.systemui: 2.2% user + 0.5% kernel / faults: 4222 minor 1 major
      2% 412/msm-core:sampli: 0% user + 2% kernel
      1.7% 24463/com.huawei.hwid.persistent: 1.5% user + 0.1% kernel / faults: 3317 minor 1 major
      1.5% 607/surfaceflinger: 1% user + 0.5% kernel / faults: 24 minor
     //...    
2020-06-04 21:05:24.462 ? E/ActivityManager: CPU usage from 1696ms to 2226ms later (2020-06-04 21:05:22.222 to 2020-06-04 21:05:22.752):
      84% 16295/com.example.android.jetpackdemo: 84% user + 0% kernel / faults: 562 minor 1 major
        68% 16295/oid.jetpackdemo: 68% user + 0% kernel
        12% 16317/RenderThread: 12% user + 0% kernel
        1.8% 16307/HeapTaskDaemon: 1.8% user + 0% kernel
       +0% 16461/DeferredSaveThr: 0% user + 0% kernel
      9.1% 2001/system_server: 1.8% user + 7.3% kernel / faults: 7 minor
        7.3% 2014/ActivityManager: 0% user + 7.3% kernel
        3.6% 2536/Binder:2001_3: 3.6% user + 0% kernel
      5.5% 607/surfaceflinger: 2.7% user + 2.7% kernel
        1.3% 607/surfaceflinger: 1.3% user + 0% kernel
        1.3% 658/Binder:607_1: 0% user + 1.3% kernel
      4.3% 2661/com.android.systemui: 2.9% user + 1.4% kernel / faults: 26 minor
        4.3% 3614/RenderThread: 2.9% user + 1.4% kernel
        1.4% 2661/ndroid.systemui: 1.4% user + 0% kernel
      1.3% 25/rcuop/2: 0% user + 1.3% kernel
      1.3% 339/irq/171-tsens_i: 0% user + 1.3% kernel
      1.5% 11851/mdss_fb0: 0% user + 1.5% kernel
      1.6% 14246/kworker/u16:5: 0% user + 1.6% kernel
      1.6% 16318/kworker/u16:4: 0% user + 1.6% kernel
    15% TOTAL: 13% user + 1.8% kernel

從上面的日志信息中我們也看出來發生ANR的時候,我們的進程com.example.android.jetpackdemoCPU占用率是比較高的,說明我們進程內存在比較忙碌的線程。然后我們繼續看一下對應的traces.txt文件。

traces.txt部分信息

----- pid 16295 at 2020-06-04 21:05:20 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通過進程號pid 16295搜索

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=16295 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 16406184130 12254163 407 ) utm=1630 stm=10 core=6 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 0000000000478088  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
  native: #01 pc 0000000000478084  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
  native: #02 pc 000000000044c604  /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+524)
  native: #03 pc 0000000000463f60  /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
  native: #04 pc 000000000044d510  /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
  native: #05 pc 00000000000ff870  /system/lib64/libart.so (_ZN3art27ScopedObjectAccessUncheckedD2Ev+576)
  native: #06 pc 000000000010a764  /system/lib64/libart.so (_ZN3art8CheckJNI23GetPrimitiveArrayRegionEPKcNS_9Primitive4TypeEP7_JNIEnvP7_jarrayiiPv+1164)
  native: #07 pc 0000000000022ee4  /system/lib64/libjavacore.so (???)
  native: #08 pc 00000000004747a8  /data/dalvik-cache/arm64/system@framework@boot-core-libart.oat (Java_libcore_icu_NativeConverter_encode__J_3CI_3BI_3IZ+244)
  at libcore.icu.NativeConverter.encode(Native method)
  at java.nio.charset.CharsetEncoderICU.encodeLoop(CharsetEncoderICU.java:169)
  at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:579)
  at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:271)
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
  - locked <0x05b5279d> (a java.io.FileWriter)
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:113)
  at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
  at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

我們重點看一下這段信息

at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)

從上面這段信息可以看出,導致ANR的最終原因是在OutputStreamWriter.java的第194行。而我們的代碼出問題的地方是StartActivity.kt的116行。

OutputStreamWriter_194.png

StartActivity_116.png

3.主線程處于阻塞狀態,等待獲取鎖

//鎖資源
val lockedResource: Any = Any()

fun onClick(v: View) {
    when (v.id) {
        R.id.btnWaitLockedResource -> {
            LockTask().execute(arrayListOf<Int>())
            Log.d(TAG, "onClick: 主線程先睡眠一會,避免先獲取到鎖")
            Thread.sleep(200)
            Log.d(TAG, "onClick: 主線程先睡眠結束,嘗試獲取鎖")
            synchronized(lockedResource) {
                for (index in 0 until 10) {
                    Log.d(TAG, "onClick: 主線程獲取到鎖了$index")
                }
            }
        }
    }
}


//LockTask后臺線程
inner class LockTask : AsyncTask<MutableList<Int>, Int, Unit>() {
    override fun doInBackground(vararg params: MutableList<Int>) =
        synchronized(lockedResource) {
            val list = params[0]
            for (i in 0 until 1000000) {
                list.add((Math.random() * 10000000).toInt())
            }
            list.forEach {
                Log.d(TAG, "doInBackground: for each element is $it")
            }
        }
}

調用onClick()方法以后,先讓后臺線程獲取鎖,然后主線程再嘗試獲取鎖。然后多次點擊返回鍵,制造ANR。

Logcat日志輸出

2020-06-04 09:55:04.396 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 20008
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 8.27 / 7.73 / 7.37
    CPU usage from 83152ms to 0ms ago (2020-06-04 09:53:36.842 to 2020-06-04 09:54:59.995) with 99% awake:
      19% 508/logd: 15% user + 3.5% kernel / faults: 533 minor 1 major
      5.5% 2001/system_server: 3.9% user + 1.5% kernel / faults: 10843 minor 7 major
      4.9% 28932/com.huawei.appmarket: 4.3% user + 0.6% kernel / faults: 13003 minor 79 major
      2.6% 2661/com.android.systemui: 2.2% user + 0.3% kernel / faults: 7158 minor 2 major
      1.5% 607/surfaceflinger: 0.9% user + 0.6% kernel / faults: 190 minor 1 major
      1.2% 24307/logcat: 0.7% user + 0.4% kernel
      0.8% 11161/com.android.settings: 0.6% user + 0.1% kernel / faults: 9084 minor 20 major
      0.6% 24305/logcat: 0.2% user + 0.3% kernel
      0.4% 24301/fingerprint_log: 0% user + 0.4% kernel
      0.3% 15363/kworker/u16:10: 0% user + 0.3% kernel
      0.2% 6831/kworker/u16:5: 0% user + 0.2% kernel
      0.2% 837/imonitor: 0% user + 0.1% kernel
      //...
     2020-06-04 09:55:04.396 ? E/ActivityManager: CPU usage from 2211ms to 2742ms later (2020-06-04 09:55:02.206 to 2020-06-04 09:55:02.737):
      105% 20008/com.example.android.jetpackdemo: 92% user + 13% kernel / faults: 220 minor
        99% 20096/AsyncTask #1: 86% user + 13% kernel
        5.6% 20019/HeapTaskDaemon: 5.6% user + 0% kernel
      103% 508/logd: 99% user + 3.7% kernel / faults: 8 minor
        92% 24315/logd.reader.per: 92% user + 0% kernel
        7.5% 511/logd.writer: 5.6% user + 1.8% kernel
        3.7% 24314/logd.reader.per: 0% user + 3.7% kernel
        1.8% 24313/logd.reader.per: 0% user + 1.8% kernel
      11% 2661/com.android.systemui: 11% user + 0% kernel / faults: 52 minor
        9.3% 3614/RenderThread: 7.5% user + 1.8% kernel
        1.8% 2661/ndroid.systemui: 1.8% user + 0% kernel
      9.3% 607/surfaceflinger: 9.3% user + 0% kernel
        3.7% 607/surfaceflinger: 3.7% user + 0% kernel
        1.8% 2614/Binder:607_4: 0% user + 1.8% kernel
      5.6% 2001/system_server: 1.8% user + 3.7% kernel / faults: 2 minor
        5.6% 2014/ActivityManager: 0% user + 5.6% kernel
      3.3% 19794/adbd: 1.6% user + 1.6% kernel / faults: 147 minor
        1.6% 19794/adbd: 0% user + 1.6% kernel
        1.6% 19796/->transport: 0% user + 1.6% kernel
        1.6% 19797/<-transport: 0% user + 1.6% kernel
      3.4% 24307/logcat: 0% user + 3.4% kernel
      1.3% 624/mm-pp-dpps: 0% user + 1.3% kernel
        1.3% 717/ABA_THREAD: 1.3% user + 0% kernel
      1.6% 18971/kworker/0:2: 0% user + 1.6% kernel
      1.6% 18974/kworker/u16:0: 0% user + 1.6% kernel
      1.6% 19095/mdss_fb0: 0% user + 1.6% kernel
      1.7% 24301/fingerprint_log: 1.7% user + 0% kernel
      1.7% 24305/logcat: 1.7% user + 0% kernel
    31% TOTAL: 26% user + 4% kernel + 0.2% irq + 0.2% softirq

從上面的日志信息中我們也看出來我們的進程CPU占用率是比較高的,說明我們進程內存在比較忙碌的線程。然后我們繼續看一下對應的traces.txt文件。

traces.txt部分信息

----- pid 20008 at 2020-06-04 09:55:00 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通過進程號pid 20008搜索

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=20008 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 278831875 7233747 156 ) utm=22 stm=5 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
  - waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

關鍵信息

at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
- waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16

在StartActivity的61行,在等待一個鎖對象<0x0f8c80b0>,該對象是一個Object對象(a java.lang.Object),這個鎖對象正在被線程id為16的線程持有。那么我們下面在traces.txt文件中搜索一下這個鎖對象<0x0f8c80b0>。如下所示:

DALVIK THREADS (16):
"AsyncTask #1" prio=5 tid=16 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x12cd61f0 self=0x7f93187200
  | sysTid=20096 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x7f84346450
  | state=R schedstat=( 13814173056 6030204 1355 ) utm=1193 stm=188 core=3 HZ=100
  | stack=0x7f84244000-0x7f84246000 stackSize=1037KB
  | held mutexes= "mutator lock"(shared held)
  at java.lang.Integer.stringSize(Integer.java:414)
  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:630)
  at java.lang.StringBuilder.append(StringBuilder.java:220)
  at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
  - locked <0x0f8c80b0> (a java.lang.Object)
  at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:99)
  at android.os.AsyncTask$2.call(AsyncTask.java:316)
  at java.util.concurrent.FutureTask.run(FutureTask.java:237)
  at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:255)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
  at java.lang.Thread.run(Thread.java:776)

關鍵信息

 at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
 - locked <0x0f8c80b0> (a java.lang.Object)

我們看到正是這個AsyncTask在107行持有鎖對象0x0f8c80b0,導致主線程無法獲取鎖而阻塞,最終導致ANR。

AsyncTaskHeldLock.png

4.主線程與其他線程之間發生死鎖

val resourceFirst = "resourceFirst"
val resourceSecond = "resourceSecond"

private fun mockDeadLock() {
    //啟動一個后臺線程
    thread(start = false) {
        synchronized(resourceSecond) {
            Log.d(TAG, "工作線程獲取了鎖 resourceSecond")
            Thread.sleep(100)
            Log.d(TAG, "工作線程嘗試獲取鎖 resourceFirst")
            synchronized(resourceFirst) {
                while (true) {
                    Log.d(TAG, "工作線程 mockDeadLock")
                }
            }
        }
    }.start()

    //主線程睡眠30ms后開始獲取鎖
    Thread.sleep(30)

    synchronized(resourceFirst) {
        Log.d(TAG, "主線程獲取了鎖 resourceFirst")

        Log.d(TAG, "主線程嘗試獲取鎖 resourceSecond")
        synchronized(resourceSecond) {
            Log.d(TAG, "主線程獲取了鎖 resourceFirst")
            while (true) {
                Log.d(TAG, "主線程 mockDeadLock")
            }
        }
    }
}

上面這段代碼邏輯:

  1. 工作線程先獲取鎖resourceSecond,然后睡眠100ms保證主線程能獲取到鎖resourceFirst
  2. 主線程睡眠30ms后先獲取鎖resourceFirst,然后再嘗試獲取鎖resourceSecond,這時候是獲取不到的,因為工作線程已經持有鎖resourceSecond并且不釋放。
  3. 工作線程睡眠結束以后嘗試獲取鎖resourceFirst,這時候是獲取不到的,因為主線程持有了鎖resourceFirst并且不釋放。
  4. 最終,造成死鎖。

調用mockDeadLock()方法以后,多次點擊返回鍵,制造ANR。

Logcat輸出

2020-06-04 15:07:41.246 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 13626
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.53 / 6.81 / 6.4
    CPU usage from 177565ms to 0ms ago (2020-06-04 15:04:39.715 to 2020-06-04 15:07:37.281):
      11% 2001/system_server: 7.1% user + 4.4% kernel / faults: 68219 minor 37 major
      3.4% 2661/com.android.systemui: 2.8% user + 0.6% kernel / faults: 20555 minor 29 major
      2% 508/logd: 0.9% user + 1.1% kernel / faults: 76 minor
      1.8% 607/surfaceflinger: 1.1% user + 0.7% kernel / faults: 82 minor 1 major
      0% 24463/com.huawei.hwid.persistent: 0% user + 0% kernel / faults: 7819 minor 24 major
      0.9% 2823/com.huawei.systemmanager:service: 0.6% user + 0.2% kernel / faults: 13277 minor 12 major
    //...      
2020-06-04 15:07:41.246 ? E/ActivityManager: CPU usage from 1714ms to 2243ms later (2020-06-04 15:07:38.994 to 2020-06-04 15:07:39.523):
      12% 2001/system_server: 9% user + 3.6% kernel / faults: 8 minor
        10% 2014/ActivityManager: 5.4% user + 5.4% kernel
        1.8% 2399/UEventObserver: 1.8% user + 0% kernel
      1.5% 13652/kworker/u16:7: 0% user + 1.5% kernel
    2.3% TOTAL: 1.1% user + 1.1% kernel

上面的Logcat輸出并沒有關于我們進程的CUP信息,說明我們的進程CPU占用率很低。那么我們繼續看一下traces.txt文件。

traces.txt部分信息

----- pid 13626 at 2020-06-04 15:07:37 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通過進程號pid 13626搜索


"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=13626 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 288564792 6939269 224 ) utm=23 stm=5 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity.mockDeadLock(StartActivity.kt:142)
  - waiting to lock <0x0a43b5c8> (a java.lang.String) held by thread 17
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:70)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

主線程狀態是線程狀態是Blocked,說明正在等待獲取鎖對象,等待獲取的鎖對象<0x0a43b5c8>是一個String對象(a java.lang.String),該對象被線程id為17的線程持有。然后我們搜索這個鎖對象。

"Thread-2" prio=5 tid=17 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x12c89dc0 self=0x7f931cd000
  | sysTid=13763 nice=0 cgrp=default sched=0/0 handle=0x7f84344450
  | state=S schedstat=( 886406 280365 2 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7f84242000-0x7f84244000 stackSize=1037KB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:127)
  - waiting to lock <0x0ec26674> (a java.lang.String) held by thread 1
  - locked <0x0a43b5c8> (a java.lang.String)
  at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:21)
  at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

Thread-2,線程狀態是Blocked,說明正在等待獲取鎖對象,等待獲取的鎖對象<0x0ec26674>是一個String對象(a java.lang.String),這個對象被線程id為1的線程(也就是主線程)持有。并且當前線程持有鎖對象<0x0a43b5c8>

最終,主線程和工作線程Thread-2造成死鎖,導致應用無響應。

5.主線程在對另一個進程進行同步Binder調用,而后者需要很長時間才能返回

我們的代碼是實現從客戶端的兩個EditText中獲取兩個數字,然后通過Binder調用服務端的方法計算兩個數的和返回給客戶端,然后客戶端講計算結果展示在界面上。完整代碼請參考 AIDLDemo

客戶端部分代碼

private IMyAidlInterface iMyAidlInterface;
    
private ServiceConnection conn = new ServiceConnection() {
    @Override
    public void onServiceConnected(ComponentName name, IBinder service) {
         //獲取Binder對象
         iMyAidlInterface = IMyAidlInterface.Stub.asInterface(service);
    }
    //...
};
    
public void onClick(View view) {
    switch (view.getId()) {
        case R.id.btn_count:
            mNum1 = Integer.parseInt(etNum1.getText().toString());
            mNum2 = Integer.parseInt(etNum2.getText().toString());
            try {
                //在主線程進行同步binder調用
                mTotal = iMyAidlInterface.add(mNum1, mNum2);
            } catch (RemoteException e) {
                e.printStackTrace();
                Log.e(TAG, "onClick: " + e.getMessage());
            }
            editShowResult.setText("mTotal=" + mTotal);
            break;
       }
}

服務端部分代碼

public class IRemoteService extends Service {

    private static final String TAG = "IRemoteService";

    private IBinder iBinder = new IMyAidlInterface.Stub() {
        @Override
        public int add(int num1, int num2) throws RemoteException {
            Log.d(TAG, "remote method add: start sleep thread id =" + Thread.currentThread().getId()+"," +
                    "thread name = "+Thread.currentThread().getName());
            try {
                //睡眠一段時間,然后才進行計算
                Thread.sleep(120000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            Log.d(TAG, "remote method add: finish sleep return calculate result");
            return num1 + num2;
        }
    };

    public IRemoteService() {
    }

    @Override
    public IBinder onBind(Intent intent) {
        return iBinder;
    }
}

注意:我們需要先把Binder服務端運行起來,然后再運行Binder客戶端執行相應的方法。

Logcat輸出

2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: ANR in com.hm.aidlclient (com.hm.aidlclient/.BaseKnowledgeActivity)
    PID: 18096
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.55 / 7.26 / 6.87
    CPU usage from 755516ms to 0ms ago (2020-06-04 15:37:07.545 to 2020-06-04 15:49:43.062) with 99% awake:
      5.1% 2001/system_server: 3.5% user + 1.5% kernel / faults: 139606 minor 17 major
      1.2% 508/logd: 0.5% user + 0.6% kernel / faults: 35 minor
      0.9% 2661/com.android.systemui: 0.7% user + 0.1% kernel / faults: 13039 minor 4 major
      0.8% 12442/adbd: 0.2% user + 0.6% kernel / faults: 23957 minor
      0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 183 minor 2 major
      0.6% 28932/com.huawei.appmarket: 0.5% user + 0.1% kernel / faults: 9311 minor 64 major
      0.4% 24463/com.huawei.hwid.persistent: 0.3% user + 0% kernel / faults: 11607 minor 6 major
      0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
      0.3% 4128/com.google.android.gms: 0.2% user + 0% kernel / faults: 26970 minor 16 major
      //...
2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: CPU usage from 1701ms to 2232ms later (2020-06-04 15:49:44.762 to 2020-06-04 15:49:45.293):
      28% 2001/system_server: 21% user + 7.2% kernel / faults: 38 minor
        16% 2010/HeapTaskDaemon: 16% user + 0% kernel
        9% 2014/ActivityManager: 1.8% user + 7.2% kernel
        1.8% 2001/system_server: 0% user + 1.8% kernel
        1.8% 2540/NetdConnector: 1.8% user + 0% kernel
      9% 607/surfaceflinger: 9% user + 0% kernel
        5.4% 607/surfaceflinger: 5.4% user + 0% kernel
        1.8% 658/Binder:607_1: 0% user + 1.8% kernel
        1.8% 677/EventThread: 0% user + 1.8% kernel
      7.1% 2661/com.android.systemui: 5.3% user + 1.7% kernel / faults: 38 minor
        8.9% 3614/RenderThread: 7.1% user + 1.7% kernel
      1.3% 508/logd: 1.3% user + 0% kernel
      1.3% 624/mm-pp-dpps: 1.3% user + 0% kernel
        2.7% 717/ABA_THREAD: 1.3% user + 1.3% kernel
      1.5% 15978/mdss_fb0: 0% user + 1.5% kernel
      1.6% 18228/logcat: 0% user + 1.6% kernel
    8.3% TOTAL: 5.8% user + 2.5% kernel

Logcat輸出的信息中并沒有什么有價值的信息。那么我們繼續看一下traces.txt文件。

traces.txt中客戶端相關信息

----- pid 18096 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlclient
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通過進程號pid 18096搜索

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=18096 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 464662186 22498334 359 ) utm=38 stm=8 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x70/0x7c
  kernel: binder_thread_read+0x4cc/0x13f0
  kernel: binder_ioctl+0x53c/0xbcc
  kernel: do_vfs_ioctl+0x570/0x5a8
  kernel: SyS_ioctl+0x60/0x88
  kernel: el0_svc_naked+0x24/0x28
  native: #00 pc 000000000006ad6c  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 000000000001fa48  /system/lib64/libc.so (ioctl+144)
  native: #02 pc 00000000000555a4  /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+260)
  native: #03 pc 0000000000056388  /system/lib64/libbinder.so (_ZN7android14IPCThreadState15waitForResponseEPNS_6ParcelEPi+352)
  native: #04 pc 000000000004b250  /system/lib64/libbinder.so (_ZN7android8BpBinder8transactEjRKNS_6ParcelEPS1_j+72)
  native: #05 pc 0000000000103354  /system/lib64/libandroid_runtime.so (???)
  native: #06 pc 0000000000b36238  /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:617)
  at com.hm.aidlserver.IMyAidlInterface$Stub$Proxy.add(IMyAidlInterface.java:90)
  at com.hm.aidlclient.BaseKnowledgeActivity.onClick(BaseKnowledgeActivity.java:109)
  at com.hm.aidlclient.BaseKnowledgeActivity_ViewBinding$1.doClick(BaseKnowledgeActivity_ViewBinding.java:41)
  at butterknife.internal.DebouncingOnClickListener.onClick(DebouncingOnClickListener.java:22)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

這里我們看到Binder客戶端主線程的狀態是Native,這個狀態是native線程的一個狀態,對應java線程的RUNNABLE狀態。更詳細的對應關系可以參考VMThread.java。然后從上面的信息中我們只看到BinderProxy調用了transactNative()方法,這是一個本地方法,最終會調用服務端Binder對象的transact()方法,實現真正的跨進程通信。除了這些我們沒有看到其他有用的信息了。那么我們接下來看一看服務端的一些信息,看看能不能找到一些線索。

traces.txt中服務端相關信息

----- pid 17773 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlserver
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通過進程號pid 17773搜索

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=17773 nice=0 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 213791882 16481247 206 ) utm=18 stm=3 core=1 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x70/0x7c
  kernel: SyS_epoll_wait+0x2d4/0x394
  kernel: SyS_epoll_pwait+0xc4/0x150
  kernel: el0_svc_naked+0x24/0x28
  native: #00 pc 000000000006ac80  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 000000000001e21c  /system/lib64/libc.so (epoll_pwait+64)
  native: #02 pc 00000000000181d8  /vendor/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+156)
  native: #03 pc 000000000001808c  /vendor/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+60)
  native: #04 pc 00000000000f66dc  /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
  native: #05 pc 0000000000b91ec0  /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:356)
  at android.os.Looper.loop(Looper.java:138)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

服務端的進程號是pid 17773,我們看到服務端的主線程中也沒有什么線索,不要慌,這里我們似乎忘了一點什么,那就是服務端的Binder對象是運行在服務端的Binder線程池中的。那我們怎么找到具體是Binder線程池中的哪個線程呢?其實在traces.txt文件中也是輸出了的。

----- binder transactions -----
18096:18096(m.hm.aidlclient:m.hm.aidlclient) -> 17773:17788(m.hm.aidlserver:Binder:17773_2) code: 1

----- end binder transactions -----

上面這段信息的意思就是,我們是在進程id為18096,內核線程為18096的線程(就是主線程)向進程id為17773,內核線id為17788的線程發起跨進程通信。內核線程id為17788的線程的線程名稱是Binder:17773_2。那么我們就搜索一下Binder:17773_2。搜索結果如下所示:

"Binder:17773_2" prio=5 tid=10 Sleeping
  | group="main" sCount=1 dsCount=0 obj=0x32c064c0 self=0x7f9a624800
  | sysTid=17788 nice=-10 cgrp=default sched=0/0 handle=0x7fa0fc3450
  | state=S schedstat=( 3077762 6086666 14 ) utm=0 stm=0 core=6 HZ=100
  | stack=0x7fa0ec9000-0x7fa0ecb000 stackSize=1005KB
  | held mutexes=
  at java.lang.Thread.sleep!(Native method)
  - sleeping on <0x05eea4a7> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:379)
  - locked <0x05eea4a7> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:321)
  at com.hm.aidlserver.IRemoteService$1.add(IRemoteService.java:18)
  at com.hm.aidlserver.IMyAidlInterface$Stub.onTransact(IMyAidlInterface.java:55)
  at android.os.Binder.execTransact(Binder.java:565)

這里我們終于發現了原因,我們看到Binder:17773_2狀態是Sleeping,就是服務端的Binder對象的add()方法內部第18行調用了Thread.sleep方法造成長時間無法返回,從而使客戶端方法執行無法結束,最終導致ANR。

binder_anr.png

總結:本篇文章列舉了幾種常見原因造成ANR的示例,并分析了相關日志和traces.txt文件。然而在真實的場景中可能還會有各種稀奇古怪的原因造成ANR,排查起來也會復雜的多,所以最重要的還是防患于未然,在實際的開發過程中盡量避免主線程被長時間阻塞。

參考鏈接:

  1. Capture and read bug reports
  2. Android進階知識:ANR的定位與解決
  3. ANR
  4. Android性能優化(七)之你真的理解ANR嗎?
  5. Android應用ANR分析
  6. VMThread.java
  7. Android Binder學習筆記(一)
  8. AIDLDemo
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 228,345評論 6 531
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,494評論 3 416
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 176,283評論 0 374
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 62,953評論 1 309
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,714評論 6 410
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,186評論 1 324
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,255評論 3 441
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,410評論 0 288
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,940評論 1 335
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,776評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 42,976評論 1 369
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,518評論 5 359
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,210評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,642評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,878評論 1 286
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,654評論 3 391
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,958評論 2 373