Android4.0之后,系統(tǒng)默認(rèn)開啟硬件加速來渲染視圖,之前,理解Android硬件加速的小白文簡單的講述了硬件加速的簡單模型,不過主要針對前半階段,并沒怎么說是如何使用OpenGL、GPU處理數(shù)據(jù)的,OpenGL主要處理的任務(wù)有Surface的composition及圖形圖像的渲染,本篇文章簡單說一下后半部分的模型,這部分對于理解View渲染也有不少幫助,也能更好的幫助理解GPU渲染玄學(xué)曲線。
不過這里有個概念要先弄清,OpenGL僅僅是提供標(biāo)準(zhǔn)的API及調(diào)用規(guī)則,在不同的硬件平臺上有不同的實現(xiàn),比如驅(qū)動等,這部分代碼一般是不開源,本文主要基于Android libagl(6.0),它是Android中通過軟件方法實現(xiàn)的一套OpenGL動態(tài)庫,并結(jié)合Systrace真機上的調(diào)用棧,對比兩者區(qū)別(GPU廠商提供的硬件實現(xiàn)的OpenGL),猜測libhgl(硬件OpenGL)的實現(xiàn)。對于Android APP而言,基于GPU的硬件加速繪制可以分為如下幾個階段:
- 第一階段:APP在UI線程構(gòu)建OpenGL渲染需要的命令及數(shù)據(jù)
- 第二階段:CPU將數(shù)據(jù)上傳(共享或者拷貝)給GPU,PC上一般有顯存一說,但是ARM這種嵌入式設(shè)備內(nèi)存一般是GPU、CPU共享內(nèi)存
- 第三階段:通知GPU渲染,一般而言,真機不會阻塞等待GPU渲染結(jié)束,效率低,CPU通知結(jié)束后就返回繼續(xù)執(zhí)行其他任務(wù),當(dāng)然,理論上也可以阻塞執(zhí)行,glFinish就能滿足這樣的需求(不同GPU廠商實現(xiàn)不同,Android源碼自帶的是軟件實現(xiàn)的,只具有參考意義)(Fence機制輔助GPU CPU同步)
- 第四階段:swapBuffers,并通知SurfaceFlinger圖層合成
- 第五階段:SurfaceFlinger開始合成圖層,如果之前提交的GPU渲染任務(wù)沒結(jié)束,則等待GPU渲染完成,再合成(Fence機制),合成依然是依賴GPU,不過這就是下一個任務(wù)了
第一個階段,其實主要做的就是構(gòu)建DrawOp樹(里面封裝OpenGL渲染命令),同時,預(yù)處理分組一些相似命令,以便提高GPU處理效率,這個階段主要是CPU在工作,不過這個階段前期運行在UI線程,后期部分運行在RenderThread(渲染線程),第二個階段主要運行在渲染線程,CPU將數(shù)據(jù)同步(共享)給GPU,之后,通知GPU進行渲染,不過這里需要注意的是,CPU一般不會阻塞等待GPU渲染完畢,而是通知結(jié)束后就返回,除非GPU非常繁忙,來不及響應(yīng)CPU的請求,沒有給CPU發(fā)送通知,CPU才會阻塞等待。CPU返回后,會直接將GraphicBuffer提交給SurfaceFlinger,告訴SurfaceFlinger進行合成,但是這個時候GPU可能并未完成圖像的渲染,這個時候就牽扯到一個同步,Android中,這里用的是Fence機制,SurfaceFlinger合成前會查詢這個Fence,如果GPU渲染沒有結(jié)束,則等待GPU渲染結(jié)束,GPU結(jié)束后,會通知SurfaceFlinger進行合成,SF合成后,提交顯示,如此完成圖像的渲染顯示,簡單畫下示意圖:
之前已經(jīng)簡單分析過DrawOp樹的構(gòu)建,優(yōu)化,本文主要是分析GPU如何完成OpenGL渲染,這個過程主要在Render線程,通過OpenGL API通知GPU處理渲染任務(wù)。
Android OpenGL環(huán)境的初始化
一般在使用OpenGL的時候,首先需要獲取OpenGL相應(yīng)的配置,再為其構(gòu)建渲染環(huán)境,比如必須創(chuàng)建OpenGL上下文(Context),上下文可以看做是OpenGL的化身,沒有上下文就沒有OpenGL環(huán)境,同時還要構(gòu)建一個用于繪圖的畫布GlSurface,在Android中抽象出來就是EglContext與EglSurface,示例如下:
private void initGL() {
mEgl = (EGL10) EGLContext.getEGL();
<!--獲取display顯示目標(biāo)-->
mEglDisplay = mEgl.eglGetDisplay(EGL10.EGL_DEFAULT_DISPLAY);
<!--構(gòu)建配置-->
mEglConfig = chooseEglConfig();
...<!--構(gòu)建上下文-->
mEglContext = createContext(mEgl, mEglDisplay, mEglConfig);
...<!--構(gòu)建繪圖Surface-->
mEglSurface = mEgl.eglCreateWindowSurface(mEglDisplay, mEglConfig, mSurface, null);
}
Android系統(tǒng)中,APP端如何為每個窗口配置OpenGL環(huán)境的,在一個窗口被添加到窗口的時候會調(diào)用其ViewRootImpl對象的setView:
public void setView(View view, WindowManager.LayoutParams attrs, View panelParentView) {
synchronized (this) {
...
enableHardwareAcceleration(attrs);
}
setView會調(diào)用enableHardwareAcceleration,配置OpenGL的硬件加速環(huán)境:
private void enableHardwareAcceleration(WindowManager.LayoutParams attrs) {
mAttachInfo.mHardwareAccelerated = false;
mAttachInfo.mHardwareAccelerationRequested = false;
...
final boolean hardwareAccelerated =
(attrs.flags & WindowManager.LayoutParams.FLAG_HARDWARE_ACCELERATED) != 0;
if (hardwareAccelerated) {
<!--可以開啟硬件加速 ,一般都是true-->
if (!HardwareRenderer.isAvailable()) {
return;
}
...
<!--創(chuàng)建硬件加速環(huán)境-->
mAttachInfo.mHardwareRenderer = HardwareRenderer.create(mContext, translucent);
if (mAttachInfo.mHardwareRenderer != null) {
mAttachInfo.mHardwareRenderer.setName(attrs.getTitle().toString());
mAttachInfo.mHardwareAccelerated =
mAttachInfo.mHardwareAccelerationRequested = true;
}
}
}
}
Android中每個顯示的Window(Activity、Dialog、PopupWindow等)都對應(yīng)一個ViewRootImpl對象,也會對應(yīng)一個AttachInfo對象,之后通過
HardwareRenderer.create(mContext, translucent);
創(chuàng)建的HardwareRenderer對象就被保存在ViewRootImpl的AttachInfo中,跟Window是一對一的關(guān)系,通過HardwareRenderer.create(mContext, translucent)創(chuàng)建硬件加速環(huán)境后,在需要draw繪制的時候,通過:
mAttachInfo.mHardwareRenderer.draw(mView, mAttachInfo, this);
進一步渲染?;剡^頭,接著看APP如何初始化硬件加速環(huán)境:直觀上說,主要是構(gòu)建OpenGLContext、EglSurface、RenderThread(如果沒啟動的話)。
static HardwareRenderer create(Context context, boolean translucent) {
HardwareRenderer renderer = null;
if (DisplayListCanvas.isAvailable()) {
renderer = new ThreadedRenderer(context, translucent);
}
return renderer;
}
ThreadedRenderer(Context context, boolean translucent) {
final TypedArray a = context.obtainStyledAttributes(null, R.styleable.Lighting, 0, 0);
...
<!--創(chuàng)建rootnode-->
long rootNodePtr = nCreateRootRenderNode();
mRootNode = RenderNode.adopt(rootNodePtr);
<!--創(chuàng)建native ThreadProxy-->
mNativeProxy = nCreateProxy(translucent, rootNodePtr);
<!--初始化AssetAtlas,本文不分析-->
ProcessInitializer.sInstance.init(context, mNativeProxy);
...
}
之前分析過,nCreateRootRenderNode 為ViewRootimpl創(chuàng)建一個root RenderNode,UI線程通過遞歸mRootNode,可以構(gòu)建ViewTree所有的OpenGL繪制命令及數(shù)據(jù),nCreateProxy會為當(dāng)前widow創(chuàng)建一個ThreadProxy ,ThreadProxy則主要用來向RenderThread線程提交一些OpenGL相關(guān)任務(wù),比如初始化,繪制、更新等:
class ANDROID_API RenderProxy {
public:
ANDROID_API RenderProxy(bool translucent, RenderNode* rootNode, IContextFactory* contextFactory);
ANDROID_API virtual ~RenderProxy();
...
ANDROID_API bool initialize(const sp<ANativeWindow>& window);
...
ANDROID_API int syncAndDrawFrame();
...
ANDROID_API DeferredLayerUpdater* createTextureLayer();
ANDROID_API void buildLayer(RenderNode* node);
ANDROID_API bool copyLayerInto(DeferredLayerUpdater* layer, SkBitmap& bitmap);
...
ANDROID_API void fence();
...
void destroyContext();
void post(RenderTask* task);
void* postAndWait(MethodInvokeRenderTask* task);
...
};
RenderProxy的在創(chuàng)建之初會做什么?其實主要兩件事,第一:如果RenderThread未啟動,則啟動它,第二:向RenderThread提交第一個Task--為當(dāng)前窗口創(chuàng)建CanvasContext,CanvasContext有點EglContext的意味,所有的繪制命令都會通過CanvasContext進行中轉(zhuǎn):
RenderProxy::RenderProxy(bool translucent, RenderNode* rootRenderNode, IContextFactory* contextFactory)
: mRenderThread(RenderThread::getInstance())
, mContext(nullptr) {
<!--創(chuàng)建CanvasContext-->
SETUP_TASK(createContext);
args->translucent = translucent;
args->rootRenderNode = rootRenderNode;
args->thread = &mRenderThread;
args->contextFactory = contextFactory;
mContext = (CanvasContext*) postAndWait(task);
<!--初始化DrawFrameTask-->
mDrawFrameTask.setContext(&mRenderThread, mContext);
}
從其構(gòu)造函數(shù)中可以看出,OpenGL Render線程是一個單例,同一個進程只有一個RenderThread,RenderProxy 通過mRenderThread引用該單例,將來需要提交任務(wù)的時候,直接通過該引用向RenderThread的Queue中插入消息,而RenderThread主要負(fù)責(zé)從Queue取出消息,并執(zhí)行,比如將OpenGL命令issue提交給GPU,并通知GPU渲染。在Android Profile的CPU工具中可以清楚的看到該線程的存在(沒有顯示任務(wù)的進程是沒有的:
簡單看下RenderThread()這個單例線程的創(chuàng)建與啟動,
RenderThread::RenderThread() : Thread(true), Singleton<RenderThread>()
, mNextWakeup(LLONG_MAX)
, mDisplayEventReceiver(nullptr)
, mVsyncRequested(false)
, mFrameCallbackTaskPending(false)
, mFrameCallbackTask(nullptr)
, mRenderState(nullptr)
, mEglManager(nullptr) {
Properties::load();
mFrameCallbackTask = new DispatchFrameCallbacks(this);
mLooper = new Looper(false);
run("RenderThread");
}
RenderThread會維護一個MessageQuene,并通過loop的方式讀取消息,執(zhí)行,RenderThread在啟動之前,會為OpenGL創(chuàng)建EglManager、RenderState、VSync信號接收器(這個主要為了動畫)等OpenGL渲染需要工具組件,之后啟動該線程進入loop:
bool RenderThread::threadLoop() {
<!--初始化-->
setpriority(PRIO_PROCESS, 0, PRIORITY_DISPLAY);
initThreadLocals();
int timeoutMillis = -1;
for (;;) {
<!--等待消息隊列不為空-->
int result = mLooper->pollOnce(timeoutMillis);
nsecs_t nextWakeup;
// Process our queue, if we have anything
<!--獲取消息并執(zhí)行-->
while (RenderTask* task = nextTask(&nextWakeup)) {
task->run();
}
...
return false;}
初始化,主要是創(chuàng)建EglContext中必須的一些組件,到這里其實都是工具的創(chuàng)建,基本上還沒構(gòu)建OpenGL需要的任何實質(zhì)性的東西
void RenderThread::initThreadLocals() {
sp<IBinder> dtoken(SurfaceComposerClient::getBuiltInDisplay(
ISurfaceComposer::eDisplayIdMain));
status_t status = SurfaceComposerClient::getDisplayInfo(dtoken, &mDisplayInfo);
nsecs_t frameIntervalNanos = static_cast<nsecs_t>(1000000000 / mDisplayInfo.fps);
mTimeLord.setFrameInterval(frameIntervalNanos);
<!--初始化vsync接收器-->
initializeDisplayEventReceiver();
<!--管家-->
mEglManager = new EglManager(*this);
<!--狀態(tài)機-->
mRenderState = new RenderState(*this);
<!--debug分析工具-->
mJankTracker = new JankTracker(frameIntervalNanos);
}
Android5.0之后,有些動畫是可以完全在RenderThread完成的,這個時候render渲染線程需要接受Vsync,等信號到來后,回調(diào)RenderThread::displayEventReceiverCallback,計算當(dāng)前動畫狀態(tài),最后調(diào)用doFrame繪制當(dāng)前動畫幀(不詳述),有時間可以看下Vsync機制
void RenderThread::initializeDisplayEventReceiver() {
mDisplayEventReceiver = new DisplayEventReceiver();
status_t status = mDisplayEventReceiver->initCheck();
mLooper->addFd(mDisplayEventReceiver->getFd(), 0,
Looper::EVENT_INPUT, RenderThread::displayEventReceiverCallback, this);
}
其次RenderThread需要new一個EglManager及RenderState,兩者跟上面的DisplayEventReceiver都從屬RenderThread,因此在一個進程中,也是單例的
EglManager::EglManager(RenderThread& thread)
: mRenderThread(thread)
, mEglDisplay(EGL_NO_DISPLAY)
, mEglConfig(nullptr)
, mEglContext(EGL_NO_CONTEXT)
, mPBufferSurface(EGL_NO_SURFACE)
, mAllowPreserveBuffer(load_dirty_regions_property())
, mCurrentSurface(EGL_NO_SURFACE)
, mAtlasMap(nullptr)
, mAtlasMapSize(0) {
mCanSetPreserveBuffer = mAllowPreserveBuffer;
}
EglManager主要作用是管理OpenGL上下文,比如創(chuàng)建EglSurface、指定當(dāng)前操作的Surface、swapBuffers等,主要負(fù)責(zé)場景及節(jié)點的管理工作:
class EglManager {
public:
// Returns true on success, false on failure
void initialize();
EGLSurface createSurface(EGLNativeWindowType window);
void destroySurface(EGLSurface surface);
bool isCurrent(EGLSurface surface) { return mCurrentSurface == surface; }
// Returns true if the current surface changed, false if it was already current
bool makeCurrent(EGLSurface surface, EGLint* errOut = nullptr);
void beginFrame(EGLSurface surface, EGLint* width, EGLint* height);
bool swapBuffers(EGLSurface surface, const SkRect& dirty, EGLint width, EGLint height);
// Returns true iff the surface is now preserving buffers.
bool setPreserveBuffer(EGLSurface surface, bool preserve);
void setTextureAtlas(const sp<GraphicBuffer>& buffer, int64_t* map, size_t mapSize);
void fence();
private:
friend class RenderThread;
EglManager(RenderThread& thread);
// EglContext is never destroyed, method is purposely not implemented
~EglManager();
void createPBufferSurface();
void loadConfig();
void createContext();
void initAtlas();
RenderThread& mRenderThread;
EGLDisplay mEglDisplay;
EGLConfig mEglConfig;
EGLContext mEglContext;
EGLSurface mPBufferSurface;
,,
};
而RenderState可以看做是OpenGL狀態(tài)機的具體呈現(xiàn),真正負(fù)責(zé)OpenGL的渲染狀態(tài)的維護及渲染命令的issue
RenderState::RenderState(renderthread::RenderThread& thread)
: mRenderThread(thread)
, mViewportWidth(0)
, mViewportHeight(0)
, mFramebuffer(0) {
mThreadId = pthread_self();
}
在RenderProxy創(chuàng)建之初,插入到的第一條消息就是SETUP_TASK(createContext),構(gòu)建CanvasContext ,它可以看做OpenGL的Context及Surface的封裝,
CREATE_BRIDGE4(createContext, RenderThread* thread, bool translucent,
RenderNode* rootRenderNode, IContextFactory* contextFactory) {
return new CanvasContext(*args->thread, args->translucent,
args->rootRenderNode, args->contextFactory);
}
可以看到,CanvasContext同時握有RenderThread、EglManager、RootRenderNode等,它可以看做Android中OpenGL上下文,是上層渲染API的入口
CanvasContext::CanvasContext(RenderThread& thread, bool translucent,
RenderNode* rootRenderNode, IContextFactory* contextFactory)
: mRenderThread(thread)
, mEglManager(thread.eglManager())
, mOpaque(!translucent)
, mAnimationContext(contextFactory->createAnimationContext(mRenderThread.timeLord()))
, mRootRenderNode(rootRenderNode)
, mJankTracker(thread.timeLord().frameIntervalNanos())
, mProfiler(mFrames) {
mRenderThread.renderState().registerCanvasContext(this);
mProfiler.setDensity(mRenderThread.mainDisplayInfo().density);
}
其實到這里初始化完成了一般,另一半是在draw的時候,進行的也就是ThreadRender的initialize,畢竟,如果不需要繪制,是不需要初始化OpenGL環(huán)境的,省的浪費資源:
private void performTraversals() {
...
if (mAttachInfo.mHardwareRenderer != null) {
try {
hwInitialized = mAttachInfo.mHardwareRenderer.initialize(mSurface);
這里的mSurface其實是已經(jīng)被WMS填充處理過的一個Surface,它在native層對應(yīng)一個ANativeWindow(其實就是個native的Surface),隨著RenderProxy的initial的初始化,EglContext跟EglSurface會被進一步創(chuàng)建,需要注意的是這里的initialize任務(wù)是在Render線程,OpenGL的相關(guān)操作都必須在Render線程:
CREATE_BRIDGE2(initialize, CanvasContext* context, ANativeWindow* window) {
return (void*) args->context->initialize(args->window);
}
bool RenderProxy::initialize(const sp<ANativeWindow>& window) {
SETUP_TASK(initialize);
args->context = mContext;
args->window = window.get();
return (bool) postAndWait(task);
}
bool CanvasContext::initialize(ANativeWindow* window) {
setSurface(window);
if (mCanvas) return false;
mCanvas = new OpenGLRenderer(mRenderThread.renderState());
mCanvas->initProperties();
return true;
}
這里傳入的ANativeWindow* window其實就是native的Surface,CanvasContext在初始化的時候,會通過setSurface為OpenGL創(chuàng)建E關(guān)聯(lián)Con小text、EglSurface畫布,同時會為當(dāng)前窗口創(chuàng)建一個OpenGLRenderer,OpenGLRenderer主要用來處理之前構(gòu)建的DrawOp,輸出對應(yīng)的OpenGL命令。
void CanvasContext::setSurface(ANativeWindow* window) {
mNativeWindow = window;
<!--創(chuàng)建EglSurface畫布-->
if (window) {
mEglSurface = mEglManager.createSurface(window);
}
if (mEglSurface != EGL_NO_SURFACE) {
const bool preserveBuffer = (mSwapBehavior != kSwap_discardBuffer);
mBufferPreserved = mEglManager.setPreserveBuffer(mEglSurface, preserveBuffer);
mHaveNewSurface = true;
<!--綁定上下文-->
makeCurrent();
}}
EGLSurface EglManager::createSurface(EGLNativeWindowType window) {
<!--構(gòu)建EglContext-->
initialize();
<!--創(chuàng)建EglSurface-->
EGLSurface surface = eglCreateWindowSurface(mEglDisplay, mEglConfig, window, nullptr);
return surface;
}
void EglManager::initialize() {
if (hasEglContext()) return;
mEglDisplay = eglGetDisplay(EGL_DEFAULT_DISPLAY);
loadConfig();
createContext();
createPBufferSurface();
makeCurrent(mPBufferSurface);
mRenderThread.renderState().onGLContextCreated();
initAtlas();
}
void EglManager::createContext() {
EGLint attribs[] = { EGL_CONTEXT_CLIENT_VERSION, GLES_VERSION, EGL_NONE };
mEglContext = eglCreateContext(mEglDisplay, mEglConfig, EGL_NO_CONTEXT, attribs);
LOG_ALWAYS_FATAL_IF(mEglContext == EGL_NO_CONTEXT,
"Failed to create context, error = %s", egl_error_str());
}
EglManager::initialize()之后EglContext、Config全都有了,之后通過eglCreateWindowSurface創(chuàng)建EglSurface,這里先調(diào)用eglApi.cpp 的eglCreateWindowSurface
EGLSurface eglCreateWindowSurface( EGLDisplay dpy, EGLConfig config,
NativeWindowType window,
const EGLint *attrib_list) {
<!--配置-->
int result = native_window_api_connect(window, NATIVE_WINDOW_API_EGL);
<!--Android源碼中,其實是調(diào)用egl.cpp的eglCreateWindowSurface,不過這一塊軟件模擬的跟真實硬件的應(yīng)該差別不多-->
// Eglsurface里面是有Surface的引用的,同時swap的時候,是能通知consumer的
EGLSurface surface = cnx->egl.eglCreateWindowSurface(
iDpy, config, window, attrib_list);
... }
egl.cpp其實是軟件模擬的GPU實現(xiàn)庫,不過這里的eglCreateWindowSurface邏輯其實跟真實GPU平臺的代碼差別不大,因為只是抽象邏輯:
static EGLSurface createWindowSurface(EGLDisplay dpy, EGLConfig config,
NativeWindowType window, const EGLint* /*attrib_list*/)
{
...
egl_surface_t* surface;
<!--其實返回的就是egl_window_surface_v2_t-->
surface = new egl_window_surface_v2_t(dpy, config, depthFormat,
static_cast<ANativeWindow*>(window));
.. return surface;
}
從上面代碼可以看出,其實就是new了一個egl_window_surface_v2_t,它內(nèi)部封裝了一個ANativeWindow,由于EGLSurface是一個Void* 類型指針,因此egl_window_surface_v2_t型指針可以直接賦值給它,到這里初始化環(huán)境結(jié)束,OpenGL需要的渲染環(huán)境已經(jīng)搭建完畢,等到View需要顯示或者更新的時候,就會接著調(diào)用VieWrootImpl的draw去更新,注意這里,一個Render線程,默認(rèn)一個EglContext,但是可以有多個EglSurface,用eglMakeCurrent切換綁定即可。也就是一個Window對應(yīng)一個ViewRootImpl->一個AttachInfo->ThreadRender對象->ThreadProxy(RootRenderNode)->CanvasContext.cpp(DrawFrameTask、EglManager(單例復(fù)用)、EglSurface)->->RenderThread(單例復(fù)用),對于APP而言,一般只會維持一個OpenGL渲染線程,當(dāng)然,你也可以自己new一個獨立的渲染線程,主動調(diào)用OpenGL API。簡答類圖如下
上面工作結(jié)束后,OpenGL渲染環(huán)境就已經(jīng)準(zhǔn)備好,或者說RenderThread這個渲染線程已經(jīng)配置好了渲染環(huán)境,接下來,UI線程像渲染線程發(fā)送渲染任務(wù)就行了。
Android OpenGL GPU 渲染
之前分析理解Android硬件加速的小白文的時候,已經(jīng)分析過,ViewRootImpl的draw是入口,會調(diào)用HardwareRender的draw,先構(gòu)建DrawOp樹,然后合并優(yōu)化DrawOp,之后issue OpenGL命令到GPU,其中構(gòu)建DrawOp的任務(wù)在UI線程,后面的任務(wù)都在Render線程
@Override
void draw(View view, AttachInfo attachInfo, HardwareDrawCallbacks callbacks) {
<!--構(gòu)建DrawOp Tree UI線程-->
updateRootDisplayList(view, callbacks);
<!--渲染 提交任務(wù)到render線程-->
int syncResult = nSyncAndDrawFrame(mNativeProxy, frameInfo, frameInfo.length);
...
}
如上面代碼所說updateRootDisplayList構(gòu)建DrawOp樹在UI線程,nSyncAndDrawFrame提交渲染任務(wù)到渲染線程,之前已經(jīng)分析過構(gòu)建流程,nSyncAndDrawFrame也簡單分析了一些合并等操作,下面接著之前流程分析如何將OpenGL命令issue到GPU,這里有個同步問題,可能牽扯到UI線程的阻塞,先分析下同步
SyncAndDrawFrame 同步
static int android_view_ThreadedRenderer_syncAndDrawFrame(JNIEnv* env, jobject clazz,
jlong proxyPtr, jlongArray frameInfo, jint frameInfoSize) {
RenderProxy* proxy = reinterpret_cast<RenderProxy*>(proxyPtr);
env->GetLongArrayRegion(frameInfo, 0, frameInfoSize, proxy->frameInfo());
return proxy->syncAndDrawFrame();
}
int DrawFrameTask::drawFrame() {
mSyncResult = kSync_OK;
mSyncQueued = systemTime(CLOCK_MONOTONIC);
postAndWait();
return mSyncResult;
}
void DrawFrameTask::postAndWait() {
AutoMutex _lock(mLock);
mRenderThread->queue(this);
<!--阻塞等待,同步資源-->
mSignal.wait(mLock);
}
void DrawFrameTask::run() {
bool canUnblockUiThread;
bool canDrawThisFrame;
{
TreeInfo info(TreeInfo::MODE_FULL, mRenderThread->renderState());
<!--同步操作,其實就是同步Java跟native中的構(gòu)建DrawOp Tree、圖層、圖像資源-->
canUnblockUiThread = syncFrameState(info);
canDrawThisFrame = info.out.canDrawThisFrame;
}
// Grab a copy of everything we need
CanvasContext* context = mContext;
<!--如果同步完成,則可以返回-->
if (canUnblockUiThread) {
unblockUiThread();
}
<!--繪制,提交OpenGL命令道GPU-->
if (CC_LIKELY(canDrawThisFrame)) {
context->draw();
}
<!--看看是否之前因為同步問題阻塞了UI線程,如果阻塞了,需要喚醒-->
if (!canUnblockUiThread) {
unblockUiThread();
}
}
其實就是調(diào)用RenderProxy的syncAndDrawFrame,將DrawFrameTask插入RenderThread,并且阻塞等待RenderThread跟UI線程同步,如果同步成功,則UI線程喚醒,否則UI線程阻塞等待直到Render線程完成OpenGL命令的issue完畢。同步結(jié)束后,之后RenderThread才會開始處理GPU渲染相關(guān)工作,先看下同步:
bool DrawFrameTask::syncFrameState(TreeInfo& info) {
int64_t vsync = mFrameInfo[static_cast<int>(FrameInfoIndex::Vsync)];
mRenderThread->timeLord().vsyncReceived(vsync);
mContext->makeCurrent();
Caches::getInstance().textureCache.resetMarkInUse(mContext);
<!--關(guān)鍵點1,TextureView類處理,主要牽扯紋理-->
for (size_t i = 0; i < mLayers.size(); i++) {
// 更新Layer 這里牽扯到圖層數(shù)據(jù)的處理,可能還有拷貝,
mContext->processLayerUpdate(mLayers[i].get());
}
mLayers.clear();
<!--關(guān)鍵點2 同步DrawOp Tree -->
mContext->prepareTree(info, mFrameInfo, mSyncQueued);
...
// If prepareTextures is false, we ran out of texture cache space
return info.prepareTextures;
}
當(dāng)Window中的TextureView(目前只考慮系統(tǒng)API,好像就這么一個View,自定義除外)有更新時,需要從TextureView的SurfaceTexture中讀取圖形緩沖區(qū),并且封裝綁定成Open GL紋理,供GPU繪制使用,這里不詳述,將來有機會分析TexutureView的時候再分析。第二步,是將UI線程中構(gòu)建的DrawOpTree等信息同步到Render Thread中,因為之前通過ViewRootImpl再Java層調(diào)用構(gòu)建的DisplayListData還沒被真正賦值到RenderNode的mDisplayListData(最終用到的對象),只是被setStagingDisplayList暫存,因為中間可能有那種多次meausre、layout的,還有可能發(fā)生改變,暫存邏輯如下:
static void android_view_RenderNode_setDisplayListData(JNIEnv* env,
jobject clazz, jlong renderNodePtr, jlong newDataPtr) {
RenderNode* renderNode = reinterpret_cast<RenderNode*>(renderNodePtr);
DisplayListData* newData = reinterpret_cast<DisplayListData*>(newDataPtr);
renderNode->setStagingDisplayList(newData);
}
void RenderNode::setStagingDisplayList(DisplayListData* data) {
mNeedsDisplayListDataSync = true;
delete mStagingDisplayListData;
mStagingDisplayListData = data;
}
View的DrawOpTree同步
void CanvasContext::prepareTree(TreeInfo& info, int64_t* uiFrameInfo, int64_t syncQueued) {
mRenderThread.removeFrameCallback(this);
if (!wasSkipped(mCurrentFrameInfo)) {
mCurrentFrameInfo = &mFrames.next();
}
<!--同步Java層測繪信息到native,OpenGL玄學(xué)曲線的來源-->
mCurrentFrameInfo->importUiThreadInfo(uiFrameInfo);
mCurrentFrameInfo->set(FrameInfoIndex::SyncQueued) = syncQueued;
<!--一個計時節(jié)點-->
mCurrentFrameInfo->markSyncStart();
info.damageAccumulator = &mDamageAccumulator;
info.renderer = mCanvas;
info.canvasContext = this;
mAnimationContext->startFrame(info.mode);
// mRootRenderNode遞歸遍歷所有節(jié)點
mRootRenderNode->prepareTree(info);
...
通過遞歸遍歷,mRootRenderNode可以檢查所有的節(jié)點,
void RenderNode::prepareTree(TreeInfo& info) {
bool functorsNeedLayer = Properties::debugOverdraw;
prepareTreeImpl(info, functorsNeedLayer);
}
void RenderNode::prepareTreeImpl(TreeInfo& info, bool functorsNeedLayer) {
info.damageAccumulator->pushTransform(this);
if (info.mode == TreeInfo::MODE_FULL) {
// 同步屬性
pushStagingPropertiesChanges(info);
}
// layer
prepareLayer(info, animatorDirtyMask);
<!--同步DrawOpTree-->
if (info.mode == TreeInfo::MODE_FULL) {
pushStagingDisplayListChanges(info);
}
<!--遞歸處理子View-->
prepareSubTree(info, childFunctorsNeedLayer, mDisplayListData);
// push
pushLayerUpdate(info);
info.damageAccumulator->popTransform();
}
到這里同步的時候,基本就是最終結(jié)果,只要把mStagingDisplayListData賦值到mDisplayListData即可,
void RenderNode::pushStagingDisplayListChanges(TreeInfo& info) {
if (mNeedsDisplayListDataSync) {
mNeedsDisplayListDataSync = false;
...
mDisplayListData = mStagingDisplayListData;
mStagingDisplayListData = nullptr;
if (mDisplayListData) {
for (size_t i = 0; i < mDisplayListData->functors.size(); i++) {
(*mDisplayListData->functors[i])(DrawGlInfo::kModeSync, nullptr);
}
}
damageSelf(info);
}
}
之后通過遞歸遍歷子View,便能夠完成完成所有View的RenderNode的同步。
void RenderNode::prepareSubTree(TreeInfo& info, bool functorsNeedLayer, DisplayListData* subtree) {
if (subtree) {
TextureCache& cache = Caches::getInstance().textureCache;
info.out.hasFunctors |= subtree->functors.size();
<!--吧RenderNode用到的bitmap封裝成紋理-->
for (size_t i = 0; info.prepareTextures && i < subtree->bitmapResources.size(); i++) {
info.prepareTextures = cache.prefetchAndMarkInUse(
info.canvasContext, subtree->bitmapResources[i]);
}
<!--遞歸子View-->
for (size_t i = 0; i < subtree->children().size(); i++) {
...
childNode->prepareTreeImpl(info, childFunctorsNeedLayer);
info.damageAccumulator->popTransform();
}
}
}
當(dāng)DrawFrameTask::syncFrameState返回值(其實是TreeInfo的prepareTextures,這里主要是針對Bitmap的處理)為true時,表示同步完成,可以立刻喚醒UI線程,但是如果返回false,則就意UI中的數(shù)據(jù)沒完全傳輸給GPU,這個情況下UI線程需要等待, 源碼中有句注釋 If prepareTextures is false, we ran out of texture cache space,其實就是說一個應(yīng)用程序進程可以創(chuàng)建的Open GL紋理是有大小限制的,如果超出這個限制,紋理就會同步失敗,看6.0代碼,這個限制有Bitmap自身大小的限制,還有整體可用內(nèi)存的限制,看代碼中的限制
Texture* TextureCache::getCachedTexture(const SkBitmap* bitmap, AtlasUsageType atlasUsageType) {
if (CC_LIKELY(mAssetAtlas != nullptr) && atlasUsageType == AtlasUsageType::Use) {
AssetAtlas::Entry* entry = mAssetAtlas->getEntry(bitmap);
if (CC_UNLIKELY(entry)) {
return entry->texture;
}
}
Texture* texture = mCache.get(bitmap->pixelRef()->getStableID());
// 沒找到情況下
if (!texture) {
// 判斷單個限制
if (!canMakeTextureFromBitmap(bitmap)) {
return nullptr;
}
const uint32_t size = bitmap->rowBytes() * bitmap->height();
//
bool canCache = size < mMaxSize;
// Don't even try to cache a bitmap that's bigger than the cache
// 剔除Lru算法中老的,不再用的,如果能夠挪出空間,那就算成功,否則失敗
while (canCache && mSize + size > mMaxSize) {
Texture* oldest = mCache.peekOldestValue();
if (oldest && !oldest->isInUse) {
mCache.removeOldest();
} else {
canCache = false;
}
}
// 如果能緩存,就新建一個Texture
if (canCache) {
texture = new Texture(Caches::getInstance());
texture->bitmapSize = size;
generateTexture(bitmap, texture, false);
mSize += size;
TEXTURE_LOGD("TextureCache::get: create texture(%p): name, size, mSize = %d, %d, %d",
bitmap, texture->id, size, mSize);
if (mDebugEnabled) {
ALOGD("Texture created, size = %d", size);
}
mCache.put(bitmap->pixelRef()->getStableID(), texture);
}
} else if (!texture->isInUse && bitmap->getGenerationID() != texture->generation) {
// Texture was in the cache but is dirty, re-upload
// TODO: Re-adjust the cache size if the bitmap's dimensions have changed
generateTexture(bitmap, texture, true);
}
return texture;
}
先看單個Bitmap限制:
bool TextureCache::canMakeTextureFromBitmap(const SkBitmap* bitmap) {
if (bitmap->width() > mMaxTextureSize || bitmap->height() > mMaxTextureSize) {
ALOGW("Bitmap too large to be uploaded into a texture (%dx%d, max=%dx%d)",
bitmap->width(), bitmap->height(), mMaxTextureSize, mMaxTextureSize);
return false;
}
return true;
}
單個Bitmap大小限制基本上定義:
#define GL_MAX_TEXTURE_SIZE 0x0D33
如果bitmap的寬高超過這個值,可能就會同步失敗,再看第二個原因:超過能夠Cache紋理總和上限:
#define DEFAULT_TEXTURE_CACHE_SIZE 24.0f 這里是24M
如果空間足夠,則直接新建一個Texture,如果不夠,則根據(jù)Lru算法 ,剔除老的不再使用的Textrue,剔除后的空間如果夠,則新建Texture,否則按失敗處理,這里雖然說得是GPU Cache,其實還是在同一個內(nèi)存中,歸CPU管理的,不過由于對GPU不是太了解,不知道這個數(shù)值是不是跟GPU有關(guān)系,紋理在需要新建的前提下:
void TextureCache::generateTexture(const SkBitmap* bitmap, Texture* texture, bool regenerate) {
SkAutoLockPixels alp(*bitmap);
<!--glGenTextures新建紋理-->
if (!regenerate) {
glGenTextures(1, &texture->id);
}
texture->generation = bitmap->getGenerationID();
texture->width = bitmap->width();
texture->height = bitmap->height();
<!--綁定紋理-->
Caches::getInstance().textureState().bindTexture(texture->id);
switch (bitmap->colorType()) {
...
case kN32_SkColorType:
// 32位 RGBA 或者BGREA resize第一次都是true,因為一開始寬高肯定不一致
uploadToTexture(resize, GL_RGBA, bitmap->rowBytesAsPixels(), bitmap->bytesPerPixel(),
texture->width, texture->height, GL_UNSIGNED_BYTE, bitmap->getPixels());
...
}
上面代碼主要是新建紋理,然后為紋理綁定紋理圖片資源,綁定資源代碼如下:
void TextureCache::uploadToTexture(bool resize, GLenum format, GLsizei stride, GLsizei bpp,
GLsizei width, GLsizei height, GLenum type, const GLvoid * data) {
glPixelStorei(GL_UNPACK_ALIGNMENT, bpp);
const bool useStride = stride != width
&& Caches::getInstance().extensions().hasUnpackRowLength();
...
if (resize) {
glTexImage2D(GL_TEXTURE_2D, 0, format, width, height, 0, format, type, temp);
} else {
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, format, type, temp);
}
關(guān)鍵就是調(diào)用glTexImage2D將紋理圖片跟紋理綁定,OpenGL的glTexImage2D一般會再次拷貝一次圖片,之后,Bitmap就可以釋放了,到這里就完成了紋理的上傳這部分成功了,就算同步成功,UI線程可以不再阻塞。那么為什么同步失敗的時候,CPU需要等待呢?我是這么理解的:如果說正常緩存了,調(diào)用glTexImage2D完成了一次數(shù)據(jù)的轉(zhuǎn)移與備份,那么UI線程就不需要維持這份Bitmap對應(yīng)的數(shù)據(jù)了,但是如果失敗,沒有為GPU生成備份,那就要保留這份數(shù)據(jù),直到調(diào)用glTexImage2D為其生成備份。那為什么不把緩存調(diào)整很大呢?可能是在內(nèi)存跟性能之間做的一個平衡,如果很大,可能同一時刻為GPU緩存的Bitmap太大,但是這個時候,GPU并沒有用的到,可能是GPU太忙,來不及處理,那么這部分內(nèi)存其實是浪費掉的,而且,這個時候CPU明顯比GPU快了很多,可以適當(dāng)讓CPU等等,有的解析說防止Bitmap被修改,說實話,我也沒太明白,只是個人理解,歡迎糾正,不過這里就算緩存失敗,在issue提交OpenGL命令的時候,還是會再次upload Bitmap的,這大概也是UI阻塞的原因,這個時段對應(yīng)的耗時如下:
Render線程issue提交OpenGL渲染命令
同步完成后,就可以處理之前的DrawOpTree,裝換成標(biāo)準(zhǔn)的OpenGL API,提交OpenGL進行渲染,繼續(xù)看DrawFrameTask的后半部分,主要是調(diào)用CanvasContext的draw,遞歸之前的DrawOpTree
void CanvasContext::draw() {
EGLint width, height;
<!--開始繪制,綁定EglSurface, 申請EglSurface需要的內(nèi)存-->
mEglManager.beginFrame(mEglSurface, &width, &height);
...
Rect outBounds;
<!--遞歸調(diào)用OpenGLRender中的OpenGL API,繪制-->
mCanvas->drawRenderNode(mRootRenderNode.get(), outBounds);
bool drew = mCanvas->finish();
// Even if we decided to cancel the frame, from the perspective of jank
// metrics the frame was swapped at this point
mCurrentFrameInfo->markSwapBuffers();
<!--通知提交畫布-->
if (drew) {
swapBuffers(dirty, width, height);
}
...
}
- 第一步:mEglManager.beginFrame,其實是標(biāo)記當(dāng)前上下文,并且申請繪制內(nèi)存,因為一個進程中可能存在多個window,也就是多個EglSurface,那么我們首先需要標(biāo)記處理哪個,也就是用哪塊畫布繪畫。之前理解Android硬件加速的小白文說過,硬件加速場景會提前在SurfaceFlinger申請內(nèi)存坑位,但是并未真正申請內(nèi)存,這塊內(nèi)存是在真正繪制的時候才去申請,這里申請的內(nèi)存是讓GPU操作的內(nèi)存,也是將來用來提交給SurfaceFlinger用來合成用的Layer數(shù)據(jù);
- 第二步:遞歸issue OpenGL命令,提交給GPU繪制;
- 第三步:通過swapBuffers將繪制好的數(shù)據(jù)提交給SF去合成(其實GPU很可能并未完成渲染,但是可以提前釋放Render線程,這里需要Fence機制保證同步)。不同的GPU實現(xiàn)不同,廠商不會將這部分開源,本文結(jié)合Android源碼(軟件實現(xiàn)的OpenGL)跟真機Systrace猜測實現(xiàn)。
先看第一步,通過EglManager讓Context綁定當(dāng)前EglSurface,完成GPU繪制內(nèi)存的申請
void EglManager::beginFrame(EGLSurface surface, EGLint* width, EGLint* height) {
makeCurrent(surface);
...
eglBeginFrame(mEglDisplay, surface);
}
makeCurrent都會向BnGraphicproducer申請一塊內(nèi)存,對于非自己編寫的Render線程,基本都是向SurfaceFlinger申請,
EGLBoolean eglMakeCurrent( EGLDisplay dpy, EGLSurface draw,
EGLSurface read, EGLContext ctx)
{
ogles_context_t* gl = (ogles_context_t*)ctx;
if (makeCurrent(gl) == 0) {
if (ctx) {
egl_context_t* c = egl_context_t::context(ctx);
egl_surface_t* d = (egl_surface_t*)draw;
egl_surface_t* r = (egl_surface_t*)read;
...
if (d) {
<!--牽扯到申請內(nèi)存-->
if (d->connect() == EGL_FALSE) {
return EGL_FALSE;
}
d->ctx = ctx;
<!--綁定-->
d->bindDrawSurface(gl);
}
...
return setError(EGL_BAD_ACCESS, EGL_FALSE);
}
如果是第一次的話,則需要調(diào)用egl_surface_t connect,其實就是調(diào)用之前創(chuàng)建的egl_window_surface_v2_t的connect,觸發(fā)申請繪制內(nèi)存:
EGLBoolean egl_window_surface_v2_t::connect()
{
// dequeue a buffer
int fenceFd = -1;
<!--調(diào)用nativeWindow的dequeueBuffer申請繪制內(nèi)存,獲取一個Fence-->
if (nativeWindow->dequeueBuffer(nativeWindow, &buffer,
&fenceFd) != NO_ERROR) {
return setError(EGL_BAD_ALLOC, EGL_FALSE);
}
// wait for the buffer 等待申請的內(nèi)存可用
sp<Fence> fence(new Fence(fenceFd));
...
return EGL_TRUE;
}
上面的nativeWindow其實就是Surface:
int Surface::dequeueBuffer(android_native_buffer_t** buffer, int* fenceFd) {
...
FrameEventHistoryDelta frameTimestamps;
status_t result = mGraphicBufferProducer->dequeueBuffer(&buf, &fence, reqWidth, reqHeight,
reqFormat, reqUsage, &mBufferAge,
enableFrameTimestamps ? &frameTimestamps
: nullptr);
... 如果需要重新分配,則requestBuffer,請求分配
if ((result & IGraphicBufferProducer::BUFFER_NEEDS_REALLOCATION) || gbuf == nullptr) {
<!--請求分配-->
result = mGraphicBufferProducer->requestBuffer(buf, &gbuf);
}
...
簡單說就是先申請內(nèi)存坑位,如果該坑位的內(nèi)存需要重新分配,則再申請分配匿名共享內(nèi)存,這里分配的內(nèi)存才是EglSurface(Surface)繪制所需內(nèi)存(硬件加速),接下來就可以通知OpenGL渲染繪制了。上面流程牽扯到一個Fence機制,其實就是一種協(xié)助生產(chǎn)者消費者的機制,主要作用是處理GPU跟CPU的同步上,先不談。先走完流程,CanvasContext的mCanvas其實是OpenGLRenderer,接著看OpenGLRenderer的drawRenderNode:
void OpenGLRenderer::drawRenderNode(RenderNode* renderNode, Rect& dirty, int32_t replayFlags) {
// All the usual checks and setup operations (quickReject, setupDraw, etc.)
// will be performed by the display list itself
if (renderNode && renderNode->isRenderable()) {
// compute 3d ordering
<!--計算Z順序-->
renderNode->computeOrdering();
<!--如果禁止合并Op直接繪制-->
if (CC_UNLIKELY(Properties::drawDeferDisabled)) {
startFrame();
ReplayStateStruct replayStruct(*this, dirty, replayFlags);
renderNode->replay(replayStruct, 0);
return;
}
...
DeferredDisplayList deferredList(mState.currentClipRect(), avoidOverdraw);
DeferStateStruct deferStruct(deferredList, *this, replayFlags);
<!--合并-->
renderNode->defer(deferStruct, 0);
<!--處理文理圖層-->
flushLayers();
<!--設(shè)置視窗-->
startFrame();
<!--flush,生成并提交OpenGL命令-->
deferredList.flush(*this, dirty);
} ...
計算Z order跟合并DrawOp之前簡單說過,不分析,這里只看flushLayers跟最終的issue OpenGL 命令(deferredList.flush,其實也是遍歷每個DrawOp,調(diào)用自己的draw函數(shù)),flushLayers主要是處理TextureView,為了簡化,先不考慮,假設(shè)不存在此類試圖,那么只看flush即可,
void DeferredDisplayList::flush(OpenGLRenderer& renderer, Rect& dirty) {
...
replayBatchList(mBatches, renderer, dirty);
...
}
static void replayBatchList(const Vector<Batch*>& batchList,
OpenGLRenderer& renderer, Rect& dirty) {
for (unsigned int i = 0; i < batchList.size(); i++) {
if (batchList[i]) {
batchList[i]->replay(renderer, dirty, i);
}
}
}
virtual void DrawBatch::replay(OpenGLRenderer& renderer, Rect& dirty, int index) override {
for (unsigned int i = 0; i < mOps.size(); i++) {
DrawOp* op = mOps[i].op;
const DeferredDisplayState* state = mOps[i].state;
renderer.restoreDisplayState(*state);
op->applyDraw(renderer, dirty); } }
遞歸每個合并后的Batch,接著處理Batch中每個DrawOp,調(diào)用其replay,以DrawPointsOp畫點為例:
class DrawPointsOp : public DrawLinesOp {
public:
DrawPointsOp(const float* points, int count, const SkPaint* paint)
: DrawLinesOp(points, count, paint) {}
virtual void applyDraw(OpenGLRenderer& renderer, Rect& dirty) override {
renderer.drawPoints(mPoints, mCount, mPaint);
}
...
最終調(diào)用OpenGLrender的drawPoints
void OpenGLRenderer::drawPoints(const float* points, int count, const SkPaint* paint) {
...
count &= ~0x1;
<!--構(gòu)建VertexBuffer-->
VertexBuffer buffer;
PathTessellator::tessellatePoints(points, count, paint, *currentTransform(), buffer);
...
int displayFlags = paint->isAntiAlias() ? 0 : kVertexBuffer_Offset;
<!--使用buffer paint繪制 -->
drawVertexBuffer(buffer, paint, displayFlags);
mDirty = true;
}
void OpenGLRenderer::drawVertexBuffer(float translateX, float translateY,
const VertexBuffer& vertexBuffer, const SkPaint* paint, int displayFlags) {
/...
Glop glop;
GlopBuilder(mRenderState, mCaches, &glop)
.setRoundRectClipState(currentSnapshot()->roundRectClipState)
.setMeshVertexBuffer(vertexBuffer, shadowInterp)
.setFillPaint(*paint, currentSnapshot()->alpha)
...
.build();
renderGlop(glop);
}
void OpenGLRenderer::renderGlop(const Glop& glop, GlopRenderType type) {
...
mRenderState.render(glop);
...
Vertex是OpenGL的基礎(chǔ)概念,drawVertexBuffer調(diào)用RenderState的render,向GPU提交繪制命令(不會立即繪制,GPU也是由緩沖區(qū)的,除非手動glFinish或者glFlush,才會即刻渲染),RenderState可以看做OpenGL狀態(tài)機的抽象,render函數(shù)實現(xiàn)如下
void RenderState::render(const Glop& glop) {
const Glop::Mesh& mesh = glop.mesh;
const Glop::Mesh::Vertices& vertices = mesh.vertices;
const Glop::Mesh::Indices& indices = mesh.indices;
const Glop::Fill& fill = glop.fill;
// ---------------------------------------------
// ---------- Program + uniform setup ----------
// ---------------------------------------------
mCaches->setProgram(fill.program);
if (fill.colorEnabled) {
fill.program->setColor(fill.color);
}
fill.program->set(glop.transform.ortho,
glop.transform.modelView,
glop.transform.meshTransform(),
glop.transform.transformFlags & TransformFlags::OffsetByFudgeFactor);
// Color filter uniforms
if (fill.filterMode == ProgramDescription::kColorBlend) {
const FloatColor& color = fill.filter.color;
glUniform4f(mCaches->program().getUniform("colorBlend"),
color.r, color.g, color.b, color.a);
}
....
// ---------- Mesh setup ----------
// vertices
const bool force = meshState().bindMeshBufferInternal(vertices.bufferObject)
|| (vertices.position != nullptr);
meshState().bindPositionVertexPointer(force, vertices.position, vertices.stride);
// indices
meshState().bindIndicesBufferInternal(indices.bufferObject);
...
// ------------------------------------
// ---------- GL state setup ----------
// ------------------------------------
blend().setFactors(glop.blend.src, glop.blend.dst);
// ------------------------------------
// ---------- Actual drawing ----------
// ------------------------------------
if (indices.bufferObject == meshState().getQuadListIBO()) {
// Since the indexed quad list is of limited length, we loop over
// the glDrawXXX method while updating the vertex pointer
GLsizei elementsCount = mesh.elementCount;
const GLbyte* vertexData = static_cast<const GLbyte*>(vertices.position);
while (elementsCount > 0) {
...
glDrawElements(mesh.primitiveMode, drawCount, GL_UNSIGNED_SHORT, nullptr);
elementsCount -= drawCount;
vertexData += (drawCount / 6) * 4 * vertices.stride; } }
...
}
可以看到,經(jīng)過一步步的設(shè)置,變換,預(yù)處理,最后都是要轉(zhuǎn)換成glXXX函數(shù),生成相應(yīng)的OpenGL命令發(fā)送給GPU,通知GPU繪制,這里有兩種處理方式,第一種是CPU阻塞等待GPU繪制結(jié)束后返回,再將繪制內(nèi)容提交給SurfaceFlinger進行合成,第二種是CPU直接返回,然后提交給SurfaceFlinger合成,等到SurfaceFlinger合成的時候,如果還未繪制完畢,則需要阻塞等待GPU繪制完畢,軟件實現(xiàn)的采用的是第一種,硬件實現(xiàn)的一般是第二種。需要注意:OpenGL繪制前各種準(zhǔn)備包括傳給GPU使用的內(nèi)存都是CPU在APP的私有內(nèi)存空間申請的,而GPU真正繪制到畫布使用的提交給SurfaceFlinger的那塊內(nèi)存,是從匿名共享申請的內(nèi)存,兩者是不一樣的,這一部分的耗時,其實就是CPU 將命令同步給GPU的耗時,在OpenGL玄學(xué)曲線中是:
Render線程swapBuffers提交圖形緩沖區(qū)(加Fence機制)
在Android里,GraphicBuffer的同步主要借助Fence同步機制,它最大的特點是能夠處理GPU、CPU、HWC間的同步。因為,GPU處理一般是異步的,當(dāng)我們調(diào)用OpenGL API返回后,OpenGL命令并不是即刻被GPU執(zhí)行的,而是被緩存在本地的GL命令緩沖區(qū)中,等緩沖區(qū)滿的時候,才會真正通知GPU執(zhí)行,而CPU可能完全不知道執(zhí)行時機,除非CPU主動使用glFinish()強制刷新,阻塞等待這些命令執(zhí)行完,但是,毫無疑問,這會使得CPU、GPU并行處理效率降低,至少,渲染線程是被阻塞在那里的;相對而言異步處理的效率要高一些,CPU提交命令后就返回,不等待GPU處理完,這樣渲染線程被解放處理下一條消息,不過這個時候圖形未被處理完畢的前提的下就被提交給SurfaceFlinger圖形合成,那么SurfaceFlinger需要知道什么時候這個GraphicBuffer被GPU處理填充完畢,這個時候就是Fence機制發(fā)揮作用的地方,關(guān)于Fence不過多分析,畢竟?fàn)砍缎畔⒁餐Χ?,只簡單畫了示意圖:
之前的命令被issue完畢后,CPU一般會發(fā)送最后一個命令給GPU,告訴GPU當(dāng)前命令發(fā)送完畢,可以處理,GPU一般而言需要返回一個確認(rèn)的指令,不過,這里并不代表執(zhí)行完畢,僅僅是通知到而已,如果GPU比較忙,來不及回復(fù)通知,則CPU需要阻塞等待,CPU收到通知后,會喚起當(dāng)前阻塞的Render線程,繼續(xù)處理下一條消息,這個階段是在swapBuffers中完成的,Google給的解釋如下:
Once Android finishes submitting all its display list to the GPU, the system issues one final command to tell the graphics driver that it's done with the current frame. At this point, the driver can finally present the updated image to the screen.
It’s important to understand that the GPU executes work in parallel with the CPU. The Android system issues draw commands to the GPU, and then moves on to the next task. The GPU reads those draw commands from a queue and processes them.
In situations where the CPU issues commands faster than the GPU consumes them, the communications queue between the processors can become full. When this occurs, the CPU blocks, and waits until there is space in the queue to place the next command. This full-queue state arises often during the Swap Buffers stage, because at that point, a whole frame’s worth of commands have been submitted
但看Android源碼而言,軟件實現(xiàn)的libagl可以看做同步的,不需要考慮Fence機制:
EGLBoolean egl_window_surface_v2_t::swapBuffers()
{
...
// 其實就是queueBuffer,queueBuffer這里用的是-1
nativeWindow->queueBuffer(nativeWindow, buffer, -1);
buffer = 0;
// dequeue a new buffer
int fenceFd = -1;
// 這里是為了什么,還是阻塞等待,難道是為了等待GPU處理完成嗎?
// buffer換buffer
if (nativeWindow->dequeueBuffer(nativeWindow, &buffer, &fenceFd) == NO_ERROR) {
sp<Fence> fence(new Fence(fenceFd));
// fence->wait
if (fence->wait(Fence::TIMEOUT_NEVER)) {
nativeWindow->cancelBuffer(nativeWindow, buffer, );
return setError(EGL_BAD_ALLOC, EGL_FALSE);
}
...
可以看到,源碼中是先將Buffer提交給SurfaceFlinger,然后再申請一個Buffer用來處理下一次請求。并且這里queueBuffer傳遞的Fence是-1,也就在swapbuffer的時候,軟件實現(xiàn)的OpenGL庫是不需要Fence機制的(壓根不需要考慮GPU、CPU同步)。queueBuffer會觸發(fā)Layer回調(diào),并向SurfaceFlinger發(fā)送消息,請求SurfaceFlinger執(zhí)行,這里是一個異步過程,不會導(dǎo)致阻塞,回調(diào)入口在Layer的onFrameAvailable
void Layer::onFrameAvailable(const BufferItem& item) {
{
...queueBuffer后觸發(fā)Layer的onFrameAvailable回調(diào),
mFlinger->signalLayerUpdate();
}
而dequeueBuffer在slot上限允許的前提下,也不會阻塞,按理說,不會怎么耗時,但是就模擬器效果而言,swapBuffers好像耗時比較嚴(yán)重(其中下圖的黃色部分就是swapBuffers耗時),這里不太理解,因為模擬器應(yīng)該是同步的,應(yīng)該不會牽扯緩沖區(qū)交換時也不會隱式將命令送去GPU執(zhí)行,也不會阻塞等待,為什么耗時這么多呢,模擬器的(Genymotion 6.0),不知道是不是跟Genymotion有關(guān)系:
再看一下Genymotion 的Systrace:
可以看到,Systace中的函數(shù)調(diào)用基本跟egl.cpp中基本一致,但是queue跟dequeue buffer為什么耗時這么久呢?有些不理解,希望有人能指點。而對于硬件真機,一般需要處理Fence,其egl_window_surface_v2_t::swapBuffers()應(yīng)該會被重寫,至少需要傳遞一個有效的Fence過去,
nativeWindow->queueBuffer(nativeWindow, buffer, fenceId(不應(yīng)該再是-1));
也就是說,queueBuffer的fenceid不能再是-1了,因為需要一個有效的Fence處理GPU CPU同步,再再看下真機的Systrace(nexus5 6.0)
可以看到真機函數(shù)的調(diào)用跟模擬器差別很大,比如dequeue、enqueue,具體可能要看各家的實現(xiàn)了,再看8.0的nexus6p:
一開始我以為,swapBuffers會在某個地方調(diào)用glFinish()或者glFlush,這個時候可能會阻塞,導(dǎo)致耗時增加,但是看源碼說不通,因為好像也跟就不會在enqueue或者dequeue的時候直接觸發(fā),就算觸發(fā),也是異步的。一般,issue任務(wù)給驅(qū)動后,如果采用是雙緩沖,在緩沖區(qū)交換操作會隱式將命令送去執(zhí)行,這里猜想是不同廠商自己實現(xiàn),但是看不到具體的代碼,也不好確定,誰做rom的希望能指點下。 這段時間的耗時在GPU呈現(xiàn)曲線上如下,文檔解釋說是CPU等待GPU的時間,個人理解:是等待時間,但是不是等待GPU完成渲染的時間,僅僅是等待一個ACK類的信號,否則,就不存在CPU、GPU并行了:
dequeueBuffer會阻塞導(dǎo)致耗時增加嗎?應(yīng)該也不會,關(guān)于swapbuffer這段時間的耗時有空再看了
總結(jié)
- UI線程構(gòu)建OpenGL的DrawOpTree
- Render線程負(fù)責(zé)DrawOpTree合并優(yōu)化、數(shù)據(jù)的同步
- Render線程負(fù)責(zé)將DrawOp轉(zhuǎn)換成標(biāo)準(zhǔn)OpenGL命令,并isssue給GPU
- Render線程通過swapbuffer通知GPU(待研究),同時完成向SurfaceFlinger畫布數(shù)據(jù)的提交
作者:看書的小蝸牛
Android硬件加速(二)-RenderThread與OpenGL GPU渲染
僅供參考,歡迎指正