Tensorflow-0-帶GPU支持的安裝與校驗

1c8b: 回顧與概述

這篇文章詳細介紹了Tensorflow的安裝和校驗安裝是否成功的教程,涵蓋了在Ubuntu 16.04環境下GPU支持安裝非GPU支持的安裝以及校驗

系統概覽:

  • Ubuntu 16.04 64位
  • NVIDIA GTX 770M

內容概覽:

  • 根據NVIDIA官方文檔安裝CUDA-Toolkit
  • 校驗CUDA安裝是否成功
  • 根據Tensorflow官方文檔安裝Tensorflow(GPU支持)
  • 校驗Tensorflow安裝是否成功

原版文檔很長,這篇文章給使用Ubuntu的朋友們提供一下便利!


1c8c: Tensorflow入門資源推薦

我的第一份Tensorflow入門,給大家介紹一下,是Youtube里周莫煩的Tensorflow基礎教程,翻墻點擊<a target="_blank">這里</a>帶你去看!

下面好評很多,很基礎,現在20集的樣子,大家可以訂閱,并且希望他能持續更新。


1c8d: Tensorflow安裝

  1. 首先請翻墻;

  2. 點擊進入Tensofrflow官網Linux安裝頁。進入頁面之后就會看到安裝選擇,可以安裝非GPU支持的TF(Tensorflow以下簡稱TF)和GPU支持的TF,我們這邊選擇安裝GPU支持的TF;

  3. 檢查系統軟件硬件是否符合NVIDIA要求;

     完整的安裝前檢查如下:
    
  • 檢查系統GPU是否處于CUDA支持列表

通過 lspci | grep -i nvidia 來查看系統GPU型號;如果沒有輸入,請先運行update-pciids,然后再次運行上一個命令;

并且到 CUDA支持的GPU列表 查看系統GPU是否處于支持列表;

  • 檢查當前Linux版本處于CUDA支持列表

通過 uname -m && cat /etc/*release 來查看Linux版本;
</br>
<center>CUDA-8支持的Linux版本
</br>


Cuda-8-Supported Linux Dists
  • 檢查系統安裝了gcc

通過 gcc --version 來查看系統是否安裝gcc;如果報錯,請安裝相應的開發工具包;

  • 檢查系統是否安裝內核頭文件,以及必要的開發工具包;

通過 sudo apt-get install linux-headers-$(uname -r) 安裝;

其他Linux發行版本,安裝方法詳見CUDA安裝文檔 第2.4章節;

  • 如果上述條件符合,轉到第4步,否則,轉到第8步安裝非GPU支持的Tensorflow
  1. 需要安裝CUDA-TOOLKIT來支持GPU。官方提供了NVIDIA的官方文檔

  2. 下載NVIDIA最新的CUDA-TOOLKIT;在頁面最下方,以此點擊Linux->x86_64->Ubuntu->16.04->deb(local); 之后點擊Download開始下載;

  3. 下載完成之后,在終端執行以下命令,安裝CUDA:

    sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
    sudo apt-get update
    sudo apt-get install cuda
    
  4. 安裝CUDA依賴庫:

    sudo apt-get install libcupti-dev
    
  5. 通過官方推薦的virtualenv方式來安裝TF

     完整安裝步驟如下:
    
    • 安裝pip,virtualenv,以及python開發包

      sudo apt-get install python-pip python-dev python-virtualenv
      
    • 創建virtualenv環境

      virtualenv --system-site-packages <tensorflow> 
      
      上述命令講創建系統默認python版本的虛擬環境
      如果需要指定python版本,務必添加--python選項
      
      virtualenv --system-site-packages --python=<path-to-python-executable> <tensorflow>
      
    • 激活虛擬環境

       $ source ~/tensorflow/bin/activate # bash, sh, ksh, or zsh
       $ source ~/tensorflow/bin/activate.csh  # csh or tcsh
      
    • 使用如下相應的命令安裝TF

     (tensorflow)$ pip install --upgrade tensorflow      # for Python 2.7
     (tensorflow)$ pip3 install --upgrade tensorflow     # for Python 3.n
     (tensorflow)$ pip install --upgrade tensorflow-gpu  # for Python 2.7 and GPU
     (tensorflow)$ pip3 install --upgrade tensorflow-gpu # for Python 3.n and GPU
    
    • 如果上面一步失敗,可能由于使用的pip版本低于了8.1,使用如下命令安裝
      pip install --upgrade pip # 升級pip,然后重試上一步驟
      
      如果有權限錯誤,請使用
      
      sudo -H pip install --upgrade pip # 升級pip,然后重試上一步驟
      
      

至此,CUDA和TF都安裝完成;


1c8e: 校驗CUDA和TF安裝

  1. 校驗CUDA安裝

    1) 重啟一下系統,讓NVIDIA GPU加載剛剛安裝的驅動,重啟完成之后運行

    cat /proc/driver/nvidia/version
    
    如果有如下顯示,說明GPU驅動加載成功:
        
    NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016
    GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 
    
    1. 配置環境變量
    # cuda env
    export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    export LPATH=/usr/lib/nvidia-375:$LPATH
    export LIBRARY_PATH=/usr/lib/nvidia-375:$LIBRARY_PATH
    export CUDA_HOME=/usr/local/cuda-8.0
    

    務必配置這些環境變量,否則在接下來編譯sample的時候會遇到如下錯誤:

    Makefile:346: recipe for target 'cudaDecodeGL' failed
    
    1. 安裝CUDA樣例程序
     cuda-install-samples-8.0.sh <dir>
    
    該命令已經在系統環境變量中,直接使用,dir為自定義目錄;
    
    執行完該命令之后,如果成功,會在dir中生成一個 NVIDIA_CUDA-8.0_Samples 目錄
    
    1. 編譯樣例程序,校驗CUDA安裝
    編譯之前首先保證第2)步中的環境變量設置無誤,并且第1)步中,GPU驅動版本顯示正常
    
    進入 NVIDIA_CUDA-8.0_Samples 目錄,執行
    
    make
    
    編譯成功之后輸入如下:
    
    /usr/local/cuda-8.0/bin/nvcc -ccbin g++ -I../../common/inc -I../common/UtilNPP -I../common/FreeImage/include  -m64    -gencode arch=compute_20,code=compute_20 -o jpegNPP.o -c jpegNPP.cpp
    nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
    /usr/local/cuda-8.0/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=compute_20 -o jpegNPP jpegNPP.o  -L../common/FreeImage/lib -L../common/FreeImage/lib/linux -L../common/FreeImage/lib/linux/x86_64 -lnppi -lnppc -lfreeimage
    nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
    mkdir -p ../../bin/x86_64/linux/release
    cp jpegNPP ../../bin/x86_64/linux/release
    make[1]: Leaving directory '/home/yuanzimiao/Downloads/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/jpegNPP'
    
    Finished building CUDA samples
    
    注:CUDA版本可以使用 nvcc -V 來查看
    
    1. 運行樣例程序
    進入 bin 目錄
    
    運行
    
    ./deviceQuery
    
    如果CUDA安裝及配置無誤,輸出如下:
    
     ./deviceQuery Starting...
    
      CUDA Device Query (Runtime API) version (CUDART static linking)
    
      Detected 1 CUDA Capable device(s)
    
      Device 0: "GeForce GTX 770M"
      CUDA Driver Version / Runtime Version          8.0 / 8.0
      CUDA Capability Major/Minor version number:    3.0
      Total amount of global memory:                 3017 MBytes (3163357184 bytes)
      ( 5) Multiprocessors, (192) CUDA Cores/MP:     960 CUDA Cores
      GPU Max Clock rate:                            797 MHz (0.80 GHz)
      Memory Clock rate:                             2004 Mhz
      Memory Bus Width:                              192-bit
      L2 Cache Size:                                 393216 bytes
      Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
      Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
      Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 65536
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Device supports Unified Addressing (UVA):      Yes
      Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
      Compute Mode:
         < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 770M
    
    Result = PASS
    
    Result = Pass,校驗通過
    
    運行:
    
    ./bandwidthTest
    
    來測試系統和CUDA組建的通信無誤,正常結果輸出如下:
    
    [CUDA Bandwidth Test] - Starting...
    Running on...
    
     Device 0: GeForce GTX 770M
     Quick Mode
    
     Host to Device Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)    Bandwidth(MB/s)
       33554432         11534.4
    
     Device to Host Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)    Bandwidth(MB/s)
       33554432         11768.0
    
     Device to Device Bandwidth, 1 Device(s)
     PINNED Memory Transfers
       Transfer Size (Bytes)    Bandwidth(MB/s)
       33554432         72735.8
    
    Result = PASS
    
    NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
    
    輸出Result = Pass 校驗通過
    

注:運行期間如果遇到錯誤,可能是系統SELINUX狀態為開啟,或者NVIDIA必要文件缺失,詳見官方文檔6.2.2.3章節

至此,CUDA安裝校驗完成
  1. 校驗Tensorflow安裝

    1. 激活虛擬環境,然后運行
    $ python
    >>> import tensorflow as tf
    >>> hello = tf.constant('Hello, TensorFlow!')
    >>> sess = tf.Session()
    >>> print(sess.run(hello))
    
    如果輸出 Hello, TensorFlow!
    
    說明TF安裝正常
    

Edit 1

帶GPU支持TF在import tensorflow的時候出現了錯誤日志:

>>> import tensorflow as t
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64/
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64/

Google得知是因為CUDNN包沒有安裝;

點擊這里進入下載頁面;

下載CUDNN需要注冊NVIDIA帳號,有些問題要答,隨便勾選,隨便寫點什么就可以了;

完成注冊之后,就可以下載了;

下載完成之后,運行

sudo tar -xvf cudnn-8.0-linux-x64-v5.1-rc.tgz -C /usr/local

這里假設/usr/local是cuda的安裝目錄

然后再次import:

>>> import tensorflow as t
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally


W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 770M
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:01:00.0
Total memory: 2.95GiB
Free memory: 2.52GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 770M, pci bus id: 0000:01:00.0)

Those are simply warnings. They are just informing you if you build TensorFlow from source it can be faster on your machine. Those instructions are not enabled by default on the builds available I think to be compatible with more CPUs as possible.
If you have any other doubts regarding this please feel free to ask, otherwise this can be closed.

所有的庫都加載成功了;上面輸出的帶W的警告,查詢后是因為沒有通過源代碼編譯TF,有些CPU的參數沒有開啟,因此CPU不支持一系列協議。不過,這只會影響CPU的計算速度,并不影響GPU的計算。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 228,606評論 6 533
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,582評論 3 418
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 176,540評論 0 376
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 63,028評論 1 314
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,801評論 6 410
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,223評論 1 324
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,294評論 3 442
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,442評論 0 289
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,976評論 1 335
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,800評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 42,996評論 1 369
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,543評論 5 360
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,233評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,662評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,926評論 1 286
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,702評論 3 392
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,991評論 2 374

推薦閱讀更多精彩內容