CDH集群上部署Python3環境及運行Pyspark作業

Anaconda與Python版本對應關系表

image.png

https://docs.anaconda.com/anaconda/packages/oldpkglists/

  1. 下載anaconda安裝包
wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh
  1. 安裝anaconda
bash Anaconda3-4.4.0-Linux-x86_64.sh

長按回車

root@bigdata-dev-43:/home/hd_user# bash Anaconda3-4.4.0-Linux-x86_64.sh 

Welcome to Anaconda3 4.4.0 (by Continuum Analytics, Inc.)

In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>>                                                                                            # (按回車鍵)
===================================
Anaconda End User License Agreement
===================================
.......

輸入yes

Copyright 2017, Continuum Analytics, Inc.
...                                                                                             # 省略
kerberos (krb5, non-Windows platforms)
A network authentication protocol designed to provide strong authentication
for client/server applications by using secret-key cryptography.

cryptography
A Python library which exposes cryptographic recipes and primitives.

Do you approve the license terms? [yes|no]
>>> yes                                                                                       # 輸入 yes
Anaconda3 will now be installed into this location:
/root/anaconda3

輸入安裝路徑 /opt/cloudera/anaconda3
如果提示“tar (child): bzip2: Cannot exec: No such file or directory”,需要先安裝bzip2。sudo yum -y install bzip2

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/root/anaconda3] >>> /opt/cloudera/anaconda3         # 輸入安裝路徑 /opt/cloudera/anaconda3
PREFIX=/opt/cloudera/anaconda3
installing: python-3.6.1-2 ...
installing: _license-1.1-py36_1 ...

設置anaconda的PATH路徑:
為了確保pyspark任務提交后使用python3,故輸入no,重新設置PATH

installing: alabaster-0.7.10-py36_0 ...
...                                                                                 # 省略
installing: zlib-1.2.8-3 ...
installing: anaconda-4.4.0-np112py36_0 ...
installing: conda-4.3.21-py36_0 ...
installing: conda-env-2.6.0-0 ...
Python 3.6.1 :: Continuum Analytics, Inc.
creating default environment...
installation finished.
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /root/.bashrc ? [yes|no]
[no] >>> no                                                           # 輸入 no

You may wish to edit your .bashrc or prepend the Anaconda3 install location:

$ export PATH=/opt/cloudera/anaconda3/bin:$PATH

Thank you for installing Anaconda3!

Share your notebooks and packages on Anaconda Cloud!
Sign up for free: https://anaconda.org

  1. 設置anaconda3的環境變量
[root@node00 ~]# echo "export PATH=/opt/cloudera/anaconda3/bin:$PATH" >> /etc/profile
[root@node00 ~]# source /etc/profile
[root@node00 ~]# env |grep PATH
PATH=/opt/cloudera/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
  1. 驗證Python版本
root@bigdata-dev-43:/home/hd_user# python
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:09:58) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

root@bigdata-dev-43:/home/hd_user# python -V
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
  1. 在CM配置Spark的Python環境
export PYSPARK_PYTHON=/opt/cloudera/anaconda3/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/anaconda3/bin/python
4c985369e1a4ea7454e0c5c225048001.png

重啟相關服務。

  1. 使用Pyspark命令測試
x = sc.parallelize([1,2,3])
y = x.flatMap(lambda x: (x, 100*x, x**2))
print(x.collect())
print(y.collect())
root@bigdata-dev-41:/home/charles# pyspark
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:09:58) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Python version 3.6.1 (default, May 11 2017 13:09:58)
SparkContext available as sc, HiveContext available as sqlContext.
>>> x = sc.parallelize([1,2,3])
>>> y = x.flatMap(lambda x: (x, 100*x, x**2))
>>> print(x.collect())
[1, 2, 3]                                                                       
>>> print(y.collect())
[1, 100, 1, 2, 200, 4, 3, 300, 9]                                               
>>> 

轉載至:https://www.pianshen.com/article/7807341338/

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
禁止轉載,如需轉載請通過簡信或評論聯系作者。

推薦閱讀更多精彩內容