不定期更新
收錄各種奇葩問題
ambari安裝之后,啟動hive MetaStore時報錯
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-metastore/bin/schematool -initSchema -dbType mysql -userName hive -passWord [PROTECTED]' returned 1.
WARNING: Use "yarn jar" to launch YARN applications.
Metastore connection URL: jdbc:mysql://c6405.ambari.apache.org/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
*** schemaTool failed ***
Solution:
hive配置的mysql登陸密碼,與mysql設置的hive用戶連接密碼不一致,修改mysql或hive配置的密碼,保持一致即可。
spark2.0 on yarn
1.jerseyNoClassDefFoundError
bin/spark-sql -driver-memory 10g --verbose --master yarn --packages com.databricks:spark-csv_2.10:1.3.0 --executor-memory 4g --num-executors 20 --executor-cores 2
16/05/09 13:15:21 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/09 13:15:21 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4041
16/05/09 13:15:21 INFO util.Utils: Successfully started service 'SparkUI' on port 4041.
16/05/09 13:15:21 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://bigaperf116.svl.ibm.com:4041
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
no issues
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-2-0-issue-with-yarn-td17440.html
A temporary solution:
set yarn.timeline-service.enabled false to turn off ATS .
2.bad substitution
diagnostics: Application application_1441066518301_0013 failed 2 times due to AM Container for appattempt_1441066518301_0013_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://localhost:8088/cluster/app/application_1441066518301_0013Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e03_1441066518301_0013_02_000001
Exit code: 1
Exception message: /mnt/yarn/nm/local/usercache/stack/appcache/
application_1441066518301_0013/container_e03_1441066518301_0013_02_000001/
launch_container.sh: line 24:$PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:
/usr/hdp/current/hadoop-client/*::$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-.6.0.${hdp.version}.jar:
/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/yarn/nm/local/usercache/stack/appcache/application_1441066518301_0013/container_e03_1441066518301_0013_02_000001/launch_container.sh: line 24: $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Solution:
此問題一般是由于手工安裝組件而無法替換變量造成;
可修改 MapReduce2 組件配置項 mapreduce.application.classpath 中的 ${hdp.version} 為 hdp 絕對路徑中的版本部分,eg. 2.4.0.0-169。
服務啟動報錯ulimit -c unlimited
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usrp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usrp/current/hadoop-client/conf start namenode'' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
starting namenode, logging to ar/log/hadoopfs/hadoop-hdfs-namenode-wy1.jcloud.local.out
Solution:
CentOS7.1上啟動HDFS的時候,在啟動HDFS的namenode或者datanode的時候,非root啟動的時候,會要求執行ulimit -c unlimited這個命令,但是執行的時候是su稱hdfs帳號來啟動,這時候因為hdfs帳號沒有權限執行這個命令,所以會導致HDFS的namenode或者datanode啟動失敗,處理這個問題有一個辦法就是改Ambari的代碼,讓HDFS啟動過程不要執行ulimit -c unlimited命令。
需要修改的代碼是:
編輯文件:
/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py
把這一行:
cmd = format("{ulimit_cmd} {hadoop_daemon} —config {hadoop_conf_dir} {action} {name}")
中的{ulimit_cmd}刪除掉,刪除之后重啟Ambari-agent即可。
注冊host報錯
ERROR 2016-08-01 13:33:38,932 main.py:309 - Fatal exception occurred:
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 306, in
main(heartbeat_stop_callback)
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 242, in main
stop_agent()
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 189, in stop_agent
sys.exit(1)
SystemExit: 1
Solution:
這是因為ambari默認用的ascii編碼,如果你用中文版操作系統,可以在/usr/lib/python2.6/site-packages/ambari_agent/main.py 文件開頭添加
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
然后再retry failed就可以了
如何刪除Ambari已有的服務
自定義服務SAMPLE后發現8080 web頁面中沒有刪除的方法
Solution:
- 停止服務
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo": {"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}' http://localhost:8080/api/v1/clusters/hadoop/services/SAMPLE
SAMPLE服務因為實際上沒干任何事,短暫時間后可能會自己又啟動,所以手速要快
- 刪除服務(快速立即執行)
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://localhost:8080/api/v1/clusters/hadoop/services/SAMPLE
如果沒有停止的話會出現
{
"status" : 500,
"message" : "org.apache.ambari.server.controller.spi.SystemException: An internal system exception occurred: Cannot remove hadoop/SAMPLE. MYMASTER is in anon-removable state."
}
沒關系再次執行就好
- 驗證
重新訪問8080 web頁面,已經發現那個SAMPLE service已經消失了 - 再舉幾個例子:
remove a host components from a host
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE 'localhost:8080/api/v1/clusters/blueCluster/hosts/elk2.jcloud.local/host_components/FLUME_HANDLER'
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE 'localhost:8080/api/v1/clusters/cluster/hosts/ochadoop10/host_components/NAMENODE'
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE 'localhost:8080/api/v1/clusters/hbcm_ocdp/hosts/hbom-if-58/host_components/YARN_CLIENT'
install the components
curl -u admin:admin -i -H "X-Requested-By:ambari" -X POST 'localhost:8080/api/v1/clusters/hbcm_ocdp/hosts/hbbdc-dn-09/host_components/PHOENIX_QUERY_SERVER'
curl -u admin:admin -i -H "X-Requested-By:ambari" -X PUT 'localhost:8080/api/v1/clusters/hbcm_ocdp/hosts/hbbdc-dn-09/host_components/PHOENIX_QUERY_SERVER' -d '{"HostRoles": {"state": "INSTALLED"}}'
如何重置ambari的管理員密碼
要想使用Ambari admin登陸,可以用以下辦法重置admin的密碼:
- Stop Ambari server
- Log on to ambari server host shell
- Run 'psql -U ambari ambari'
- Enter password **** (這是ambari連接到數據庫時用的密碼,默認是bigdata, 竟然以明文的形式,存儲在文件/etc/ambari-server/conf/password.dat)
- In psql:
update ambari.users set
user_password='538916f8943ec225d97a9a86a2c6ec0818c1cd400e09e03b660fdaaec4af29ddbb6f2b1033b81b00'
where user_name='admin'; - Quit psql: ctrl+D
- Run 'ambari-server restart'
User [dr.who] is not authorized to view the logs for application
在hadoop集群啟用權限控制后,發現job運行日志的ui訪問不了, User [dr.who] is not authorized to view the logs for application
Reason:
Resource Manager UI的默認用戶dr.who權限不正確
Solution:
如果集群使用Ambari管理的話,在HDFS > Configurations > Custom core-site > Add Property
hadoop.http.staticuser.user=yarn
后臺腳本修改配置:
獲取配置信息:
/var/lib/ambari-server/resources/scripts/configs.sh get localhost hdp_cluster hive-site|grep hive.server2.authenticatio
"hive.server2.authentication" : "NONE",
"hive.server2.authentication.spnego.keytab" : "HTTP/_HOST@EXAMPLE.COM",
"hive.server2.authentication.spnego.principal" : "/etc/security/keytabs/spnego.service.keytab",
修改配置信息:
/var/lib/ambari-server/resources/scripts/configs.sh set localhost hdp_cluster hive-site hive.server2.authentication LDAP
ambari-sudo.sh /usr/bin/hdp-select錯誤
ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.4.0.0-169 | tail -1`'] {'only_if': 'ls -d /usr/hdp/2.4.0.0-169*
Solution:
- What happens when you run "hdp-select versions" from the command line, as root? Does it return your current 2.4 version number? If not, inspect your /usr/hdp and make sure you have only "current" and the directories named after your versions (2.4 and older ones if you did an upgrade) there. If you have any other file there, delete it, and retry, first "hdp-select versions" and then ATS.
- go to /usr/bin/
vi hdp-select
def printVersions():
......
......
- if f not in [".", "..", "current", "share", "lost+found"]:
+ if f not in [".", "..", "current", "share", "lost+found","hadoop"]:
......
- 軟連接沖突,刪除多余軟連接重試
HiveMetaStore or Hiveserver fails to come up
SupportKBSYMPTOMHiveServer2 fails to come up and error similar to the following is reported in hiveserver2.log file
2015-11-18 20:47:19,965 WARN [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(442)) - Error starting HiveServer2 on attempt 4, will retry in 60 secondsorg.apache.hive.service.ServiceException: Failed to Start HiveServer2
at org.apache.hive.service.CompositeService.start(CompositeService.java:80)
at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:366)
at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:412)
at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:78)
at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:654)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:527)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
at org.apache.hive.service.cli.CLIService.start(CLIService.java:154)
at org.apache.hive.service.CompositeService.start(CompositeService.java:70) ... 11 more
Caused by: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException javax.jdo.JDOException: Exception thrown when executing query
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:230)
at org.apache.hadoop.hive.metastore.ObjectStore.getDatabases(ObjectStore.java:701)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy7.getDatabases(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_databases(HiveMetaStore.java:1158)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
HiveMetaStore fails to come up
2017-02-27 14:45:05,361 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5908)) - Starting hive metastore on port 9083
2017-02-27 14:45:05,472 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(590)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2017-02-27 14:45:05,497 INFO [main]: metastore.ObjectStore (ObjectStore.java:initialize(294)) - ObjectStore, initialize called
2017-02-27 14:45:06,193 ERROR [main]: DataNucleus.Datastore (Log4JLogger.java:error(115)) - Error : An error occurred trying to instantiate an instance of the adapter "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" for this JDBC driver : Class "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" was not found in the CLASSPATH. Please check your specification and your CLASSPATH.
Class "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" was not found in the CLASSPATH. Please check your specification and your CLASSPATH.
org.datanucleus.exceptions.ClassNotResolvedException: Class "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" was not found in the CLASSPATH. Please check your specification and your CLASSPATH.
at org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:216)
at org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:368)
at org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:391)
at org.datanucleus.store.rdbms.adapter.DatastoreAdapterFactory.getAdapterClass(DatastoreAdapterFactory.java:226)
at org.datanucleus.store.rdbms.adapter.DatastoreAdapterFactory.getNewDatastoreAdapter(DatastoreAdapterFactory.java:144)
at org.datanucleus.store.rdbms.adapter.DatastoreAdapterFactory.getDatastoreAdapter(DatastoreAdapterFactory.java:92)
at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:309)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConst
ROOT CAUSE
AMBARI-12947 , BUG-44352
Post Ambari 2.1, up to Ambari 2.1.2, its mandatory to initialize datanucleus.rdbms.datastoreAdapterClassName in Hive Configs. This is
required only if SqlAnywhere database is used. There is no option in Ambari to delete this parameter.
RESOLUTION
Upgrade to Ambari 2.1.2.
WORKAROUND
Remove Hive configuration parameter 'datanucleus.rdbms.datastoreAdapterClassName' from hive-site using configs.sh
For eg
- Dump the hive-site parameters to a file
/var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get Ambari_Hostname Ambari_ClusterName hive-site > /tmp/hive-site.txt
This would dump/redirect all Ambari Hive configs parameter to /tmp/hive-site.txt - Edit the /tmp/hive-site.txt template file created above and remove 'datanucleus.rdbms.datastoreAdapterClassname'. Also remove the
lines before the 'properties' tag - Set the hive-site parameters using /tmp/hive-site.txt
/var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin set Ambari_Hostname Ambari_ClusterName hive-site /tmp/hive-site.txt - Start Hive Services
This article created by Hortonworks Support (Article: 000003468) on 2015-11-25 06:07
OS: Linux
Type: Cluster_Administration
Version: 2.1.0, 2.3.0
Support ID: 000003468
https://issues.apache.org/jira/browse/AMBARI-13114