本文主要记录手动安装Cloudera Hive集群过程,环境设置及Hadoop安装过程见手动安装Cloudera Hadoop CDH,参考这篇文章,hadoop各个组件和jdk版本如下:

  1. hadoop-2.0.0-cdh4.6.0
  2. hbase-0.94.15-cdh4.6.0
  3. hive-0.10.0-cdh4.6.0
  4. jdk1.6.0_38

hadoop各组件可以在这里下载。
集群规划为7个节点,每个节点的ip、主机名和部署的组件分配如下:

  1. 192.168.0.1 desktop1 NameNodeHiveResourceManagerimpala
  2. 192.168.0.2 desktop2 SSNameNodeDataNodeHBaseNodeManagerimpala
  3. 192.168.0.3 desktop3 DataNodeHBaseNodeManagerimpala

安装hive

hive安装在desktop1上,注意:hive默认是使用derby数据库保存元数据,这里替换为postgresql,下面会提到postgresql的安装说明,并且需要拷贝postgres的jdbc jar文件导hive的lib目录下。
上传hive-0.10.0-cdh4.6.0.tar到desktop1的/opt,并解压缩。

安装postgres

创建数据库

这里创建数据库metastore并创建hiveuser用户,其密码为redhat。

  1. bash# sudo -u postgres psql
  2. bash$ psql
  3. postgres=# CREATE USER hiveuser WITH PASSWORD 'redhat';
  4. postgres=# CREATE DATABASE metastore owner=hiveuser;
  5. postgres=# GRANT ALL privileges ON DATABASE metastore TO hiveuser;
  6. postgres=# \q;

初始化数据库

  1. psql -U hiveuser -d metastore
  2. \i /opt/hive-0.10.0-cdh4.6.0/scripts/metastore/upgrade/postgres/hive-schema-0.10.0.postgres.sql

编辑postgresql配置文件(/opt/PostgreSQL/9.1/data/pg_hba.conf),修改访问权限

  1. host all all 0.0.0.0/0 md5

修改postgresql.conf

  1. standard_conforming_strings = of

重起postgres

拷贝postgres的jdbc驱动到/opt/hive-0.10.0-cdh4.6.0/lib目录。

  1. su -c '/opt/PostgreSQL/9.1/bin/pg_ctl -D /opt/PostgreSQL/9.1/data restart' postgres

修改配置文件

hive-site.xml

注意修改下面配置文件中postgres数据库的密码,注意配置hive.aux.jars.path,在hive集成hbase时候需要从该路径家在hbase的一些jar文件。
hive-site.xml文件内容如下:

  1. <configuration>
  2. <property>
  3. <name>javax.jdo.option.ConnectionURL</name>
  4. <value>jdbc:postgresql://127.0.0.1/metastore</value>
  5. </property>
  6. <property>
  7. <name>javax.jdo.option.ConnectionDriverName</name>
  8. <value>org.postgresql.Driver</value>
  9. </property>
  10. <property>
  11. <name>javax.jdo.option.ConnectionUserName</name>
  12. <value>hiveuser</value>
  13. </property>
  14. <property>
  15. <name>javax.jdo.option.ConnectionPassword</name>
  16. <value>redhat</value>
  17. </property>
  18. <property>
  19. <name>mapred.job.tracker</name>
  20. <value>desktop1:8031</value>
  21. </property>
  22. <property>
  23. <name>mapreduce.framework.name</name>
  24. <value>yarn</value>
  25. </property>
  26. <property>
  27. <name>hive.aux.jars.path</name>
  28. <value>file:///opt/hive-0.10.0-cdh4.6.0/lib/zookeeper-3.4.5-cdh4.6.0.jar,
  29. file:///opt/hive-0.10.0-cdh4.6.0/lib/hive-hbase-handler-0.10.0-cdh4.6.0.jar,
  30. file:///opt/hive-0.10.0-cdh4.6.0/lib/hbase-0.94.15-cdh4.6.0.jar,
  31. file:///opt/hive-0.10.0-cdh4.6.0/lib/guava-11.0.2.jar</value>
  32. </property>
  33. <property>
  34. <name>hive.metastore.warehouse.dir</name>
  35. <value>/opt/data/warehouse-${user.name}</value>
  36. </property>
  37. <property>
  38. <name>hive.exec.scratchdir</name>
  39. <value>/opt/data/hive-${user.name}</value>
  40. </property>
  41. <property>
  42. <name>hive.querylog.location</name>
  43. <value>/opt/data/querylog-${user.name}</value>
  44. </property>
  45. <property>
  46. <name>hive.support.concurrency</name>
  47. <value>true</value>
  48. </property>
  49. <property>
  50. <name>hive.zookeeper.quorum</name>
  51. <value>desktop1,desktop2,desktop3</value>
  52. </property>
  53. <property>
  54. <name>hive.hwi.listen.host</name>
  55. <value>desktop1</value>
  56. </property>
  57. <property>
  58. <name>hive.hwi.listen.port</name>
  59. <value>9999</value>
  60. </property>
  61. <property>
  62. <name>hive.hwi.war.file</name>
  63. <value>lib/hive-hwi-0.10.0-cdh4.6.0.war</value>
  64. </property>
  65. </configuration>

环境变量

参考手动安装Cloudera Hadoop CDH中环境变量的设置。

启动hive

在启动完之后,执行一些sql语句可能会提示错误,如何解决错误可以参考Hive安装与配置

hive与hbase集成

hive-site.xml中配置hive.aux.jars.path,在环境变量中配置hadoop、mapreduce的环境变量

异常说明

异常1:

  1. FAILED: Error in metadata: MetaException(message:org.apache.hadoop.hbase.ZooKeeperConnectionException: An error is preventing HBase from connecting to ZooKeeper

原因:hadoop配置文件没有zk

异常2

  1. FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException javax.jdo.JDODataStoreException: Error executing JDOQL query "SELECT "THIS"."TBL_NAME" AS NUCORDER0 FROM "TBLS" "THIS" LEFT OUTER JOIN "DBS" "THIS_DATABASE_NAME" ON "THIS"."DB_ID" = "THIS_DATABASE_NAME"."DB_ID" WHERE "THIS_DATABASE_NAME"."NAME" = ? AND (LOWER("THIS"."TBL_NAME") LIKE ? ESCAPE '\\' ) ORDER BY NUCORDER0 " : ERROR: invalid escape string 建议:Escape string must be empty or one character..

参考:https://issues.apache.org/jira/browse/HIVE-3994

异常3,以下语句没反应

  1. select count(*) from hive_userinfo

异常4

  1. zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(966)) - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (无法定位登录配置)

原因:hive中没有设置zk

异常5

  1. hbase 中提示:WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

原因:cloudera hadoop lib中没有hadoop的native jar

异常6

  1. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster

原因:classpath没有配置正确,检查环境变量以及yarn的classpath

参考文章