hive

hive

  • Hive是一个SQL解析引擎,将SQL语句转译成MR Job,然后再Hadoop平台上运行。
  • Hive中的表是纯逻辑表,就只是表的定义等,即表的元数据。本质就是Hadoop的目录/文件,达到了元数据与数据存储分离的目的
  • Hive本身不存储数据,完全依赖HDFS和MapReduce

hive server

启动服务

  1. hive --service hiveserver2 help 查看帮助
  2. hive --service hiveserver2
  3. netstat -anp | grep 10000 ss -lnp | grep 10000

配置

/etc/hive/conf/hive-site.xml

使用了远程metastore配置

  1. <property>
  2. <name>hive.metastore.uris</name>
  3. <value>thrift://hd1.mingchao.com:9083</value>
  4. </property>

增加配置,配置启动的Host、端口

  1. <property>
  2. <name>hive.server2.thrift.bind.host</name>
  3. <value>hd2.mingchao.com</value>
  4. </property>
  5. <property>
  6. <name>hive.server2.thrift.port</name>
  7. <value>10000</value>
  8. </property>

hiveserver2会同时启动一个webui,端口号默认为10002,配置一个端口转发访问

  1. cat hive.mingchao.com.conf
  2. server {
  3. listen 80;
  4. server_name hive.mingchao.com;
  5. location / {
  6. proxy_pass http://127.0.0.1:10002;
  7. }
  8. }

beeline连接hive

  1. beeline -u jdbc:hive2://192.168.8.207:10000
  1. beeline
  2. !connect jdbc:hive2://192.168.8.207:10000
  3. !quit

配置hive odbc连接

官网下载驱动

/etc/odbc.ini

  1. [ODBC Data Sources]
  2. Cloudera Impala DSN = Cloudera Impala ODBC Driver
  3. Cloudera Hive DSN = Cloudera Hive ODBC Driver
  4. [Cloudera Impala DSN]
  5. Driver = /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
  6. HOST = 192.168.8.207
  7. PORT = 21050
  8. AuthMech = 0
  9. [Cloudera Hive DSN]
  10. Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
  11. HOST=192.168.8.207
  12. PORT=10000
  13. HiveServerType=2
  14. [ODBC Drivers]
  15. Impala = Installed
  16. Hive = Installed

/etc/odbcins.ini

  1. [Impala]
  2. Description = ODBC for impala
  3. Driver = /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
  4. [Hive]
  5. Description = ODBC for Hive
  6. Driver = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
  7. [ODBC Drivers]
  8. Impala = Installed
  9. Hive = Installed
  1. isql "Cloudera Hive DSN"
  2. SQL> select count(1) FROM t_log_login;
  3. +---------------------+
  4. | EXPR_1 |
  5. +---------------------+
  6. | 122149 |
  1. Query ID = root_20181203235454_c671a6b2-16a8-4678-bd20-1948a52ef51f
  2. Total jobs = 1
  3. Launching Job 1 out of 1
  4. Number of reduce tasks determined at compile time: 1
  5. In order to change the average load for a reducer (in bytes):
  6. set hive.exec.reducers.bytes.per.reducer=<number>
  7. In order to limit the maximum number of reducers:
  8. set hive.exec.reducers.max=<number>
  9. In order to set a constant number of reducers:
  10. set mapreduce.job.reduces=<number>
  11. Starting Job = job_1540986756395_0618, Tracking URL = http://hd1.mingchao.com:8088/proxy/application_1540986756395_0618/
  12. Kill Command = /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/bin/hadoop job -kill job_1540986756395_0618
  13. Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
  14. 2018-12-03 23:55:26,779 Stage-1 map = 0%, reduce = 0%
  15. 2018-12-03 23:55:48,254 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 11.11 sec
  16. 2018-12-03 23:55:51,562 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 11.94 sec
  17. 2018-12-03 23:56:03,778 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 22.75 sec
  18. 2018-12-03 23:56:05,982 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 29.57 sec
  19. 2018-12-03 23:56:07,078 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 29.73 sec
  20. 2018-12-03 23:56:11,353 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 41.54 sec

资料

Hive原理详解

bigdata11