hive 相关

作者: yonggang_sun | 来源:发表于2016-05-18 17:39 被阅读124次

hive相关
hive 相关
hive相关
hive相关基础
大数据学习教程
docker 构建hadoop 环境
Hive优化实践1-数据倾斜及join无关的优化
hive相关知识
Hive相关优化
Hive SQL 相关

hive 相关

搭建hadoop和hive，mysql的环境，过程截图

1.hadoop install

sunyonggang@gg01:~/hadoop-2.6.0$ ./sbin/start-dfs.sh
Starting namenodes on [gg01]
gg01: starting namenode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-namenode-gg01.out
ggg03: starting datanode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-datanode-ggg03.out
ggg02: starting datanode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-datanode-ggg02.out
Starting secondary namenodes [gg01]
gg01: starting secondarynamenode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-secondarynamenode-gg01.out
sunyonggang@gg01:~/hadoop-2.6.0$ ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-resourcemanager-gg01.out
ggg02: starting nodemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-nodemanager-ggg02.out
ggg03: starting nodemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-nodemanager-ggg03.out
sunyonggang@gg01:~/hadoop-2.6.0$ jps
1915 NameNode
2118 SecondaryNameNode
2260 ResourceManager
2514 Jps

2.mysql install

sunyonggang@gg01:~$ sudo service mysql status
mysql start/running, process 7483
sunyonggang@gg01:~$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 43
Server version: 5.5.49-0ubuntu0.14.04.1 (Ubuntu)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)

3.hive install(ps: 这边安装的时候需要讲my.cof: #bind-address = 127.0.0.1)

sunyonggang@gg01:/etc/mysql$ hive
16/04/26 19:14:49 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in jar:file:/home/sunyonggang/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.487 seconds, Fetched: 1 row(s)
hive>

利用上节课的数据，建表并导入数据

hive> create external table sg(time char(14),
    >         cookie char(32),
    >         keyword varchar(256),
    >         rank int,
    >         click int,
    >         ref varchar(512)
    >     ) row format delimited fields terminated by '\t' location '/hdfs/user/data';
OK
Time taken: 0.05 seconds
hive> select * from sg limit 5;
OK
20111230000005  57375476989eea12893c0c3811607bcf    奇艺高清    1   1   http://www.qiyi.com/
20111230000005  66c5bb7774e31d0a22278249b26bc83a    凡人修仙传   3   1   http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1
20111230000007  b97920521c78de70ac38e3713f524b50    本本联盟    1   1   http://www.bblianmeng.com/
20111230000008  6961d0c97fe93701fc9c0d861d096cd9    华南师范大学图书馆   1   1   http://lib.scnu.edu.cn/
20111230000008  f2f5a21c764aebde1e8afcc2871e086f    在线代理    2   1   http://proxyie.cn/
Time taken: 0.239 seconds, Fetched: 5 row(s)

最热门的查询词排行 top10

按词分组，按出现的次数倒序排列
sql语句与结果

hive> select count(*) as times, keyword from sg group by keyword order by times desc limit 10;
Total MapReduce CPU Time Spent: 1 minutes 25 seconds 60 msec
OK
77627   百度
36564   baidu
29598   人体艺术
23306   4399小游戏
20847   优酷
20677   qq空间
19205   新亮剑
17842   馆陶县县长闫宁的父亲
16612   公安卖萌
15212   百度一下 你就知道

用户查询排行 top10

按照cookie分组，按出现次数排序，选取前10
sql语句与结果

hive> select count(*) as times, cookie from sg group by cookie order by times desc limit 10;  
Total MapReduce CPU Time Spent: 1 minutes 36 seconds 110 msec
OK
20385   ac65768b987c20b3b25cd35612f61892
11653   9faa09e57c277063e6eb70d178df8529
11528   02a8557754445a9b1b22a37b40d6db38
2571    cc7063efc64510c20bcdd604e12a3b26
2355    b64b0ec03efd0ca9cef7642c4921658b
1292    7a28a70fe4aaff6c35f8517613fb5c67
1277    b1e371de5729cdda9270b7ad09484c4f
1241    f656e28e7c3e10c2b733e6b68385d5a2
1181    7eab8caf9708d68e6964220e2f89e80d
1120    c72ce1164bcd263ba1f69292abdfdf7c
Time taken: 187.017 seconds, Fetched: 10 row(s)

搜索结果排名第1，但是点击次序排在第2的数据

rank = 1 && click = 2
sql语句与结果

hive> select count(*) from sg where rank=1 and click=2; 
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1461738251729_0004, Tracking URL = http://gg01:8088/proxy/application_1461738251729_0004/
Kill Command = /home/sunyonggang/hadoop-2.6.0/bin/hadoop job  -kill job_1461738251729_0004
Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1
2016-04-27 14:46:23,653 Stage-1 map = 0%,  reduce = 0%
2016-04-27 14:47:07,452 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 16.51 sec
2016-04-27 14:47:11,596 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 19.3 sec
2016-04-27 14:47:12,622 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 20.03 sec
2016-04-27 14:47:13,677 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 20.66 sec
2016-04-27 14:47:18,928 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 23.04 sec
2016-04-27 14:47:19,964 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 23.69 sec
2016-04-27 14:47:23,044 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 24.86 sec
MapReduce Total cumulative CPU time: 24 seconds 860 msec
Ended Job = job_1461738251729_0004
MapReduce Jobs Launched: 
Job 0: Map: 5  Reduce: 1   Cumulative CPU: 24.86 sec   HDFS Read: 1147077971 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 24 seconds 860 msec
OK
200885
Time taken: 248.801 seconds, Fetched: 1 row(s)

网友评论

本文标题：hive 相关

本文链接：https://www.haomeiwen.com/subject/phpwrttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

hive 相关

hive 相关

搭建hadoop和hive，mysql的环境，过程截图

利用上节课的数据，建表并导入数据

最热门的查询词排行 top10

用户查询排行 top10

搜索结果排名第1，但是点击次序排在第2的数据

相关文章

hive相关

hive 相关

hive相关

hive相关基础

大数据学习教程

docker 构建hadoop 环境

Hive优化实践1-数据倾斜及join无关的优化

hive相关知识

Hive相关优化

Hive SQL 相关

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读