美文网首页
hive 相关

hive 相关

作者: yonggang_sun | 来源:发表于2016-05-18 17:39 被阅读124次

hive 相关

搭建hadoop和hive,mysql的环境,过程截图

1.hadoop install

sunyonggang@gg01:~/hadoop-2.6.0$ ./sbin/start-dfs.sh
Starting namenodes on [gg01]
gg01: starting namenode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-namenode-gg01.out
ggg03: starting datanode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-datanode-ggg03.out
ggg02: starting datanode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-datanode-ggg02.out
Starting secondary namenodes [gg01]
gg01: starting secondarynamenode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-secondarynamenode-gg01.out
sunyonggang@gg01:~/hadoop-2.6.0$ ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-resourcemanager-gg01.out
ggg02: starting nodemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-nodemanager-ggg02.out
ggg03: starting nodemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-nodemanager-ggg03.out
sunyonggang@gg01:~/hadoop-2.6.0$ jps
1915 NameNode
2118 SecondaryNameNode
2260 ResourceManager
2514 Jps

2.mysql install

sunyonggang@gg01:~$ sudo service mysql status
mysql start/running, process 7483
sunyonggang@gg01:~$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 43
Server version: 5.5.49-0ubuntu0.14.04.1 (Ubuntu)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)

3.hive install(ps: 这边安装的时候需要讲my.cof: #bind-address = 127.0.0.1)

sunyonggang@gg01:/etc/mysql$ hive
16/04/26 19:14:49 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in jar:file:/home/sunyonggang/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.487 seconds, Fetched: 1 row(s)
hive>

利用上节课的数据,建表并导入数据

hive> create external table sg(time char(14),
    >         cookie char(32),
    >         keyword varchar(256),
    >         rank int,
    >         click int,
    >         ref varchar(512)
    >     ) row format delimited fields terminated by '\t' location '/hdfs/user/data';
OK
Time taken: 0.05 seconds
hive> select * from sg limit 5;
OK
20111230000005  57375476989eea12893c0c3811607bcf    奇艺高清    1   1   http://www.qiyi.com/
20111230000005  66c5bb7774e31d0a22278249b26bc83a    凡人修仙传   3   1   http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1
20111230000007  b97920521c78de70ac38e3713f524b50    本本联盟    1   1   http://www.bblianmeng.com/
20111230000008  6961d0c97fe93701fc9c0d861d096cd9    华南师范大学图书馆   1   1   http://lib.scnu.edu.cn/
20111230000008  f2f5a21c764aebde1e8afcc2871e086f    在线代理    2   1   http://proxyie.cn/
Time taken: 0.239 seconds, Fetched: 5 row(s)

最热门的查询词排行 top10

  1. 按词分组,按出现的次数倒序排列
  2. sql语句与结果
hive> select count(*) as times, keyword from sg group by keyword order by times desc limit 10;
Total MapReduce CPU Time Spent: 1 minutes 25 seconds 60 msec
OK
77627   百度
36564   baidu
29598   人体艺术
23306   4399小游戏
20847   优酷
20677   qq空间
19205   新亮剑
17842   馆陶县县长闫宁的父亲
16612   公安卖萌
15212   百度一下 你就知道

用户查询排行 top10

  1. 按照cookie分组,按出现次数排序,选取前10
  2. sql语句与结果
hive> select count(*) as times, cookie from sg group by cookie order by times desc limit 10;  
Total MapReduce CPU Time Spent: 1 minutes 36 seconds 110 msec
OK
20385   ac65768b987c20b3b25cd35612f61892
11653   9faa09e57c277063e6eb70d178df8529
11528   02a8557754445a9b1b22a37b40d6db38
2571    cc7063efc64510c20bcdd604e12a3b26
2355    b64b0ec03efd0ca9cef7642c4921658b
1292    7a28a70fe4aaff6c35f8517613fb5c67
1277    b1e371de5729cdda9270b7ad09484c4f
1241    f656e28e7c3e10c2b733e6b68385d5a2
1181    7eab8caf9708d68e6964220e2f89e80d
1120    c72ce1164bcd263ba1f69292abdfdf7c
Time taken: 187.017 seconds, Fetched: 10 row(s)

搜索结果排名第1,但是点击次序排在第2的数据

  1. rank = 1 && click = 2
  2. sql语句与结果
hive> select count(*) from sg where rank=1 and click=2; 
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1461738251729_0004, Tracking URL = http://gg01:8088/proxy/application_1461738251729_0004/
Kill Command = /home/sunyonggang/hadoop-2.6.0/bin/hadoop job  -kill job_1461738251729_0004
Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1
2016-04-27 14:46:23,653 Stage-1 map = 0%,  reduce = 0%
2016-04-27 14:47:07,452 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 16.51 sec
2016-04-27 14:47:11,596 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 19.3 sec
2016-04-27 14:47:12,622 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 20.03 sec
2016-04-27 14:47:13,677 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 20.66 sec
2016-04-27 14:47:18,928 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 23.04 sec
2016-04-27 14:47:19,964 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 23.69 sec
2016-04-27 14:47:23,044 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 24.86 sec
MapReduce Total cumulative CPU time: 24 seconds 860 msec
Ended Job = job_1461738251729_0004
MapReduce Jobs Launched: 
Job 0: Map: 5  Reduce: 1   Cumulative CPU: 24.86 sec   HDFS Read: 1147077971 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 24 seconds 860 msec
OK
200885
Time taken: 248.801 seconds, Fetched: 1 row(s)

相关文章

  • hive相关

    HQL优化 1. HQL优化 一、避免or操作的不当。如果where子句中有多个条件,并且其中某个条件没有索引,使...

  • hive 相关

    hive 相关 搭建hadoop和hive,mysql的环境,过程截图 1.hadoop install 2.m...

  • hive相关

    Hive也是数据库的一种,hivesql和mysql使用起来并没有什么不同,数据插入、表关联查询都是一样的,二者都...

  • hive相关基础

    hive相关基础 1、进入到hive命令下;(linux下直接输入hive即可) hive 2、查看工作分区下的h...

  • 大数据学习教程

    Hadoop生态 Hadoop相关内容 Spark Spark系列教程 Hive Hive快速入门 Elastic...

  • docker 构建hadoop 环境

    技能库 docker 相关 linux 相关 shell 相关 hadoop hive 准备文件 jdk 安装包:...

  • Hive优化实践1-数据倾斜及join无关的优化

    Hive SQL的各种优化方法基本 都和数据倾斜密切相关。 Hive的优化分为join相关的优化和join无关的优...

  • hive相关知识

    hive概念 hive的兼容性 hive日志 启动hive hive中除了保存真正的数据以外还要额外保存用来描述库...

  • Hive相关优化

    map优化 优化并发个数 减少map数,合并小文件set mapred.max.split.size=100000...

  • Hive SQL 相关

    1.Hive 连接查询JOIN2.hive 导入数据的方式3.hadoop streaming内存超限解决方案4....

网友评论

      本文标题:hive 相关

      本文链接:https://www.haomeiwen.com/subject/phpwrttx.html