美文网首页
hive的练习

hive的练习

作者: 薛落花随泪绽放 | 来源:发表于2017-10-20 21:32 被阅读10次
#cd
#vi relations
Zhangsan Wangwu 01:01:01
Zhangsan Zhaoliu 00:11:21
Zhangsan Yuqi 00:19:01
Zhangsan Jingba 00:21:01
Zhangsan Wuxi 01:31:17
Wangwu Zhaoliu 00:51:01
Wangwu Zhaoliu 01:11:19
Wangwu Yuqi 00:00:21
Wangwu Yuqi 00:37:21
Yuqi Zhaoliu 00:23:01
Yuqi Wuxi 00:18:00
Jingba Wangwu 00:01:01
Jingba Wangwu 00:00:06
Jingba Wangwu 00:02:03
Jingba Wangwu 00:02:54
Wangwu Yuqi 01:00:19
#hadoop fs -put relations /
#vi test_mapper.py
import sys
for line in sys.stdin:
        line = line.strip()
        fromstr,tostr,time =line.split('\t')
        hours,minutes,secondes = time.split(':')
        newtime = int(hours)*60*60+int(minutes)*60+int(secondes)
        if cmp(fromstr,tostr) == -1:
          #print(cmp(fromstr,tostr))
          fromstr,tostr = tostr,fromstr
        print ' ' .join([fromstr+tostr,str(newtime)])

#vi test.hive
CREATE TABLE relations_new (
fromtostr STRING,
   duration INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' ';

add FILE test_mapper.py;

INSERT OVERWRITE TABLE relations_new
SELECT
  TRANSFORM (fromstr,tostr,duration)
  USING 'python test_mapper.py'
  AS (fromtostr,duration)
FROM relations;
#hive
create table relations (
fromstr string,
tostr string,
duration string)
row format delimited fields terminated by ' ';

load data inpath '/relations' overwrite into table relations;
exit;
#hive -f test.hive
#hive

select 
    fromtostr,
    duration,
    rank() over(order by duration desc) rank,
dense_rank() over(order by duration) dense_rank,
row_number() over(order by duration) row_number
from relations_new;

OK
WangwuJingba    6   1   1   1
YuqiWangwu  21  2   2   2
WangwuJingba    61  3   3   3
WangwuJingba    123 4   4   4
WangwuJingba    174 5   5   5
ZhaoliuZhangsan 681 6   6   6
YuqiWuxi    1080    7   7   7
YuqiWangwu  1109    8   8   8
ZhangsanYuqi    1141    9   9   9
ZhangsanJingba  1261    10  10  10
ZhaoliuYuqi 1381    11  11  11
YuqiWangwu  2241    12  12  12
ZhaoliuWangwu   3061    13  13  13
YuqiWangwu  3619    14  14  14
ZhangsanWangwu  3661    15  15  15
ZhaoliuWangwu   4279    16  16  16
ZhangsanWangwu  4713    17  17  17
ZhangsanWuxi    5477    18  18  18
Time taken: 27.345 seconds, Fetched: 18 row(s)



相关文章

  • 2018-08-05--08-11

    08-05配置1、sql语句练习。根据月乔的文档&sql优化,根据文档练习2、hive语句1)hive,sql连接...

  • hive学习(三):练习题——collect_set及array

    前言: 以sql为基础,利用题目进行hive的语句练习,逐步体会sql与hive的不同之处。 题目用到hive的集...

  • hive学习(二):练习题——求访问次数

    前言: 以sql为基础,利用题目进行hive的语句练习,逐步体会sql与hive的不同之处。本次练习题来源:htt...

  • hive的练习

  • Hive练习

    数据: 建表语句 1、输出的日期格式不一样,需进行格式化 2、计算每个用户的小计 计算累加列,开窗函数根据用户id...

  • hive学习(四):面试题——场景输出(row_number)

    前言: 以sql为基础,利用题目进行hive的语句练习,逐步体会sql与hive的不同之处。本次题目用到row_n...

  • Hive SQL练习

    查询全体学生的学号与姓名 查询选修了课程的学生姓名 ----hive的group by 和集合函数 查询学生的总人...

  • Hive实践练习

    创建表 查看刚刚创建的表 造测试数据 加载测试数据 由于是内部表上图红色路径中的元数据在drop表的同时也会被清除...

  • Hive练习(一)

    练习所使用的数据是之前创建的users表和train表中的 caculate the event with thw...

  • Hive练习(三)

    使用beeline连接到hive 创建users表和train表,为了方便, 创建表的hql语句单独写在一个脚本文...

网友评论

      本文标题:hive的练习

      本文链接:https://www.haomeiwen.com/subject/cscbuxtx.html