美文网首页我爱编程
COMP9313_WEEK1_1_课程简介

COMP9313_WEEK1_1_课程简介

作者: Eric_Hunter | 来源:发表于2018-03-03 18:44 被阅读0次

COMP9313(Big Data Management)课程相关介绍

教师:Doctor. 曹欣

Email:自行搜索

简历:浙江大学计算机学院本科,硕士;南洋理工大学计算机学院博士。

Paper数:22篇

兴趣方向:Data Management (in particular, on geo-textual data), Databases, Information Retrieval, and Data Mining

现在从事研究:

1) Filtering geo-textual data stream, e.g., geo-tagged tweets (SIGMOD13, ICDE15)

2) Keyword-aware route planning (PVLDB12, IJCAI15)

3) Efficient processing of spatial keyword queries (PVLDB10, SIGMOD11, PVLDB14, SIGMOD15, TODS15, PVLDB16, and an invited paper in ER12)

4) Mining significant semantic locations from user generated GPS data (PVLDB10)

5) Link structure analysis (PVLDB10, SIGMOD17)

曾经从事研究:

1)Using categorization information to improve question search in community based question answering services (CIKM09, WWW10, TOIS12)

2)Indoor distance-aware query processing (ICDE12)

3)Streaming graph clustering (ICDE16)

Tutor’s Email: 自行搜索

目的:

This course aims to introduce you to the concepts behind Big Data, the core technologies used in managing large-scale data sets, and a range of technologies for developing solutions to large-scale data analytics problems.

This course is intended for students who want to understand modern large-scale data analytics systems. It covers a wide range of topics and technologies, and will prepare students to be able to build such systems as well as use them efficiently and effectively to address challenges in big data management.

课程lecture:

Lectures focusing on the frontier technologies on big data management and the typical applications

Try to run in more interactive mode and provide more examples

A few lectures may run in more practical manner (e.g., like a lab/demo) to cover the applied aspects

Lecture length varies slightly depending on the progress (of that lecture) l

课本:

1)Hadoop: The Definitive Guide. Tom White. 4th Edition - O'Reilly Media

2)Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeff Ullman. 2nd edition - Cambridge University Press

3)Data-Intensive Text Processing with MapReduce. Jimmy Lin and Chris Dyer. University of Maryland, College Park.

4)Learning Spark . Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell. O'Reilly Media

参考资料:

1)Apache MapReduce Tutorial

2)Apache Spark Quick Start

课程囊括topics:

1)Topic 1. Big data management tools

Apache Hadoop

MapReduce

YARN/HDFS/HBase/Hive/Pig (briefly introduced)

Spark

AWS platform

Mahout [tentative]

2)Topic 2. Big data typical applications

Finding similar items

Graph data processing

Data stream mining

Recommender Systems

预备知识:

1)have experiences and good knowledge of algorithm design (equivalent to COMP9024 )

2)have a solid background in database systems (equivalent to COMP9311)

3)have solid programming skills in Java

4)be familiar with working on a Unix-style operating systems

5)have basic knowledge of linear algebra (e.g., vector spaces, matrix multiplication), probability theory and statistics , and graph theory

课程预期结果:

1)elaborate the important characteristics of Big Data

2)develop an appropriate storage structure for a Big Data repository

3)utilize the map/reduce paradigm and the to manipulate Big Data

4)utilize the Spark platform to manipulate Big Data

5)develop efficient solutions for analytical problems involving Big Data

课程作业及计分机制:

课程作业及计分机制

4个project:

1 warm-up programming project on Hadoop MapReduce

1 harder project on Hadoop MapReduce

1 project on Spark

1 project on AWS (MapReduce/Spark)

由于CSE电脑的运行环境是Linux,因此:

Use Linux/command line (virtual machine image will be provided)

Projects marked on Linux servers

You need to be able to upload, run, and test your program under Linux

作业上传:

Use Give to submit (either command line or web page)

Classrun. Check your submission, marks, etc. Read https://wiki.cse.unsw.edu.au/give/Classrun

(注意,作业延迟上交,第一天10% penalty,后面按照30%penalty)

Final Exam:

1)Double Pass, final >= 40%

2)Final written exam (100 pts)

课程计划表:

Schedule

Laboratory:(一共11个)

5 labs on MapReduce;3 labs on Spark;1 lab on high level MapReduce tools;1 lab on AWS;1 lab on big data machine learning platform [tentative]

运行环境安装:(使用虚拟机安装)

1)Pure Xubuntu 14.04: <u>http://www.cse.unsw.edu.au/~z3515164/Raw_Xubuntu.zip</u>

2)Xubuntu 14.04 with pre-installed Hadoop and Eclipse plugin: <u>http://mirror.cse.unsw.edu.au/pub/cs9313/Xubuntu.zip</u>

安装步骤:

(1)Download the zip file and uncompress it, and rename the file "xubuntu-disk.vmdk" as "xubuntu-disk2.vmdk“

(2)Open VirtualBox, File->Import Applicance

(3)Browse the image folder, select the "*.ovf" file

(4)The image will be imported to your computer, which may take 10 minutes

(5)comp9313 is used as both username and password. The hadoop installation path is the same as in the virtual machine on lab computers.

相关文章

  • COMP9313_WEEK1_1_课程简介

    COMP9313(Big Data Management)课程相关介绍 教师:Doctor. 曹欣 Email:自...

  • 课程简介

    复苏与生长 春天,是万物复苏,生机勃勃的季节。 春季的身体舞动,适宜舒枝展叶,唤醒沉睡一冬的身体,当身体充分舒展,...

  • 课程简介

    每个人都对未来充满期待,但现实是每天都在重复同样的工作,看不清未来的发展在哪里工作换来换去,还是找不到自己到底适合...

  • 课程简介

    完全区别于市面上碎片化的知识信息,那些知识点只是饮鸩止渴! 这是一个完整体系的理疗课程,会指引您从理疗小白成长为高...

  • 课程简介

    1 课程名称 Deveping iOS11 Apps with Swift By Paul Heagrty 2 课...

  • 课程简介

    这么课程主要讲解了哪些呢 1、UI视图 UITableView 相关(重用机制的理解和运用方面 、数据源的多线程的...

  • 寒假课程简介

    天文第一课 失序的星球 想要通过日怒高塔第一关的考验,我们必须先了解太阳系八大行星的排列顺序! 天文第二课 旋转星...

  • 《有关课程简介》

    想了很久,应该如何表达这次课程因为 如果用:“中阴身救度”,“破瓦法”,“睡梦瑜伽”这样的字样,有太浓重的宗教乐彩...

  • 1 课程简介

    一共16章,前8章,讲基本使用和原理,后8章讲场景组合,课程来源自b站尚硅谷 用于简化Spring应用开发,其整合...

  • Python行业分析与课程简介

    Python行业分析与课程简介 课程内容:① 了解Python的现状和未来 ② 课程简介 一、Python的起源 ...

网友评论

    本文标题:COMP9313_WEEK1_1_课程简介

    本文链接:https://www.haomeiwen.com/subject/kicxfftx.html