安装flume
官网下载最新版
解压 tar -xzvf apache-flume-1.9.0-bin.tar.gz
设置环境变量
vi /etc/profile.d/flume.sh
source /etc/profile
mv flume-env.sh.template flume-env.sh
配置JAVA_HOME、FLUME_CLASPATH
export JAVA_HOME=/usr/tools/jdk1.8.0_181
FLUME_CLASSPATH="/usr/tools/hadoop-3.1.2/share/hadoop/hdfs/*"
备注:FLUME_CLASSPATH="启动flume所需的依赖,包括主目录lib下的jar包、conf目录、自定义的插件(放到plugins.d)等"
配置source、channel、sink
在conf目录下新增配置文件hdfs_sink.conf
LogAgent.sources = apache
LogAgent.channels = fileChannel
LogAgent.sinks = HDFS
#sources config
#spooldir 对监控指定文件夹中新文件的变化,一旦有新文件出现就解析,解析写入channel后完成的文件名将追加后缀为*.COMPLATE
LogAgent.sources.apache.type = spooldir
LogAgent.sources.apache.spoolDir = /tmp/logs
LogAgent.sources.apache.channels = fileChannel
LogAgent.sources.apache.fileHeader = false
#sinks config
LogAgent.sinks.HDFS.channel = fileChannel
LogAgent.sinks.HDFS.type = hdfs
LogAgent.sinks.HDFS.hdfs.path = hdfs://ubuntu:9000/data/logs/%Y-%m-%d/%H
LogAgent.sinks.HDFS.hdfs.fileType = DataStream
LogAgent.sinks.HDFS.hdfs.writeFormat=TEXT
LogAgent.sinks.HDFS.hdfs.filePrefix = flumeHdfs
LogAgent.sinks.HDFS.hdfs.batchSize = 1000
LogAgent.sinks.HDFS.hdfs.rollSize = 10240
LogAgent.sinks.HDFS.hdfs.rollCount = 0
LogAgent.sinks.HDFS.hdfs.rollInterval = 1
LogAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
#channels config
LogAgent.channels.fileChannel.type = memory
LogAgent.channels.fileChannel.capacity =10000
LogAgent.channels.fileChannel.transactionCapacity = 100
启动
创建监控目录 mkdir -p /tmp/logs
在flume安装目录下执行
bin/flume-ng agent --conf-file conf/hdfs_sink.conf -c conf/ --name LogAgent -Dflume.root.logger=DEBUG,console
验证
- 打开另一终端
- 在监控目录/tmp/logs下新建test.log文件
vi test.log
保存文件查看
a. test.log变成test.log.COMPLETED
b. HDFS目录下生成了文件及路径为:hdfs://ubuntu:9000/data/logs/2019-06-17/18/flumeHdfs.1560777723320
c. 查看文件内容
hdfs -dfs -cat data/logs/2019-06-17/18/flumeHdfs.1560777723320









网友评论