Flume的下载
下载地址:
http://mirrors.hust.edu.cn/apache/
http://flume.apache.org/download.html
Flume的安装
Flume框架对hadoop和zookeeper的依赖只是在jar包上,并不要求flume启动时必须将hadoop和zookeeper服务也启动。
上传并解压
[hadoop@hadoop001 app]$tar -xzvf flume-ng-1.6.0-cdh5.7.0.tar.gz
[hadoop@hadoop001 app]$mv apache-flume-1.6.0-cdh5.7.0 flume
修改配置文件,添加JAVA_HOME ```bash [hadoop@hadoop001 app]$ cd flume [hadoop@hadoop001 flume]$ cp flume/conf/flume-env.sh.template flum/conf/flume-env.sh [hadoop@hadoop001 flume]$ vim flume/conf/flume-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_45
- 添加环境变量
```bash
hadoop@hadoop001 bin]$ soruce ~/.bash_profile
export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin
export PATH=$FLUME_HOME/bin:$PATH
[hadoop@hadoop001 bin]$ source ~/.bash_profile
[hadoop@hadoop001 bin]$ which flume-ng
/home/hadoop/flume/bin/flume-ng
Agent配置使用案列
- Flume的使用其实就是Source、Channel、Sink的配置
- Agent=Source+Channel+Sink,其实agent就是Flume的配置文件
- 一个配置文件可以配置多个Agent的。
- Event:Flume数据传输的最小单位,一个EVent就是一条记录,由head和body两个部分组成,head存储的是管道,body存储的是字节数组
Flume文件配置
```bash [hadoop@hadoop001 conf]$ vim /home/hadoop/flume/conf/example.conf
example.conf: A single-node Flume configuration
#
#
主要作用是监听文件中的新增数据,采集到数据之后,输出到avro
注意:Flume agent的运行,主要就是配置source channel sink
下面的a1就是agent的代号,source叫r1 channel叫c1 sink叫k1
#
Name the components on this agent
a1.sources = r1 a1.sinks = k1 a1.channels = c1
对于source的配置描述 监听文件中的新增数据 exec
a1.sources.r1.type = exec a1.sources.r1.command = tail -F /home/uplooking/data/data-clean/data-access.log
对于sink的配置描述 使用avro日志做数据的消费
a1.sinks.k1.type = avro a1.sinks.k1.hostname = uplooking03 a1.sinks.k1.port = 44444
对于channel的配置描述 使用文件做数据的临时缓存 这种的安全性要高
a1.channels.c1.type = file a1.channels.c1.checkpointDir = /home/uplooking/data/flume/checkpoint a1.channels.c1.dataDirs = /home/uplooking/data/flume/data
通过channel c1将source r1和sink k1关联起来
a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
配置完成后, 启动Flume Agent,即可对日志文件进行监听:
```bash
$ flume-ng agent --conf conf -n a1 -f ./flume/conf/example.conf >/dev/null 2>&1 &
注意参数 -n ,a1要对应example.conf中的a1