Apache Kafka Part 1 Webserver --> Using Flume ( Kafka Channel )--> HDFS (Same server)
Apache Kafka comes into picture because traditional messaging system don't scale up to handle big data in real time.
Developed by Linkedin engineers .
Apache Kafka is a distributed messaging framework that meets the demands of big data by scaling on commodity hardware .
Best for real-time use cases
Lets look into a example where we need to extract data from webserver and put into HDFS .
1) If your webserver resides on the same Hadoop Cluster .
Webserver --> Using Flume ( Kafka Channel )--> HDFS
# Sources, channels, and sinks names as source1,channel1,sink1
# agent name, in this case 'logagent'.
logagent.sources = source1
logagent.channels = channel1
logagent.sinks = sink1
# spooldir Source Configuration
logagent.sources.source1.type = spooldir
logagent.sources.source1.spoolDir = /log/C_12345
# the hostname that Flume Syslog source will be running on
logagent.sources.source1.host = localhost
# the port that Flume Syslog source will listen on
logagent.sources.source1.port = 5040
#Bind the source to the channel
logagent.sources.source1.channels = channel1
# HDFS Sink configuration
logagent.sinks.sink1.type = hdfs
logagent.sinks.sink1.hdfs.path= hdfs://<hadoop Cluster IP>/flume
logagent.sinks.sink1.hdfs.fileType =DataStream
logagent.sinks.sink1.hdfs.useLocalTimeStamp =true
logagent.sinks.sink1.hdfs.rollInterval =600
#Bind the sink to the channel
logagent.sinks.sink1.channel = channel1
# Kafka Channel Configuration
logagent.channels.channel1.type =org.apache.flume.channel.kafka.KafkaChannel
logagent.channels.channel1.capacity = 10000
logagent.channels.channel1.transactionCapacity = 1000
logagent.channels.channel1.brokerList = kafkaf-2:9092,kafkaf-3:9092
logagent.channels.channel1.topic = channel1
logagent.channels.channel1.zookeeperConnect = kafkaf-1:2181
logagent.channels.channel1.groupId = flume2
Save this as logagent.conf and save to conf directory in my case the path is /hadoop/inst/apache-flume-1.6.0-bin/conf/logagent.conf
Login to your hadoop cluster ->
Go to flume directory -> then type below command
.flume-ng agent --conf /hadoop/inst/apache-flume-1.6.0-bin/conf/ -f /hadoop/inst/apache-flume-1.6.0-bin/conf/logagent.conf -Dflume.root.logger=INFO ,console -n logagent
--conf /hadoop/inst/apache-flume-1.6.0-bin/conf/
syntax : --conf <path of flume conf folder >
-f /hadoop/inst/apache-flume-1.6.0-bin/conf/logagent.conf
syntax : -f <path of logagent.conf file>
-Dflume.root.logger=INFO ,console
This is to print all the messages on the console .
-n logagent
syntax : -n <name of the agent > // Note : not the conf file name
In the next blog -> we will see how to fetch records from external server and put into hdfs .
No comments:
Post a Comment