BigData Journey: Apache Kafka Part 1 Webserver --> Using Flume ( Kafka Channel )--> HDFS (Same server)

Tuesday, 25 April 2017

Apache Kafka Part 1 Webserver --> Using Flume ( Kafka Channel )--> HDFS (Same server)

Apache Kafka comes into picture because traditional messaging system don't scale up to handle big data in real time.

Developed by Linkedin engineers .

Apache Kafka is a distributed messaging framework that meets the demands of big data by scaling on commodity hardware .

Best for real-time use cases

Lets look into a example where we need to extract data from webserver and put into HDFS .

1) If your webserver resides on the same Hadoop Cluster .

Webserver --> Using Flume ( Kafka Channel )--> HDFS

# Sources, channels, and sinks names as source1,channel1,sink1
# agent name, in this case 'logagent'.
logagent.sources = source1
logagent.channels = channel1
logagent.sinks = sink1

# spooldir Source Configuration

logagent.sources.source1.type = spooldir

logagent.sources.source1.spoolDir = /log/C_12345

# the hostname that Flume Syslog source will be running on
logagent.sources.source1.host = localhost
# the port that Flume Syslog source will listen on
logagent.sources.source1.port = 5040

#Bind the source to the channel
logagent.sources.source1.channels = channel1

# HDFS Sink configuration
logagent.sinks.sink1.type = hdfs
logagent.sinks.sink1.hdfs.path= hdfs://<hadoop Cluster IP>/flume
logagent.sinks.sink1.hdfs.fileType =DataStream
logagent.sinks.sink1.hdfs.useLocalTimeStamp =true
logagent.sinks.sink1.hdfs.rollInterval =600

#Bind the sink to the channel
logagent.sinks.sink1.channel = channel1

# Kafka Channel Configuration
logagent.channels.channel1.type =org.apache.flume.channel.kafka.KafkaChannel
logagent.channels.channel1.capacity = 10000
logagent.channels.channel1.transactionCapacity = 1000
logagent.channels.channel1.brokerList = kafkaf-2:9092,kafkaf-3:9092
logagent.channels.channel1.topic = channel1
logagent.channels.channel1.zookeeperConnect = kafkaf-1:2181
logagent.channels.channel1.groupId = flume2

Save this as logagent.conf and save to conf directory in my case the path is /hadoop/inst/apache-flume-1.6.0-bin/conf/logagent.conf

Login to your hadoop cluster ->
Go to flume directory -> then type below command

.flume-ng agent --conf /hadoop/inst/apache-flume-1.6.0-bin/conf/ -f /hadoop/inst/apache-flume-1.6.0-bin/conf/logagent.conf -Dflume.root.logger=INFO ,console -n logagent

--conf /hadoop/inst/apache-flume-1.6.0-bin/conf/
syntax : --conf <path of flume conf folder >

-f /hadoop/inst/apache-flume-1.6.0-bin/conf/logagent.conf
syntax : -f <path of logagent.conf file>

-Dflume.root.logger=INFO ,console
This is to print all the messages on the console .

-n logagent
syntax : -n <name of the agent > // Note : not the conf file name

In the next blog -> we will see how to fetch records from external server and put into hdfs .

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)