To start your journey towards big data / hadoop you have to follow following steps :
1. Brush up Core Java and SQL .
3. Now you are ready to swim in the world of Big Data
Big Data = This is the combination of Big + Data . We already have RDBMS to handle data but now we are getting digital data from everywhere around the world that makes data big . This is Big Data .
To store and process we have a powerful Java framework called as HADOOP .
Who has created - Doug Cutting
Where - Yahoo
When - 2006
How ? its goes to open source as it was Yahoo product .
Yahoo later provided Hadoop to Apache foundation and now this is the top level open source project in Apache.
Hadoop
|
|
----------------------------------------------
| |
Storage Processing
(HDFS) (MapReduce)
1. Storage : For storage of big data , hadoop uses HDFS i.e. Hadoop Distributed File System .
2. MapReduce : This framework is designed in Java to process large size data in parallel.
We will look both in detail :
1. Storage : Refer this link Hadoop Storage
Hadoop Ecosytem Tools :
1.Data Analytical Tool : Hive
Apache Hive is a Open source Data Warehouse infrastructure built on top of Apache Hadoop for providing data summarization, query, and analysis.
This tool was initially developed by Facebook . Later they have contributed to Apache .
This tool is used for structure type of data .
2. Data Transformation Tool : Pig
This tool is for structure as well as semi -structure data .
3. Data Ingestion Tool : Sqoop
Its a Open Source , Product from Apache .
Full name of Sqoop i.e SQ+OOP = SQL to HADOOP
This tool is used to transfer data from Relational Database to Hadoop supporting storage system and vice versa .
Interesting facts about SQOOP : It is not used only with open source framework i.e. Hadoop but also used by industry giant like below :
1. Informatica provides a Sqoop based connector .
2. Pentaho provides open source Sqoop based connector.
3. Microsoft uses a Sqoop based connector to help transfer from Microsoft SQL Server DB to Hadoop.
and many more ...
Refer this link : SQOOP
Also refer this link to Refresh your PostgreSQL Knowledge . This Link will help you to go through from the all SELECT queries to fetch record from RDBMS ( i,e, PostgreSQL) to transfer data into Hadoop distributed File system , hive or No Sql database like HBASE.
4.Data Ingestion Tool : Flume (Coming Soon)
This is also a Data Ingestion Tool but this tool is used to transfer the semi-structure data from any web server to hdfs/hive/HBASE .
Example : Apache Log file stored in remote web server can be transfer using Flume to HDFS .
Apache Kafka :
First lets go through how we can use Kafka channels in Flume as a reliable and highly available channel for any source/sink combination.
In this blog you will get to know how to transfer data from webserver to hdfs
5.NoSql Databases (Coming Soon)
Hadoop gives better solution for Big Data problems, Your article impressed me to take Hadoop Certification. Thanks for your motivation.
ReplyDeleteRegards
Hadoop Training Chennai | Hadoop Training in Chennai
Thank you so much for your kind words .
Delete