Installing and Configuring Hadoop in Fully Distributed Mode

Thursday, July 25, 201325comments

For setting up a multi-node cluster, You first need to learn and understand setting up a pseudo distributed cluster. You can go thru my earlier blog by clicking in here to learn this process. 

Now that You know how to set up a pseudo distributed Hadoop cluster, You can do the below steps to build your multimode cluster. I will mention the process of setting up a two node cluster here with a "master" and a "slave" machine. The master is where the name node and job tracker runs and is the single point of interaction and failure in the cluster. The slaves run data node and task tracker and act as per the direction of the hdfs and map reduce master. You can use this process to scale up to as many nodes as You want.

1. Pick Your 2nd computer which You want to make as "the slave" in the cluster. We will call it "slave". Find out its ip address issuing the ifconfig command in the terminal. If you are in a wifi LAN to set up your cluster and if Your ip address is set to DHCP and changes frequently everytime You login, then You may consider setting up a static ip address as done for the master node in my previous tutorial. Click here to go to the blog to set up static ip address.

2. Do steps 1, 2, 3, 4, 5, 6 of pseudo-distributed cluster set-up process blog on the slave machine. The hostname on this machine should be set as "slave".

3. Define the slave machine in the /etc/hosts file of the master and vice versa.

4. scp the slave machine's id_rsa.pub key generated in step 5, to the master machine.
On Slave: $ scp -r .ssh/id_rsa.pub hduser@master:/home/hduser/
It will ask you for the password of master machine's hduser account at this point.
Now Concatenate this id_rsa.pub of slave to the authorized_keys file of master.
On Master: cat $HOME/id_rsa.pub >> $HOME/.ssh/authorized_keys
On Master: $ rm -rf $HOME/id_rsa.pub
Now SCP the authorized_keys file from master to slave.
On Master:  $ scp -r .ssh/authorized_keys hduser@slave:/home/hduser/.ssh/
It will ask you for a password of the slave hduser account at this point.
After this step, the passwordless ssh communication is set up between your master and slave machine using hduser account. You can test it out by using the following commands
On slave(logging into hduser account): $ ssh master 
On master(logging onto hduser account): $ssh slave
SSH login to the other system won't need a password now and the two systems can talk to each other.

5. Now on the master machine where you had set up your pseudo cluster, make the below changes:
  • Add "slave" to the hadoop/conf/slaves file. Let master continue to be listed there, so that a data node and a task tracker runs on the master machine too. The slave nodes have to mentioned in the slaves file, one slave per line.
  • Change the dfs.replication property in hdfs-site.xml file from 1 to 2. So 2 copies of each block is going to be stored in the cluster redundantly.

6. Now scp the entire /home/hduser/bigdata/hadoop/ directory from master to slave computer. 
On Master: $ scp -r bigdata/hadoop/* hduser@slave:/home/hduser/bigdata/hadoop/ 
Note that the absolute path of the hadoop home should remain same in both machines. Also the below directory structure should be present in the slave as in the master:

- bigdata (our main bigdata projects related directory)
      - hadoop (Hadoop App Folder. This one is scp 'ed from the master.)
      - hadoopdata (data directory)
         - name (dfs.name.dir points to this directory)
         - data (dfs.data.dir points to this directory)
         - tmp  (hadoop.tmp.dir points to this directory)

7. Clear the masters and slaves file in the slave machine.

8. Now clear the $HOME/bigdata/hadoopdata/data directory in both the machines.
$ rm -rf $HOME/bigdata/hadoop/data 

9. Format the namenode on master machine:
On Master: $ hadoop namenode -format

Now Your cluster is ready to hadoop to run in fully distributed mode. You can run the start-all.sh in your master machine. It will start the name node, data node, job tracker, secondary name node, task tracker daemons in the master and start a data node and a task tracker in the slave machine. You can check the url master:50070 for name node administration and master:50030 for map reduce administration. After You are done with your hadoop work, dont forget to issue the stop-all.sh on the master node to stop the daemons running on the cluster.
Share this article :

+ comments + 25 comments

December 11, 2014 at 12:06 AM

Hi,i hope to your information really understand.

Refer The Link Below:
Besant Technologies
&
Seleniumtraininginchennai

June 10, 2015 at 12:09 AM

Really awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog. Keep update your blog. Recently I did Testing Training in Chennai at a reputed training institutes. This is really useful for me to make a bright future in IT field.

July 3, 2015 at 4:15 AM

I have read your blog and i got a very useful and knowledgeable information from your blog. its really a very nice article.You have done a great job .

Regards......

Best Institute for Cloud Computing in Chennai

August 9, 2015 at 11:00 PM

There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

August 9, 2015 at 11:22 PM

Now a days cloud based technologies are getting popular like wild fire. So as the training programs related to these technologies. Thanks for providing an useful information.

Hadoop Training Chennai
Salesforce Training in Chennai

October 6, 2015 at 11:21 PM


hai you have to learned to lot of information about c# .net Gain the knowledge and hands-on experience you need to successfully design, build and deploy applications with c#.net.
C-Net-training-in-chennai

October 6, 2015 at 11:21 PM


hai If you are interested in asp.net training, our real time working.
asp.net Training in Chennai.
Asp-Net-training-in-chennai.html

October 6, 2015 at 11:22 PM


Amazing blog if our training additional way as an silverlight training trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog
silverlight-training.html
greenstechnologies.in:

October 6, 2015 at 11:22 PM



awesome Job oriented sharepoint training in Chennai is offered by our institue is mainly focused on real time and industry oriented. We provide training from beginner’s level to advanced level techniques thought by our experts.
if you have more details visit this blog.
SharePoint-training-in-chennai.html

October 6, 2015 at 11:22 PM


if share valuable information about cloud computing training courses, certification, online resources, and private training for Developers, Administrators, and Data Analysts may visit
Cloud-Computing-course-content.html

October 6, 2015 at 11:24 PM


hai If you are interested in asp.net training, our real time working.
asp.net Training in Chennai.
Asp-Net-training-in-chennai.html

October 6, 2015 at 11:25 PM


Amazing blog if our training additional way as an silverlight training trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog
silverlight-training.html
greenstechnologies.in:

October 6, 2015 at 11:25 PM


awesome Job oriented sharepoint training in Chennai is offered by our institue is mainly focused on real time and industry oriented. We provide training from beginner’s level to advanced level techniques thought by our experts.
if you have more details visit this blog.
SharePoint-training-in-chennai.html

October 22, 2015 at 12:32 AM


I have read your blog, it was good to read & I am getting some useful info's through your blog keep sharing... Informatica is an ETL tools helps to transform your old business leads into new vision. Learn Informatica training in chennai from corporate professionals with very good experience in informatica tool.
Regards,
Informatica training center in Chennai|Informatica training chennai

October 28, 2015 at 2:52 AM

Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
Regards,
Best Informatica Training In Chennai|Informatica training chennai|sas training in Chennai

November 2, 2015 at 4:04 AM

There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

November 2, 2015 at 10:51 PM

Excellent information with unique content and it is very useful to know about the information based on blogs.
Selenium Training in Chennai |
QTP Training In Chennai

November 5, 2015 at 10:32 PM

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Standard Edition, provided the nodes are clustered using Oracle Clusterware.
Oracle RAC allows multiple computers to run Oracle RDBMS software simultaneously while accessing a single database, thus providing clustering.

In a non-RAC Oracle database, a single instance accesses a single database. The database consists of a collection of data files, control files, and redo logs located on disk. The instance comprises the collection of Oracle-related memory and operating system processes that run on a computer system.

Oracle RAC Training in Chennai

November 20, 2015 at 12:34 AM

It is really very helpful for us and I have gathered some important information from this blog.
Javascript Training in Chennai

December 10, 2015 at 4:39 AM

Pretty Post! It is really interesting to read from the beginning & I would like to share your blog to my circles for getting awesome knowledge, keep your blog as updated.
Regards,
sas training in Chennai|sas training chennai|sas institutes in Chennai

January 9, 2016 at 11:03 PM

Hai have a good day....
i got knowledge about this topic through your informative post..i would like to thanks for sharing your post......i am eagerly waiting for your upcoming post...
http://sonymobileservicecenterinchennai.in/

January 12, 2016 at 3:46 AM

I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

Software testing training in chennai | Software testing training | Software testing course chennai

August 12, 2016 at 2:51 AM

installing and configuring nice posts..

Hadoop online training .All the basic and get the full knowledge of hadoop.
hadoop online training

September 7, 2016 at 2:24 AM

good Article
lenovo laptop service center chennai

Post a Comment

 
Support : Creating Website | Johny Template | Mas Template
Copyright © 2011. Atom's Arena - All Rights Reserved
Template Created by Creating Website Published by Mas Template
Proudly powered by Blogger