How JustAnswer Works:
  • Ask an Expert
    Experts are full of valuable knowledge and are ready to help with any question. Credentials confirmed by a Fortune 500 verification firm.
  • Get a Professional Answer
    Via email, text message, or notification as you wait on our site. Ask follow up questions if you need to.
  • 100% Satisfaction Guarantee
    Rate the answer you receive.
Ask Steve Herrod Your Own Question
Steve Herrod
Steve Herrod, Computer Support Specialist
Category: Software
Satisfied Customers: 160
Experience:  Familiar with a wide variety of software and experienced in user training/support
Type Your Software Question Here...
Steve Herrod is online now

Hadoop contains some parallelization technology and a

Customer Question

Hello, Hadoop contains some parallelization technology and a protocol of communication between the master and the slave nodes. Is it any different from MPI or it is only a kind of copy?
Submitted: 1 year ago.
Category: Software
Expert:  JamesI replied 1 year ago.

Hadoop supports fault tolerance (node failure, corrupt data transports etc), which is not the case of MPI.

Most implementations I have seen of Hadoop have also used MPI, since the two technologies specialise in different tasks, and it comes down to the calculations and data manipulation you are performing.

If you have resilient and high performance hardware, MPI can be used, however where there is a risk of failure\latency Hadoop helps reduce some of the effects.

That said Hadoop is aimed at non-iterative algorithms where nodes require little data exchange to proceed (non-iterative and independent), where as MPI is aimed at iterative algorithms where nodes require data exchange to proceed (iterative and dependent).

I found a good explanation of the two frameworks (which explain it better than me)

  1. MPI is a message passing paradigm of parallelism. Here, you have a root machine which spawns programs on all the machines in its MPI WORLD. All the threads in the system are independent and hence the only way of communication between them is through messages over network. The network bandwidth and throughput is one of the most crucial factor in MPI implementation's performance. Idea : If there is just one thread per machine and you have many cores on it, you can use OpenMP shared memory paradigm for solving subsets of your problem on one machine.
  2. Hadoop is used for solving large problems on commodity hardware using Map Reduce paradigm. Hence, you do not have to worry about distributing data or managing corner cases. Hadoop also provides a file system HDFS for storing data on compute nodes.
Customer: replied 1 year ago.
Thanks your informative answer. Please clarify and I will rate you : "The parallelization algorithms of MPI are not integrated in Hadoop MapReduce and thus Hadoop parallelization is something original".