More

    WHAT IS APACHE HADOOP?

    We already saw what is Big Data. There are many technologies and processes to work with Big Data. Hadoop is one of the most popular technologies available to work with Big Data. 

    Hadoop is not a single product, but a collection of components.

    There is a core set of components in Hadoop like: 

    1. HDFS, which stands for Hadoop Distributed File System,
      • is Hadoop’s way of distributing the data over the network.
      • HDFS replicates the data over the network with a default replication factor of 3, thus providing better availability (from CAP theorem) and making Hadoop more reliable.
      • HDFS also allow to add more nodes to the setup, without affecting other nodes, thus providing partition tolerance  (from CAP theorem) and making Hadoop more scalable.
    2. MapReduce,
      • is Hadoop’s way of taking processing to nodes. 
      • MapReduce is a programming model implementation with a parallel, distributed algorithm on a cluster, and is designed for processing and generating large data sets much faster. 
      • MapReduce can be considered as a two step processes: a process that splits tasks into pieces (mapping) and then combines the result (reducing).
    3. YARN,
      • is a newer version of MapReduce architecture, that
      • separates resource management and job scheduling/monitoring responsibilities of original MapReduce, into separate daemons.

    Hadoop is also an affordable solution because it runs on one or many regular commodity hardware and is also open source.

    There are also additional technologies like Pig, Hive, HBase, Storm, Spark, Shark etc. that works along with the core set of HDFS, MapReduce and YARN and help in processing Big Data efficiently.

    Spark is a system, which many also consider a core technology as it can also work independently of MapReduce and can process data in memory.

    Recent Articles

    OAUTH – FREQUENTLY ASKED QUESTIONS FOR INTERVIEWS AND SELF EVALUATION

    Why is refresh token needed when you have access token? Access tokens are usually short-lived and refresh tokens are...

    SUMO LOGIC VIDEOS AND TUTORIALS

    Sumo Logic Basics - Part 1 of 2 (link is external) (Sep 29, 2016)Sumo Logic Basics - Part 2 of 2...

    GIT – USEFUL COMMANDS

    Discard all local changes, but save them for possible re-use later:  git stash Discarding local changes...

    DISTRIBUTED COMPUTING – RECORDED LECTURES (BITS)

    Module 1 - INTRODUCTION Recorded Lecture - 1.1 Introduction Part I – Definition

    BOOK REVIEW GUIDELINES FOR COOKBOOKS

    Whenever you add reviews for the book, please follow below rules. Write issues in an excel.Create an excel...

    Related Stories

    Leave A Reply

    Please enter your comment!
    Please enter your name here

    Stay on op - Ge the daily news in your inbox