Hadoop 3.0 Architecture

Hadoop, the most popular open-source distributed framework has arrived with a new release 3.x. It brings promising features and enhancements, but here we will demystify the Hadoop 3.0 Architecture in detail. The difference between Hadoop 3.0 & Hadoop 2.0  is already talked a lot but how all such changes fit into Hadoop 3.0 architecture will give you a better insight and make you a better aware developer.
Lets see how Hadoop 3.0 architecture evolved from its initial release in 2006 till Hadoop 2.x version. Hadoop 2.x has much improved architecture with YARN and building blocks look more flexible.
Hadoop 3.0 Architecture
As data started growing and enterprise working on Enterprise Data Lake (EDL) solution, optimizing the cost of storage is one of the key concern. The underline development programming language (Java) also moved moved forward to 1.8 with many enhanced feature, the adoption is must for Hadoop community. YARN improvement, Task Level Native Optimization, Derive heap size automatically, Schedule enhancement, Change of default ports, Client side class path isolation are the other changes which brought the new architecture for Hadoop 3.0

Hadoop 3.0 Architecture for HDFS

HDFS 2.x current implementation has 200% of space overhead. Each data block is copied to two other data nodes. This is a very simple, scalable and robust architecture but has too much of space overhead.
HDFS 2.0 Implementation Architecture
HDFS 3.0 architecture is implemented by Erasure Coding
HDFS 3.0 Architecture

HDFS 3.0 Architectural Decision HDFS-7185

Hadoop 3.0 Downstream Compatibility

Following are the version compatibility matrix sheet indication the version of different Apache projects and their unit test status including basic functionality testing. This was done as part of Hadoop 3.0 Beta 1 release in Oct 2017.
Apache ProjectVersionCompilesUnit Testing StatusBasic Functional Testing
HBase2.0.0
Spark2.0
Hive2.1.0
Oozie5.0
Pig0.16
Solr6.x
Kafka0.10

More on Hadoop 3.0 Related Topics

#Other ArticlesLink
1All the newly added features and enhancements in Hadoop 3.0Hadoop 3.0 features and enhancement
2Detailed comparison between Hadoop 3.0 vs Hadoop 2.0 and what benefit it brings to the developerHadoop 3.0 vs Hadoop 2.0
3Hadoop 3.0 InstallationHadoop 3.0 Installation
4Hadoop 3.0 Release DateHadoop 3.0 Release Date
5Hadoop 3. 0 Security BookHadoop 3.0 Security by Ben and Joey
6Demystify The Hadoop 3.0 Architecture and its componentsHadoop 3.0 Architecture
7Hadoop 3.0 & Hortonworks Support for it in HDP 3.0 Release
Hadoop 3.0 Hortonworks

Hadoop 3.0 vs Hadoop 2.0

Hadoop 3.0 vs Hadoop 2.0 : Hadoop 3.0.0 GA (General Availability) is released on 13-Dec-2017. Everybody wants to know what it brings into the table for developer, administrator and enterprise IT. There are top 8 focus area where Hadoop 3.0 shows improvement over Hadoop 2.0. these attributes sure indicates that Hadoop 3.0 is much more better & easier for developers, cost saving for enterprise and more manageable for administrators

8  Key Comparison Factor : Hadoop 3.0 vs Hadoop 2.0

Look at the below comparison table where it clearly says what Hadoop 3.0 is promising. It brings lot of cook features for big data engineers to make their life easier.
Hadoop 3 vs Hadoop 2 side by side
Hadoop 3 vs Hadoop 2 side by side

The key Hadoop 3.0 new features and enhancement are as follows

  • Java 8 (jdk 1.8) as runtime for Hadoop 3.0
  • Erasure Encoding for to reduce storage cost
  • YARN Timeline Service v.2 (YARN-2928)
  • New Default Ports for Several Services
  • Intra-DataNode Balancer
  • Shell Script Rewrite (HADOOP-9902)
  • Shaded Client Jars
  • Support for Opportunistic Containers
  • MapReduce Task-Level Native Optimization
  • Support for More than 2 NameNodes
  • Support for Filesystem Connector
  • Reworked Daemon and Task Heap Management
  • Improved Fault-tolerance with Quorum Journal Manager





No comments:

Post a Comment