Hadoop Version 3.0 - What's New?
Last Updated :
04 Aug, 2025
Hadoop is a Java-based framework for distributed storage and processing of large datasets. Introduced in 2006 by Doug Cutting and Mike Cafarella for the Nutch project, it soon became central to Big Data technologies. By 2008, it outperformed supercomputers in sorting terabytes of data. With Hadoop 2.x enabling scalability and Hadoop 3.x improving fault tolerance, efficiency, and flexibility, it continues to power modern data-intensive industries.
Key New Features in Hadoop 3.0
1. JDK 8.0 is the Minimum JAVA Version Supported by Hadoop 3.x
Since Oracle has ended the use of JDK 7 in 2015, so to use Hadoop 3 users have to upgrade their Java version to JDK 8 or above to compile and run all the Hadoop files. JDK version below 8 is no more supported for using Hadoop 3.
2. Erasure Coding is Supported
Erasure coding in Hadoop 3 provides fault tolerance by reconstructing lost data, similar to RAID technology. Unlike Hadoop 2, which relied on replication, erasure coding requires nearly half the storage while offering the same reliability. This reduces disk usage, saves storage costs, and improves fault tolerance efficiency in Hadoop clusters built on commodity hardware.
3. More Than Two NameNodes Supported
Hadoop 3.x extends fault tolerance by supporting multiple standby NameNodes instead of just one, as in Hadoop 2.x. Data replication is managed through a quorum of three or more JournalNodes, making the cluster more resilient. For example, configuring three NameNodes with five JournalNodes allows the system to handle failures of two NameNodes, ensuring higher availability for big data applications.
4. Shell Script Rewriting
The Hadoop file system utilizes various shell-type commands that directly interact with the HDFS and other file systems that Hadoop supports i.e. such as WebHDFS, Local FS, S3 FS, etc. The multiple functionalities of Hadoop are controlled by the shell. The shell script used in the latest version of Hadoop i.e. Hadoop 3.x has fixed lots of bugs. Hadoop 3.x shell scripts also provide the functionality of rewriting the shell script.
5. Timeline Service v.2 for YARN
The YARN Timeline service stores and retrieve the applicant's information(The information can be ongoing or historical). Timeline service v.2 was much important to improve the reliability and scalability of our Hadoop. System usability is enhanced with the help of flows and aggregation. In Hadoop 1.x with TimeLine service, v.1 users can only make a single instance of reader/writer and storage architecture that can not be scaled further.
Hadoop 2.x uses distributed writer architecture where data read and write operations are separable. Here distributed collectors are provided for every YARN(Yet Another Resource Negotiator) application. Timeline service v.2 uses HBase for storage purposes which can be scaled to massive size along with providing good response time for reading and writing operations.
The information that Timeline service v.2 stores can be of major 2 types:
A. Generic information of the completed application
- user information
- queue name
- count of attempts made per application
- container information which runs for each attempt on application
B. Per framework information about running and completed application
- count of Map and Reduce Task
- counters
- information broadcast by the developer for TimeLine Server with the help of Timeline client.

6. Filesystem Connector Support
This new Hadoop version 3.x now supports Azure Data Lake and Aliyun Object Storage System which are the other standby option for the Hadoop-compatible filesystem.
7. Default Multiple Service Ports Have Been Changed
In the Previous version of Hadoop, the multiple service port for Hadoop is in the Linux ephemeral port range (32768-61000). In this kind of configuration due to conflicts occurs in some other application sometimes the service fails to bind to the ports. So to overcome this problem Hadoop 3.x has moved the conflicts ports from the Linux ephemeral port range and new ports have been assigned to this as shown below.
// The new assigned Port
Namenode Ports: 50470 -> 9871, 50070 -> 9870, 8020 -> 9820
Datanode Ports: 50020-> 9867,50010 -> 9866, 50475 -> 9865, 50075 -> 9864
Secondary NN Ports: 50091 -> 9869, 50090 -> 9868
8. Intra-Datanode Balancer
DataNodes are utilized in the Hadoop cluster for storage purposes. The DataNodes handles multiple disks at a time. This Disk's got filled evenly during write operations. Adding or Removing the disk can cause significant skewness in a DataNode. The existing HDFS-BALANCER can not handle this significant skewness, which concerns itself with inter-, not intra-, DN skew. The latest intra-DataNode balancing feature can manage this situation which is invoked with the help of HDFS disk balancer CLI.
9. Shaded Client Jars
The new Hadoop–client-API and Hadoop-client-runtime are made available in Hadoop 3.x which provides Hadoop dependencies in a single packet or single jar file. In Hadoop 3.x the Hadoop –client-API have compile-time scope while Hadoop-client-runtime has runtime scope. Both of these contain third-party dependencies provided by Hadoop-client. Now, the developers can easily bundle all the dependencies in a single jar file and can easily test the jars for any version conflicts. using this way, the Hadoop dependencies onto application classpath can be easily withdrawn.
10. Task Heap and Daemon Management
In Hadoop version 3.x we can easily configure Hadoop daemon heap size with some newly added ways. With the help of the memory size of the host auto-tuning is made available. Instead of HADOOP_HEAPSIZE, developers can use the HEAP_MAX_SIZE and HEAP_MIN_SIZE variables. JAVA_HEAP_SIZE internal variable is also removed in this latest Hadoop version 3.x. Default heap sizes are also removed which is used for auto-tuning by JVM(Java Virtual Machine). If you want to use the older default then enable it by configuring HADOOP_HEAPSIZE_MAX in Hadoop-env.sh file.
Related Articles
Similar Reads
Difference Between Hadoop 2.x vs Hadoop 3.x The Journey of Hadoop Started in 2005 by Doug Cutting and Mike Cafarella. Which is an open-source software build for dealing with the large size Data? The objective of this article is to make you familiar with the differences between the Hadoop 2.x vs Hadoop 3.x version. Obviously, Hadoop 3.x has so
2 min read
What's new in Bootstrap v4.3 ? The front end web development track consists of several languages, frameworks, and libraries. HTML, CSS, JavaScript, jQuery, AngularJS, ReactJS, VueJS, Bootstrap are some of those. These are some of the technologies that a developer must know in order to make beautiful, responsive, and functional we
4 min read
What's new in Vue 3? Vue is a Progressive Javascript Framework for building UI and single-page applications. It is an open-source Model-View-ViewModel (MVVM) framework. The core framework is primarily focused on the view layer and it can be easily integrated with other libraries and projects. Using modern tooling and su
3 min read
What's New in Flutter 3.16? Today we were going to discuss the Flutter new stable version which is 3.16 which was released mid of November 2023. In this new version, a lot of changes have been made by the Flutter team which we discussed point-wise Changes in Flutter 3.16By Default, Use Material 3Add, Option in Edit MenuSelecti
7 min read
How Web 3.0 is Going to Impact the Digital World The internet: a familiar term that encompasses a whole universe in itself. Calling it probably the greatest human invention since sliced bread wouldnât be wrong. This global network of billions of interconnected computers and other such devices is single-handedly responsible for impacting the everyd
7 min read
JDK 23: New Features of Java 23 Java Development Kit (JDK) 23 is a long-awaited release, which brings numerous new features, enhancements and updates to increase performance, security and the overall experience of developers. This guide wants to provide an extensive review of what will be in JDK 23 such as new characteristics, nec
14 min read