This paper presents rfhoc, an automatic tuning approach for Hadoop configuration parameters aimed at optimizing performance for applications. The method utilizes a random-forest approach to build performance models and a genetic algorithm to explore configuration settings, achieving an average performance speedup of 2.11 times, and up to 7.4 times compared to a recent cost-based optimization method. Additionally, the performance benefits of rfhoc enhance with larger input data sets.
Related topics: