Researchers Optimizes the Datasize-Aware High Dimensional Configurations Auto-Tuning for In-Memory Cluster Computing
In-memory cluster computing (IMC) has evolved into a popular paradigm for big data analytics because it runs 10~100 times faster than the on-disk cluster computing (ODC). However, the performance and configuration of an IMC program is sensitive to the size of input dataset. Moreover, the number of performance-critical configuration parameters of IMC is typically larger than 40. The combination of data sensitivity and high dimensional configurations makes optimizing the performance of IMC programs extremely difficult.
A research group led by Prof. YU Zhibin and Dr. BEI Zhendong from Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Science (CAS) has made great progress in optimizing the data sensitive and high dimensional configurations for IMC programs.
In their work, a hierarchical machine-learning based performance model has been built as a function of configuration parameters and the size of input datasets. Experimental results show that this model is much more accurate than traditional machine learning and statistical reasoning models such as random forest and response surface. Base on the performance model, the researchers employ the genetic algorithm to search the best configuration for optimal performance of an IMC program. As the result, the proposed approach speeds up the IMC programs configured default configurations by a factor of 30.4x on average and up to 89x.
The paper entitled “Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing” has been published in the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (2018).
This work is supported by the National Key Research and Development Program of China, National Natural Science Foundation of China (NSFC). Outstanding technical talent program of CAS. Additional support is provided by the major scientific and technological project of Guangdong province, Shenzhen Technology Research Project, and Key technique research on Haiyun Data System of NICT.
Prof. YU Zhibin is reporting on the ASPLOS 2018 (Image by Prof. YU Zhibin)