FACADE Academic Article uri icon

abstract

  • The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for implementing such applications, due to its quick development cycle and rich community resource. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets Big Data, this cost is significantly magnified and becomes a scalability-prohibiting bottleneck. This paper presents a novel compiler framework, called Facade, that can generate highly-efficient data manipulation code by automatically transforming the data path of an existing Big Data application. The key treatment is that in the generated code, the number of runtime heap objects created for data types in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform 7 common applications on 3 real-world, already well-optimized Big Data frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3%--48% execution time reduction and an up to 88X GC reduction; (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.

published proceedings

  • ACM SIGPLAN Notices

author list (cited authors)

  • Nguyen, K., Wang, K., Bu, Y., Fang, L. u., Hu, J., & Xu, G.

citation count

  • 5

complete list of authors

  • Nguyen, Khanh||Wang, Kai||Bu, Yingyi||Fang, Lu||Hu, Jianfei||Xu, Guoqing

publication date

  • May 2015