Collaborative Research: CNS Core: Medium: Learning to Cache and Caching to Learn in High Performance Caching Systems Grant uri icon

abstract

  • Caching is fundamental to cloud computing and content distribution, and is important to the vast number of applications and services they support. Crucial performance metrics of a caching algorithm are its ability to quickly and accurately learn a changing popularity distribution. However, there is a serious disconnect between empirical studies using real-world traces that account for popularity changes, and analytical performance analysis results that assume a fixed popularity. A basic goal of this project is to develop a methodology based on online learning and reinforcement learning for caching algorithm design with provable performance guarantees. This enables the systematic design of caching algorithms that can be tailored to a variety of application contexts. The use-case of these algorithms is in high performance caching networks that support large-scale cloud applications and services. Emulation of high-performance caching systems to leverage and to empirically evaluate the online learning algorithms developed supports this goal, and provides a real-world context for the methodology developed. The results will also enhance the performance of content distribution platforms. At the same time the project develops fundamental theories that pertain to the area of machine learning, specifically to online learning. This project aims at optimally utilizing locally available memory and computing resources of caches, while ensuring provably good performance via fast and accurate learning of content popularity. This requires the conjunction of several mathematical tools to analyze online learning algorithms, as well as strong systems development skills to make the algorithms a reality. The project addresses these key challenges in two main themes. The first theme focuses on systematic design of distributed online learning in networks of caches using collaborative filtering for distributed identification of popular content, and multi-agent reinforcement learning for joint learning and content placement. The second theme focuses on building high performing caching systems using the algorithms developed in the first theme, and quantifying the impacts of the algorithms on real-world applications such as Hipster Shop, an open-source e-commerce website, and Spark data-analytics job pipelines. The immediate impact of this project is in creating high performance caching schemes that apply to cloud computing and content distribution networks. This project also advances the fundamental theory of online learning. The project includes an education plan focusing on machine learning and caching, and outreach in the form of summer camps and seminars for high school students. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

date/time interval

  • 2020 - 2023