SPX: Collaborative Research: NG4S: A Next-generation Geo-distributed Scalable Stateful Stream Processing System
- View All
Our society increasingly relies on applications that process streaming data across geo-distributed sites, such as making business decisions from marketing data, identifying spam campaigns in social network streams, and analyzing genome datasets in different labs and countries to track the sources of potential epidemics. State-of-art solutions for these needs are centered around stateless stream processing. This project advances stream processing to enable next-generation streaming applications to store and update state along with computation, therefore processing live data streams in a timely fashion from massive and geo-distributed datasets. Existing systems are mainly designed for stateless stream processing in intra-datacenter settings and do not scale well for running stream applications that contain large distributed states. This project breaks the traditional abstractions of a centralized architecture and hashtable-based stateless operators, redefining them with a new decentralized architecture and new memory-efficient stateful operators, which enables novel approaches to improve overall system performance and scalability. This project builds a next-generation geo-distributed scalable stateful stream processing system that will significantly improve the scalability of stream processing systems. This work includes three primary research directions. (1) At the architecture level, a new decentralized ''many masters/many workers'' architecture will be proposed, which provides each master with maximum independence. (2) At the operator level, a new in-memory data structure will be designed and implemented to store application state and minimize the memory overhead so as to handle ''big data'' requirements. (3) A new shard-based parallel recovery mechanism will be proposed to handle failures and stragglers in a scalable way. All three parts of the project will be prototyped and implemented on a widely adopted stream processing system (Apache Storm).This award reflects NSF''s statutory mission and has been deemed worthy of support through evaluation using the Foundation''s intellectual merit and broader impacts review criteria.