SHF: Small: Software/Hardware Acceleration Architectures for Low-Tail-Latency QoS Provisioning Based Data Centers Grant uri icon

abstract

  • The era of data science is underway, with an explosion of data from social media, environmental monitoring, E-health, national defense, sciences/engineering advances, etc., driving a very fast-growing information-technology sector. As a foundational pillar for big data, data centers play a crucially important role in efficiently collecting, storing, retrieving, classifying, and processing large datasets. In addition, these tremendous volumes of data in data centers, as well as the rapid advances of modern computer techniques, have propelled the ongoing boom of machine learning (ML) from artificial intelligence (AI). While ML aims at automatically learning useful properties from data for accurate and timely stochastic decision making, there is an increasing need for this decision making to occur in a real-time fashion. Thus, one of the most important services in an AI-based interactive data center is how to efficiently process computation-intensive and time-sensitive multimedia (e.g., video, audio) data and provide AI-based decision-making services. However, because of limited computing and storage capabilities, random uncertainties of availability for software/hardware resources, and statistical multiplex switching in data centers, the deterministic delay-bounded requirements for high-volume real-time services of AI-based interactive data-centers are often infeasible. Thus, the PI proposes to extend and apply the statistical delay-bounded quality-of-service (QoS) provisioning theory as an alternative solution to support real-time decision-making services, where the goal is to guarantee bounded delay with a small violation probability, therefore significantly reducing the processing delays currently found in AI-based interactive data-centers. These demand various software/hardware accelerators to be developed to guarantee diverse delay-bounded QoS requirements. The objective of this research is to systematically investigate fundamental and challenging issues on how to extend, apply, and implement the statistical delay-bounded QoS provisioning theory in supporting real-time, interactive, and decision-making services over AI-based interactive data centers. While the statistical delay-bounded QoS provisioning theory has been shown to be a powerful technique and useful performance metric for supporting time-sensitive multimedia transmissions over mobile computing networks, how to efficiently extend and implement this technique/performance-metric for statistically upper-bounding the tail-Latency, which is the worst-case latency dictating delay-bounded QoS performances, imposed in the AI-based interactive data center services has neither been well understood nor thoroughly studied. To overcome the above challenges, employing various emerging computer software/hardware technologies, this project proposes to develop a set of AI-based hybrid software/hardware acceleration architectures, algorithms, and schemes to support the low-tail-latency QoS provisioning for multi-core AI-based interactive data-center services, while reducing the computational workloads and complexities introduced by parallel and distributed data centers. The proposed framework is mainly based on developing novel acceleration architectures for both software and hardware designs and optimizations to significantly boost computing efficiencies through minimizing instruction and data movement and processing across processors and memories. Leveraging the unique novel features and techniques of the statistical delay-bounded QoS provisioning theory and AI-based computing accelerators, a number of QoS-enabling engines constitute the main foundation of this project..........

date/time interval

  • 2020 - 2023