CIF: Small: Error Correction with Natural Redundancy Grant uri icon

abstract

  • Part 1: Nontechnical description of the project This project studies the fundamental problem of removing errors from data by using internal structures of data. It shows that the vast amount of data stored in current data-storage systems possess very rich structures; therefore, by fully exploiting them for error correction, the reliability of data-storage systems can be improved significantly. The project studies several fundamental aspects of this technology, including how to discover and characterize the highly complex structures of various types of data, how to use them to correct errors in data efficiently to improve the reliability of data-storage systems, how to combine the technology with existing error-correction techniques that are based on adding external redundancy to data, and how to implement the technology in practical data-storage systems. This project addresses a critical issue of the modern society: how to ensure that data can be stored reliably at large scale and over a long time. The new technology has the potential to substantially improve the dependability of information infrastructure, which accesses vast amounts of data frequently for scientific and industrial computing. The project is interdisciplinary in nature: it combines multiple scientific fields including information theory, machine learning, big data analysis and algorithm design, and aims to educate students and contribute to workforce development for next-generation storage systems. The project conjugates rigorous theoretical analysis and significant practical applications, to foster collaboration between academia and industry, and create new scientific advances with combined efforts. Part 2: Technical description of the project This project studies how to use the inherent redundancy in big data for error correction. Examples of big data include languages, images, databases, and others. The inherent redundancy is integrated with error-correcting codes (ECC) for effective error correction. The objective is to elevate data reliability in storage systems to the next level. To achieve this goal, new techniques will be developed to discover various types of inherent redundancy in both compressed and uncompressed data. New approaches will be explored to combine inherent-redundancy decoders and ECC decoders for effective error correction. Fundamental limits of both capacity and computational complexity will be studied for error correction using inherent redundancy. This project combines error correction with machine learning and is interdisciplinary in nature. It will expand the current knowledge on error correction in multiple ways. First, it uses techniques in natural language processing and deep learning to discover new types of redundancy in big data that are suitable for error correction, and which extend beyond current knowledge in joint source-channel coding. This includes redundancy discovery techniques for data already compressed by various compression algorithms. Second, it explores decoding algorithms for ECCs with not only regular ECC-imposed redundancy, but also irregular inherent redundancy. It extends existing error correction schemes to cast the fundamental limits of inherent redundancy for error correction, in terms of both capacity and computational complexity. Third, by integrating a theoretical study with practical systems, a foundation can be laid for next-generation systems that store and transmit big data. Modern society relies increasingly heavily on digital data. With the explosive amount of data generated each day, it is essential to make advances in error correction that can catch the speed of data explosion. This project aims at improving data reliability significantly to the next level, and improvements in this direction can be highly beneficial to the daily work and life of the modern society. This project, being interdisciplinary between coding theory and machine learning, can foster collaboration between the information theory and computer science communities. The project combines rigorous theoretical analysis with significant practical applications, to foster collaboration between academia and industry, and create new scientific advances with combined efforts. The proposed research will be integrated with engineering education by developing new courses for graduate and undergraduate students, and involving under-represented, domestic and international students in advanced research. The results will be actively publicized in national/international conferences and journals.

date/time interval

  • 2017 - 2020