Byoun, Jae Guen (2018-08). Data Traffic Reduction by Exploiting Data Criticality With A Compressor in GPU. Master's Thesis. Thesis uri icon

abstract

  • Graphics Processing Units (GPUs) have been predominantly accepted for various general purpose applications due to a massive degree of parallelism. The demand for large-scale GPUs processing an enormous volume of data with high throughput has been rising rapidly. However, the performance of the massive parallelism workloads usually suffer from multiple constraints such as memory bandwidth, high memory latency, and power/energy cost. Also a bandwidth efficient network design is challenging in large-scale GPUs. In this research, we focus on mitigating network bottlenecks by effectively reducing the size of packets transferring through an interconnect network so that the overall system performance improves. The unused fraction of each L1 data cache block across a variety of benchmark suits is initially investigated to see inefficient cache usage. Then, categorizing memory access patterns into several types we introduce essential micro-architectural enhancements to support filtering out unnecessary words in packets throughout the reply path. A compression scheme (Dual Pattern Compression) adequate for packet compression is exploited to effectively reduce the size of reply packets. We demonstrate that our scheme effectively improves system performance. Our approach yields 39% IPC improvement across heterogeneous computing and text processing benchmarks over the baseline cooperating with DPC. Comparing this work with DPC, we achieved 5% IPC improvement for the overall benchmark suits and 20% IPC increase for favorable workloads to this scheme.
  • Graphics Processing Units (GPUs) have been predominantly accepted for various general purpose
    applications due to a massive degree of parallelism. The demand for large-scale GPUs processing
    an enormous volume of data with high throughput has been rising rapidly. However, the
    performance of the massive parallelism workloads usually suffer from multiple constraints such
    as memory bandwidth, high memory latency, and power/energy cost. Also a bandwidth efficient
    network design is challenging in large-scale GPUs.
    In this research, we focus on mitigating network bottlenecks by effectively reducing the size
    of packets transferring through an interconnect network so that the overall system performance
    improves.
    The unused fraction of each L1 data cache block across a variety of benchmark suits is initially
    investigated to see inefficient cache usage. Then, categorizing memory access patterns into several
    types we introduce essential micro-architectural enhancements to support filtering out unnecessary
    words in packets throughout the reply path. A compression scheme (Dual Pattern Compression)
    adequate for packet compression is exploited to effectively reduce the size of reply packets. We
    demonstrate that our scheme effectively improves system performance. Our approach yields 39%
    IPC improvement across heterogeneous computing and text processing benchmarks over the baseline
    cooperating with DPC. Comparing this work with DPC, we achieved 5% IPC improvement
    for the overall benchmark suits and 20% IPC increase for favorable workloads to this scheme.

ETD Chair

  • Kim, Eun  Associate Professor - Term Appoint

publication date

  • August 2018