Azeez, Babatunde (2005-12). Reliable low latency I/O in torus-based interconnection networks. Master's Thesis. Thesis uri icon

abstract

  • In today's high performance computing environment I/O remains the main bottleneck in
    achieving the optimal performance expected of the ever improving processor and
    memory technologies. Interconnection networks therefore combines processing units,
    system I/O and high speed switch network fabric into a new paradigm of I/O based
    network. It decouples the system into computational and I/O interconnections each
    allowing "any-to-any" communications among processors and I/O devices unlike the
    shared model in bus architecture. The computational interconnection, a network of
    processing units (compute-nodes), is used for inter-processor communication in carrying
    out computation tasks, while the I/O interconnection manages the transfer of I/O requests
    between the compute-nodes and the I/O or storage media through some dedicated I/O
    processing units (I /O-nodes). Considering the special functions performed by the I/O
    nodes, their placement and reliability become important issues in improving the overall
    performance of the interconnection system.
    This thesis focuses on design and topological placement of I/O-nodes in torus based
    interconnection networks, with the aim of reducing I/O communication latency between
    compute-nodes and I/O-nodes even in the presence of faulty I/O-nodes. We propose an
    efficient and scalable relaxed quasi-perfect placement scheme using Lee distance error
    correction code such that compute-nodes are at distance-t or at most distance-t+1 from an
    I/O-node for a given t. This scheme provides a better and optimal alternative placement
    than quasi perfect placement when perfect placement cannot be found for a particular
    torus. Furthermore, in the occurrence of faulty I/O-nodes, the placement scheme is also
    used in determining other alternative I/O-nodes for rerouting I/O traffic from affected
    compute-nodes with minimal slowdown. In order to guarantee the quality of service
    required of inter-processor communication, a scheduling algorithm was developed at the router level to prioritize message forwarding according to inter-process and I/O messages
    with the former given higher priority.
    Our simulation results show that relaxed quasi-perfect outperforms quasi-perfect and the
    conventional I/O placement (where I/O nodes are concentrated at the base of the torus
    interconnection) with little degradation in inter-process communication performance.
    Also the fault tolerant redirection scheme provides a minimal slowdown, especially when
    the number of faulty I/O nodes is less than half of the initial available I/O nodes.

ETD Chair

  • Kim, Eun  Associate Professor - Term Appoint

publication date

  • December 2005