Azeez, Babatunde (2005-12). Reliable low latency I/O in torus-based interconnection networks. Master's Thesis.
In today's high performance computing environment I/O remains the main bottleneck in
achieving the optimal performance expected of the ever improving processor and
memory technologies. Interconnection networks therefore combines processing units,
system I/O and high speed switch network fabric into a new paradigm of I/O based
network. It decouples the system into computational and I/O interconnections each
allowing "any-to-any" communications among processors and I/O devices unlike the
shared model in bus architecture. The computational interconnection, a network of
processing units (compute-nodes), is used for inter-processor communication in carrying
out computation tasks, while the I/O interconnection manages the transfer of I/O requests
between the compute-nodes and the I/O or storage media through some dedicated I/O
processing units (I /O-nodes). Considering the special functions performed by the I/O
nodes, their placement and reliability become important issues in improving the overall
performance of the interconnection system.
This thesis focuses on design and topological placement of I/O-nodes in torus based
interconnection networks, with the aim of reducing I/O communication latency between
compute-nodes and I/O-nodes even in the presence of faulty I/O-nodes. We propose an
efficient and scalable relaxed quasi-perfect placement scheme using Lee distance error
correction code such that compute-nodes are at distance-t or at most distance-t+1 from an
I/O-node for a given t. This scheme provides a better and optimal alternative placement
than quasi perfect placement when perfect placement cannot be found for a particular
torus. Furthermore, in the occurrence of faulty I/O-nodes, the placement scheme is also
used in determining other alternative I/O-nodes for rerouting I/O traffic from affected
compute-nodes with minimal slowdown. In order to guarantee the quality of service
required of inter-processor communication, a scheduling algorithm was developed at the router level to prioritize message forwarding according to inter-process and I/O messages
with the former given higher priority.
Our simulation results show that relaxed quasi-perfect outperforms quasi-perfect and the
conventional I/O placement (where I/O nodes are concentrated at the base of the torus
interconnection) with little degradation in inter-process communication performance.
Also the fault tolerant redirection scheme provides a minimal slowdown, especially when
the number of faulty I/O nodes is less than half of the initial available I/O nodes.
Kim, Eun Associate Professor - Term Appoint