Drug design on the Cell BroadBand Engine Conference Paper uri icon


  • We evaluate a well known protein docking application in the Bioinformatic field, Fourier Transform Docking (FTDock) [1], on a Blade with two 3.2GHz Cell Broadband Engine (BE) processor [2]. FTDock is a geometry complementary approximation of the protein docking problem, and uses 3D FFTs to reduce the complexity of the algorithm. FTDock achieves a significant speedup when most time consuming functions are offloaded to SPEs, and vectorized. Figure 1 shows the performance impact evolution of offloading and vectorizing two functions of FTDock (CM and SC) on 1 SPU. Figure shows total execution time of FTDock when CM and SC run in the PPU (bar 1), CM is offloaded (bar 2), CM is also vectorized (bar 3), SC is offloaded (bar 4) and SC is also vectorized (bar 5). Parallelizing functions that are not offloaded, using OpenMP for instance, on the dual-thread PPE helps to increase the PPE pipeline use and system throughput, and the scalability of the application. (Graph Presented) We have also observed that increasing the re-use of data of the 3D FFT within of the SPE, and so, reducing the amount of DMA transfers to main memory, significantly improves the 3D FFT performance. Otherwise, memory bandwidth (communication) becomes a bottleneck for the application. Figure 2 shows total execution time of a 3D FFT of a 1283cube using different algorithm parameters and number of SPEs. 3D FFT implementations represented by 128-A bars make best re-use of data joining different steps on 1 SPE, and reducing the amount of DMA transfers. (Graph Presented) Finally, we implement and evaluate an MPI FTDock version. On one hand, its performace improves increasing the number of SPEs used per MPI task (1 task per Cell BE). On the other hand, MPI FTDock with two MPI tasks and n/2 SPEs per task achieves better performance than one task with n SPEs, due to the main memory accesses are distributed between two memories. Our Cell-BE FTDock implementation with 1 task and 8 SPEs shows 3x speedup compared to an MPI FTDock using 8 tasks on a multicore platform with two 1.5GHz POWER5 chips, each chip dualcore, and each core dual-threaded. That shows the potential of heterogeneous multiprocessors like the Cell BE processor for drug design applications (for more details [3]). © 2007 IEEE.

author list (cited authors)

  • Servat, H., González, C., Aguilar, X., Cabrera, D., & Jimenez, D.

publication date

  • December 2007