A high-speed special purpose architecture is developed to implement a stereo vision model for an autonomous vehicle application. The design criteria are high speed, low complexity, and good portability. The main computational tasks of the model are the implementation of the directional filter and the binocular fusion. The highly parallel pipelined architecture developed has the characteristics of both a systolic array and an enhanced 2-D mesh connected computer. We refer to this hybrid architecture as SMESH (systolic mesh). In contrast to previous work in this area, which is confined primarily to computational aspects, the SMESH architecture addresses both computation and I/O issues. We exploit the systolic array approach to improve the computation execution time while using a special broadcast scheme to speed-up I/O communication. The characteristics of SMESH include global and local communications, pipelined operation, and concurrency between I/O transfer and computation. Although it has the same topology as an enhanced mesh, the global bus of SMESH is designed so that the broadcast may be carried out on both row and the column buses in parallel. This architecture allows loading and unloading of the image to occur in parallel with the computation. As far as the computation is concerned, SMESH is a systolic array performing a multiply-add cycle every clock period along the Hamiltonian path. New parallel algorithms on SMESH to implement the Laplacian of Gaussian directional filter (based on convolution in the spatial domain) and the binocular fusion process (based on morphology) are proposed. These algorithms are optimal in time complexity O(M^2), and the amount of local memory required is O(1), where M x M is the size of the kernel. The SMESH architecture can be adapted for any generalized two-dimensional convolution, and the time complexity of the algorithm is independent of the kernel geometry. Finally, we investigate the implementation aspects of SMESH under the constraints of area and pin-out of current VLSI technology. We examine the feasibility of implementing a 16 x 16 8-bit SMESH chip by laying out a 4-bit processing element (PE) with 16 local memory locations, and we suggest a method for system integration. Besides its processing speed, the strength of this design is the modularity that allows simple multiple-chip implementations.