Provide development information

Recent Posts

PPMC : Hardware Scheduling and Memory Management support for Multi Hardware Accelerators.

PPMC : Hardware Scheduling and Memory Management support for Multi Hardware Accelerators.

Jan 4, 2013

Authors: Hussain Tassadaq, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
FPL2012 | 22nd International Conference on Field Programmable Logic and Applications.

Download Link

A generic multi-accelerator system comprises a microprocessor unit that schedules the accelerators along with the necessary data movements. The system, having the processor as control unit, encounters multiple delays (memory and task management) which degrade the overall system performance. This performance degradation demands an efficient memory manager and high speed scheduler, which feeds prearranged data to the appropriate accelerator. In this work we propose the integration of an efficient scheduler and an intelligent memory manger into an existing core known as PPMC (Programmable Pattern based Memory Controller), such that data movement and computational tasks can be handled proficiently. Consequently, the modified PPMC system improves performance by managing data movements and address generation in hardware and scheduling accelerators without the intervention of a control processor nor an operating system. The PPMC system is evaluated with six memory intensive accelerators: Laplacian solver, FIR, FFT, Thresholding, Matrix Multiplication and 3DStencil. This modified PPMC system is implemented and tested on a Xilinx ML505 evaluation FPGA board. The performance of the system is compared with a microprocessor based system that has been integrated with the Xilkernel operating system. Results show that the modified PPMC based multi-accelerator system consumes 50% less hardware resources, 32% less on-chip power and achieves approximately a 27 speed-up compared to the MicroBlaze-based system.

thumb— Viewd 3084 TIme

Implementation of a Reverse Time Migration Kernel using the HCE High Level Synthesis Tool.

Implementation of a Reverse Time Migration Kernel using the HCE High Level Synthesis Tool.

Jan 4, 2013

Authors: Tassadaq Hussain, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
The 2011 International Conference on Field-Programmable Technology FPT 2011 IIT Delhi New Delhi, India (2011)

Download Link

Abstract—Reconfigurable computers have started to appear in the HPC landscape, albeit at a slow pace. Adoption is still being hindered by the design methodologies and slow implementation cycles. Recently, methodologies based on High Level Synthesis (HLS) have begun to flourish and the reconfigurable supercomputing community is slowly adopting these techniques. In this paper we took a geophysics application and implemented it on FPGA using a HLS tool called HCE. The application, Reverse Time Migration, is an important code for subsalt imaging. It is also a highly demanding code both in computationally as in its memory requirements. The complexity of this code makes it challenging to implement it using a HLS methodology instead of HDL. We study the achieved performance and compare it with hand-written HDL and also with software based execution.
The resulting design, when implemented on the Altera Stratix IV EP4SGX230 and EP4SGX530 devices achieves 11.2 and 22
GFLOPS respectively. On these devices, the design was capable of achieving up to 4.2x and 7.9x improvement, espectively, over a general purpose processor core (Intel i7).

thumb— Viewd 800 TIme

Reconfigurable Memory Controller with Programmable Pattern Support.

Reconfigurable Memory Controller with Programmable Pattern Support.

Jan 4, 2013

Authors: Hussain Tassadaq, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
Plublished: 5th HiPEAC Workshop on Reconfigurable Computing, WRC 2011.


Download Link

Heterogeneous architectures are increasingly popular due to their flexibility and high performance per watt capability. A kind of heterogeneous architecture, reconfigurable systems-on-chip, offer high performance per watt through the reconfigurable logic and flexibility via multiprocessor cores. But in order to achieve the performance goals it is necessary to provide enough data to the accelerators.
In this paper we describe a programmable, pattern-based memory controller (PMC) that aims at improving the performance of heterogeneous or reconfigurable SoC devices. These include scatter gather and strided 1D, 2D and 3D patterns. PMC can prefetch complete patterns into scratchpads that can then be accessed either by a microprocessor or by an accelerator. As a result, the microprocessors and accelerators can focus on computation and are relieved of having
to perform address calculations. PMC has been implemented and tested on an ML505 evaluation board using the MicroBlaze softcore as the platform’s microprocessor.
While PMC adds some latency, it improves performance by offloading the processor and by making better use of available bandwidths. The PMC provide 1.5x speed-ups with processor and 27x speed-ups achieved by using hardware accelerator in PMC SoC based environment while executing thresholding application.

thumb— Viewd 696 TIme

Streaming Scatter Gather DMA Controller for Hardware Accelerators

Streaming Scatter Gather DMA Controller for Hardware Accelerators

Jan 4, 2013

Authors Hussain Tassadaq, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2010), Terrassa, July 2010.

Download Link

Abstract:

Feeding data to hardware accelerators is an intricate process that affects performance and efficiency of systems. In System-on-chip environments hardware accelerators act as target/slave and movement of data to/from memory is controlled by microprocessor (initiator/master) unit. Processors play middle role they read data from the hardware accelerator and write it to memory and vise versa. This technique provides flexibility but affects the performance of the system. To get maximum benefit from parallelism, HPC applications need to adopt memory controllers that have intelligence like CPU and have potential to synchronize with hardware accelerator. In this abstract we present a memory controller that provides Scatter/Gather DMA functionality. This memory controller takes maximum benefit of the hardware fabric by feeding data in streaming format. Memory access patterns are defined by programmable descriptor blocks available in the controller. We measure gate-count and speed by executing memory controller over Xilinx Virtex 5
ML505 board and compare results with SoC designed in Xilinx base system builder.

KEYWORDS: Master, Slave, FPGA, SoC, HPC Applications

thumb— Viewd 1713 TIme