Provide development information

Harris extraction and SIFT matching for correlation of two Tablets

Harris extraction and SIFT matching for correlation of two Tablets

Oct 30, 2014

Harris extraction and SIFT matching for correlation of two Tablets

Authors: A Ali, A Georges, Tassadaq Hussain, S Ali
Publisher: World Academy of Science, Engineering and Technology WASET 2011 76 (76), 102-106

tcct

— Viewd 239 TIme

PPMC : A Programmable Pattern based Memory Controller.

PPMC : A Programmable Pattern based Memory Controller.

Jan 4, 2013

Authors: Hussain Tassadaq,Muhammad Shafiq, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
ARC 2012, the 8th International Symposium on Applied Reconfigurable Computing (2012).

Download Link

One of the main challenges in the design of hardware accelerators is the efficient access of data from the external memory. Improving and optimizing the functionality of the memory controller between the external memory and the accelerators is therefore critical. In this paper, we advance toward this goal by proposing PPMC, the Programmable Pattern-based Memory Controller. This controller supports scatter-gather and strided 1D, 2D and 3D accesses with programmable tiling. Compared to existing solutions, the proposed system provides better performance, simplifies programming access patterns and eases software integration by interfacing to high-level programming languages. In addition, the controller offers an interface for automating domain decomposition via tiling. We implemented and tested PPMC on a Xilinx ML505 evaluation board using a MicroBlaze soft-core as the host processor. The evaluation uses six memory intensive application kernels: Laplacian solver, FIR, FFT, Thresholding, Matrix Multiplication, and 3D-Stencil. The results show that the PPMC-enhanced system achieves at least 10x speed-ups for 1D, 2D and 3D memory accesses as compared to a non-PPMC based setup.

thumb— Viewd 591 TIme

PPMC : Hardware Scheduling and Memory Management support for Multi Hardware Accelerators.

PPMC : Hardware Scheduling and Memory Management support for Multi Hardware Accelerators.

Jan 4, 2013

Authors: Hussain Tassadaq, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
FPL2012 | 22nd International Conference on Field Programmable Logic and Applications.

Download Link

A generic multi-accelerator system comprises a microprocessor unit that schedules the accelerators along with the necessary data movements. The system, having the processor as control unit, encounters multiple delays (memory and task management) which degrade the overall system performance. This performance degradation demands an efficient memory manager and high speed scheduler, which feeds prearranged data to the appropriate accelerator. In this work we propose the integration of an efficient scheduler and an intelligent memory manger into an existing core known as PPMC (Programmable Pattern based Memory Controller), such that data movement and computational tasks can be handled proficiently. Consequently, the modified PPMC system improves performance by managing data movements and address generation in hardware and scheduling accelerators without the intervention of a control processor nor an operating system. The PPMC system is evaluated with six memory intensive accelerators: Laplacian solver, FIR, FFT, Thresholding, Matrix Multiplication and 3DStencil. This modified PPMC system is implemented and tested on a Xilinx ML505 evaluation FPGA board. The performance of the system is compared with a microprocessor based system that has been integrated with the Xilkernel operating system. Results show that the modified PPMC based multi-accelerator system consumes 50% less hardware resources, 32% less on-chip power and achieves approximately a 27 speed-up compared to the MicroBlaze-based system.

thumb— Viewd 3084 TIme

Implementation of a Reverse Time Migration Kernel using the HCE High Level Synthesis Tool.

Implementation of a Reverse Time Migration Kernel using the HCE High Level Synthesis Tool.

Jan 4, 2013

Authors: Tassadaq Hussain, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
The 2011 International Conference on Field-Programmable Technology FPT 2011 IIT Delhi New Delhi, India (2011)

Download Link

Abstract—Reconfigurable computers have started to appear in the HPC landscape, albeit at a slow pace. Adoption is still being hindered by the design methodologies and slow implementation cycles. Recently, methodologies based on High Level Synthesis (HLS) have begun to flourish and the reconfigurable supercomputing community is slowly adopting these techniques. In this paper we took a geophysics application and implemented it on FPGA using a HLS tool called HCE. The application, Reverse Time Migration, is an important code for subsalt imaging. It is also a highly demanding code both in computationally as in its memory requirements. The complexity of this code makes it challenging to implement it using a HLS methodology instead of HDL. We study the achieved performance and compare it with hand-written HDL and also with software based execution.
The resulting design, when implemented on the Altera Stratix IV EP4SGX230 and EP4SGX530 devices achieves 11.2 and 22
GFLOPS respectively. On these devices, the design was capable of achieving up to 4.2x and 7.9x improvement, espectively, over a general purpose processor core (Intel i7).

thumb— Viewd 800 TIme

Reconfigurable Memory Controller with Programmable Pattern Support.

Reconfigurable Memory Controller with Programmable Pattern Support.

Jan 4, 2013

Authors: Hussain Tassadaq, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
Plublished: 5th HiPEAC Workshop on Reconfigurable Computing, WRC 2011.


Download Link

Heterogeneous architectures are increasingly popular due to their flexibility and high performance per watt capability. A kind of heterogeneous architecture, reconfigurable systems-on-chip, offer high performance per watt through the reconfigurable logic and flexibility via multiprocessor cores. But in order to achieve the performance goals it is necessary to provide enough data to the accelerators.
In this paper we describe a programmable, pattern-based memory controller (PMC) that aims at improving the performance of heterogeneous or reconfigurable SoC devices. These include scatter gather and strided 1D, 2D and 3D patterns. PMC can prefetch complete patterns into scratchpads that can then be accessed either by a microprocessor or by an accelerator. As a result, the microprocessors and accelerators can focus on computation and are relieved of having
to perform address calculations. PMC has been implemented and tested on an ML505 evaluation board using the MicroBlaze softcore as the platform’s microprocessor.
While PMC adds some latency, it improves performance by offloading the processor and by making better use of available bandwidths. The PMC provide 1.5x speed-ups with processor and 27x speed-ups achieved by using hardware accelerator in PMC SoC based environment while executing thresholding application.

thumb— Viewd 695 TIme

Streaming Scatter Gather DMA Controller for Hardware Accelerators

Streaming Scatter Gather DMA Controller for Hardware Accelerators

Jan 4, 2013

Authors Hussain Tassadaq, Miquel Pericas, Nacho Navarro, Eduard Ayguade.
Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2010), Terrassa, July 2010.

Download Link

Abstract:

Feeding data to hardware accelerators is an intricate process that affects performance and efficiency of systems. In System-on-chip environments hardware accelerators act as target/slave and movement of data to/from memory is controlled by microprocessor (initiator/master) unit. Processors play middle role they read data from the hardware accelerator and write it to memory and vise versa. This technique provides flexibility but affects the performance of the system. To get maximum benefit from parallelism, HPC applications need to adopt memory controllers that have intelligence like CPU and have potential to synchronize with hardware accelerator. In this abstract we present a memory controller that provides Scatter/Gather DMA functionality. This memory controller takes maximum benefit of the hardware fabric by feeding data in streaming format. Memory access patterns are defined by programmable descriptor blocks available in the controller. We measure gate-count and speed by executing memory controller over Xilinx Virtex 5
ML505 board and compare results with SoC designed in Xilinx base system builder.

KEYWORDS: Master, Slave, FPGA, SoC, HPC Applications

thumb— Viewd 1713 TIme