

# **UNIVERSITI PUTRA MALAYSIA**

# FAST FOURIER TRANSFORM PROCESSOR IMPLEMENTATION FOR HIGH INPUTS ON FIELD PROGRAMMABLE GATES ARRAY

ZAID ALI ABBAS

FK 2018 121



# FAST FOURIER TRANSFORM PROCESSOR IMPLEMENTATION FOR HIGH INPUTS ON FIELD PROGRAMMABLE GATES ARRAY

By

ZAID ALI ABBAS

Thesis submitted to the School of Graduate Studies, Universiti Putra Malaysia, in fulfilment of the requirement for the Degree of Master of Science

August 2018

# COPYRIGHT

All material contained within the thesis, including without limitation text, logos, icons, photographs and all other artwork, is copyright material of Universiti Putra Malaysia unless otherwise stated. Use may be made of any material contained within the thesis for non-commercial purposes from the copyright holder. Commercial use of material may only be made with the express, prior, written permission of Universiti Putra Malaysia.

Copyright © Universiti Putra Malaysia



# DEDICATION

I would like to dedicate my thesis to my beloved parents and my precious brothers for their endless support and care



Abstract of a thesis presented to the Senate of Universiti Putra Malaysia in fulfillment of the requirement for the degree of Master of Science

### FAST FOURIER TRANSFORM PROCESSOR IMPLEMENTATION FOR HIGH INPUTS ON FIELD PROGRAMMABLE GATES ARRAY

By

### ZAID ALI ABBAS

August 2018

Chairman : Nasri bin Sulaiman, PhD Faculty : Engineering

In the past few years, fast Fourier transform (FFT) proved to be an efficient method to accomplish the discrete Fourier transform (DFT) with less number of operations. FFT has been vastly applied for many applications, such as image processing technique, network data transmission (XDSL, WiMAX, and WLAN), orthogonal frequency-division multiplexing (OFDM), digital signal processing (DSP) and numerous applications that require high input data (1024 and up) processing. Low power and low complexity are the main concerns in high input FFT. Therefore, this research aims to investigate the power consumption, hardware resources usage and speed for radix-(2, 4 and 8) FFT processor, using the same device and environment to investigate the performance of each. Memory-based architecture chosen to use for FFT processors, due to the reduction in the number of butterflies and rotators, as they are reused for different stages of the FFT, were implemented on Cyclone II Field Programmable Gate Arrays (FPGA). Verilog Hardware Description Language (Verilog HDL) and VHDL Languages are used to program the algorithms into the FPGA. FFT algorithms will be implemented for up to 4096 points to measure the high load processing capability. The results show that for the 4096 points FFT, the radix-4 is the best trade-off in term of speed, resources and power consumption, which requires only 36% of the power required by the 4069 points radix-8 FFT and 58% of the power required by the 4069 points radix-2 FFT. On another hand, for the hardware resources, the result shows that the 4096 points radix-4 FFT used 30% of hardware resources furthermore; radix-8 FFT uses approximately 45%, in the meanwhile radix-2 require 20% only. For speed, the results shows that a 4096 points radix-4 FFT is 70% faster than 4096 points radix-2 FFT and 62% slower than 4096 points radix-8 FFT. While the radix-2 may be preferred, when it comes to power saving because it only need to consume 28% less



than radix-4 and 41% less than radix-8. Radix-8 is better when speed is the most important factor; it is notably 80% faster than radix-2 and 60% than radix-4.



Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia sebagai memenuhi keperluan untuk ijazah Master Sains

### IMPLEMENTASI PEMPROSES TRASFORMASI FOURIER PANTAS BAGI INPUT TINGGI ATAS TATASUSUNAN LOGIK BOLEH ATURCARA MEDAN

Oleh

#### ZAID ALI ABBAS

Ogos 2018

Pengerusi : Nasri bin Sulaiman, PhD Fakulti : Engineering

Dalam beberapa tahun yang lalu, transformasi Fourier pantas (FFT) terbukti menjadi kaedah yang cekap dalam menyelesaikan transformasi Fourier diskret (DFT) dengan jumlah operasi yang rendah. FFT banyak digunakan dalam aplikasi, seperti teknik pemprosesan imej, penghantaran data rangkaian (XDSL, WiMAX, dan WLAN), pembahagian frekuensi orthogonal multipleks (OFDM), pemprosesan isyarat digital (DSP) dan pelbagai aplikasi yang memerlukan pemprosesan data input tinggi (1024 dan lebih tinggi).

Penggunaan kuasa dan saiz yang rendah adalah kepentingan utama dalam FFT input yang tinggi. Oleh itu, penyelidikan ini bertujuan untuk mengkaji penggunaan kuasa, penggunaan sumber perkakasan dan kelajuan untuk radix-(2, 4 dan 8) pemproses FFT menggunakan peranti dan persekitaran yang sama untuk menyiasat prestasi masing-masing. Senibina berasaskan memori yang dipilih untuk digunakan untuk pemproses FFT, disebabkan oleh pengurangan jumlah "kupu-kupu" dan rotator, kerana mereka digunakan semula untuk pelbagai peringkat FFT, telah dilaksanakan atas tatasusunan logik boleh aturcara medan Cyclone II, Bahasa Penerangan Perkakasan Verilog (Verilog HDL) dan Bahasa VHDL digunakan untuk memprogram algoritma ke atas FPGA. Algoritma FFT akan dilaksanakan sehingga 4096 mata untuk mengukur keupayaan pemprosesan beban tinggi.

Keputusan menunjukkan bahawa untuk 4096 mata FFT, radix-4 adalah terbaik dari segi kelajuan, sumber dan penggunaan kuasa, yang memerlukan hanya 36% daripada kuasa yang diperlukan oleh 4096 mata radix-8 FFT dan 58% daripada kuasa yang diperlukan oleh 4096 mata radix-2 FFT. Selain itu, bagi sumber perkakasan, keputusan menunjukkan bahawa 4096 mata radix-4 FFT menggunakan sumber perkakasan 30% yang digunakan oleh 4096 mata radix-8 FFT manakala radix-2 hanya memerlukan 20%. Dari segi kelajuan, hasil menunjukkan 4096 mata radix-4 FFT 70% lebih cepat daripada 4096 mata radix-2 FFT dan 62% lebih lambat daripada 4096 mata radix-2 FFT dan 62% lebih lambat daripada 4096 mata radix-8 FFT. Sedangkan radix-2 mungkin lebih diterima, apabila ia berkaitan dengan penjimatan kuasa kerana hanya memerlukan 28% kurang daripada radix-8 dan 41% kurang daripada radix-4. Radix-8 lebih baik apabila kelajuan adalah yang paling penting, terutamanya 80% lebih cepat daripada radix-2 dan 60% daripada radix-4.

### ACKNOWLEDGEMENTS

Firstly, I would like to express my sincere gratitude to my supervisor, Dr. Nasri bin Sulaiman for the continuous support of my Master study and related research, for his patience, motivation, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. I could not have imagined having a better advisor and mentor for my Master study.

Besides my supervisor, I would like to thank my co-supervisor, Associate Professor Nurul Amziah Bt. Md. Yunus for her insightful comments and encouragement.

I thank my fellow research mates for the stimulating discussions, for the sleepless nights, we were working together before deadlines, and for all the good time we have had in the last three years.

My thanks also go to my friends and colleagues for their encouragement and motivation.

Last but not the least; I would like to thank my family, my parents and my brothers for supporting me spiritually throughout writing this thesis and my life in general

This thesis was submitted to the Senate of Universiti Putra Malaysia and has been accepted as fulfillment of the requirement for the degree of Master of Science. The members of Supervisory Committee were as follows:

# Nasri b. Sulaiman, PhD

Senior Lecturer Faculty of Engineering Universiti Putra Malaysia (Chairman)

### Nurul Amziah Bt. Md. Yunus, PhD

Associate Professor Faculty of Engineering Universiti Putra Malaysia (Member)

#### **ROBIAH BINTI YUNUS, PhD**

Professor and Dean School of Graduate Studies Universiti Putra Malaysia

Date:

## **Declaration by graduate student**

I hereby confirm that:

- this thesis is my original work;
- quotations, illustrations, and citations have been duly referenced;
- this thesis has not been submitted previously or concurrently for any other degree at any other institutions;
- intellectual property from the thesis and copyright of thesis are fullyowned by Universiti Putra Malaysia, as according to the Universiti Putra Malaysia (Research) Rules 2012;
- written permission must be obtained from supervisor and the office of Deputy Vice-Chancellor (Research and Innovation) before thesis is published (in the form of written, printed or in electronic form) including books, journals, modules, proceedings, popular writings, seminar papers, manuscripts, posters, reports, lecture notes, learning modules or any other materials as stated in the Universiti Putra Malaysia (Research) Rules 2012;
- There is no plagiarism or data falsification/fabrication in the thesis, and scholarly integrity is upheld as according to the Universiti Putra Malaysia (Graduate Studies) Rules 2003 (Revision 2012-2013) and the Universiti Putra Malaysia (Research) Rules 2012. The thesis has undergone plagiarism detection software.

| Signature: | Date: |  |
|------------|-------|--|
|            |       |  |

Name and Matric No.: Zaid Ali Abbas, GS35867

# **Declaration by Members of Supervisory Committee**

This is to confirm that:

- the research conducted and the writing of this thesis was under our supervision;
- supervision responsibilities as stated in the Universiti Putra Malaysia (Graduate Studies) Rules 2003 (Revision 2012-2013) are adhered to.

| Signature:<br>Name of Chairman<br>of Supervisory<br>Committee: | Dr. Nasri b. Sulaiman                             |
|----------------------------------------------------------------|---------------------------------------------------|
| Signature:<br>Name of Member<br>of Supervisory<br>Committee:   | Associate Professor Nurul<br>Amziah Bt. Md. Yunus |
|                                                                |                                                   |
|                                                                |                                                   |
|                                                                |                                                   |

# TABLE OF CONTENTS

|                                                          |                                                        |                                                        | Page                                              |
|----------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------|---------------------------------------------------|
| ABST<br>ABS7<br>ACKN<br>APPR<br>DECL<br>LIST (<br>LIST ( | RACT<br>RAK<br>IOWL<br>OVAL<br>ERAT<br>OF TA<br>OF FIC | EDGEMENTS<br>-<br>ION<br>BLES<br>GURES<br>BBREVIATIONS | i<br>iii<br>v<br>vi<br>viii<br>xiii<br>xiv<br>xvi |
| СНАР                                                     | TFR                                                    |                                                        |                                                   |
| 1                                                        | INTR                                                   | ODUCTION                                               | 1                                                 |
| -                                                        | 1.1                                                    | Background                                             | 1                                                 |
|                                                          | 1.2                                                    | Problem Statement                                      | 2                                                 |
|                                                          | 1.3                                                    | Objectives of the research                             | 3                                                 |
|                                                          | 1.4                                                    | Research scope                                         | 3                                                 |
|                                                          | 1.5                                                    | Thesis Outlines                                        | 5                                                 |
| 2                                                        | LITE                                                   | RATURE REVIEW                                          | 6                                                 |
|                                                          | 2.1                                                    | Introduction                                           | 6                                                 |
|                                                          | 2.2                                                    | General Description of the FFT role in any OFDM        |                                                   |
|                                                          |                                                        | system's                                               | 6                                                 |
|                                                          | 2.3                                                    | FFT Algorithms                                         | 8                                                 |
|                                                          |                                                        | 2.3.1 The Radix-2 FFT algorithm                        | 9                                                 |
|                                                          |                                                        | 2.3.2 The Radix-4 FFT algorithm                        | 11                                                |
|                                                          | 2.4                                                    | 2.3.3 The Radix-8 FFT algorithm                        | 13                                                |
|                                                          | 2.4                                                    | 2.4.1 Momony Based Architecture                        | 10                                                |
|                                                          |                                                        | 2.4.1 Memory                                           | 10                                                |
|                                                          |                                                        | 2.4.1.2 Dual Memory                                    | 20                                                |
|                                                          |                                                        | 2.4.1.3 Cached-Memory Architecture                     | 20                                                |
|                                                          |                                                        | 2.4.2 Pipelined Architecture                           | 21                                                |
|                                                          |                                                        | 2.4.3 Array Architecture                               | 23                                                |
|                                                          | 2.5                                                    | Field Programmable Gate Arrays (FPGA)                  | 24                                                |
|                                                          |                                                        | 2.5.1 Embedded Multipliers                             | 25                                                |
|                                                          |                                                        | 2.5.2 Embedded RAM                                     | 25                                                |
|                                                          | 2.6                                                    | The effect of power, speed and area overhead on FPGAs  | 26                                                |
| 3                                                        | METH                                                   | HODOLOGY                                               | 28                                                |
|                                                          | 3.1                                                    | Introduction                                           | 28                                                |
|                                                          | 3.2                                                    | FFT Design                                             | 30                                                |
|                                                          | 3.3                                                    | Design specification                                   | 30                                                |
|                                                          |                                                        | 3.3.1 Choosing FFT Radix algorithms                    | 30                                                |

|   |       | 3.3.2               | Choosing the FFT    | architectur | e       |             | 30 |
|---|-------|---------------------|---------------------|-------------|---------|-------------|----|
|   |       | 3.3.3               | Choosing suitable   | Altera FPG  | GA reso | urces to be | 05 |
|   | 0.4   | A I                 |                     |             |         |             | 35 |
|   | 3.4   | Algori              | Inms' Validation    | /alidation  |         |             | 35 |
|   |       | 3.4.1               | The Radix 2 FFT \   | alidation   |         |             | 35 |
|   |       | 3.4.2               | The Radix 4 FFT \   | alidation   |         |             | 31 |
|   | 2 E   | 3.4.3<br>Codin      |                     |             |         |             | 39 |
|   | 3.5   | Coain               | g and System Imple  | ementation  |         |             | 41 |
|   | 3.0   | Ineasu              | aring the Outputs   |             |         |             | 42 |
|   |       | 3.0.1               |                     | DN          |         |             | 42 |
|   |       | 3.0.2               |                     | ces         |         |             | 43 |
|   |       | 3.0.3               | Speed               |             |         |             | 44 |
| 4 | RES   | JLTS A              | ND DISCUSSION       |             |         |             | 45 |
|   | 4.1   | Introd              | uction              |             |         |             | 45 |
|   | 4.2   | The F               | FT algorithms powe  | er consump  | otion   |             | 45 |
|   |       | 4.2.1               | The 4096-point Ra   | dix-2 FFT   |         |             | 45 |
|   |       | 4.2.2               | The 4096-point Ra   | dix-4 FFT   |         |             | 46 |
|   |       | 4.2.3               | The 4096-point Ra   | dix-8 FFT   |         |             | 46 |
|   |       |                     | 4.2.3.1 Discussio   | n           |         |             | 47 |
|   | 4.3   | The F               | FT algorithms spee  | d results   |         |             | 47 |
|   |       | 4.3.1               | The 4096-point Ra   | dix-2 FFT   | speed r | esult       | 47 |
|   |       | 4.3.2               | The 4096-point Ra   | dix-4 FFT   | speed r | esult       | 48 |
|   |       | 4 <mark>.3.3</mark> | The 4096-point Ra   | dix-8 FFT   | speed r | esult       | 49 |
|   |       | <b>4.3.4</b>        | Discussion          |             |         |             | 49 |
|   | 4.4   | The F               | FT algorithms hard  | ware resou  | rces us | age         | 51 |
|   |       | 4.4.1               | The 4096-point      | Radix-2     | FFT     | hardware    |    |
|   |       |                     | resources usage     |             |         |             | 51 |
|   |       | 4.4.2               | The 4096-point      | Radix-4     | FFT     | hardware    |    |
|   |       |                     | resources usage     |             |         |             | 52 |
|   |       | 4.4.3               | The 4096-point      | Radix-8     | FFT     | hardware    |    |
|   |       |                     | resources usage     |             |         |             | 53 |
|   |       | 4.4.4               | The 4096-point      | FFT algo    | orithms | hardware    |    |
|   |       |                     | resources usage d   | liscussion  |         |             | 54 |
|   |       |                     | 4.4.4.1 Area over   | rhead       |         |             | 54 |
|   |       |                     | 4.4.4.2 Registers   |             |         |             | 55 |
|   |       |                     | 4.4.4.3 Memory      |             |         |             | 55 |
|   |       |                     | 4.4.4.4 Multipliers | S           |         |             | 56 |
|   | 4.5   | Bench               | marking and valida  | tion with p | revious | work        | 56 |
| 5 | SI IM |                     |                     |             | JWWE    |             |    |
| 5 | FOR   | FUTIR               | F STUDIES           |             |         |             | 59 |
|   | 5 1   | Concl               | usions              |             |         |             | 59 |
|   | 5.2   | Future              | e Work              |             |         |             | 59 |
|   |       |                     |                     |             |         |             |    |

5.2 Future Work

| REFERENCES         | 61 |
|--------------------|----|
| APPENDICES         | 69 |
| BIODATA OF STUDENT | 83 |



# LIST OF TABLES

| Table | )                                                                                                             | Page |
|-------|---------------------------------------------------------------------------------------------------------------|------|
| 2.1   | Various FFT size for OFDM applications (Konguvel & Kannan, 2018; Lu et al., 2009; Sarada & Vigneswaran, 2013) | 7    |
| 2.2   | Multiplication and addition equations comparison (Andraka, 1998)                                              | 15   |
| 2.3   | FFT Algorithms evolution and implementation                                                                   | 15   |
| 2.4   | C <mark>omparison hardware o</mark> f different memory based FFT<br>processor                                 | 20   |
| 2.5   | Summarize different pipeline architectures with different hardware requirements                               | 22   |
| 2.6   | Comparison utilization of different pipelined FFT processor                                                   | 22   |
| 2.7   | Compari <mark>son between memory-based and pipeline architecture</mark>                                       | 24   |
| 3.1   | Design specification                                                                                          | 30   |
| 3.2   | Power eqution information(Estimators)                                                                         | 43   |
| 4.1   | Overhead utilization comparison                                                                               | 57   |
| 4.2   | Performance comparison from previous work                                                                     | 58   |
|       |                                                                                                               |      |

 $\bigcirc$ 

# LIST OF FIGURES

| Figur | e                                               | Page |  |
|-------|-------------------------------------------------|------|--|
| 1.1   | FFT flow diagram                                | 4    |  |
| 2.1   | OFDM transmitter and receiver                   | 7    |  |
| 2.2   | Radix-2 DIF butterfly unit                      | 10   |  |
| 2.3   | Flow graph of 8-point Radix-2 FFT               | 10   |  |
| 2.4   | Radix-2 DIF butterfly unit                      | 11   |  |
| 2.5   | Flow graph of 16-point Radix-4 FFT              | 13   |  |
| 2.6   | Radix 2^3 butterfly                             | 14   |  |
| 2.7   | General diagram of a memory-based FFT processor | 19   |  |
| 2.8   | Single Memory FFT Architecture                  | 19   |  |
| 2.9   | Dual Memory FFT Architecture                    | 20   |  |
| 2.10  | Cache Memory FFT Architecture                   | 21   |  |
| 2.11  | Radix-4 (R4SDF) pipelined FFT Architecture      | 22   |  |
| 2.12  | Array FFT Architecture                          | 23   |  |
| 2.13  | Field-programmable gate arrays (FPGA) diagram   | 26   |  |
| 3.1   | Methodology Phases                              | 29   |  |
| 3.2   | Radix-2 Multiplier Implementation Results       | 31   |  |
| 3.3   | Radix-4 Multiplier Implementation Results       | 32   |  |
| 3.4   | Radix-8 Multiplier Implementation Results       | 35   |  |
| 3.5   | Pseudo Matlab code for radix-2 FFT              | 36   |  |
| 3.6   | Radix-2 FFT validation results                  | 37   |  |
| 3.7   | Pseudo Matlab code for radix-4 FFT              | 38   |  |
| 3.8   | Radix-4 FFT validation results                  | 39   |  |
| 3.9   | Pseudo Matlab code for radix-8 FFT              | 40   |  |

| 3.10 | Radix-8 FFT validation results                    | 41 |
|------|---------------------------------------------------|----|
| 3.11 | FFT System Illustration                           | 41 |
| 3.12 | Radix-2 FSM state transitions                     | 42 |
| 3.13 | Altera Quartus II results summary                 | 44 |
| 4.1  | 4096-point radix-2 FFT power consumption          | 45 |
| 4.2  | 4096-point radix-4 FFT power consumption          | 46 |
| 4.3  | 4096-point radix-8 FFT power consumption          | 46 |
| 4.4  | 4096-point FFT algorithms power consumption       | 47 |
| 4.5  | 4096-point radix-2 FFT speed and number of stages | 48 |
| 4.6  | 4096-point radix-4 FFT speed and number of stages | 48 |
| 4.7  | 4096-point radix-8 FFT speed and number of stages | 49 |
| 4.8  | 4096-point FFT algorithms speed                   | 50 |
| 4.9  | 4096-point Radix-2 FFT resources usage            | 51 |
| 4.10 | 4096-point Radix-4 FFT resources usage            | 52 |
| 4.11 | 4096-point Radix-8 FFT resources usage            | 53 |
| 4.12 | Area overhead usage for 4096-point FFT algorithms | 54 |
| 4.13 | Registers usage for 4096-point FFT algorithms     | 55 |
| 4.14 | Memory usage for 4096-point FFT algorithms        | 56 |
| 4.15 | Multipliers usage for 4096-point FFT algorithms   | 56 |
|      |                                                   |    |
|      |                                                   |    |
|      |                                                   |    |

# LIST OF ABBREVIATIONS

Application-Specific Integrated Circuit

| CPU  | Central Processing Unit                    |
|------|--------------------------------------------|
| DFT  | Discrete Fourier Transform                 |
| DIF  | Decimation In Frequency                    |
| DIT  | Decimation In Time                         |
| DSP  | Digital Signal Processing                  |
| FFT  | Fast Fourier Transform                     |
| FPGA | Field Programmable Gate Arrays             |
| FSM  | Finite State Machine                       |
| OFDM | Orthogonal Frequency Division Multiplexing |
|      |                                            |

ASIC

## **CHAPTER 1**

### INTRODUCTION

## 1.1 Background

Information was mainly delivered through the analog system in the past. However, advancement in digital signal processing has confirmed that many advantages regarding cost and performance are offered by technology over analog solutions. Discrete Fourier Transform (DFT) plays an important role in Orthogonal Frequency Division Multiplexing (OFDM) based communication systems in modern digital signal processing (DSP) and telecommunication(Chiueh & Tsai, 2008). The DFT is often used in linear filtering. A broad range of applications including video broadcasting, digital the quantum mechanism(lbrahim et al., audio. 2016). image reconstruction(Jain, 1989), asymmetric digital subscriber loop (ADSL) has linear filtering. Rather unaffordable computation is needed by even the finite DFT signal to be completed. Specifically, N2 complex multiplications are required for DFT direct calculation of an input signal of length N. according to Colley and Tukey, fast Fourier transform (FFT) will be calculated in O (logrN) operations. In this operations, N refers to the length of the transform and r shows FFT decomposing radix(Cooley & Tukey, 1965; Saenz et al., 2015). In the field of Digital Signal Processing, FFT is regarded as a popular algorithm and is operated widely in digital communication particularly OFDM systems(S. He & Torkelson, 1998b; Oppenheim et al., 1989). The original contribution of Cooley and Tukey received considered attention and a large number of researchers attempt to extend and enhance their original work.

In many telecommunication systems and digital signal processing, FFT has become a key component. In OFDM, higher orders FFTs are needed for the purpose of increasing transmission efficiency. According to measuring storage system, higher data start with kilobyte (KB) equal to 1,024 bytes. The desire of consumer for high speed untethered access to multimedia together with entertainment services in recent years, has inspired and lead to wideband wireless communication system's growth(Daniels & Heath Jr, 2007). In physical layer of this standard, high throughput FFT/IFFT (Inverse fast Fourier transform) is considered as one of the main components which is indispensable for frequency-domain equalization together with orthogonal frequency division multiplexing modulation(Chiueh & Tsai, 2008). Different applications including video broadcasting, OFDM systems, speech processing. WLAN and image processing need high throughput FFT(Mookherjee et al.).



On the other hand, it is greatly used in the DSP field because of FPGA technology. Its architecture is flexible. It can be configured based on the project need of the user to calculate definite algorithms skillfully(Ibrahim et al., 2016). It is the best choice for FFT processors due to its low cost together with high capacity and performance. Traditionally, DSP chips or application Specific integrated circuits were utilized for design solutions. Nowadays, a great number of scholars shift to Field-programmable Gate Arrays (FPGA).

FPGA has different advantages over other technologies. They have the processing power for handling high-speed DSP. The capability of FPGA in performing repetitive operations in parallel propose a performance benefit to the instruction driven, DSP chips' sequential processing. They are a proper alternative to ASIC (Application-specific integrated circuit ) due to their lowcost design cycles in comparison with ASIC which need large financial investments to be produced and updated. Moreover, unprecedented design flexibility is provided by them. Designers are provided with a platform in which they are able to assess their design decisions in terms of power, size, and throughput through their programmability(Wolf, 2004). Moreover, this feature enables them to get the most effective solution based on the system needs. The performance of FPGA is pushed upward because of advancement in the technology. Different improvements including implanted multipliers and RAM (Random-access memory) logic has lead to the simplification of the hardware implementation of DSP algorithm, which allows the digital information's transfer and transport.

#### 1.2 Problem Statement

Fast Fourier Transform has been studied for a number of years. It has been used in various orthogonal frequency division multiplexing systems including IEEE 802.11ad standard for 60-GHz communication system is ratified(C. Wang et al.), terrestrial digital video broadcasting(K. Chen & Li, 2008; Saleh et al., 2013) and large number of mission information from/to users in real time like UAV (unmanned aerial vehicle)(Porcello, 2013). The outstanding features of these applications are the high inputs data.

For the seek of achieving efficient DFT, different FFT algorithms have been introduced. To achieve the required hardware resources and saving power in real-time applications, Different studies were conducted on the comparison of FFT algorithms, because the natural requirement of real-time applications is a hunger to high inputs data, however, there is no practical comparison between FFT algorithms in terms of high inputs.

To determine each algorithm (radix-2, radix-4, and radix-8) features under high inputs, and determine the best among these algorithms; this is done by measuring the system power, speed and hardware resource consumption.

In order to manage the parameters boundaries, a certain consideration should be counted such as the amount of data limitations. Furthermore, any amount of data above 1024 considered high input data. This kind of parameters limits the system performance especially when these parameters are measured such as speed, system power consumption, and hardware complexity.

## 1.3 Objectives of the research

Designing an efficient FFT processor for high inputs is the main objective of this research. The following steps are proposed to achieve this objective:

- 1- To design Radix-2, Radix-4, and Radix-8 memory based FFT algorithms on FPGA to examine the performance of high inputs data in terms of processor speed, power consumption, and hardware complexity.
- 2- To investigate the most efficient FFT algorithm using FPGA processor that offers the best performance in terms of speed, resources, and power consumption.
- 3- Identify the suitable systems required for the FFT algorithms and the suitable size for these systems.

## 1.4 Research scope

In this research, FFT processors will be designed based on memory-based architecture. However, the research will not propose new algorithms, rather compare the existing one, and implement them on FPGA. Any FFT can be perform in three stages. First stage decompose an N point time domain signal into N signals each containing a single point, second stage, find the spectrum of each of the N point signals. Finally, synthesize the N frequency spectra into single frequency spectrum. Figure shows the FFT flow diagram(Smith, 1997).



Figure 1.1 : FFT flow diagram(Smith, 1997)

Based on literature, most of the results for other researcher in the same filed, the power consumption range between (26) to (535) mWatts for the same number of points (4096). Meanwhile speed results range between (4093) to (150000) cycles. For hardware usage, it hard to specific certain range due to the vast of technologies been used to implement the algorithms from one researcher to others.

The FFT processors designed for up to 4096 points. The word lengths applied are 16-bit and 8-bit for input data. The twiddle elements used 16-bit because twiddle factor length must be less than or equal to input data lengths.

The speed of the processor is set to 50 MHZ, which is the maximum level of the Altera DE2C70 FPGA frequency. Since no external clocks are operated, FPGA PLLs are not used.

System's accuracy is measured through hardware implementation for real results. The power is precisely calculated from algorithm operations. The power's amount for operation is measured separately through applying Altera power function for accurate findings.

4

## 1.5 Thesis Outlines

This research consists of five chapters including:

Chapter 1: This chapter discusses the general background to the field of study following by discussion on problem statement and research objectives to solve the research problem. Finally, the scope of the study is discussed.

Chapter 2: This chapter reviews the existing literature on the field of study. FFT architecture together with FFT algorithm will be discussed in this chapter.

Chapter 3: the used methodology to address the research objectives of the study will be discussed in this chapter. It will begin with designing the FFT processor. Then, the FFT algorithms will be validated and implemented on the processor for various N-point values. Finally, the FFT output values are verified for deciding whether the system is working properly or not.

Chapter 4: discussion of the results and findings of the study will be presented in this chapter.

Chapter5: conclusion on the findings of the study together with some suggestion for future works will be discussed in this chapter.

#### REFERENCES

- Andraka, R. (1998). A survey of CORDIC algorithms for FPGA based computers. Paper presented at the Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays.
- Baas, B. M. (1999). A low-power, high-performance, 1024-point FFT processor. Solid-State Circuits, IEEE Journal of, 34(3), 380-387.
- Bergland, G. (1969). A radix-eight fast Fourier transform subroutine for realvalued series. *IEEE Transactions on audio and electroacoustics*, *17*(2), 138-144.
- Chang, Y.-N., & Parhi, K. K. (1999). *Efficient FFT implementation using digit*serial arithmetic. Paper presented at the Signal Processing Systems, 1999. SiPS 99. 1999 IEEE Workshop on.
- Chao, C., Qin, Z., Yingke, X., & Chengde, H. (2005). *Design of a high performance FFT processor based on FPGA*. Paper presented at the Proceedings of the 2005 Asia and South Pacific Design Automation Conference.
- Chen, J., Hu, J., Lee, S., & Sobelman, G. E. (2015). Hardware efficient mixed radix-25/16/9 FFT for LTE systems. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 23(2), 221-229.
- Chen, K., & Li, Y. (2008). A multi-radix FFT processor using pipeline in memory-based architecture (PIMA) for DVB-T/H systems. Paper presented at the Mixed Design of Integrated Circuits and Systems, 2008. MIXDES 2008. 15th International Conference on.
- Chen, W.-H., Smith, C., & Fralick, S. (1977). A fast computational algorithm for the discrete cosine transform. *IEEE Transactions on communications*, *25*(9), 1004-1009.
- Cheng, L. S., Miri, A., & Yeap, T. H. (2005). *Efficient FPGA implementation of FFT based multipliers.* Paper presented at the Electrical and Computer Engineering, 2005. Canadian Conference on.
- Chiueh, T.-D., & Tsai, P.-Y. (2008). OFDM baseband receiver design for wireless communications: John Wiley & Sons.
- Cochran, W. T., Cooley, J. W., Favin, D. L., Helms, H. D., Kaenel, R. A., Lang, W. W., . . . Welch, P. D. (1967). What is the fast Fourier transform? *Proceedings of the IEEE, 55*(10), 1664-1674.

- Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine calculation of complex Fourier series. *Mathematics of computation*, *19*(90), 297-301.
- Daniels, R. C., & Heath Jr, R. W. (2007). 60 GHz wireless communications: emerging requirements and design recommendations. *Vehicular Technology Magazine, IEEE, 2*(3), 41-50.
- Despain, A. M. (1979). Very fast Fourier transform algorithms hardware for implementation. *IEEE Transactions on Computers*(5), 333-341.
- Duan, B., Wang, W., Li, X., Zhang, C., Zhang, P., & Sun, N. (2011). Floatingpoint mixed-radix FFT core generation for FPGA and comparison with GPU and CPU. Paper presented at the Field-Programmable Technology (FPT), 2011 International Conference on.
- Duhamel, P. (1986). Implementation of split-radix FFT algorithms for complex, real, and real-symmetric data. *Acoustics, Speech and Signal Processing, IEEE Transactions on, 34*(2), 285-295.
- Duhamel, P., & Hollmann, H. (1984). Split radix'FFT algorithm. *Electronics letters*, *20*(1), 14-16.
- Duhamel, P., & Vetterli, M. (1990). Fast Fourier transforms: a tutorial review and a state of the art. *Signal processing*, *19*(4), 259-299.
- Estimators, P. E. P. and Power Analyzer. URL: <u>https://www</u>. altera. com/support/support-resources/operation-and-testing/power/powpowerplay. html (visited on 04/09/2016).
- Ferreira, M. L., Barahimi, A., & Ferreira, J. C. (2016). Reconfigurable FPGAbased FFT processor for cognitive radio applications. Paper presented at the International Symposium on Applied Reconfigurable Computing.
- Flynn, M. J., & Luk, W. (2011). *Computer System Design: System-on-Chip*: John Wiley & Sons.
- Frigo, M. (1999). *A fast Fourier transform compiler.* Paper presented at the Acm sigplan notices.
- Frigo, M., & Johnson, S. G. (1998). FFTW: An adaptive software architecture for the FFT. Paper presented at the Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on.
- Garrido, M., Grajal, J., Sanchez, M., & Gustafsson, O. (2013). Pipelined radix-\$2^{k} \$ feedforward FFT architectures. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21*(1), 23-32.

- Garrido, M., Sanchez, M. A., Lopez-Vallejo, M. L., & Grajal, J. (2017). A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25*(1), 375-379. doi:10.1109/tvlsi.2016.2567784
- Goedecker, S. (1997). Fast Radix 2, 3, 4, and 5 Kernels for Fast Fourier Transformations on Computers with Overlapping Multiply--Add Instructions. *SIAM Journal on Scientific Computing, 18*(6), 1605-1611.
- Groginsky, H. L., & Works, G. A. (1970). A pipeline fast Fourier transform. *IEEE Transactions on Computers, 100*(11), 1015-1019.
- Guide, F. M. F. U. (2001). Altera Corporation. San Jose, CA.
- He, J., Ma, L., & Xu, X. (2010). *Design of a length-variable FFT processor.* Paper presented at the Multimedia Technology (ICMT), 2010 International Conference on.
- He, S., & Torkelson, M. (1998a). *Design and implementation of a 1024-point pipeline FFT processor*. Paper presented at the Custom Integrated Circuits Conference, 1998. Proceedings of the IEEE 1998.
- He, S., & Torkelson, M. (1998b). *Designing pipeline FFT processor for OFDM* (*de*) modulation. Paper presented at the Signals, Systems, and Electronics, 1998. ISSSE 98. 1998 URSI International Symposium on.
- He, S., & Torkelsson, M. (2000). Real-time pipeline fast fourier transform processors. In: Google Patents.
- Hemmert, K. S., & Underwood, K. D. (2005). An analysis of the doubleprecision floating-point FFT on FPGAs. Paper presented at the Field-Programmable Custom Computing Machines, 2005. FCCM 2005. 13th Annual IEEE Symposium on.
- Heo, K. L., Baek, J. H., Sunwoo, M. H., Jo, B. G., & Son, B. S. (2003). New in-place strategy for a mixed-radix FFT processor. Paper presented at the SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip].
- Ibrahim, M., Kamal, M., Khan, O., & Ullah, K. (2016). *Analysis of radix-2 decimation in time algorithm for FPGA co-processors.* Paper presented at the Computing, Electronic and Electrical Engineering (ICE Cube), 2016 International Conference on.
- Jain, A. K. (1989). *Fundamentals of digital image processing*: Prentice-Hall, Inc.
- JAMES, W., COOLEY, P. A. L., & Peter, D. W. (1967). Historical notes on the fast Fourier transform. *Proceedings of the IEEE, 55*(10), 1675.

- Jo, B. G., & Sunwoo, M. H. (2005). New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy. *Circuits and Systems I: Regular Papers, IEEE Transactions on, 52*(5), 911-919.
- Johnson, L. (1992). Conflict free memory addressing for dedicated FFT hardware. *Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, 39*(5), 312-316.
- Kim, J., Lee, J., & Cho, K. (2016). Design of 256-point FFT processor for 100 Gb/s coherent optical OFDM system. Paper presented at the Consumer Electronics (ISCE), 2016 IEEE International Symposium on.
- Konguvel, E., & Kannan, M. (2018). A Survey on FFT/IFFT Processors for Next Generation Telecommunication Systems. *Journal of Circuits, Systems and Computers, 27*(03), 1830001. doi:10.1142/s0218126618300015
- Krishna, E. H., Sivani, K., & Reddy, K. A. (2015). *Hardware implementation of OFDM transceiver using FPGA*. Paper presented at the Computer and Computational Sciences (ICCCS), 2015 International Conference on.
- Lee, H.-Y., & Park, I.-C. (2007). Balanced binary-tree decomposition for area-efficient pipelined FFT processing. *Circuits and Systems I: Regular Papers, IEEE Transactions on, 54*(4), 889-900.
- Lin, C.-T., Yu, Y.-C., & Van, L.-D. (2008). Cost-effective triple-mode reconfigurable pipeline FFT/IFFT/2-D DCT processor. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 16(8), 1058-1071.
- Lu, Q., Wang, X. a., & Niu, J. (2009). *A low-power variable-length FFT processor base on Radix-2 4 algorithm.* Paper presented at the Microelectronics & Electronics, 2009. PrimeAsia 2009. Asia Pacific Conference on Postgraduate Research in.
- Mangaiyarkarasi, V., & Paul, C. K. C. (2014). *Performance analysis between Radix2, Radix4, Mixed Radix4-2 and Mixed Radix8-2 FFT.* Paper presented at the Current Trends in Engineering and Technology (ICCTET), 2014 2nd International Conference on.
- Mookherjee, S., DeBrunner, L., & DeBrunner, V. A High Throughput and Low Power Radix-4 FFT Architecture.
- Nag, S. K., & Verma, H. K. (2000). Method for parallel-efficient configuring an FPGA for large FFTS and other vector rotation computations. In: Google Patents.

- O'Brien, J., Mather, J., & Holland, B. (1989). *A 200 MIPS single-chip 1 k FFT processor.* Paper presented at the Solid-State Circuits Conference, 1989. Digest of Technical Papers. 36th ISSCC., 1989 IEEE International.
- Oppenheim, A. V., Schafer, R. W., & Buck, J. R. (1989). *Discrete-time signal* processing (Vol. 2): Prentice-hall Englewood Cliffs.
- Oruklu, E., Xiao, X., & Saniie, J. (2012). Reduced memory and low power architectures for CORDIC-based FFT processors. *Journal of Signal Processing Systems, 66*(2), 129-134.
- Polat, G., Ozturk, S., & Yakut, M. (2015). Design and Implementation of 256-Point Radix-4 100 Gbit/s FFT Algorithm into FPGA for High-Speed Applications. *ETRI Journal*, *37*(4), 667-676.
- Porcello, J. C. (2013). Designing and implementing Multibeam Smart Antennas for high bandwidth UAV communications using FPGAs. Paper presented at the Aerospace Conference, 2013 IEEE.
- Qian, Z., & Margala, M. (2016). Low-power split-radix FFT processors using radix-2 butterfly units. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24*(9), 3008-3012.
- Qian, Z., Nasiri, N., Segal, O., & Margala, M. (2014). *FPGA implementation of low-power split-radix FFT processors.* Paper presented at the Field Programmable Logic and Applications (FPL), 2014 24th International Conference on.
- Saenz, S. J., Cisneros, S. O., & Dominguez, J. R. (2015). FPGA design and implementation of radix-2 fast fourier transform algorithm with 16 and 32 points. Paper presented at the Power, Electronics and Computing (ROPEC), 2015 IEEE International Autumn Meeting on.
- Saleh, H. H., Mohammad, B. S., & Maalouf, M. (2013). A high-throughput, contention-free low-power, Radix-2 1k, 2k, 4k and 8k-point fast fourier transform engine using 28nm standard-cell process. Paper presented at the Design & Technology of Integrated Systems in Nanoscale Era (DTIS), 2013 8th International Conference on.
- Sanchez, M., Garrido, M., Vallejo, M. L., & Lopez-Barrio, C. (2006). *Automated design space exploration of FPGA-based FFT architectures based on area and power estimation.* Paper presented at the Field Programmable Technology, 2006. FPT 2006. IEEE International Conference on.
- Sarada, V., & Vigneswaran, T. (2013). Reconfigurable FFT processor–A broader perspective survey. *International Journal of Engineering and Technology*, *5*(2), 949-956.

- Shang, L., Kaviani, A. S., & Bathala, K. (2002). Dynamic power consumption in Virtex<sup>™</sup>-II FPGA family. Paper presented at the Proceedings of the 2002 ACM/SIGDA tenth international symposium on Fieldprogrammable gate arrays.
- Shirazi, N., Athanas, P. M., & Abbott, A. L. (1995). *Implementation of a 2-D fast Fourier transform on an FPGA-based custom computing machine.* Paper presented at the International Workshop on Field Programmable Logic and Applications.
- Shirazi, N., Walters, A., & Athanas, P. (1995). Quantitative analysis of floating point arithmetic on FPGA based custom computing machines. Paper presented at the FPGAs for Custom Computing Machines, 1995. Proceedings. IEEE Symposium on.
- Singleton, R. (1969). An algorithm for computing the mixed radix fast Fourier transform. *IEEE Transactions on audio and electroacoustics*, *17*(2), 93-103.
- Smith, S. W. (1997). The scientist and engineer's guide to digital signal processing.
- Sukhsawas, S., & Benkrid, K. (2004). A high-level implementation of a high performance pipeline FFT on Virtex-E FPGAs. Paper presented at the VLSI, 2004. Proceedings. IEEE Computer society Annual Symposium on.
- Sulaiman, N., & Arslan, T. (2005). A multi-objective genetic algorithm for onchip real-time optimisation of word length and power consumption in a pipelined FFT processor targeting a MC-CDMA receiver. Paper presented at the Evolvable Hardware, 2005. Proceedings. 2005 NASA/DoD Conference on.
- Sun, M., Tian, L., & Dai, D. (2012). *Radix-8 FFT processor design based on FPGA*. Paper presented at the Image and Signal Processing (CISP), 2012 5th International Congress on.
- Supriya, S., Rajeev, R., & Pawankumar, B. (2017). *Design and implementation of 4096 point FFT for satellite communication.* Paper presented at the Innovative Mechanisms for Industry Applications (ICIMIA), 2017 International Conference on.
- Takahashi, D. (2001). An extended split-radix FFT algorithm. *IEEE Signal Processing Letters, 8*(5), 145-147.
- Takeda, M., Ina, H., & Kobayashi, S. (1982). Fourier-transform method of fringe-pattern analysis for computer-based topography and interferometry. *JosA*, *72*(1), 156-160.

- Todman, T. J., Constantinides, G. A., Wilton, S. J., Mencer, O., Luk, W., & Cheung, P. Y. (2005). Reconfigurable computing: architectures and design methods. *IEE Proceedings-Computers and Digital Techniques*, 152(2), 193-207.
- Uzun, I. S., Amira, A., & Bouridane, A. (2005). *FPGA implementations of fast Fourier transforms for real-time signal and image processing.* Paper presented at the Vision, Image and Signal Processing, IEE Proceedings-.
- Voigt, S.-O., & Teufel, T. (2007). Dynamically Reconfigurable Dataflow Architecture for High-Performance Digital Signal Processing on Multi-FPGA Platforms. Paper presented at the Field Programmable Logic and Applications, 2007. FPL 2007. International Conference on.
- Wang, C.-C., Huang, J.-M., & Cheng, H.-C. (2005). A 2K/8K mode smallarea FFT processor for OFDM demodulation of DVB-T receivers. *IEEE Transactions on Consumer Electronics*, 51(1), 28-32.
- Wang, C., Yan, Y., & Fu, X. A High-Throughput Low-Complexity Radix-2'-2<sup>2</sup>-2<sup>3</sup> FFT/IFFT Processor With Parallel and Normal Input/Output Order for IEEE 802.11 ad Systems.
- Wey, C.-L., Tang, W.-C., & Lin, S.-Y. (2007). *Efficient VLSI implementation of memory-based FFT processors for DVB-T applications.* Paper presented at the VLSI, 2007. ISVLSI'07. IEEE Computer Society Annual Symposium on.
- White, S. A. (1990). Programmable windowing FFT device with reduced memory requirements. In: Google Patents.
- Widhe, T., Melander, J., & Wanhammar, L. (1997). Design of efficient radix-8 butterfly PEs for VLSI. Paper presented at the Circuits and Systems, 1997. ISCAS'97., Proceedings of 1997 IEEE International Symposium on.
- Wolf, W. (2004). FPGA-based system design: Pearson education.
- Xia, K.-F., Wu, B., Xiong, T., & Ye, T.-C. (2017). A Memory-Based FFT Processor Design With Generalized Efficient Conflict-Free Address Schemes. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 25(6), 1919-1929. doi:10.1109/tvlsi.2017.2666820
- Xia, K., Wu, B., Zhou, X., & Xiong, T. (2016). A generalized conflict-free address scheme for arbitrary 2k-point memory-based FFT processors. Paper presented at the Circuits and Systems (ISCAS), 2016 IEEE International Symposium on.

Xiao, X., Oruklu, E., & Saniie, J. (2009). *Fast memory addressing scheme for radix-4 FFT implementation.* Paper presented at the Electro/Information Technology, 2009. eit'09. IEEE International Conference on.

