

## **UNIVERSITI PUTRA MALAYSIA**

DESIGN AND IMPLEMENTATION OF REAL DATA FAST FOURIER TRANSFORM PROCESSOR ON FIELD PROGRAMMABLE GATES ARRAY

MOHAMMED KASSIM AHMED

FK 2015 39



## DESIGN AND IMPLEMENTATION OF REAL DATA FAST FOURIER TRANSFORM PROCESSOR ON FIELD PROGRAMMABLE GATES ARRAY



Thesis Submitted to the School of Graduate Studies, Universiti Putra Malaysia, in Fulfilment of the Requirements for the Degree of Master of Science

July 2015

All material contained within the thesis, including without limitation text, logos, icons, photographs and all other artwork, is copyright material of Universiti Putra Malaysia unless otherwise stated. Use may be made of any material contained within the thesis for non-commercial purposes from the copyright holder. Commercial use of material may only be made with the express, prior, written permission of Universiti Putra Malaysia.

Copyright © Universiti Putra Malaysia



## DEDICATIONS

To my parents,

The source of kindliness and love,

I wish I can be your pride.

To my brothers and sisters,

My continuous support and joy.

To my fiancée,

C.

Whose love is a source of inspiration and encouragement.

Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfilment of the requirement for the Degree of Master of Science

## DESIGN AND IMPLEMENTATION OF REAL DATA FAST FOURIER TRANSFORM PROCESSOR ON FIELD PROGRAMMABLE GATES ARRAY

By

#### MOHAMMED KASSIM AHMED

#### July 2015

# Chairman : Nasri bin Sulaiman, PhDFaculty : Engineering

The Fast Fourier Transform (FFT) is an efficient method to achieve the Discrete Fourier Transform (DFT) with less number of operations. FFT utilizes the symmetry property in the DFT to calculate the output. For input length of N=2m, FFT requires only  $Nlog_2N$  multiplications instead of the N<sup>2</sup> multiplications needed for the DFT.

Although FFT is mostly used to transform complex values into frequency domain, but still there are cases where the FFT inputs are only real values. In such cases, the transform will unnecessarily use higher power and area to calculate the output.

In order to calculate the DFT efficiently, various FFT algorithms and architectures have been presented. So far, different approaches of achieving FFT for real-valued inputs has been discussed and implemented. However, to the best of author's knowledge, no work has been found so far making direct experimental comparison between these approaches. Therefore, and for accurate comparison, this study aims to investigate the most common algorithms of RFFT on the same device and resources used. In this work, the memorybased FFT processor, based on radix-4 FFT algorithm is implemented on Cyclone II Field Programmable Gates Array (FPGA). In order to program the FPGA, Verilog Hardware Description Language (Verilog HDL) is used. The radix-4 FFT algorithm will be implemented for 64 and 256 points using both 8-bit and 16-bit input width. The results are compared and analyzed in terms of output accuracy and power consumption.

Two main RFFT approaches are discussed in this work. The Butterfly approach is mainly based on cancelling the unnecessary operations while maintaining the same FFT size. The Formula approach, on the other hand, uses a half size complex FFT to perform the full size RFFT. The results show that for 64-points, the Butterfly approach requires 82.95% of the power required by the complex FFT. Similarly, the Formula approach required 83.63% to perform the FFT which is nearly equal to the power required by the butterfly approach. For 256-points, in contrast, the results show that the Formula approach requires only 72.1% of the power required by the complex FFT. This is notably

fewer than the 76.62% required by the Butterfly approach. Additionally, the results show that the 8-bit input data width saves a lot of power with a slightly higher error rate compared to the 16-bit width. Therefore, unless high output accuracy is essential, the 8-bit can be efficiently adopted for FFT input.



Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia sebagai memenuhi keperluan untuk Ijazah Master Sains

## REKA BENTUK DAN IMPLEMENTASI JELMAAN FOURIER PANTAS DATA NYATA ATAS TATASUSUNAN LOGIK BOLEH ATURCARA MEDAN

Oleh

## MOHAMMED KASSIM AHMED

#### Julai 2015

Pengerusi : Nasri bin Sulaiman, PhD Fakulti : Kejuruteraan

Jelmaaan Fourier pantas (FFT) adalah satu kaedah yang cekap untuk mengira jelmaan Fourier diskret (DFT) dengan jumlah operasi yang lebih kecil. FFT menggunakan ciri kesimetrian dalam FFT untuk mengira keluaran. Bagi masukan dengan panjang N = 2m, FFT hanya memerlukan  $Nlog_2N$  pendaraban-pendaraban manakala DFT memerlukan  $N^2$  pendaraban-pendaraban.

Walaupun sebahagian besar FFT diperlukan untuk menjelmakan nilai-nilai kompleks ke dalam domain frekuensi, tetapi masih terdapat kes-kes di mana masukan FFT adalah nilai-nilai nyata sahaja. Dalam kes-kes tersebut, penjelmaan berkenaan akan menggunakan lebih kuasa yang tidak perlu untuk mengira keluaran.

Dalam usaha untuk mengira DFT secara cekap, pelbagai algoritma dan senibina FFT telah dibentangkan. Setakat ini, pendekatan-pendekatan untuk mengira FFT bagi masukan-masukan nyata telah dibincangkan dan dilaksanakan. Walau bagaimapun, sepanjang pengetahuan terbaik penulis, tiada penyelidikan yang dijumpai setakat ini yang membuat pembandingan ujikaji secara langsung di antara pendekatan-pendekatan tersebut. Oleh yang demikian, untuk perbandingan yang tepat, kajian ini bertujuan untuk menyelidik algorithma-algorithma yang paling biasa bagi RFFT atas peranti dan sumber-sumber yang sama digunakan. Dalam kajian ini, pemproses FFT berasakan ingatan, berdasarkan radiks-4 dilaksanakan atas Cyclone II tatasusunan logic boleh aturcara medan (FPGA). Bagi tujuan memprogram FPGA, bahasa perihalan perkakasan verilog (verilog HDL) telah digunakan. Algorithma untuk radiks-4 FFT akan dilaksanakan bagi 64 dan 256 titik menggunakan kedua-dua masukan 8-bit dan 16-bit lebar. Keputusan-keputusan dianalisa dan dibanding dari segi ketepatan keluaran dan penggunaan kuasa.

 $\bigcirc$ 

Dua pendekatan RFFT yang utama dibincangkan dalam kajian ini. Pendekatan kupukupu adalah berdasarkan kepada keutamaan pembatalan operasi-operasi yang tidak perlu sementara mengekalkan saiz FFT. Pendekatan formula, menggunakan separuh saiz FFT kompleks untuk melaksanakan RFFT saiz penuh. Hasil kajian menunjukkan bahawa 64titik FFT, pendekatan kupu-kupu memerlukan 82.95% daripada kuasa yang diperlukan oleh FFT kompleks. Begitu juga, pendekatan formula memerlukan 83.63% untuk melaksanakan FFT, di mana ianya hampir sama dengan kuasa yang diperlukan oleh pendekatan kupu-kupu. Bagi 256-titik, keputusan-keputusan menunjukkan sebaliknya bahawa pendekatan formula hanya memerlukan 72.1% daripada kuasa yang diperlukan oleh FFT kompleks. Ianya jelas kurang daripada 76.62% yang diperlukan oleh pendekatan kupu-kupu. Tambahan pula, keputusan-keputusan menunjukkan masukan 8-bit lebar menjimatkan banyak kuasa dengan kadar ralat yang lebih tinggi sedikit berbanding dengan masukan 16-bit lebar. Oleh yang demikian, jika bukan ketepatan keluaran adalah lebih utama, masukan 8-bit lebar boleh diterima pakai untuk masukan FFT yang berkesan.



### ACKNOWLEDGEMENTS

I would like to show my gratefulness to my supervisor, Dr. Nasri bin sulaiman, for his support, patience, and encouragement through my studying period. It is not frequent that one gets a supervisor that always has the time for listening to the tiny issues and obstructions that unavoidably show up in the course of performing research. His technical and editorial advices were essential to the completion of this dissertation and he has taught me innumerable lessons and insights on the workings of academic research in general.

I also like to extend my thanks to my co-supervisor, Dr. Nurul Amziah Bt. Md. Yunus for her widely help, support, and precious comments during my research time.

My thanks also go to my friends and colleagues for their encouragement and motivation.

Finally, I would like to thank my family, especially my parents who supported me allover my life. Without their prayers and support, I would never be able to do what I've done. This thesis was submitted to the Senate of Universiti Putra Malaysia and has been accepted as fulfilment of the requirement for the degree of Master of Science. The members of the Supervisory Committee were as follows:

Nasri b. Sulaiman, PhD Senior Lecturer Faculty of Engineering Universiti Putra Malaysia (Chairman)

## Nurul Amziah Bt. Md. Yunus, PhD

Senior Lecturer Faculty of Engineering Universiti Putra Malaysia (Member)

## **BUJANG BIN KIM HUAT, PhD** Professor and Dean School of Graduate Studies Universiti Putra Malaysia

Date:

## TABLE OF CONTENTS

|                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Page                                                                                                     |
|----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| ABSTRAG<br>ABSTRAK<br>ACKNOW<br>APPROV<br>DECLAR<br>LIST OF I<br>LIST OF A | CT<br>VLEDGEMENTS<br>AL<br>ATION<br>FIGURES<br>ABBREVIATIONS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | i<br>iii<br>v<br>vi<br>viii<br>xiii<br>xiii                                                              |
| CHAPTE                                                                     | R                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                          |
| 1 IN<br>1.1<br>1.2<br>1.3<br>1.4<br>1.5                                    | TRODUCTION<br>Problem Background<br>Problems Statement<br>Objectives of the Research<br>Research Scope<br>Organization of the Thesis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 1<br>1<br>1<br>2<br>2<br>2                                                                               |
| 2 LI <sup>*</sup><br>2.1<br>2.2<br>2.3<br>2.4<br>2.5                       | <b>TERATURE REVIEW</b> IntroductionFFT Algorithms2.2.1The Radix-2 FFT Algorithm2.2.2The Radix-4 FFT Algorithm2.2.3RFFT AlgorithmsFFT Architecture2.3.1Memory Based Architecture2.3.2Pipelined Architecture2.3.3Array ArchitectureField Programmable Gates Array (FPGA)Power Consumption Significance                                                                                                                                                                                                                                                                                                                                                                     | 4<br>4<br>5<br>6<br>8<br>12<br>12<br>13<br>14<br>14<br>15                                                |
| <b>3</b> MI<br>3.1<br>3.2<br>3.3<br>3.4                                    | <ul> <li>ETHODOLOGY</li> <li>Introduction</li> <li>FFT Design</li> <li>3.2.1 Choosing a suitable Radix algorithm</li> <li>3.2.2 Choosing the FFT architecture</li> <li>3.2.3 Choosing suitable Altera FPGA resources to be used</li> <li>3.2.4 Choosing Input and output word length</li> <li>3.2.5 Designing the FFT algorithm for complex data</li> <li>3.2.6 Designing the FFT algorithm for real data</li> <li>Algorithms' Validation</li> <li>3.3.1 The Complex FFT Validation</li> <li>3.3.2 The Butterfly Approach Validation</li> <li>3.3.3 The Formula Approach Validation</li> <li>3.4.1 The Butterfly Approach</li> <li>3.4.2 The Formula Approach</li> </ul> | 16<br>16<br>17<br>17<br>17<br>18<br>42<br>21<br>21<br>23<br>23<br>23<br>26<br>28<br>30<br>30<br>30<br>32 |

|                      | 3.5    | Coding and System Implementation       | 33 |  |
|----------------------|--------|----------------------------------------|----|--|
|                      | 3.6    | Measuring the Outputs                  | 35 |  |
|                      |        | 3.6.1 Power Consumption                | 35 |  |
|                      |        | 3.6.2 Output Accuracy                  | 35 |  |
| 4                    | RESU   | ULTS AND DISCUSSION                    | 37 |  |
|                      | 4.1    | Introduction                           | 37 |  |
|                      | 4.2    | Accuracy – Power Consumption Trade-off | 37 |  |
|                      |        | 4.2.1 The 64-Point FFT                 | 37 |  |
|                      |        | 4.2.2 The 256-Point FFT                | 41 |  |
|                      |        | 4.2.3 Discussion                       | 46 |  |
|                      | 4.3    | Measuring the Outputs                  | 47 |  |
|                      |        | 4.3.1 The 64-Point FFT                 | 47 |  |
|                      |        | 4.3.2 The 256-Point FFT                | 47 |  |
|                      | 4.4    | Measuring the Outputs                  | 48 |  |
|                      |        | 4.4.1 The 64-Point FFT                 | 48 |  |
|                      |        | 4.4.2 The 256-Point FFT                | 49 |  |
|                      |        | 4.4.3 Discussion                       | 50 |  |
| 5                    | SUM    | MARY, CONCLUSION AND RECOMMENDATIONS   | 51 |  |
|                      | FOR    | FUTURE STUDIES                         |    |  |
|                      | 5.1    | Conclusions                            | 51 |  |
|                      | 5.2    | Future Work                            | 51 |  |
|                      |        |                                        |    |  |
| REFI                 | ERENC  | ES                                     | 52 |  |
| APPE                 | ENDIC  | ES                                     | 57 |  |
| BIODATA OF STUDENT 5 |        |                                        |    |  |
| PUBI                 | LICATI | ION                                    | 60 |  |
|                      |        |                                        |    |  |
|                      |        |                                        |    |  |

C

## LIST OF FIGURES

| Table |                                                                         | Page |
|-------|-------------------------------------------------------------------------|------|
| 2.1   | Radix-2 DIF butterfly unit                                              | 5    |
| 2.2   | Flow graph of 8-point Radix-2 FFT                                       | 6    |
| 2.3   | Radix-2 DIF butterfly unit                                              | 7    |
| 2.4   | Flow graph of 16-point Radix-4 FFT                                      | 8    |
| 2.5   | Emergence of imaginary data in DIF FFT                                  | 9    |
| 2.6   | Flow graph of the Formula approach of RFFT                              | 10   |
| 2.7   | Single Memory FFT Architecture                                          | 12   |
| 2.8   | Dual Memory FFT Architecture                                            | 13   |
| 2.9   | Cache Memory FFT Architecture                                           | 13   |
| 2.10  | Pipelined FFT Architecture                                              | 13   |
| 2.11  | Array FFT Architecture                                                  | 14   |
| 3.1   | Methodology Phases                                                      | 16   |
| 3.2   | Multiplier Implementation Results                                       | 18   |
| 3.3   | Memory Implementation Results                                           | 19   |
| 3.4   | FFT linearity validation results                                        | 20   |
| 3.5   | FFT Formula approach stages                                             | 22   |
| 3.6   | Combining two N/4 FFT to obtain one N/2-point FFT                       | 22   |
| 3.7   | Output of Radix-4 DIF 64-point FFT validation                           | 24   |
| 3.8   | Output of Radix-4 DIF 256-point FFT validation                          | 26   |
| 3.9   | Output of Butterfly approach algorithm validation                       | 28   |
| 3.10  | Output of Formula approach algorithm validation                         | 30   |
| 3.11  | FFT System Illustration                                                 | 34   |
| 3.12  | FSM Transitions                                                         | 34   |
| 4.1   | Input data of 64-point complex FFT                                      | 37   |
| 4.2   | Output timing diagrams for 64-point complex FFT                         | 38   |
| 4.3   | Output data of 64-point complex FFT                                     | 38   |
| 4.4   | Output error percentage of 64-point complex FFT                         | 39   |
| 4.5   | Accuracy for 64-point complex FFT                                       | 40   |
| 4.6   | Power consumption comparison for 8-bit and 16-bit, 64-point complex FFT | 41   |
| 4.7   | Input data of 256-point complex FFT                                     | 42   |
| 4.8   | Output timing diagrams for 256-point complex FFT                        | 42   |
| 4.9   | Output data of 256-point complex FFT                                    | 43   |
| 4.10  | Output error percentage of 256-point complex FFT                        | 66   |
| 4.11  | Accuracy for 256-point complex FFT                                      | 45   |
| 4.12  | Power consumption comparison for 8-bit and 16-bit, 256-point            | 46   |
|       | complex FFT                                                             |      |
| 4.13  | Output timing diagrams for 64-point RFFT                                | 47   |
| 4.14  | Output timing diagrams for 256-point RFFT                               | 47   |
| 4.15  | Power consumption of 64-point RFFT                                      | 48   |
| 4.16  | Power consumption of 256-point RFFT                                     | 49   |

6

## LIST OF ABBREVIATIONS

| ASIC<br>CPU<br>DET | Application-Specific Integrated Circuit<br>Central Processing Unit |
|--------------------|--------------------------------------------------------------------|
| DFI<br>DIF         | Discrete Fourier Transform<br>Decimation In Frequency              |
| DIT                | Decimation In Time                                                 |
| DSP                | Digital Signal Processing                                          |
| FFT                | Fast Fourier Transform                                             |
| FPGA               | Field Programmable Gates Array                                     |
| FSM                | Finite State Machine                                               |
| RFFT               | Real Fast Fourier Transform                                        |



 $\mathbf{G}$ 

## **CHAPTER 1**

## INTRODUCTION

## 1.1 Background

Nowadays, one of the most significant operations in digital signal processing (DSP) is the Discrete Fourier transform (DFT). Due to the effectiveness of the convolutional property, the DFT is frequently used in linear filtering which exists in a wide range of applications, for instance, quantum mechanics (Walker, 1996), noise reduction (Krasny and Oraintara, 1998), and image reconstruction (Jain, 1989). However, even the finite length DFT signal requires a relatively unaffordable computations in order to be completed. Particularly, for an input signal of length N, its DFT direct calculation will need N<sup>2</sup> complex multiplications. However, Cooley and Tukey proposed the fast Fourier transform (FFT), which will calculate the DFT effectively with only N\*log<sub>2</sub>N operations (Cooley and Tukey, 1965). It is one of the widely used algorithms in digital signal processing (He and Torkelson, 1998; Oppenheim et al., 1989). Since the FFT reemergence in 1965, a lot of researches and developments have been made to extend and enhance Cooley and Tukey's original contribution.

Along with that, there has been an increasing interest in the computation of FFT for real valued signals (RFFT). Since virtually most of the physical signals are real. The real valued signals are of major significance in real-time signal processing. It plays an important role in different fields such as communication systems, biomedical applications, sensors and radar signal processing. Having the conjugate symmetry will cause a large number of unneeded operations. This property could be used to reduce the dissipated power and the system.

On the other hand, due to the rapid growth of FPGA technology, it is widely used in the DSP field nowadays. It has a flexible architecture that can be reconfigured according to user's project requirements to proficiently calculate definite algorithms. Its high capacity and performance along with the low cost makes it the perfect choice for FFT processors.

## **1.2 Problem Statement**

In order to implement an efficient FFT processor on FPGA, the design must be simple yet satisfactory. Simpler systems will obviously utilize less hardware resources and consume less power. To achieve a simple and sufficient design, both FFT algorithm and architecture must be designed competently.

To improve the FFT algorithm for real inputs, several searches have been done. The most recent ones where done by (Ayinala and Parhi, 2013; Ayinala et al., 2013; Glittas and Lakshminarayanan, 2014; Salehi et al., 2013; Zode et al., 2014).

So far, many approaches were presented by researchers to achieve the FFT of real data. Among these approaches, there are two main approaches that were recently used by researchers. The first approach is based on eliminating the redundant calculations and calculating half of the output values since the other half of values are the complex conjugates of the first half (Chu and George, 1999). The second approach relies on calculating an N-point real-valued FFT using N/2-point complex FFT, with only few extra operations to retain the N length output. Therefore, it is necessary to implement both approaches on the FPGA and compare them with the results of the complex FFT processor. These comparison results can be used to decide the suitable approach which offers the most efficient processor.

Similarly, improvements on the FFT architecture are also necessary for an efficient processor. Additionally, choosing a suitable word-length for both data and twiddle factors is one of the most important steps in the FFT processor design. A trade-off between the accuracy and power must be done and decided by the designer.

## **1.3** Objectives of the research

The aim of this research is to design an efficient FFT processor for real-valued inputs. To achieve this aim, the following steps are intended:

- 1- To assess the RFFT algorithms and choose the most powerful one by implementing the complex FFT and the real FFTs on FPGA.
- 2- To identify a suitable trade-off between input word-length, output accuracy, and power dissipation by designing FFT for 8-bit and 16-bit input word length.

## 1.4 Research scope

This research aims to design RFFT processors of the most commonly used RFFT algorithms. No new algorithms are proposed, but the current algorithms are compared and implemented on FPGA. The FFT is designed based on Radix 4 algorithm for both 64 and 256 points. Two systems were designed using 8-bit and 16-bit input word-length. While 16-bit word length is used for twiddle factors.

The processor speed is set to the maximum of the Altera DE2C70 FPGA frequency which is 50 MHz. No external clocks are used, so the FPGA PLLs are not utilized. The Accuracy of the system is measured from the hardware implementation for real results. While the power is measured using Altera PowerPlay power analyzer tool.

#### 1.5 **Research scope**

This thesis is basically divided into five chapters:

Chapter 1: This chapter involves general background to the field of the research. Then it explains the problem statement and the research objectives to solve the research problem. Finally, the scope of the work is discussed in this chapter.

Chapter 2: This chapter presents a literature review to the previous researches related to the field of the study. It will discuss the FFT algorithm and FFT architecture. The RFFT is also discussed in details with some information about FPGAs.

Chapter 3: This chapter discusses the methodology used to reach the research objectives. It starts with designing the FFT processor. After that, the FFT algorithms are being validated and implemented on the processor for different N-point values. Finally, the FFT output values are verified to decide if the system is working properly.

Chapter 4: This chapter will discuss the results and findings of the research work.

Chapter 5: This chapter will conclude the research work and some future works are suggested.



#### REFERENCES

- Ayinala, M.; Lao, Y.; Parhi, K. K. An In-Place FFT Architecture for Real-Valued Signals. IEEE Trans.on Circuits and Systems 2013, 60, 652-656.
- Ayinala, M.; Parhi, K. K. FFT Architectures for Real-Valued Signals Based on Radixand Radix-Algorithms. *Circuits and Systems I: Regular Papers, IEEE Transactions on* 2013, 60, 2422-2430.
- Ayinala, M.; Parhi, K. K. In *In Parallel-pipelined radix-2 2 FFT architecture for real valued signals;* Signals, Systems and Computers (ASILOMAR), 2010 Conference Record of the Forty Fourth Asilomar Conference on; IEEE: 2010; , pp 1274-1278.
- Baas, B. M. A low-power, high-performance, 1024-point FFT processor. *Solid-State Circuits, IEEE Journal of* **1999**, *34*, 380-387.
- Bergland, G. D. Numerical Analysis: A fast fourier transform algorithm for real-valued series. *Commun ACM* **1968**, *11*, 703-710.
- Bruun, G. z-transform DFT filters and FFT's. Acoustics, Speech and Signal Processing, IEEE Transactions on 1978, 26, 56-63.
- Chang, Y.; Parhi, K. K. In *In Efficient FFT implementation using digit-serial arithmetic;* Signal Processing Systems, 1999. SiPS 99. 1999 IEEE Workshop on; IEEE: 1999; , pp 645-653.
- Chao, C.; Qin, Z.; Yingke, X.; Chengde, H. In *In Design of a high performance FFT* processor based on FPGA; Proceedings of the 2005 Asia and South Pacific Design Automation Conference; ACM: 2005; , pp 920-923.
- Cheng, L. S.; Miri, A.; Yeap, T. H. In In Efficient FPGA implementation of FFT based multipliers; Electrical and Computer Engineering, 2005. Canadian Conference on; IEEE: 2005; , pp 1300-1303.
- Cheng, M.; Chen, L.; Hung, Y.; Yang, C. M. In *In A real-time maximum-likelihood heart-rate estimator for wearable textile sensors;* Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE; IEEE: 2008; , pp 254-257.
- Chi, H.; Lai, Z. In In A cost-effective memory-based real-valued FFT and Hermitian symmetric IFFT processor for DMT-based wire-line transmission systems; Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on; IEEE: 2005a; , pp 6006-6009.
- Chi, H.; Lai, Z. In In A cost-effective memory-based real-valued FFT and Hermitian symmetric IFFT processor for DMT-based wire-line transmission systems;

Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on; IEEE: 2005b; , pp 6006-6009.

- Chiueh, T.; Tsai, P. OFDM Baseband Receiver Design for Wireless Communications; John Wiley & Sons: 2008; .
- Chu, E.; George, A. Inside the FFT black box: serial and parallel fast Fourier transform algorithms; CRC Press: 1999; .
- Cooley, J. W.; Lewis, P. A.; Welch, P. D. The fast Fourier transform and its applications. *Education, IEEE Transactions on* **1969**, *12*, 27-34.
- Cooley, J. W.; Tukey, J. W. An algorithm for the machine calculation of complex Fourier series. *Mathematics of computation* **1965**, *19*, 297-301.
- Duhamel, P. Implementation of "split-radix" FFT algorithms for complex, real, and real-symmetric data. Acoustics, Speech and Signal Processing, IEEE Transactions on **1986**, 34, 285-295.
- Duhamel, P.; Vetterli, M. Fast Fourier transforms: a tutorial review and a state of the art. *Signal Process* **1990**, *19*, 259-299.
- Frigo, M.; Johnson, S. G. In *In FFTW: An adaptive software architecture for the FFT;* Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on; IEEE: 1998; Vol. 3, pp 1381-1384.
- Garrido, M.; Parhi, K. K.; Grajal, J. A pipelined FFT architecture for real-valued signals. *Circuits and Systems I: Regular Papers, IEEE Transactions on* **2009**, *56*, 2634-2643.
- Glittas, A. X.; Lakshminarayanan, G. In *In Pipelined FFT architectures for real-time signal processing and wireless communication applications;* VLSI Design and Test, 18th International Symposium on; IEEE: 2014; , pp 1-2.
- He, S.; Torkelson, M. In *In Designing pipeline FFT processor for OFDM (de) modulation;* Signals, Systems, and Electronics, 1998. ISSSE 98. 1998 URSI International Symposium on; IEEE: 1998; , pp 257-262.
- Heideman, M.; Burrus, C.; Johnson, H. In *In Prime factor FFT algorithms for real-valued series;* Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'84. IEEE: 1984; Vol. 9, pp 492-495.
- Hemmert, K. S.; Underwood, K. D. In *In An analysis of the double-precision floatingpoint FFT on FPGAs;* Field-Programmable Custom Computing Machines, 2005. FCCM 2005. 13th Annual IEEE Symposium on; IEEE: 2005; , pp 171-180.
- Ibraheem, O. W.; Khamiss, N. N. In *In Design and simulation of asymmetric digital* subscriber line (ADSL) modem; Information and Communication Technologies:

From Theory to Applications, 2008. ICTTA 2008. 3rd International Conference on; IEEE: 2008; , pp 1-6.

- Jain, A. K. Fundamentals of digital image processing; prentice-Hall Englewood Cliffs: 1989; Vol. 3.
- JAMES, W.; COOLEY, P. A. L.; Peter, D. W. Historical notes on the fast Fourier transform. *Proc IEEE* **1967**, *55*, 1675.
- Johnson, L. Conflict free memory addressing for dedicated FFT hardware. *Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on* **1992**, 39, 312-316.
- Krasny, L.; Oraintara, S. Noise reduction for speech signals using voice activity detector. *Ericsson, Inc., Research Triangle Park, NC, EUS/TR/X-98* **1998**, *1662*.
- Lee, H.; Park, I. Balanced binary-tree decomposition for area-efficient pipelined FFT processing. *Circuits and Systems I: Regular Papers, IEEE Transactions on* **2007**, *54*, 889-900.
- Lin, C.; Yu, Y.; Van, L. Cost-effective triple-mode reconfigurable pipeline FFT/IFFT/2-D DCT processor. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 2008, 16, 1058-1071.
- Lin, Y.; Lee, C. Design of an FFT/IFFT processor for MIMO OFDM systems. *Circuits and Systems I: Regular Papers, IEEE Transactions on* **2007**, *54*, 807-815.
- Murakami, H. Real-valued decimation-in-time and decimation-in-frequency algorithms. *Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on* **1994**, *41*, 808-816.
- Netoff, T.; Park, Y.; Parhi, K. In In Seizure prediction using cost-sensitive support vector machine; Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE; IEEE: 2009; , pp 3322-3325.
- Oppenheim, A. V.; Schafer, R. W.; Buck, J. R. *Discrete-time signal processing;* Prentice-hall Englewood Cliffs: 1989; Vol. 2.
- Salehi, S. A.; Amirfattahi, R.; Parhi, K. K. Pipelined Architectures for Real-Valued FFT and Hermitian-Symmetric IFFT With Real Datapaths. *Circuits and Systems II: Express Briefs, IEEE Transactions on* **2013**, *60*, 507-511.
- Sanchez, M.; Garrido, M.; Vallejo, M. L.; Lopez-Barrio, C. In *In Automated design* space exploration of FPGA-based FFT architectures based on area and power estimation; Field Programmable Technology, 2006. FPT 2006. IEEE International Conference on; IEEE: 2006; , pp 127-134.

- Sanchez, M. A.; Garrido, M.; Lopez-Vallejo, M.; Grajal, J. Implementing FFT-based digital channelized receivers on FPGA platforms. *Aerospace and Electronic Systems, IEEE Transactions on* 2008, 44, 1567-1585.
- Sekhar, B. R.; Prabhu, K. Radix-2 decimation-in-frequency algorithm for the computation of the real-valued FFT. *Signal Processing, IEEE Transactions on* 1999, 47, 1181-1184.
- Sorensen, H. V.; Jones, D.; Burrus, C. In *In Real-valued algorithms for the FFT;* Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'87. IEEE: 1987a; Vol. 12, pp 1831-1834.
- Sorensen, H. V.; Jones, D. L.; Heideman, M.; Burrus, C. S. Real-valued fast Fourier transform algorithms. Acoustics, Speech and Signal Processing, IEEE Transactions on 1987b, 35, 849-863.
- Storn, R. Efficient input reordering for the DCT based on a real-valued decimation-intime FFT. *Signal Processing Letters, IEEE* **1996**, *3*, 242-244.
- Sukhsawas, S.; Benkrid, K. In *In A high-level implementation of a high performance pipeline FFT on Virtex-E FPGAs;* VLSI, 2004. Proceedings. IEEE Computer society Annual Symposium on; IEEE: 2004; , pp 229-232.
- Sulaiman, N.; Arslan, T. In In A multi-objective genetic algorithm for on-chip real-time optimisation of word length and power consumption in a pipelined FFT processor targeting a MC-CDMA receiver; Evolvable Hardware, 2005. Proceedings. 2005 NASA/DoD Conference on; IEEE: 2005; , pp 154-159.
- Todman, T. J.; Constantinides, G. A.; Wilton, S. J.; Mencer, O.; Luk, W.; Cheung, P.
   Y. Reconfigurable computing: architectures and design methods. *IEE Proceedings-Computers and Digital Techniques* 2005, 152, 193-207.
- Uniyal, P. Transforming real-valued sequences: fast Fourier versus fast Hartley transform algorithms. *IEEE transactions on signal processing* **1994**, *42*, 3249-3254.
- Uzun, I. S.; Amira, A.; Bouridane, A. In *In FPGA implementations of fast Fourier transforms for real-time signal and image processing;* Vision, Image and Signal Processing, IEE Proceedings-; IET: 2005; Vol. 152, pp 283-296.
- Voigt, S.; Baesler, M.; Teufel, T. Dynamically reconfigurable dataflow architecture for high-performance digital signal processing. J. Syst. Archit. 2010, 56, 561-576.
- Walker, J. S. Fast fourier transforms; CRC press: 1996; Vol. 24.
- Wang, A.; Chandrakasan, A. P. In *In Energy-aware architectures for a real-valued FFT implementation;* Proceedings of the 2003 international symposium on Low power electronics and design; ACM: 2003; , pp 360-365.

- Widhe, T.; Melander, J.; Wanhammar, L. In *In Design of efficient radix-8 butterfly PEs for VLSI;* Circuits and Systems, 1997. ISCAS'97., Proceedings of 1997 IEEE International Symposium on; IEEE: 1997; Vol. 3, pp 2084-2087.
- Ziegler, H. A fast Fourier transform algorithm for symmetric real-valued series. *Audio* and Electroacoustics, IEEE Transactions on **1972**, 20, 353-356.
- Zode, P.; Thor, A.; Deshmukh, A. In *In Folded FFT architecture for real-valued* signals based on Radix-2 3 algorithm; Devices, Circuits and Systems (ICDCS), 2014 2nd International Conference on; IEEE: 2014; , pp 1-4.

