1 Introduction

Recent studies have shown that most of the assigned radio spectrum is underutilized. On the other hand, the increasing number of wireless multimedia applications leads to increasing spectrum scarcity. Cognitive Radio [1, 2] is proposed as a promising technology to address the paradox of spectrum scarcity and spectrum under-utilization. In Cognitive Radio, spectrum sensing locates the unused spectrum segments in a targeted spectrum pool. These segments will be used optimally without harmful interference to licensed users. This technology is called spectrum pooling [3]. In spectrum pooling, orthogonal frequency-division multiplexing (OFDM) is proposed as the baseband transmission scheme. Those subcarries which cause interference to licensed users should be nullified. Therefore, OFDM based Cognitive Radio has to be reconfigurable to be adaptive to the changing wireless channels. This reconfigurability has to be supported by a reconfigurable platform. Our research undertaken in the Adaptive Ad-hoc Freeband (AAF) project (www.freeband.nl) focuses on mapping the baseband algorithms of Cognitive Radio onto a reconfigurable platform.

2 Multiprocessor system-on-chip for cognitive radio

Cognitive Radio is seen as an evolution from the software-defined radio platform [1]. However, the traditional software-defined radio platform for digital processing is mainly based on General Purpose Processors and Digital Signal Processors which are inadequate for future high data rate wireless communications in terms of processing speed and energy efficiency. With the advance of the semiconductor technology, the future trend of wireless baseband processors is moving toward Multiprocessor System-on-Chips (MPSoCs) which integrate heterogeneous processing elements tailored for different processing tasks. MPSoCs offer high performance, reconfigurability and energy efficiency. Therefore, we proposed a tiled MPSoC (see Fig. 1) architecture to support Cognitive Radio [4]. These tiles can be various processing elements including General Purpose Processors, Field Programmable Gate Arrays, Application Specific Integrated Circuits and Domain Specific Reconfigurable Hardware (DSRH) modules which target the specific algorithm domains. The Montium [5] tile processor developed at the University of Twente is an example of a DSRH. It targets the digital signal processing algorithm domain and has the flexibility to adapt to different algorithms in an energy-efficient manner. Therefore, the Montium tiled processor is the key element in our proposed reconfigurable platform for Cognitive Radio. The tiles in the SoC are interconnected by a Network-on-Chip (NoC). Both the SoC and NoC are dynamically reconfigurable, which means that the programs (running on the reconfigurable processing elements) as well as the communication links between the processing elements are configured at run-time.

Figure 1
figure 1

Heterogeneous multiprocessor tiled SoC

3 TTL design methodology

MPSoCs offer many advantages as described in the previous section. However, it is a challenging task to map applications onto MPSoCs. First, the applications to be mapped on the MPSoC become more complex: they consist of more and more tasks and some of the tasks may change their behavior dynamically. Second, in order to map tasks to different components on an MPSoC, designers have to deal with the low-level interfaces for the inter-component communication and synchronization which become a bottleneck from a performance and an energy point of view. Further, opportunities for the reuse of hardware and software modules are limited and no method exists for exploring their trade-offs. Therefore, there is a gap between the application models used for specification and the optimized implementation of the application on an MPSoC. A task transaction level (TTL) interface approach [6] was proposed to help to close the gap by raising the abstraction level. We propose to use the TTL approach both for developing the Cognitive Radio application at the system level and as a platform interface for implementing the application onto the MPSoC architecture.

In the TTL approach, an application is modelled as a task graph. A task is an entity that performs computations. One task may communicate with other tasks via channels. Communications are invoked by calling TTL interface functions. Computation components can be plugged-in and replaced as functions, which allows exploring and validating design alternatives at a high level of abstraction. The TTL model of the application can be implemented on the targeted platform providing that the TTL shells are available for each processor. A design example on an OFDM receiver for HiperLAN/2 was given in [7]. The TTL approach is extended for modelling reconfigurable applications such as adaptive baseband processing in Cognitive Radio.

4 Design case: a low complexity fast Fourier transform for OFDM based cognitive radio

In this paper, we present a design case of a reconfigurable low complexity fast Fourier transform (FFT) algorithm for OFDM based Cognitive Radio to demonstrate the TTL approach. The reasons to choose this design case are: 1) The FFT is an essential and the most computational intensive task in OFDM baseband processing; 2) the algorithm is novel in the context of Cognitive Radio; 3) the algorithm is reconfigurable. We organize this section as follows. First, OFDM based Cognitive Radio is introduced. Then our proposed low-complexity FFT is explained and followed by the TTL implementation. Finally the results will be analyzed.

4.1 OFDM based cognitive radio

Theoretically, an OFDM-based Cognitive Radio system can optimally approach the Shannon capacity in the segmented spectrum by adaptive resource allocation on each subcarrier, which includes adaptive bit loading and adaptive power loading. In [4], we proposed an OFDM system with adaptive bit loading and power loading for Cognitive Radio. We could maximize the data rate of the system under a certain power constraint. It is formulated as follows:

$$ \begin{array}{lll} &{Max}~\ R=~\sum\limits_{k=1}^K \frac{F_{k}}{K}\log_{2}\left(1+\frac{h_{k}^2p_{k}}{N_{0}\frac{B}{K}}\right) \\ &{Subject\;to:} \sum\limits_{k=1}^Kp_{k}\leq P_{total} \\ &F_{k}\in\{0,1\}\ {for all}\ k \\ &p_{k}=0\ {for all}\ k\ {which satisfies}\ F_{k}=0\label{eq1} \end{array} $$
(1)

where R is the data rate; K is the number of the subcarriers; N 0 is the noise power density, B is the band of interest for Cognitive Radio, h k is the subcarrier gain and p k is the power allocated to the corresponding subcarrier. F k is the factor indicating the availability of subcarrier k to Cognitive Radio, where F k  = 1 means the kth carrier can be used by Cognitive Radio. The system power minimization can also be applied under the constraint of a constant data rate. We formulate it as follows:

$$ \begin{array}{lll} &{Min}\ \sum\limits_{k=1}^Kp_{k}=P_{total} \\ &{Subject\;to:}\ R=\sum\limits_{k=1}^K\frac{F_{k}}{K}\log_{2}\left(1+\frac{h_{k}^2p_{k}}{N_{0}\frac{B}{K}}\right) \\ &F_{k}\in\{0,1\}\ \text{for all}\ k \\ &p_{k}=0\ \text{for all}\ k\ \text{which satisfies}\ F_{k}=0\label{eq2} \end{array} $$
(2)

A functional diagram of the system is presented in Fig. 2. A bit allocation vector indicates how many bits are loaded on each subcarrier. The number of bits corresponds to the different modulation types used for each subcarrier. The bit allocation vector is determined by the spectrum occupancy information from spectrum sensing and the SNR of subchannels. The bit allocation vector is disseminated via a signaling channel, so that both transmitter and receiver have the same information. We assume the bit allocation vector does not change frequently for instance during several frames. The basic idea is to load more bits on good subcarriers and load zeros onto carriers which cause interference to the licensed user or lead to poor transmissions. The expectation is that there could be a large number of zero inputs/outputs for the IFFT/FFT when a large part of the spectrum is not available to Cognitive Radio or there are many bad channels. Therefore the normal radix-2 IFFT/FFT will be inefficient in this case due to the wasted operations on zeros.

Figure 2
figure 2

OFDM for cognitive radio

4.2 Sparse FFT

In mathematical terms, arrays or matrices where most of the elements have the same value (called the default value,usually 0) and only a few elements have a non-default value are sparse. It is beneficial and often necessary to take advantage of the sparse structure algorithmically to reduce the operations of the standard algorithms. Therefore, we propose a low complexity FFT algorithm as an option for OFDM based Cognitive Radio in [8]. We term this FFT algorithm sparse FFT. The algorithm is based on transform decomposition in [9], but has been tailored for our Cognitive Radio system. Transform decomposition can be seen as a modified Cooley–Tukey FFT where the DFT is decomposed into two smaller DFTs. The detailed mathematical derivation of the algorithm can be found in [9]; here we only show the computational structure in Fig. 3. We made some modifications to the original algorithm to facilitate efficient hardware implementations. We choose the total number of carrier N as a power-of-two integer and L is the number of non-zero outputs. N 1 is chosen as a nearest power-of-two integer larger than L. This choice of N 1 helps to exploit more regularities. Thus N 2 is also a power-of-two integer which satisfies N = N 1 N 2. The algorithm is decomposed into two major parts: the N 2 blocks of N 1-point DFTs which can be implemented as radix-2 FFTs and the multiplications with twiddle factors and the recombination of the multiplications. The reduction of computation comes from the second part where only L twiddle factors are multiplied with each \(X_{n_{2}}(\langle k \rangle_{N_{1}})\) (denoting modulo N 1) for n 2 = 1,2,...,N 2. According to the computational structure, we perform quantitative analysis on the computational complexity by counting the number of complex multiplications. So the number of multiplication for sparse FFT is \((N_{2}-1)*L + \frac{N}{2}\log_{2}{N_{1}}\), which is less than the number of multiplications for radix-2 FFT \(\big(\frac{N}{2}\log_{2}{N}\big)\) when L < N/2.

Figure 3
figure 3

Computational structure of transform decomposition

Therefore, we propose that the system will be reconfigured to a sparse FFT when there are a large number of zeros in the bit allocation vector.

4.3 The TTL implementation

We implemented the reconfigurable sparse FFT in the TTL environment to achieve the following goals: 1) to verify the sparse FFT at system level; 2) to obtain high level profile information in terms of computation workload and communication workload which help to make the implementation trade-offs on processors.

During the first step, we create the task graph (see Fig. 4) of the reconfigurable sparse FFT. The source task generates the input samples which are sent via the data channel to the FFT task. The destination task consumes the output samples from the FFT task. A configuration manager decides the type of FFT algorithm, depending on the number of non-zero values L in the bit allocation vector. If L < N/2, the configuration manager will generate the configuration data for a sparse FFT. Then it indicates the FFT task to perform sparse FFT and sends all the configuration data to the FFT task via the configuration channel. The configuration manager will go to standby until a new bit allocation vector arrives. Depending on L, the FFT task either performs radix-2 FFT or the sparse FFT.

Figure 4
figure 4

Task graph of reconfigurable sparse FFT

The TTL functions are called from the TTL C/C++ library to create tasks, define communication interfaces and generate the task graph. At the system level, the tasks are coded in C/C++. But in the platform implementation, the tasks can be implementations on a particular processor. Here we give a pseudo code example of the TTL implementation to show how the reconfiguration is done for the FFT task.

  • Task Task_FFT

    {initialization;

    while(true)

       {local variables;

       \\check the configuration updates

       tryAcquiredata(Task_FFT->config_inport)

       {\\update L

       ttl_read(Task_FFT->config_inport, L);

       \\read in configuration

       ttl_read(Task_FFT->config_inport,

                                SFFT_CONFIG_DATA);

       }

       \\read in data

       for(i=0; i<num_samples; i++)

       ttl_read(Task_FFT->data_inport,

                                proc_buffer[i]);

       \\sparse FFT or radix-2 FFT

       if (L<num_samples/2)

       {\\sparse_FFT processing

       call sparse_FFT;

       }

       else

       {\\radix-2 FFT

       call rad2_FFT;

       }

       \\write out results

       for(i=0; i<num_samples; i++)

       ttl_write(Task_FFT->data_outport,

                                 proc_buffer[i]);

       }

    }

The FFT task checks the updates from the configuration channel. If a new configuration is generated by the configuration manager, the FFT task will read in the configuration data from the configuration channel via the configuration input port. Then the FFT task reads in samples from the source task. After reading the samples and the configuration data, the sparse FFT or radix-2 FFT procedure will be executed depending on L. After the FFT processing, the results will be written out to the data channel via the output data port. Both synchronization and data transfer are done by the TTL read and write functions. The procedures for sparse FFT and radix-2 FFT are software implementations in C/C++, but they can also be replaced by equivalent hardware implementations; for example, configurations for the Montium processor.

4.4 Results

We applied the reconfigurable sparse FFT to an OFDM receiver based on the specification in [10] for the AAF system. The OFDM parameter set under consideration is shown in Table 1. The bandwidth for the OFDM system is 5.12 MHz and there are 512 subcarriers in one OFDM symbol. Thus, subcarrier spacing Δf is 10 kHz and the useful symbol part duration T u is 100 μs. Therefore, we need a 512-point FFT to process an OFDM symbol, where some subcarriers might be zero according to the bit allocation vector. In Fig. 5, we show an example of the sparse FFT reconfiguration for the given OFDM system. We denote all the zero output indexes of the FFT with 0 and non-zero indexes with 1. In Scenario 1, 420 of 512 indexes are non-zeros which means that most of the subcarriers can be used by Cognitive Radio. To avoid causing interference to a potential licensed user, Cognitive Radio has to switch off a certain number subcarriers and re-assign the transmitted information to the available subchannels. This corresponds to Scenario 2 where only 56 out of 512 indexes are non-zeros. From Scenario 1 to Scenario 2, the FFT task is reconfigured from a radix-2 FFT to a sparse FFT. The high level TTL implementation has been run on a Linux PC. The computation result verifies the functional correctness of the sparse FFT. The TTL run-time environment can generate high level profile information in terms of computation workload and communication workload. The computation workload is measured by counting the number of annotated instructions while the communication workload is measured by counting the number tokens (data units) that are travelling through the TTL channels. Table 2 shows the computation workload of the FFT task in two scenarios generated by TTL. The reduction of computation in Scenario 2 is due to the sparse FFT which takes advantage of sparse data. Considering that the worst case execution time for processing an OFDM symbol is 100 μs, we estimate the minimum required processing capacity for the S 1 and S 2 in Table 3. The profile information generated by the TTL run-time environment is independent of platforms. However, it can help to generate the platform dependent profile for specific implementations. By associating execution times with instructions, and by multiplying these execution times with the instruction counts, one can obtain a rough estimate of total execution time of a task on a certain processor. Considering the Montium for the FFT task, the Montium can execute one complex multiplication instruction in one clock cycle. From Table 3, we find that the Montium has to run at at least 23 MHz for Scenario 1 and 19 MHz for Scenario 2. In other words, the sparse FFT will save 16% processing capacity. Such a reconfiguration only takes place when the bit allocation vector has been updated. We expect the bit allocation vector not to change very often: at least it will be constant over several OFDM frames. Therefore the reconfiguration overhead is relatively small compared to the saving of computations. From the TTL profile information, we compare the computation workload of 512-point sparse FFT for various L with different zero distributions. Figure 6 shows that the computation workload increases with the number of non-zero L.

Table 1 OFDM parameters: sample frequency and symbol duration
Figure 5
figure 5

Example of reconfiguration

Table 2 Computation workload of the FFT task
Table 3 Minimum processing requirements
Figure 6
figure 6

Computation workload of sparse FFT for 512 samples

5 Future work

Based on the TTL model, the reconfigurable sparse FFT will be mapped onto the proposed MPSoC platform. Some software implementations are ready to be ported to the platform. For example, the C code for the configuration manager can be compiled for an ARM processor. Hardware implementations such as the Montium configuration for the sparse FFT will also be ported to the platform. Our ultimate goal is to develop the entire Cognitive Radio baseband application in the TTL framework. Based on the TTL model, the Cognitive Radio baseband application will be mapped onto the proposed MPSoC platform.

6 Conclusion

In this paper we have presented a system level design methodology for mapping Cognitive Radio onto an MPSoC platform. The design methodology is based on a TTL interface to partition the application into communicating tasks. By making the communication explicit, the computation (task) is implemented separately from the communication. We show a design case on a novel FFT for OFDM based Cognitive Radio. This sparse FFT takes advantage of sparse data and results in a computation complexity reduction. The reconfigurable sparse FFT has been implemented in the TTL framework. From the design case, we conclude that TTL offers the following advantages:

  • the functional correctness of the algorithm has been verified at a system level

  • the TTL programming model facilitates the application partitioning and the task based implementation on MPSoCs

  • the profile information given by the TTL run-time environment is used for the computation and communication complexity analysis at a system level; therefore design trade-offs can be made at an early design stage

  • TTL is suitable for modelling a reconfigurable application

  • the TTL code is portable to the reconfigurable platform.

In conclusion, TTL is a suitable design methodology for mapping Cognitive Radio on a reconfigurable MPSoC. It brings Cognitive Radio from a novel idea closer to a reality.