[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,920)

Search Parameters:
Keywords = FPGA

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 13951 KiB  
Article
1D-CNN-Transformer for Radar Emitter Identification and Implemented on FPGA
by Xiangang Gao, Bin Wu, Peng Li and Zehuan Jing
Remote Sens. 2024, 16(16), 2962; https://doi.org/10.3390/rs16162962 - 12 Aug 2024
Abstract
Deep learning has brought great development to radar emitter identification technology. In addition, specific emitter identification (SEI), as a branch of radar emitter identification, has also benefited from it. However, the complexity of most deep learning algorithms makes it difficult to adapt to [...] Read more.
Deep learning has brought great development to radar emitter identification technology. In addition, specific emitter identification (SEI), as a branch of radar emitter identification, has also benefited from it. However, the complexity of most deep learning algorithms makes it difficult to adapt to the requirements of the low power consumption and high-performance processing of SEI on embedded devices, so this article proposes solutions from the aspects of software and hardware. From the software side, we design a Transformer variant network, lightweight convolutional Transformer (LW-CT) that supports parameter sharing. Then, we cascade convolutional neural networks (CNNs) and the LW-CT to construct a one-dimensional-CNN-Transformer(1D-CNN-Transformer) lightweight neural network model that can capture the long-range dependencies of radar emitter signals and extract signal spatial domain features meanwhile. In terms of hardware, we design a low-power neural network accelerator based on an FPGA to complete the real-time recognition of radar emitter signals. The accelerator not only designs high-efficiency computing engines for the network, but also devises a reconfigurable buffer called “Ping-pong CBUF” and two-level pipeline architecture for the convolution layer for alleviating the bottleneck caused by the off-chip storage access bandwidth. Experimental results show that the algorithm can achieve a high recognition performance of SEI with a low calculation overhead. In addition, the hardware acceleration platform not only perfectly meets the requirements of the radar emitter recognition system for low power consumption and high-performance processing, but also outperforms the accelerators in other papers in terms of the energy efficiency ratio of Transformer layer processing. Full article
Show Figures

Figure 1

Figure 1
<p>Overall architecture of the accelerator.</p>
Full article ">Figure 2
<p>Waveform of the LFM signal, which is normalized.</p>
Full article ">Figure 3
<p>(<b>a</b>) The whole neural network architecture. (<b>b</b>) The structure of the ResD1D Block.</p>
Full article ">Figure 4
<p>The structure of LW-CT.</p>
Full article ">Figure 5
<p>The structure of Central Logic.</p>
Full article ">Figure 6
<p>Instruction encoding format.</p>
Full article ">Figure 7
<p>Two-stage pipeline architecture for convolution.</p>
Full article ">Figure 8
<p>CONV1D calculation order.</p>
Full article ">Figure 9
<p>The structure of the CONV1D module.</p>
Full article ">Figure 10
<p>(<b>a</b>) The structure of the PE cluster, (<b>b</b>) the structure of PE, (<b>c</b>) the structure of MPM.</p>
Full article ">Figure 11
<p>The method of our PE cluster convolution and the traditional convolution.</p>
Full article ">Figure 12
<p>The structure of the MHSA module.</p>
Full article ">Figure 13
<p>The structure of the Self-attention Processing Module.</p>
Full article ">Figure 14
<p>The structure of the FC module.</p>
Full article ">Figure 15
<p>The radar emitter signal waveform of six radar individuals. (<b>a</b>–<b>f</b>) The signal-to-noise ratio of each radar emitter signal is −6 dB.</p>
Full article ">Figure 16
<p>The network classification performance of different models under −10 dB to 4 dB. The maximum number of channels in the convolutional layers of (<b>a</b>–<b>d</b>) are 48, 96, 192, and 384, respectively.</p>
Full article ">Figure 17
<p>(<b>a</b>) Test accuracy with different channel numbers; (<b>b</b>) params and operations with different channel numbers.</p>
Full article ">Figure 18
<p>Recognition performance of different models.</p>
Full article ">Figure 19
<p>Details of the proposed FPGA implementation. Breakdowns of (<b>a</b>) DSP blocks, (<b>b</b>) block RAMs.</p>
Full article ">
24 pages, 8201 KiB  
Article
Enhancing Sustainable Transportation Infrastructure Management: A High-Accuracy, FPGA-Based System for Emergency Vehicle Classification
by Pemila Mani, Pongiannan Rakkiya Goundar Komarasamy, Narayanamoorthi Rajamanickam, Mohammad Shorfuzzaman and Waleed Mohammed Abdelfattah
Sustainability 2024, 16(16), 6917; https://doi.org/10.3390/su16166917 (registering DOI) - 12 Aug 2024
Abstract
Traffic congestion is a prevalent problem in modern civilizations worldwide, affecting both large cities and smaller communities. Emergency vehicles tend to group tightly together in these crowded scenarios, often masking one another. For traffic surveillance systems tasked with maintaining order and executing laws, [...] Read more.
Traffic congestion is a prevalent problem in modern civilizations worldwide, affecting both large cities and smaller communities. Emergency vehicles tend to group tightly together in these crowded scenarios, often masking one another. For traffic surveillance systems tasked with maintaining order and executing laws, this poses serious difficulties. Recent developments in machine learning for image processing have significantly increased the accuracy and effectiveness of emergency vehicle classification (EVC) systems, especially when combined with specialized hardware accelerators. The widespread use of these technologies in safety and traffic management applications has led to more sustainable transportation infrastructure management. Vehicle classification has traditionally been carried out manually by specialists, which is a laborious and subjective procedure that depends largely on the expertise that is available. Furthermore, erroneous EVC might result in major problems with operation, highlighting the necessity for a more dependable, precise, and effective method of classifying vehicles. Although image processing for EVC involves a variety of machine learning techniques, the process is still labor intensive and time consuming because the techniques now in use frequently fail to appropriately capture each type of vehicle. In order to improve the sustainability of transportation infrastructure management, this article places a strong emphasis on the creation of a hardware system that is reliable and accurate for identifying emergency vehicles in intricate contexts. The ResNet50 model’s features are extracted by the suggested system utilizing a Field Programmable Gate Array (FPGA) and then optimized by a multi-objective genetic algorithm (MOGA). A CatBoost (CB) classifier is used to categorize automobiles based on these features. Overtaking the previous state-of-the-art accuracy of 98%, the ResNet50-MOP-CB network achieved a classification accuracy of 99.87% for four primary categories of emergency vehicles. In tests conducted on tablets, laptops, and smartphones, it demonstrated excellent accuracy, fast classification times, and robustness for real-world applications. On average, it took 0.9 nanoseconds for every image to be classified with a 96.65% accuracy rate. Full article
(This article belongs to the Special Issue Sustainable Transportation Infrastructure Management)
Show Figures

Figure 1

Figure 1
<p>General architecture of framework for vehicle classification using ML through image processing technique.</p>
Full article ">Figure 2
<p>Sample vehicle images of emergency vehicle image dataset chosen from Google Images. The images show the various classes of fire engine, patrol, VIP/HNWI and ambulance (top to bottom).</p>
Full article ">Figure 2 Cont.
<p>Sample vehicle images of emergency vehicle image dataset chosen from Google Images. The images show the various classes of fire engine, patrol, VIP/HNWI and ambulance (top to bottom).</p>
Full article ">Figure 3
<p>Design paradigm of the proposed network for Emergency vehicle classification.</p>
Full article ">Figure 4
<p>(<b>a</b>)Hardware set up (<b>b</b>) FPGA kit connected both PC and Laptop for the CNN model deployment (<b>c</b>) Stimulation result of the classification of the vehicles.</p>
Full article ">Figure 5
<p>Stimulation output using FPGA based CNN model extraction of subnormal emergency vehicle features under low light condition (<b>a</b>) captured Image of Ambulance (<b>b</b>) Extracted Image (<b>c</b>) Captured Image of Police Van (<b>d</b>) Extracted Image.</p>
Full article ">Figure 6
<p>Stimulation Outputs of Subnormal Emergency vehicle Classification of Training Image Using a FPGA based CNN-ResNet50-MOGA-CB-MOGA under low light conditions.</p>
Full article ">Figure 7
<p>Stimulation Outputs of Subnormal Emergency vehicle Classification of Testing Image Using a FPGA based CNN-ResNet50-MOGA-CB-MOGA under low light conditions.</p>
Full article ">Figure 8
<p>Assessment to determine the optimal network architecture for each CNN model.</p>
Full article ">Figure 9
<p>Performance metrics by confusion matrix for the optimal network architecture of each CNN model in emergency vehicle classification.</p>
Full article ">Figure 10
<p>Receiver operating characteristic analysis for the optimal CNN models in emergency vehicle classification.</p>
Full article ">Figure 10 Cont.
<p>Receiver operating characteristic analysis for the optimal CNN models in emergency vehicle classification.</p>
Full article ">Figure 11
<p>Design paradigm of the proposed system for real world application.</p>
Full article ">
25 pages, 329 KiB  
Review
The Role of FPGAs in Modern Option Pricing Techniques: A Survey
by Aidan O Mahony, Bernard Hanzon and Emanuel Popovici
Electronics 2024, 13(16), 3186; https://doi.org/10.3390/electronics13163186 - 12 Aug 2024
Abstract
In financial computation, Field Programmable Gate Arrays (FPGAs) have emerged as a transformative technology, particularly in the domain of option pricing. This study presents the impact of Field Programmable Gate Arrays (FPGAs) on computational methods in finance, with an emphasis on option pricing. [...] Read more.
In financial computation, Field Programmable Gate Arrays (FPGAs) have emerged as a transformative technology, particularly in the domain of option pricing. This study presents the impact of Field Programmable Gate Arrays (FPGAs) on computational methods in finance, with an emphasis on option pricing. Our review examined 99 selected studies from an initial pool of 131, revealing how FPGAs substantially enhance both the speed and energy efficiency of various financial models, particularly Black–Scholes and Monte Carlo simulations. Notably, the performance gains—ranging from 270- to 5400-times faster than conventional CPU implementations—are highly dependent on the specific option pricing model employed. These findings illustrate FPGAs’ capability to efficiently process complex financial computations while consuming less energy. Despite these benefits, this paper highlights persistent challenges in FPGA design optimization and programming complexity. This study not only emphasises the potential of FPGAs to further innovate financial computing but also outlines the critical areas for future research to overcome existing barriers and fully leverage FPGA technology in future financial applications. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

Figure 1
<p>Bar chart of yearly counts of publications.</p>
Full article ">
24 pages, 13367 KiB  
Article
Compact Walsh–Hadamard Transform-Driven S-Box Design for ASIC Implementations
by Omer Tariq, Muhammad Bilal Akram Dastagir and Dongsoo Han
Electronics 2024, 13(16), 3148; https://doi.org/10.3390/electronics13163148 - 9 Aug 2024
Viewed by 322
Abstract
With the exponential growth of the Internet of Things (IoT), ensuring robust end-to-end encryption is paramount. Current cryptographic accelerators often struggle with balancing security, area efficiency, and power consumption, which are critical for compact IoT devices and system-on-chips (SoCs). This work presents a [...] Read more.
With the exponential growth of the Internet of Things (IoT), ensuring robust end-to-end encryption is paramount. Current cryptographic accelerators often struggle with balancing security, area efficiency, and power consumption, which are critical for compact IoT devices and system-on-chips (SoCs). This work presents a novel approach to designing substitution boxes (S-boxes) for Advanced Encryption Standard (AES) encryption, leveraging dual quad-bit structures to enhance cryptographic security and hardware efficiency. By utilizing Algebraic Normal Forms (ANFs) and Walsh–Hadamard Transforms, the proposed Register Transfer Level (RTL) circuitry ensures optimal non-linearity, low differential uniformity, and bijectiveness, making it a robust and efficient solution for ASIC implementations. Implemented on 65 nm CMOS technology, our design undergoes rigorous statistical analysis to validate its security strength, followed by hardware implementation and functional verification on a ZedBoard. Leveraging Cadence EDA tools, the ASIC implementation achieves a central circuit area of approximately 199 μm2. The design incurs a hardware cost of roughly 80 gate equivalents and exhibits a maximum path delay of 0.38 ns. Power dissipation is measured at approximately 28.622 μW with a supply voltage of 0.72 V. According to the ASIC implementation on the TSMC 65 nm process, the proposed design achieves the best area efficiency, approximately 66.46% better than state-of-the-art designs. Full article
Show Figures

Figure 1

Figure 1
<p>AES S-Box and Inverse S-Box.</p>
Full article ">Figure 2
<p>RTL Diagram for 8-bit S-Box Using Dual Quad-Bit Forward and Backward Transformations.</p>
Full article ">Figure 3
<p>Dual quad-bit s-box pair for forward operation is presented. Figure (<b>a</b>) shows the schematic of proposed s-box forward first pair and Figure (<b>b</b>) shows the schematic of proposed s-box forward second pair.</p>
Full article ">Figure 4
<p>Here, dual quad-bit s-box pair for backward operation is presented. Figure (<b>a</b>) shows the schematic of proposed s-box backward first pair and Figure (<b>b</b>) shows the schematic of proposed s-box backward second pair.</p>
Full article ">Figure 5
<p>The values of proposed S-Box Forward and Backward.</p>
Full article ">Figure 6
<p>The values of dual quad-bit S-Box pairs.</p>
Full article ">Figure 7
<p>Histogram Analysis of Original and Cipher Images of the Proposed S-Box Method and Related Work.</p>
Full article ">Figure 8
<p>Our hardware setup for testing proposed forward and backward S-Box in AES Core on FPGA and its simulation with VIO.</p>
Full article ">Figure 9
<p>Encryption Result on ZedBoard Zynq 7000 SoC Hardware.</p>
Full article ">Figure 10
<p>Decryption Result on ZedBoard Zynq 7000 SoC Hardware.</p>
Full article ">Figure 11
<p>Physical layout of the proposed S-Box AES Core using 65 nm technology.</p>
Full article ">
16 pages, 3834 KiB  
Article
A Device-on-Chip Solution for Real-Time Diffuse Correlation Spectroscopy Using FPGA
by Christopher H. Moore, Ulas Sunar and Wei Lin
Biosensors 2024, 14(8), 384; https://doi.org/10.3390/bios14080384 - 8 Aug 2024
Viewed by 292
Abstract
Diffuse correlation spectroscopy (DCS) is a non-invasive technology for the evaluation of blood perfusion in deep tissue. However, it requires high computational resources for data analysis, which poses challenges in its implementation for real-time applications. To address the unmet need, we developed a [...] Read more.
Diffuse correlation spectroscopy (DCS) is a non-invasive technology for the evaluation of blood perfusion in deep tissue. However, it requires high computational resources for data analysis, which poses challenges in its implementation for real-time applications. To address the unmet need, we developed a novel device-on-chip solution that fully integrates all the necessary computational components needed for DCS. It takes the output of a photon detector and determines the blood flow index (BFI). It is implemented on a field-programmable gate array (FPGA) chip including a multi-tau correlator for the calculation of the temporal light intensity autocorrelation function and a DCS analyzer to perform the curve fitting operation that derives the BFI at a rate of 6000 BFIs/s. The FPGA DCS system was evaluated against a lab-standard DCS system for both phantom and cuff ischemia studies. The results indicate that the autocorrelation of the light correlation and BFI from both the FPGA DCS and the reference DCS matched well. Furthermore, the FPGA DCS system was able to achieve a measurement rate of 50 Hz and resolve pulsatile blood flow. This can significantly lower the cost and footprint of the computational components of DCS and pave the way for portable, real-time DCS systems. Full article
(This article belongs to the Special Issue Advances in Biosensors Based on Reflectometry)
Show Figures

Figure 1

Figure 1
<p>Illustration of the DCS process. (<b>a</b>) Multimode laser enters the tissue, and the scattered light is detected by the APD shown as light intensity. (<b>b</b>) The autocorrelation of the scattered light. Faster decay in the correlation indicates high flow.</p>
Full article ">Figure 2
<p>Block diagram of the submodules making up the DCS analyzer module. A central state machine controls the operation and flow of data between the other submodules.</p>
Full article ">Figure 3
<p>Arty A7 FPGA development board and the custom interface board connected to a Pmod port.</p>
Full article ">Figure 4
<p>Block diagram showing the major FPGA modules used for the full DCS processing system. The single-headed arrows represent unidirectional data flow while the double-headed arrows represent bidirectional data flow. A Microblaze soft microprocessor core is used to integrate the different components making up the system.</p>
Full article ">Figure 5
<p>The experimental setup for arm ischemia. (<b>a</b>) The blood pressure cuff was placed around the bicep of the arm and the probe was placed on the forearm. (<b>b</b>) The experiment was repeated at the subject’s palm. (<b>c</b>) The DCS probe used for cuff ischemia experiments. The source (S) and detector (D) have a separation of 1 cm with the other prisms remaining unused for these experiments.</p>
Full article ">Figure 6
<p>Correlations averaged over 30 s for solid phantom measurements for reference and FPGA correlators.</p>
Full article ">Figure 7
<p>Correlations averaged over 30 s for liquid phantom measurements with varying levels of methyl cellulose (MC) for reference and FPGA correlators.</p>
Full article ">Figure 8
<p>BFI from reference and FPGA in cuff ischemia experiment with both systems measuring at 1 Hz on the forearm with a source–detector separation of 1 cm.</p>
Full article ">Figure 9
<p>BFI from the reference and FPGA in the cuff ischemia experiment with the reference measuring at 1 Hz and the FPGA at 50 Hz at the forearm with a source–detector separation of 1 cm. A 15-point moving average filter was applied to the FPGA data to remove high-frequency noise.</p>
Full article ">Figure 10
<p>BFI from the reference and FPGA in the cuff ischemia experiment with the reference measuring at 1 Hz and the FPGA at 50 Hz at the palm with a source–detector separation of 1 cm. A moving average filter with a window size of 5 data points was applied to the FPGA data to remove high-frequency noise. (<b>a</b>) The full time series from the experiment. (<b>b</b>) Zoomed in view of 10 s of the time series data. The pulsatile flow and dicrotic notch are clearly apparent here. (<b>c</b>) The frequency spectrum from an FFT of the FPGA BFI data following the release of the cuff. The peak at 1.45 Hz corresponds to the heart rate of the subject.</p>
Full article ">
19 pages, 1303 KiB  
Article
Natural Language Processing for Hardware Security: Case of Hardware Trojan Detection in FPGAs
by Jaya Dofe, Wafi Danesh, Vaishnavi More and Aaditya Chaudhari
Cryptography 2024, 8(3), 36; https://doi.org/10.3390/cryptography8030036 - 8 Aug 2024
Viewed by 359
Abstract
Field-programmable gate arrays (FPGAs) offer the inherent ability to reconfigure at runtime, making them ideal for applications such as data centers, cloud computing, and edge computing. This reconfiguration, often achieved through remote access, enables efficient resource utilization but also introduces critical security vulnerabilities. [...] Read more.
Field-programmable gate arrays (FPGAs) offer the inherent ability to reconfigure at runtime, making them ideal for applications such as data centers, cloud computing, and edge computing. This reconfiguration, often achieved through remote access, enables efficient resource utilization but also introduces critical security vulnerabilities. An adversary could exploit this access to insert a dormant hardware trojan (HT) into the configuration bitstream, bypassing conventional security and verification measures. To address this security threat, we propose a supervised learning approach using deep recurrent neural networks (RNNs) for HT detection within FPGA configuration bitstreams. We explore two RNN architectures: basic RNN and long short-term memory (LSTM) networks. Our proposed method analyzes bitstream patterns, to identify anomalies indicative of malicious modifications. We evaluated the effectiveness on ISCAS 85 benchmark circuits of varying sizes and topologies, implemented on a Xilinx Artix-7 FPGA. The experimental results revealed that the basic RNN model showed lower accuracy in identifying HT-compromised bitstreams for most circuits. In contrast, the LSTM model achieved a significantly higher average accuracy of 93.5%. These results demonstrate that the LSTM model is more successful for HT detection in FPGA bitstreams. This research paves the way for using RNN architectures for HT detection in FPGAs, eliminating the need for time-consuming and resource-intensive reverse engineering or performance-degrading bitstream conversions. Full article
(This article belongs to the Special Issue Emerging Topics in Hardware Security)
Show Figures

Figure 1

Figure 1
<p>Concept of a bitstream protocol stack for Xilinx 7 series FPGAs.</p>
Full article ">Figure 2
<p>Xilinx configuration file formats. (Note: For (1), (2), (3) please refer to [<a href="#B22-cryptography-08-00036" class="html-bibr">22</a>]).</p>
Full article ">Figure 3
<p>Steps in 7-series FPGA configuration.</p>
Full article ">Figure 4
<p>Type 1 packet header format for Xilinx 7-series FPGA.</p>
Full article ">Figure 5
<p>Opcode for Type 1 packet header.</p>
Full article ">Figure 6
<p>Type 2 packet header format for Xilinx 7-series FPGA.</p>
Full article ">Figure 7
<p>Frame address register description.</p>
Full article ">Figure 8
<p>Conventional RNN architecture.</p>
Full article ">Figure 9
<p>Hidden layer for conventional RNN.</p>
Full article ">Figure 10
<p>Cell for LSTM architecture.</p>
Full article ">Figure 11
<p>The .<span class="html-italic">bit</span> file format.</p>
Full article ">Figure 12
<p>Data preprocessing algorithm.</p>
Full article ">Figure 13
<p>Training RNN models on preprocessed bitstreams.</p>
Full article ">Figure 14
<p>Training and validation accuracy over training epochs for RNN with step size 16.</p>
Full article ">Figure 15
<p>Training and validation accuracy over training epochs for LSTM with step size 16.</p>
Full article ">Figure 16
<p>Step size and accuracy trends for LSTM.</p>
Full article ">Figure 17
<p>Example of latched RO.</p>
Full article ">Figure 18
<p>Training and validation accuracy vs. step size for c17 benchmark.</p>
Full article ">Figure 19
<p>Comparison of c17 performance metrics for step sizes 8 and 16.</p>
Full article ">
12 pages, 3131 KiB  
Article
Efficient Twiddle Factor Generators for NTT
by Nari Im, Heehun Yang, Yujin Eom, Seong-Cheon Park and Hoyoung Yoo
Electronics 2024, 13(16), 3128; https://doi.org/10.3390/electronics13163128 - 7 Aug 2024
Viewed by 227
Abstract
Fully Homomorphic Encryption (FHE) allows computations on encrypted data without decryption, providing strong security for sensitive information. However, computational and memory demands for FHE are significant challenges, particularly in the Number Theoretic Transform (NTT) phase. This paper presents three efficient Twiddle Factor Generators [...] Read more.
Fully Homomorphic Encryption (FHE) allows computations on encrypted data without decryption, providing strong security for sensitive information. However, computational and memory demands for FHE are significant challenges, particularly in the Number Theoretic Transform (NTT) phase. This paper presents three efficient Twiddle Factor Generators (TFGs) to address these challenges: the Half-Memory TFG, the On-the-fly Serial TFG, and the On-the-fly Parallel TFG. The Half-Memory TFG reduces memory usage by storing only half of the twiddle factors and calculating the rest as needed. The On-the-fly Serial TFG eliminates memory requirements by computing twiddle factors, while the On-the-fly Parallel TFG enhances computational speed through parallel processing. Implemented on the FPGA KCU105 board, these TFGs demonstrated significant improvements in hardware resource utilization and computational efficiency. The Half-Memory TFG effectively reduces memory footprint, the On-the-fly Serial TFG eliminates memory usage with acceptable computational overhead, and the On-the-fly Parallel TFG offers superior performance for high-throughput applications. These innovations make FHE more practical for real-world applications, contributing to the broader goal of enabling secure, privacy-preserving computations on encrypted data. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

Figure 1
<p>Primitive roots of unity for <math display="inline"><semantics> <mrow> <mi>N</mi> </mrow> </semantics></math> = 16.</p>
Full article ">Figure 2
<p>Block diagram of Processing Element (PE).</p>
Full article ">Figure 3
<p>Diagram of NTT for <math display="inline"><semantics> <mrow> <mi>N</mi> </mrow> </semantics></math> = 16.</p>
Full article ">Figure 4
<p>Block diagram of conventional Memory-based TFG.</p>
Full article ">Figure 5
<p>Symmetry properties of primitive roots of unity for <math display="inline"><semantics> <mrow> <mi>N</mi> </mrow> </semantics></math> = 16.</p>
Full article ">Figure 6
<p>Block diagram of proposed Half-Memory-based TFG.</p>
Full article ">Figure 7
<p>Block diagram of proposed On-the-fly Serial (O-Serial) TFG.</p>
Full article ">Figure 8
<p>Block diagram of proposed On-the-fly Parallel (O-Parallel) TFG.</p>
Full article ">
18 pages, 22304 KiB  
Article
A High-Performance FPGA PRNG Based on Multiple Deep-Dynamic Transformations
by Shouliang Li, Zichen Lin, Yi Yang and Ruixuan Ning
Entropy 2024, 26(8), 671; https://doi.org/10.3390/e26080671 - 7 Aug 2024
Viewed by 246
Abstract
Pseudo-random number generators (PRNGs) are important cornerstones of many fields, such as statistical analysis and cryptography, and the need for PRNGs for information security (in fields such as blockchain, big data, and artificial intelligence) is becoming increasingly prominent, resulting in a steadily growing [...] Read more.
Pseudo-random number generators (PRNGs) are important cornerstones of many fields, such as statistical analysis and cryptography, and the need for PRNGs for information security (in fields such as blockchain, big data, and artificial intelligence) is becoming increasingly prominent, resulting in a steadily growing demand for high-speed, high-quality random number generators. To meet this demand, the multiple deep-dynamic transformation (MDDT) algorithm is innovatively developed. This algorithm is incorporated into the skewed tent map, endowing it with more complex dynamical properties. The improved one-dimensional discrete chaotic mapping method is effectively realized on a field-programmable gate array (FPGA), specifically the Xilinx xc7k325tffg900-2 model. The proposed pseudo-random number generator (PRNG) successfully passes all evaluations of the National Institute of Standards and Technology (NIST) SP800-22, diehard, and TestU01 test suites. Additional experimental results show that the PRNG, possessing high novelty performance, operates efficiently at a clock frequency of 150 MHz, achieving a maximum throughput of 14.4 Gbps. This performance not only surpasses that of most related studies but also makes it exceptionally suitable for embedded applications. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

Figure 1
<p>Lyapunov exponents of the MDDT system (3D), where the color of the figure represents the value of the angle.</p>
Full article ">Figure 2
<p>Bifurcation Diagram of the MDDT System (3D).</p>
Full article ">Figure 3
<p>(<b>a</b>) Chaotic trajectories of the MDDT system (<math display="inline"><semantics> <mrow> <mi>z</mi> <mo>=</mo> <mn>0.45</mn> <mo>,</mo> <mi>p</mi> <mo>=</mo> <mn>0.61</mn> </mrow> </semantics></math>); (<b>b</b>) chaotic trajectories of the skew tent system (<math display="inline"><semantics> <mrow> <mi>p</mi> <mo>=</mo> <mn>0.499</mn> </mrow> </semantics></math>).</p>
Full article ">Figure 4
<p>(<b>a</b>) Sample entropy: MDDT (<math display="inline"><semantics> <mrow> <mi>z</mi> <mo>=</mo> <mn>0.45</mn> </mrow> </semantics></math>) vs. skew tent map; (<b>b</b>) permutation entropy: MDDT (<math display="inline"><semantics> <mrow> <mi>z</mi> <mo>=</mo> <mn>0.45</mn> </mrow> </semantics></math>) vs. skew tent map.</p>
Full article ">Figure 5
<p>Schematic diagram of the data exchange of the random number part of the FPGA.</p>
Full article ">Figure 6
<p>Hardware design structure of the <span class="html-italic">X</span> update logic implemented on FPGA.</p>
Full article ">Figure 7
<p>Implementation of <span class="html-italic">P</span> update logic on FPGA.</p>
Full article ">Figure 8
<p>Calculation logic for <math display="inline"><semantics> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> <mn>1</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> <mn>2</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 9
<p>Hardware resource utilization for various random number generators [<a href="#B1-entropy-26-00671" class="html-bibr">1</a>,<a href="#B16-entropy-26-00671" class="html-bibr">16</a>,<a href="#B19-entropy-26-00671" class="html-bibr">19</a>,<a href="#B21-entropy-26-00671" class="html-bibr">21</a>,<a href="#B23-entropy-26-00671" class="html-bibr">23</a>,<a href="#B24-entropy-26-00671" class="html-bibr">24</a>,<a href="#B26-entropy-26-00671" class="html-bibr">26</a>,<a href="#B27-entropy-26-00671" class="html-bibr">27</a>,<a href="#B28-entropy-26-00671" class="html-bibr">28</a>].</p>
Full article ">Figure 10
<p>(<b>a</b>) <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>a</mi> <mi>n</mi> <mi>d</mi> <mi>o</mi> <msub> <mi>m</mi> <mi>n</mi> </msub> <mo>−</mo> <mi>R</mi> <mi>a</mi> <mi>n</mi> <mi>d</mi> <mi>o</mi> <msub> <mi>m</mi> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> trajectory points. (<b>b</b>) Histogram of random bitstreams.</p>
Full article ">
19 pages, 6482 KiB  
Article
Field-Programmable Gate Array Architecture for the Discrete Orthonormal Stockwell Transform (DOST) Hardware Implementation
by Martin Valtierra-Rodriguez, Jose-Luis Contreras-Hernandez, David Granados-Lieberman, Jesus Rooney Rivera-Guillen, Juan Pablo Amezquita-Sanchez and David Camarena-Martinez
J. Low Power Electron. Appl. 2024, 14(3), 42; https://doi.org/10.3390/jlpea14030042 - 7 Aug 2024
Viewed by 376
Abstract
Time–frequency analysis is critical in studying linear and non-linear signals that exhibit variations across both time and frequency domains. Such analysis not only facilitates the identification of transient events and extraction of key features but also aids in displaying signal properties and pattern [...] Read more.
Time–frequency analysis is critical in studying linear and non-linear signals that exhibit variations across both time and frequency domains. Such analysis not only facilitates the identification of transient events and extraction of key features but also aids in displaying signal properties and pattern recognition. Recently, the Discrete Orthonormal Stockwell Transform (DOST) has become increasingly utilized in many specialized signal processing tasks. Given its growing importance, this work proposes a reconfigurable field-programmable gate array (FPGA) architecture designed to efficiently implement the DOST algorithm on cost-effective FPGA chips. An accompanying MATLAB app enables the automatic configuration of the DOST method for varying sizes (64, 128, 256, 512, and 1024 points). For the implementation, a Cyclone V series FPGA device from Intel Altera, featuring the 5CSEMA5F31C6N chip, is used. To provide a complete hardware solution, the proposed DOST core has been integrated into a hybrid ARM-HPS (Advanced RISC Machine–Hard Processor System) control unit, which allows the control of different peripherals, such as communication protocols and VGA-based displays. Results show that less than 5% of the chip’s resources are occupied, indicating a low-cost architecture that can be easily integrated into other FPGA structures or hardware systems for diverse applications. Moreover, the accuracy of the proposed FPGA-based implementation is underscored by a root mean squared error of 6.0155 × 10−3 when compared with results from floating-point processors, highlighting its accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>Algorithm for the DOST computation.</p>
Full article ">Figure 2
<p>Flowchart to generate the DOST hardware solution.</p>
Full article ">Figure 3
<p>Basic internal modules of ARM-FPGA hybrid architecture.</p>
Full article ">Figure 4
<p>Top-level block diagram for the proposed hardware DOST processor.</p>
Full article ">Figure 5
<p>Inverse and bias block.</p>
Full article ">Figure 6
<p>Calculation of IFFT through FFT.</p>
Full article ">Figure 7
<p>Butterfly operation.</p>
Full article ">Figure 8
<p>Experimental setup: (<b>a</b>) DE1-SOC connections, (<b>b</b>) FPGA board, and (<b>c</b>) TFR results.</p>
Full article ">Figure 9
<p>DOST coefficients of a chirp function. (<b>a</b>) Absolute DOST coefficients obtained from MATLAB. (<b>b</b>) Absolute DOST coefficients obtained from HDL simulation. (<b>c</b>) Absolute error of simulation results compared with MATLAB results.</p>
Full article ">Figure 10
<p>Simulation of DOST implementation.</p>
Full article ">Figure 11
<p>Results obtained from hardware implementation: (<b>a</b>) Input chirp signal, (<b>b</b>) absolute FFT coefficients, and (<b>c</b>) DOST TFR.</p>
Full article ">Figure 12
<p>RTL diagram for the DOST processor.</p>
Full article ">Figure 13
<p>Blocks for the RTL diagram: (<b>a</b>) Ctrl_FFT, (<b>b</b>) Ctrl_DOST, (<b>c</b>) Address_Generation, (<b>d</b>) Muxs, (<b>e</b>) Inverse_Bias, and (<b>f</b>) Conjugate.</p>
Full article ">Figure 14
<p>Interface for generating VHDL files.</p>
Full article ">Figure 15
<p>Results for a chirp input signal with (<b>a</b>) 64, (<b>b</b>) 128, (<b>c</b>) 256, (<b>d</b>) 512, and (<b>e</b>) 1024 points.</p>
Full article ">
15 pages, 26053 KiB  
Article
Module Tester for Positron Emission Tomography and Particle Physics
by David Baranyai, Stefan Oniga, Balazs Gyongyosi, Balazs Ujvari and Attia Mohamed
Electronics 2024, 13(15), 3066; https://doi.org/10.3390/electronics13153066 - 2 Aug 2024
Viewed by 354
Abstract
The combination of high-density, high-time-resolution inorganic scintillation crystals such as Lutetium Yttrium Oxyorthosilicate (LYSO), Yttrium Orthosilicate (YSO) and Bismuth Germanate (BGO) with Silicon Photomultiplier (SiPM) sensors is widely employed in medical imaging, particularly in Positron Emission Tomography (PET), as well as in modern [...] Read more.
The combination of high-density, high-time-resolution inorganic scintillation crystals such as Lutetium Yttrium Oxyorthosilicate (LYSO), Yttrium Orthosilicate (YSO) and Bismuth Germanate (BGO) with Silicon Photomultiplier (SiPM) sensors is widely employed in medical imaging, particularly in Positron Emission Tomography (PET), as well as in modern particle physics detectors for precisely timing sub-detectors and calorimeters. During the assembly of each module, following individual component testing, the crystals and SiPMs are bonded together using optical glue and enclosed in a light-tight, temperature-controlled cooling box. After integration with the readout electronics, the bonding is initially tested. The final readout electronics typically comprise Application-Specific Integrated Circuits (ASICs) or low-power Analog-to-Digital Converters (ADCs) and amplifiers, designed not to heat up the temperature-sensitive SiPM sensors. However, these setups are not optimal for testing the optical bonding. Specific setups were developed to test the LYSO + SiPM modules that are already bonded but not enclosed in a box. Through large data collection, small deviations in bonding can be detected if the SiPMs and LYSOs have been thoroughly tested before our measurement. The Monte Carlo simulations we used to study how parameters—which are difficult to measure in the laboratory (LYSO absorption length, refractive index of the coating)—affect the final result. Our setups for particle physics and PET applications are already in use by research institutes and industrial partners. Full article
(This article belongs to the Special Issue Sensor Based Big Data Analysis)
Show Figures

Figure 1

Figure 1
<p>SiPMs with different sizes and connectors.</p>
Full article ">Figure 2
<p>DAQ card.</p>
Full article ">Figure 3
<p>DAQ PC.</p>
Full article ">Figure 4
<p>DAQ server.</p>
Full article ">Figure 5
<p>FE.</p>
Full article ">Figure 6
<p>SiPM tester user interface.</p>
Full article ">Figure 7
<p>Measurements.</p>
Full article ">Figure 8
<p>For 10 saved inputs, the two opposite SiPM signals are shown and the triggering condition is indicated by the orange line.</p>
Full article ">Figure 9
<p>Measurement #1.</p>
Full article ">Figure 10
<p>Sigma per mean from the measurements, 6–10%, in 16 positions was measured and fit the photopeak position.</p>
Full article ">Figure 11
<p>Measurement #2.</p>
Full article ">Figure 12
<p>Geant4 schematic view of LYSO crystal with double-ended SiPM readout.</p>
Full article ">Figure 13
<p>ADC waveform for left and right SiPM with ADC amplitude threshold of &gt;600. The blue color represents the ADC waveform, while the red color represents linear fitting for the rising edge and exponential fitting for the falling edge in both left and right SiPM.</p>
Full article ">Figure 14
<p>Slopes from linear fitting of left and right waveforms.</p>
Full article ">Figure 15
<p>Slopes from exponential fitting of left and right waveforms.</p>
Full article ">Figure 16
<p>Left and right single optical photon waveforms.</p>
Full article ">Figure 17
<p>Simulated waveforms for left and right SiPMs, depicted in red, with the integral window [45, 75] highlighted in green.</p>
Full article ">Figure 18
<p>ADC integral for the summation of left and right SiPM, shown in blue, with the Gaussian fitting of the photopeak depicted in red. From the fitting, the sigma per mean is approximately 10%.</p>
Full article ">Figure 19
<p>Photopeak position dependence on reflectivity of coating.</p>
Full article ">
24 pages, 5669 KiB  
Article
Design of Multichannel Spectrum Intelligence Systems Using Approximate Discrete Fourier Transform Algorithm for Antenna Array-Based Spectrum Perception Applications
by Arjuna Madanayake, Keththura Lawrance, Bopage Umesha Kumarasiri, Sivakumar Sivasankar, Thushara Gunaratne, Chamira U. S. Edussooriya and Renato J. Cintra
Algorithms 2024, 17(8), 338; https://doi.org/10.3390/a17080338 - 1 Aug 2024
Viewed by 411
Abstract
The radio spectrum is a scarce and extremely valuable resource that demands careful real-time monitoring and dynamic resource allocation. Dynamic spectrum access (DSA) is a new paradigm for managing the radio spectrum, which requires AI/ML-driven algorithms for optimum performance under rapidly changing channel [...] Read more.
The radio spectrum is a scarce and extremely valuable resource that demands careful real-time monitoring and dynamic resource allocation. Dynamic spectrum access (DSA) is a new paradigm for managing the radio spectrum, which requires AI/ML-driven algorithms for optimum performance under rapidly changing channel conditions and possible cyber-attacks in the electromagnetic domain. Fast sensing across multiple directions using array processors, with subsequent AI/ML-based algorithms for the sensing and perception of waveforms that are measured from the environment is critical for providing decision support in DSA. As part of directional and wideband spectrum perception, the ability to finely channelize wideband inputs using efficient Fourier analysis is much needed. However, a fine-grain fast Fourier transform (FFT) across a large number of directions is computationally intensive and leads to a high chip area and power consumption. We address this issue by exploiting the recently proposed approximate discrete Fourier transform (ADFT), which has its own sparse factorization for real-time implementation at a low complexity and power consumption. The ADFT is used to create a wideband multibeam RF digital beamformer and temporal spectrum-based attention unit that monitors 32 discrete directions across 32 sub-bands in real-time using a multiplierless algorithm with low computational complexity. The output of this spectral attention unit is applied as a decision variable to an intelligent receiver that adapts its center frequency and frequency resolution via FFT channelizers that are custom-built for real-time monitoring at high resolution. This two-step process allows the fine-gain FFT to be applied only to directions and bands of interest as determined by the ADFT-based low-complexity 2D spacetime attention unit. The fine-grain FFT provides a spectral signature that can find future use cases in neural network engines for achieving modulation recognition, IoT device identification, and RFI identification. Beamforming and spectral channelization algorithms, a digital computer architecture, and early prototypes using a 32-element fully digital multichannel receiver and field programmable gate array (FPGA)-based high-speed software-defined radio (SDR) are presented. Full article
Show Figures

Figure 1

Figure 1
<p>Simulated magnitude response of (<b>a</b>) ideal 32-point DFT and (<b>b</b>) low-complexity 32-point ADFT [<a href="#B97-algorithms-17-00338" class="html-bibr">97</a>,<a href="#B98-algorithms-17-00338" class="html-bibr">98</a>].</p>
Full article ">Figure 2
<p>(<b>a</b>) The system architecture; (<b>b</b>) the experimental setup of the 32-beam array receiver operates at a 5.700 GHz to 5.800 GHz band followed by the ROACH-2 FPGA system (Xilinx Virtex-6 sx475T FPGA) digital back-end.</p>
Full article ">Figure 3
<p>ROACH-2-based DSP platform based on Xilinx Virtex-6 Sx475 FPGA, and 32-channel ADC card. We gratefully acknowledge Dr. Dan Werthimer at UC Berkeley and the CASPER community for their contributions to the ROACH-2 and CASPER tools.</p>
Full article ">Figure 4
<p>ADFT architecture with spatial windowing and power normalizing units.</p>
Full article ">Figure 5
<p>Simulated magnitude response of 32-point ADFT and its modifications under three different windowing techniques. Subplot (<b>a</b>) shows the baseline response without any windowing. Subplots (<b>b</b>–<b>d</b>) demonstrate the ADFT responses after applying the Butterworth, Humming, and Hanning windows, respectively.</p>
Full article ">Figure 6
<p>ADFT and FFT calibration, energy integration, and overall system block including BRAM registers.</p>
Full article ">Figure 7
<p>Sixteen beams measured from the 5.7 GHz array (vertical axis is in decibel, and the horizontal axis is the azimuthal angle <math display="inline"><semantics> <mrow> <mo>[</mo> <mrow> <mo>−</mo> <mn>90</mn> </mrow> <mo>,</mo> <mn>90</mn> <mo>]</mo> </mrow> </semantics></math>). Each subfigure contains the measured array factor patterns of the beam using the real inputs of the ADFT core. The imaginary component of each ADFT input is set to zero here, thus resulting in symmetrical main lobes as expected.</p>
Full article ">Figure 8
<p>Temporal PSD over the 1024 discrete frequency bins for the particular RF beam at multiple frequencies when the source is placed at 15°, 35°, and 50° broadside receiver angles.</p>
Full article ">Figure 9
<p>Illustration of the overall setup showing the transmitting antennas, receiving antenna array and the multibeam spectral sensor. The 32-element antenna array receiver captures signals from different directions, with the transmitters located 20 m away from the receiver.</p>
Full article ">
13 pages, 2678 KiB  
Article
An FPGA-Based Data Acquisition System with Embedded Processing for Real-Time Gas Sensing Applications
by Godwin Enemali and Ryan M. Gibson
Appl. Sci. 2024, 14(15), 6738; https://doi.org/10.3390/app14156738 - 1 Aug 2024
Viewed by 388
Abstract
Real-time gas sensing based on wavelength modulation spectroscopy (WMS) has been widely adopted for several gas sensing applications. It is attractive for its accurate, non-invasive, and fast determination of critical gas parameters such as concentration, temperature, and pressure. To implement real-time gas sensing, [...] Read more.
Real-time gas sensing based on wavelength modulation spectroscopy (WMS) has been widely adopted for several gas sensing applications. It is attractive for its accurate, non-invasive, and fast determination of critical gas parameters such as concentration, temperature, and pressure. To implement real-time gas sensing, data acquisition and processing must be implemented to accurately extract harmonics of interest from transmitted laser signals. In this work, we present an FPGA-based data acquisition architecture with embedded processing capable of achieving both real-time and accurate gas detection. By leveraging real-time processing on-chip, we minimised the data transfer bandwidth requirement, hence enabling better resolution of data transferred for high-level processing. The proposed architecture has a significantly lower bandwidth requirement compared to both the conventional offline processing architecture and the standard I-Q architecture. Specifically, it is capable of reducing data transfer overhead by 25% compared to the standard I-Q method, and it only requires a fraction of the bandwidth needed by the offline processing architecture. The feasibility of the proposed architecture is demonstrated on a commercial off-the-shelf SoC board, where measurement results show that the proposed architecture has better accuracy compared to the standard I-Q demodulation architecture for the same signal bandwidth. The proposed DAQ system has potential for more accurate and fast real-time gas sensing. Full article
(This article belongs to the Special Issue Current Updates of Programmable Logic Devices and Synthesis Methods)
Show Figures

Figure 1

Figure 1
<p>Overview of the WMS system for gas sensing.</p>
Full article ">Figure 2
<p>Block diagram of real-time lock-in.</p>
Full article ">Figure 3
<p>RTL schematic of implemented real-time demodulation.</p>
Full article ">Figure 4
<p>Spectral absorption profile of water around 1391.</p>
Full article ">Figure 5
<p>Transmitted signal with 56 dB additive white Gaussian noise.</p>
Full article ">Figure 6
<p>DAQ set-up for WMS. The AWG contains signals generated from realistic absorption data from the HITRAN database. The generated signals are digitised, processed to extract harmonics of interest, and transmitted to a high-level processor.</p>
Full article ">Figure 7
<p>Comparison of accuracy of the proposed DAQ system with conventional techniques. In (<b>a</b>), raw transmitted samples were demodulated offline on a high-level processor. In (<b>b</b>,<b>c</b>), the standard I-Q demodulated technique and the proposed DAQ system operated in real time for a fixed bandwidth.</p>
Full article ">Figure 8
<p>Comparison of mean squared errors. The proposed DAQ method maintains a lower error value compared to the standard I-Q method.</p>
Full article ">
16 pages, 1503 KiB  
Article
FPGA Implementation of Pillar-Based Object Classification for Autonomous Mobile Robot
by Chaewoon Park, Seongjoo Lee and Yunho Jung
Electronics 2024, 13(15), 3035; https://doi.org/10.3390/electronics13153035 - 1 Aug 2024
Viewed by 316
Abstract
With the advancement in artificial intelligence technology, autonomous mobile robots have been utilized in various applications. In autonomous driving scenarios, object classification is essential for robot navigation. To perform this task, light detection and ranging (LiDAR) sensors, which can obtain depth and height [...] Read more.
With the advancement in artificial intelligence technology, autonomous mobile robots have been utilized in various applications. In autonomous driving scenarios, object classification is essential for robot navigation. To perform this task, light detection and ranging (LiDAR) sensors, which can obtain depth and height information and have higher resolution than radio detection and ranging (radar) sensors, are preferred over camera sensors. The pillar-based method employs a pillar feature encoder (PFE) to encode 3D LiDAR point clouds into 2D images, enabling high-speed inference using 2D convolutional neural networks. Although the pillar-based method is employed to ensure real-time responsiveness of autonomous driving systems, research on accelerating the PFE is not actively being conducted, although the PFE consumes a significant amount of computation time within the system. Therefore, this paper proposes a PFE hardware accelerator and pillar-based object classification model for autonomous mobile robots. The proposed object classification model was trained and tested using 2971 datasets comprising eight classes, achieving a classification accuracy of 94.3%. The PFE hardware accelerator was implemented in a field-programmable gate array (FPGA) through a register-transfer level design, which achieved a 40 times speedup compared with the firmware for the ARM Cortex-A53 microprocessor unit; the object classification network was implemented in the FPGA using the FINN framework. By integrating the PFE and object classification network, we implemented a real-time pillar-based object classification acceleration system on an FPGA with a latency of 6.41 ms. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) and Field-Programmable Gate Array (FPGA) Design)
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed acceleration system.</p>
Full article ">Figure 2
<p>Operation mechanism of DSCNN.</p>
Full article ">Figure 3
<p>Examples of dataset.</p>
Full article ">Figure 4
<p>Configuration of dataset classes: (<b>a</b>) building; (<b>b</b>) tree; (<b>c</b>) vehicle; (<b>d</b>) bicycle; (<b>e</b>) obstacle; (<b>f</b>) greenery; (<b>g</b>) person; (<b>h</b>) urban fixture.</p>
Full article ">Figure 5
<p>Architecture of DSCNN-based object classification network: (<b>a</b>) Network 1; (<b>b</b>) Network 2; (<b>c</b>) Network 3; (<b>d</b>) Network 4; (<b>e</b>) Network 5.</p>
Full article ">Figure 6
<p>Architecture of proposed acceleration system on FPGA.</p>
Full article ">Figure 7
<p>Block diagram of PFE accelerator.</p>
Full article ">Figure 8
<p>Operation of the PU: (<b>a</b>) tensor-level operation; (<b>b</b>) matrix-level operation; (<b>c</b>) hardware-level operation.</p>
Full article ">
25 pages, 10247 KiB  
Article
Development of Power-Delay Product Optimized ASIC-Based Computational Unit for Medical Image Compression
by Tanya Mendez, Tejasvi Parupudi, Vishnumurthy Kedlaya K and Subramanya G. Nayak
Technologies 2024, 12(8), 121; https://doi.org/10.3390/technologies12080121 - 29 Jul 2024
Viewed by 491
Abstract
The proliferation of battery-operated end-user electronic devices due to technological advancements, especially in medical image processing applications, demands low power consumption, high-speed operation, and efficient coding. The design of these devices is centered on the Application-Specific Integrated Circuits (ASIC), General Purpose Processors (GPP), [...] Read more.
The proliferation of battery-operated end-user electronic devices due to technological advancements, especially in medical image processing applications, demands low power consumption, high-speed operation, and efficient coding. The design of these devices is centered on the Application-Specific Integrated Circuits (ASIC), General Purpose Processors (GPP), and Field Programmable Gate Array (FPGA) frameworks. The need for low-power functional blocks arises from the growing demand for high-performance computational units that are part of high-speed processors operating at high clock frequencies. The operational speed of the processor is determined by the computational unit, which is the workhorse of high-speed processors. A novel approach to integrating Very Large-Scale Integration (VLSI) ASIC design and the concepts of low-power VLSI compatible with medical image compression was embraced in this research. The focus of this study was the design, development, and implementation of a Power Delay Product (PDP) optimized computational unit targeted for medical image compression using ASIC design flow. This stimulates the research community’s quest to develop an ideal architecture, emphasizing on minimizing power consumption and enhancing device performance for medical image processing applications. The study uses area, delay, power, PDP, and Peak Signal-to-Noise Ratio (PSNR) as performance metrics. The research work takes inspiration from this and aims to enhance the efficiency of the computational unit through minor design modifications that significantly impact performance. This research proposes to explore the trade-off of high-performance adder and multiplier designs to design an ASIC-based computational unit using low-power techniques to enhance the efficiency in power and delay. The computational unit utilized for the digital image compression process was synthesized and implemented using gpdk 45 nm standard libraries with the Genus tool of Cadence. A reduced PDP of 46.87% was observed when the image compression was performed on a medical image, along with an improved PSNR of 5.89% for the reconstructed image. Full article
Show Figures

Figure 1

Figure 1
<p>VLSI ASIC Design Flow.</p>
Full article ">Figure 2
<p>Proposed Computational Unit Block Diagram.</p>
Full article ">Figure 3
<p>Modified Carry Select Adder (CSLA) using ET Adders.</p>
Full article ">Figure 4
<p>Multiplier based on Divide and Conquer approach.</p>
Full article ">Figure 5
<p>Block diagram of One Hot Block.</p>
Full article ">Figure 6
<p>Two-part Multiplicand Block.</p>
Full article ">Figure 7
<p>Divide and Conquer Block diagram.</p>
Full article ">Figure 8
<p>4 number range Block diagram.</p>
Full article ">Figure 9
<p>4 × 4 Multiplier incorporating Iterative Carry Save SBETA.</p>
Full article ">Figure 10
<p>8 × 8 Multiplier incorporating Iterative Carry Save SBETA.</p>
Full article ">Figure 11
<p>16 × 16 Multiplier incorporating Iterative Carry Save SBETA.</p>
Full article ">Figure 12
<p>Encoding block of DCT Image Compression.</p>
Full article ">Figure 13
<p>Bit Transitioning for Binary and Gray Codes.</p>
Full article ">Figure 14
<p>An 8-point Loeffler’s DCT flow diagram incorporating the computational unit.</p>
Full article ">Figure 15
<p>Zig-zag scan pattern.</p>
Full article ">Figure 16
<p>Decoding block of IDCT Image Compression.</p>
Full article ">Figure 17
<p>PDP-PSNR Plot of various DCT architectures and the proposed Computational units 1—[<a href="#B43-technologies-12-00121" class="html-bibr">43</a>], 2—[<a href="#B44-technologies-12-00121" class="html-bibr">44</a>], 3—[<a href="#B45-technologies-12-00121" class="html-bibr">45</a>], 4—[<a href="#B46-technologies-12-00121" class="html-bibr">46</a>], 5—[<a href="#B47-technologies-12-00121" class="html-bibr">47</a>], 6—[<a href="#B48-technologies-12-00121" class="html-bibr">48</a>].</p>
Full article ">Figure 18
<p>Original and reconstructed CT scan images obtained using image compression.</p>
Full article ">Figure 19
<p>Physical Layout Design of the computational unit used for image compression.</p>
Full article ">
22 pages, 7285 KiB  
Article
Design and Application of an Onboard Particle Identification Platform Based on Convolutional Neural Networks
by Chaoping Bai, Xin Zhang, Shenyi Zhang, Yueqiang Sun, Xianguo Zhang, Ziting Wang and Shuai Zhang
Appl. Sci. 2024, 14(15), 6628; https://doi.org/10.3390/app14156628 - 29 Jul 2024
Viewed by 369
Abstract
Space radiation particle detection plays a crucial role in scientific research and engineering practice, especially in particle species identification. Currently, commonly used in-orbit particle identification techniques include telescope methods, electrostatic analysis time of flight (ESA × TOF), time-of-flight energy (TOF × E), and [...] Read more.
Space radiation particle detection plays a crucial role in scientific research and engineering practice, especially in particle species identification. Currently, commonly used in-orbit particle identification techniques include telescope methods, electrostatic analysis time of flight (ESA × TOF), time-of-flight energy (TOF × E), and pulse shape discrimination (PSD). However, these methods usually fail to utilize the full waveform information containing rich features, and their particle identification results may be affected by the random rise and fall of particle deposition and noise interference. In this study, a low-latency and lightweight onboard FPGA real-time particle identification platform based on full waveform information was developed utilizing the superior target classification, robustness, and generalization capabilities of convolutional neural networks (CNNs). The platform constructs diversified input datasets based on the physical features of waveforms and uses Optuna and Pytorch software architectures for model training. The hardware platform is responsible for the real-time inference of waveform data and the dynamic expansion of the dataset. The platform was utilized for deep learning training and the testing of the historical waveform data of neutron and gamma rays, and the inference time of a single waveform takes 4.9 microseconds, with an accuracy rate of over 97%. The classification expectation FOM (figure-of-merit) value of this CNN model is 133, which is better than the traditional pulse shape discrimination (PSD) algorithm’s FOM value of 0.8. The development of this platform not only improves the accuracy and efficiency of space particle discrimination but also provides an advanced tool for future space environment monitoring, which is of great value for engineering applications. Full article
Show Figures

Figure 1

Figure 1
<p>Structural diagram of the convolutional neural network particle identification platform.</p>
Full article ">Figure 2
<p>(<b>a</b>) Schematic diagram of waveform signals of different particles with the same energy; (<b>b</b>) neutron and gamma waveform schematics.</p>
Full article ">Figure 3
<p>Dataset construction flowchart.</p>
Full article ">Figure 4
<p>Forward inference architecture construction flowchart.</p>
Full article ">Figure 5
<p>Block diagram of the data flow of the convolutional layer.</p>
Full article ">Figure 6
<p>Pooling layer data flow block diagram.</p>
Full article ">Figure 7
<p>Block diagram of the full connectivity layer’s data flow.</p>
Full article ">Figure 8
<p>Three types of training dataset.</p>
Full article ">Figure 9
<p>Three types of validation dataset.</p>
Full article ">Figure 10
<p>Three types of test dataset.</p>
Full article ">Figure 11
<p>CNN network architecture diagram.</p>
Full article ">Figure 12
<p>Training and validation set computation results.</p>
Full article ">Figure 13
<p>Physical diagram of CNN operation platform.</p>
Full article ">Figure 14
<p>(<b>a</b>) Distribution of neutron classification expectations. (<b>b</b>) Gamma classification expectation distribution.</p>
Full article ">Figure 15
<p>The pulse shape and time window of neutron and gamma in CLYC.</p>
Full article ">Figure 16
<p>Neutron–gamma PSD frequency curve.</p>
Full article ">
Back to TopTop