Vassilis Paliouras

University of Patras, Electrical and Computer Engineering, Faculty Member

Followers

Following

Co-authors

Mentions

Public Views

InterestsView All (12)

Uploads

Papers

Logarithmic number system for deep learning

2018 7th International Conference on Modern Circuits and Systems Technologies (MOCAST)

In this paper the logarithmic Number System (LNS) is adopted to implement Long-Short Term Memory ... more In this paper the logarithmic Number System (LNS) is adopted to implement Long-Short Term Memory (LSTM), the basic component of a deep learning network type. Initially, piece wise approximations to activation functions σ and tanh are proposed and evaluated in LNS. Secondly, LNS multipliers and adders are implemented for wordlengths of 9,10 and 11 bits. The circuits are implemented in an 90-nm 1.0 V CMOS standard-cell library and quantitative results are reported. Results demonstrate that LNS is a good candidate for data representation and processing in deep learning networks, as area reduction of up to 36% is possible.

A Low-Latency Syndrome-based Deep Learning Decoder Architecture and its FPGA Implementation

2022 11th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Hardware Implementation Aspects of a Syndrome-based Neural Network Decoder for BCH Codes

2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), 2019

Deep-Learning-based Decoders have been recently introduced for use with short-length codes. They ... more Deep-Learning-based Decoders have been recently introduced for use with short-length codes. They have been found to act as a Soft-Decision-Decoding method achieving near Maximum-Likelihood error correcting capability. However, Deep-Learning decoding methods are hard to implement as they normally require millions of operations for inference. In order for Deep-Learning decoding to be a competitive candidate for practical applications, research effort is required to reduce the computational complexity and storage requirements of the Neural Networks involved. In this work, a structured flow is presented that significantly compresses a trained Syndrome-Based Neural Network Decoder by pruning up to 80% of the network's weights and quantizing them to 8-bit fixed-point representation, with no loss in its BER performance. The attained compressed Neural Network can then be used for inference, by designing specific hardware or by using a generic Deep-Learning hardware accelerator that exploits the compressed structure of the network. The deployment of the DL Decoder in an embedded application is showcased, using the AI Edge platform by Xilinx. To accomplish this, a simple method to obtain a computationally equivalent convolutional layer from a fully-connected one is introduced. Implementation results are provided for the compressed DL Decoder, regarding throughput rate and BER performance. To our knowledge, this is the first DL decoder in hardware reported.

Simplified Deep-Learning-based decoders for linear block codes

2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2018

Deep-learning technology proliferates in a wide spectrum of applications. Modern communications a... more Deep-learning technology proliferates in a wide spectrum of applications. Modern communications applications require innovative solutions that can deliver near optimum performance in diverse and evolving environments. This paper proposes several substantial simplifications to deep-learning networks applied to the decoding of linear block codes. All proposed techniques reduce computational and interconnection complexity required for the inference in a deep-learning network over prior art. The proposed techniques build on inducing or exploiting sparsity in the trained network. Complexity savings of 60% to more than 80% are achieved without any practical degradation on decoding performance, quantified as coding gain.

Simplified Hardware Implementation of Memoryless Dot Product for Neural Network Inference

2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021

In this paper a simplified hardware implementation of a dot product arithmetic operation with con... more In this paper a simplified hardware implementation of a dot product arithmetic operation with constant coefficients is presented. The proposed methodology exploits a combination of distributed arithmetic and common subexpression sharing techniques. An algorithm is introduced for identifying the common sub partial sums systematically. Subsequently, a hardware architecture is proposed and the obtained circuits are synthesized in a 90-nm 1.0 V CMOS standard-cell library using Synopsys Design Compiler. Comparisons reveal significant reduction of 52% and 23% in area and power respectively for 1.5 ns delay over a regular dot product constant multiplier.

Hardware Aspects of Parallel Neural Network Implementation

2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST), 2021

In this paper a parallel neural network architecture is proposed targeting efficient hardware imp... more In this paper a parallel neural network architecture is proposed targeting efficient hardware implementation on low-resource devices. Following the introduction of the proposed technique, the novel concept is applied on two basic function approximation examples namely cos(x) and sin(x). Quantitative results are offered and discussed in terms of accuracy and hardware complexity. It is shown that the proposed technique achieves promising results when considering low-power and high-performance hardware implementations targeted to edge devices.

Design methodology for the implementation of multidimensional circular convolution

by Vassilis Paliouras and Dimitrios Soudris

IEE Proceedings - Circuits, Devices and Systems, 1997

Download

A Low Complexity-High Throughput QC-LDPC Encoder

IEEE Transactions on Signal Processing, 2000

ABSTRACT This paper introduces hardware architectures for encoding Quasi-Cyclic Low-Density Parit... more ABSTRACT This paper introduces hardware architectures for encoding Quasi-Cyclic Low-Density Parity Check (QC-LDPC) codes. The proposed encoders are based on appropriate factorization and subsequent compression of involved matrices by means of a novel technique, which exploits features of recursively-constructed QC-LDPC codes. The particular approach derives to linear encoding time complexity and requires a constant number of clock cycles for the computation of parity bits for all the constructed codes of various lengths that stem from a common base matrix. The proposed architectures are flexible, as they are parameterized and can support multiple code rates and codes of different lengths simply by appropriate initialization of memories and determination of data bus widths. Implementation results show that the proposed encoding technique is more efficient for some LDPC codes than previously proposed solutions. Both serial and parallel architectures are proposed. Hardware instantiations of the proposed serial encoders demonstrate high throughput with low area complexity for code words of many thousand bits, achieving area reduction compared to prior art. Furthermore, parallelization is shown to efficiently support multi-Gbps solutions at the cost of moderate area increase. The proposed encoders are shown to outperform the current state-of-the-art in terms of throughput-area-ratio and area-time complexity by 10 to up to 80 times for codes of comparable error-correction strength.

Low-Power Logarithmic Number System Addition/Subtraction and Their Impact on Digital Filters

IEEE Transactions on Computers, 2000

This paper discusses techniques for low-power addition/subtraction in the logarithmic number syst... more This paper discusses techniques for low-power addition/subtraction in the logarithmic number system (LNS) and evaluates their impact on digital filter implementation. Initially, the impact of partitioning the look-up tables (LUT) required for addition/subtraction on complexity, performance, and power dissipation is studied. Subsequently techniques for the low-power implementation of an LNS multiply- accumulate (MAC) unit are investigated. The obtained LNS MACs are used for the design of digital filters. Synthesis of LNS-based digital filters using a 0.18 mum 1.8 V CMOS standard-cell library, reveal that significant power dissipation savings are possible at no performance penalty, when compared to linear two&amp;amp;#x27;s-complement equivalent.

Operation-saving VLSI architectures for 3D geometrical transformations

by Georgios Diamantakos and Vassilis Paliouras

IEEE Transactions on Computers, 2001

Download

Novel high-radix residue number system architectures

by T. Stouraitis, Vassilis Paliouras, and Thanos Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2000

Download

A low-complexity combinatorial RNS multiplier

by Vassilis Paliouras and T. Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2001

Download

Multifunction architectures for RNS processors

by Vassilis Paliouras and Thanos Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1999

Download

A VLSI design methodology for RNS full adder-based inner product architectures

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1997

In this paper, a systematic graph-based methodology for synthesizing VLSI RNS architectures using... more In this paper, a systematic graph-based methodology for synthesizing VLSI RNS architectures using full adders as the basic building block is introduced. The design methodology derives array architectures starting from the algorithm level and ending up with the bit-level design. Using as target architectural style the regular array processor, the proposed procedure constructs the two-dimensional (2-D) dependence graph of the bit-level algorithm, which is formally described by sets of uniform recurrent equations. The main characteristic of the proposed architectures is that they can operate at very high-throughput rates. The proposed architectures exhibit significantly reduced complexity over ROM-based ones

A floating-point processor for fast and accurate sine/cosine evaluation

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2000

Download

A Low-Complexity High-Radix RNS Multiplier

IEEE Transactions on Circuits and Systems I: Regular Papers, 2000

A graph-based technique is introduced for the design of a class of residue arithmetic multipliers... more A graph-based technique is introduced for the design of a class of residue arithmetic multipliers, as well as a family of new high-radix digit adders. A proposed design technique derives simple high-radix modulo-r n multipliers by optimally selecting among the variety of introduced digit adders the ones that compose a minimal-area multiplier. The proposed technique minimizes multiplier complexity by selecting

A Novel Architecture and a Systematic Graph-Based Optimization Methodology for Modulo Multiplication

IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 2004

... Giorgos Dimitrakopoulos and Vassilis Paliouras, Member, IEEE ... transform (DFT) [7]. Elleith... more

Considering the alternatives in low-power design

IEEE Circuits and Devices Magazine, 2001

Download

Low-power maximum magnitude computation for PAPR reduction in ofdm transmitters

… Circuit and System Design. Power and …, 2006

... The remainder of the paper is as follows: Section 2 discusses the basics of OFDM transmission... more

An efficient architecture for peak-to-average power ratio reduction in OFDM systems in the presence of pulse-shaping filtering

Circuits and Systems, 2004. …, 2004

... 2. BASIC NOTATION This section discusses the basics of OFDM transmission, defines PAPR and Cr... more

Logarithmic number system for deep learning

2018 7th International Conference on Modern Circuits and Systems Technologies (MOCAST)

A Low-Latency Syndrome-based Deep Learning Decoder Architecture and its FPGA Implementation

2022 11th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Hardware Implementation Aspects of a Syndrome-based Neural Network Decoder for BCH Codes

2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), 2019

Simplified Deep-Learning-based decoders for linear block codes

2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2018

Simplified Hardware Implementation of Memoryless Dot Product for Neural Network Inference

2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021

Hardware Aspects of Parallel Neural Network Implementation

2021 10th International Conference on Modern Circuits and Systems Technologies (MOCAST), 2021

Design methodology for the implementation of multidimensional circular convolution

by Vassilis Paliouras and Dimitrios Soudris

IEE Proceedings - Circuits, Devices and Systems, 1997

Download

A Low Complexity-High Throughput QC-LDPC Encoder

IEEE Transactions on Signal Processing, 2000

Low-Power Logarithmic Number System Addition/Subtraction and Their Impact on Digital Filters

IEEE Transactions on Computers, 2000

Operation-saving VLSI architectures for 3D geometrical transformations

by Georgios Diamantakos and Vassilis Paliouras

IEEE Transactions on Computers, 2001

Download

Novel high-radix residue number system architectures

by T. Stouraitis, Vassilis Paliouras, and Thanos Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2000

Download

A low-complexity combinatorial RNS multiplier

by Vassilis Paliouras and T. Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2001

Download

Multifunction architectures for RNS processors

by Vassilis Paliouras and Thanos Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1999

Download

A VLSI design methodology for RNS full adder-based inner product architectures

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1997

A floating-point processor for fast and accurate sine/cosine evaluation

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2000

Download

A Low-Complexity High-Radix RNS Multiplier

IEEE Transactions on Circuits and Systems I: Regular Papers, 2000

A Novel Architecture and a Systematic Graph-Based Optimization Methodology for Modulo Multiplication

IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 2004

... Giorgos Dimitrakopoulos and Vassilis Paliouras, Member, IEEE ... transform (DFT) [7]. Elleith... more

Considering the alternatives in low-power design

IEEE Circuits and Devices Magazine, 2001

Download

Low-power maximum magnitude computation for PAPR reduction in ofdm transmitters

… Circuit and System Design. Power and …, 2006

... The remainder of the paper is as follows: Section 2 discusses the basics of OFDM transmission... more

An efficient architecture for peak-to-average power ratio reduction in OFDM systems in the presence of pulse-shaping filtering

Circuits and Systems, 2004. …, 2004

... 2. BASIC NOTATION This section discusses the basics of OFDM transmission, defines PAPR and Cr... more

Vassilis Paliouras

Uploads

Papers

Log In