[go: up one dir, main page]

0% found this document useful (0 votes)
41 views7 pages

Speech

ham radio details

Uploaded by

pravin2275767
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views7 pages

Speech

ham radio details

Uploaded by

pravin2275767
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Speech:

Speech is our primary mode of communication; When you want to


communicate something important, you say it face-to-face. Everything
important is communicated in a spoken form.

Speech is about communication. A characteristic trait of humans in


comparison to other animals is our refined abilities to communicate. To work
efficiently as a group, we need to communicate. To learn from our mistakes,
we need to communicate. Where hand waving and smoke signals can be
used to communicate, speech remains as our best way to communicate
abstract thoughts. An important difference between speech and images is
however that where pictures excel in the transmission of information, speech
excels in interaction.

Telecommunication:

Telecommunications was another milestone in human history. Though the


telegraph was an effective way of communicating, it also required specialized
training. The invention of the telephone in 1849 was, therefore, a great
invention because it was the first technology to provide instantaneous
telecommunication without specialized training.

The first wireless (radio) transmission of speech came 50 years later in 1900,
quickly to become an important broadcast media. Again, while newspapers
had an important role in broadcasting news, the radio was faster and more
accessible (does not require the ability to read).

If we can make communication with speech easier using technology, it can


be very useful. For example, if telecommunication, such as telephony,
teleconferences, and voice-over-IP, can be improved, then that would allow
people to use speech more efficiently.

We can use to our advantage the people's preference for speech


communication. For example, interactions with devices and computers could
be improved by allowing spoken interaction with them. In particular, typing
on a keyboard and other tactile interfaces are difficult for children, the
elderly and handicapped people, whereas most people can speak. Similarly,
user interfaces based on visual information is often based on accessing
information and services through menus. Using natural language can be
more intuitive and simpler to use; we could just say to the washing machine
"Wash this small number of dirty curtains. " instead of searching for
washing options from a menu.

The devices and services which use speech and language are extremely wide-
spread. By now, a majority of people in the world has access to a mobile
phone and there are almost 8 billion active mobile-phone subscriptions. If
we can improve the technology used by those 8 billion people, by say,
reducing energy consumption, then the impact of such improvements would
be majestic.

SPEECH ENHANCEMENT

When using speech technology in real environments, we are often faced with
less than perfect signal quality. For example, if you make a phone call at a
cafeteria, typically you have plenty of other people speaking in the
background, there could be music playing and the room itself can have
reverberation. Such effects distort the desired speech signal such that the
receiving end, the desired speech sounds less pleasant, requires more effort
to understand or at the worst case, it becomes less intelligible. Speech
enhancement refers to methods which try to reduce such distortions, to
make speech sounds more pleasant, reduce listening effort and improve
intelligibility.

The most prominent categories of speech enhancement are:

 Noise attenuation, where we try to extract the desired speech signal


when distorted by background noise(s).
 Echo cancellation and feedback cancellation are used when the sound
played from a loudspeaker is picked up by a microphone distorting the
desired signal.
 Dereverberation refers to methods which attenuate the effect of room
acoustics on the desired signal.
 Source separation methods try to extract sounds of single sources
from a mixture, for example, in the classical cocktail-party problem,
we would like to isolate single speakers when multiple people are
talking at the same time.
 Beamforming refers to spatially selective methods, where the objective
is to isolate sounds coming from a particular direction, by using the
information about the spatial separation of a set of microphones.

The objective of speech enhancement however requires a bit more


consideration. In its most classical form, the objective is to extract a clean
speech signal from a distorted mixture, where the distortions can be
background and sensor noises, as well as room reverberation. Here the clean
reference signal is considered to be that signal which would be rerecorded
with a microphone close to the speaker, which does not contain said noises
or reverberation. It is then clear that it will be challenging to obtain realistic
data since even a microphone close to the speaker will usually contain
background noises and the effect of reverberation. For the development of
methods, it is therefore often difficult to obtain data which would accurately
correspond to a realistic situation. In any case, a typical objective would be
to improve the signal to noise ratio (with or without perceptual weighting) as
much as possible.

A more challenging scenario is when two or more persons are speaking in


the same acoustic environment. The second speaker can then be viewed as a
competing speaker (undesired source) or as a discussion partner (desired
source). Even if the two speakers are in an interaction with each other, then
often they will speak on top of each other, even if stereotypically we think of
dialogue as a non-overlapping back and forth exchange of non-overlapping
arguments. If we want to separate between the two speakers, then overlaps
are difficult, because the statistics of both speech signals will be rather
similar, whereas noise signals with distinct statistics are easier to attenuate.
Sometimes we do not want to remove all distortions entirely but just
attenuate their effect. Completely removing artefacts can sometimes make
the signal sound unnatural and besides removing distortions, processing
methods also almost always distort the desired signal. Therefore, to retain a
natural-sounding signal and to minimize distortion of the desired speech
signal, we often limit the extent to which distortions are removed. 

A further aspect of enhancement is intelligibility and pleasantness; as a


starting point, observe that the speech of some people is by nature difficult
to understand or otherwise just annoying (unpleasant). It then conceivable
that we devise some processing which improves the speech signal to better
than the original. What "sounds better" is however a difficult concept, since
we do not have unambiguous measures for "how good it sounds" and
opinions between listeners will certainly diverge.

Intelligibility about human listeners is similarly complicated as


pleasantness, but luckily, we can use speech recognition engines to obtain
objective measures. That is, if we give noisy and improved speech signals to
a speech recognizer, we can determine the recognition performance in both
cases to estimate the benefit obtained with our processing.

Stage Description Duration in


s Months
1 Literature survey 1
2 Database collection in HF/VHF/UHF modes in 2
the lab environment
3 Database collection in HF/VHF/UHF modes in 2
moving vehicle environment
4 Database collection in HF/VHF/UHF modes in 2
the factory noise environment
5 Study on various speech denoising techniques 1
6 Implementation and performance evaluation of 2
various speech denoising techniques
7 Speech database and technical report 1
submission to CAIR-DRDO

The detailed description of each stage of the project is as follows:


1. Literature survey: A detailed literature survey on speech data
collection over radio channels through HAM radio.
2. Database collection in HF/VHF/UHF modes in a lab environment: The
voice samples of the speakers are collected in the lab/office
environment with no or minimum disturbances/background noise.
The voice samples are collected with different antenna positions
(Horizontal, vertical and angular) and with different make devices to
capture the variations.
3. Database collection in HF/VHF/UHF modes in moving vehicle
environment: The voice samples of the speakers are collected in a
moving vehicle environment to capture the disturbances/background
noise. The voice samples are collected with different antenna positions
(Horizontal, vertical and angular) and with different make devices to
capture the variations.
4. Database collection in HF/VHF/UHF modes in factory environment:
The voice samples of the speakers are collected in a factory
environment where the sounds/harmonics of the running machines
are captured as disturbances/background noise. The voice samples
are collected with different antenna positions (Horizontal, vertical and
angular) and with different make devices to capture the variations.
5. Study on various speech denoising techniques: A detailed literature
survey on state-of-the-art speech denoising techniques and denoising
techniques based on deep learning.
6. Implementation and performance evaluation of various speech
denoising techniques: Implementation and comparison of
performances of state-of-the-art speech denoising techniques and
denoising techniques based on deep learning.
7. Speech database and technical report submission to CAIR-DRDO: The
collected speech database which satisfying all the criterion will be
handed over to CAIR-DRDO along with the detailed technical report.

Database Specifications:
Item Details
No. of speakers 250
Data type Speech data
Sampling rate 8 kHz
Sampling 1 channel, 16-bit resolution
Format
Language Indian English
Type of speech Isolated words, Digits and sentences
Acoustic Office/moving vehicle/factory
environment
Channel Radio channel
Duration Min 30 mins/language/speaker/channel

Proposed Technical Approach:

Speech data
collection in
HF/VHF/UHF modes
in lab environment

Speech data
collection in Speech
HF/VHF/UHF modes denoising Clean speech
in moving vehicle algorithms
environment
Speech data
collection in
HF/VHF/UHF modes
in factory
environment

The proposed technical approach of the project titled “Evaluation of


Denoising Algorithms on Speech Corpus Created over radio channels” is
shown in the above block diagram. Concerning the above figure, the first
step is to collect the voice samples of the speakers through HF/VHF/UHF
frequency mode in lab/moving vehicle/factory environment. The data
collection will be done with different orientations of antenna such as
horizontal, vertical and angular directions to capture all the possible
variations in the speech data. The next step is to apply the speech denoising
techniques to remove the background disturbances in the speech file. In this
step, the state-of-the-art techniques and deep learning methods were
implemented for denoising the speech and their performances were
compared. The output of the denoising algorithms is the clean speech which
is free from background disturbances.

You might also like