[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (36)

Search Parameters:
Keywords = voice over IP

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 3237 KiB  
Article
QoS-Aware Power-Optimized Path Selection for Data Center Networks (Q-PoPS)
by Mohammed Nsaif, Gergely Kovásznai, Ali Malik and Ruairí de Fréin
Electronics 2024, 13(15), 2976; https://doi.org/10.3390/electronics13152976 - 28 Jul 2024
Viewed by 656
Abstract
Data centers consume significant amounts of energy, contributing indirectly to environmental pollution through greenhouse gas emissions during electricity generation. According to the Natural Resources Defense Council, information and communication technologies and networks account for roughly 10% of global energy consumption. Reducing power consumption [...] Read more.
Data centers consume significant amounts of energy, contributing indirectly to environmental pollution through greenhouse gas emissions during electricity generation. According to the Natural Resources Defense Council, information and communication technologies and networks account for roughly 10% of global energy consumption. Reducing power consumption in Data Center Networks (DCNs) is crucial, especially given that many data center components operate at full capacity even under low traffic conditions, resulting in high costs for both service providers and consumers. Current solutions often prioritize power optimization without considering Quality of Service (QoS). Services such as video streaming and Voice over IP (VoIP) are particularly sensitive to loss or delay and require QoS to be maintained below certain thresholds. This paper introduces a novel framework called QoS-Aware Power-Optimized Path Selection (Q-PoPS) for software-defined DCNs. The objective of Q-PoPS is to minimize DCN power consumption while ensuring that an acceptable QoS is provided, meeting the requirements of DCN services. This paper describes the implementation of a prototype for the Q-PoPS framework that leverages the POX Software-Defined Networking (SDN) controller. The performance of the prototype is evaluated using the Mininet emulator. Our findings demonstrate the performance of the proposed Q-PoPS algorithm in three scenarios. Best-case: Enhancing real-time traffic protocol quality without increasing power consumption. midrange-case: Replacing bottleneck links while preserving real-time traffic quality. Worst-case: Identifying new paths that may increase power consumption but maintain real-time traffic quality. This paper underscores the need for a holistic approach to DCN management, optimizing both power consumption and QoS for critical real-time applications. We present the Q-PoPS framework as evidence that such an approach is achievable. Full article
(This article belongs to the Section Networks)
Show Figures

Figure 1

Figure 1
<p>Data volume and the rise in DCN energy consumption: The projected exponential growth in global data volume from 2010 to 2025, measured in zettabytes, is illustrated. This surge in data processing significantly loads DCNs, which are reported to consume roughly <math display="inline"> <semantics> <mrow> <mn>10</mn> <mo>%</mo> </mrow> </semantics> </math> of global energy according to the Natural Resources Defense Council.</p>
Full article ">Figure 2
<p>Flow of control of FPLF: FPLF operates in a sequential manner. The output from each of three stages feeds into the next.</p>
Full article ">Figure 3
<p>The pure consolidation method and the RTP-aware consolidation method of power optimization subject to an RTP-flow delay constraint are illustrated. In sub-figure (<b>a</b>): [<span class="html-fig-inline" id="electronics-13-02976-i001"><img alt="Electronics 13 02976 i001" src="/electronics/electronics-13-02976/article_deploy/html/images/electronics-13-02976-i001.png"/></span>]: Represents the consolidation of three flows. [<span class="html-fig-inline" id="electronics-13-02976-i002"><img alt="Electronics 13 02976 i002" src="/electronics/electronics-13-02976/article_deploy/html/images/electronics-13-02976-i002.png"/></span>]: Represents the consolidation of 4 flows on segment path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mn>9</mn> <mo>,</mo> <mn>5</mn> <mo>)</mo> </mrow> </semantics> </math> from different sources. [<span class="html-fig-inline" id="electronics-13-02976-i003"><img alt="Electronics 13 02976 i003" src="/electronics/electronics-13-02976/article_deploy/html/images/electronics-13-02976-i003.png"/></span>]: Signifies the sum of 2 flows. In sub-figure (<b>b</b>): [<span class="html-fig-inline" id="electronics-13-02976-i001"><img alt="Electronics 13 02976 i001" src="/electronics/electronics-13-02976/article_deploy/html/images/electronics-13-02976-i001.png"/></span>]: Represents the consolidation of (two plus the RTP flow). [<span class="html-fig-inline" id="electronics-13-02976-i002"><img alt="Electronics 13 02976 i002" src="/electronics/electronics-13-02976/article_deploy/html/images/electronics-13-02976-i002.png"/></span>]: Represents the consolidation of 3 flows because the RTP flow was moved to an alternative path, represented by [<span class="html-fig-inline" id="electronics-13-02976-i001"><img alt="Electronics 13 02976 i001" src="/electronics/electronics-13-02976/article_deploy/html/images/electronics-13-02976-i001.png"/></span>].</p>
Full article ">Figure 4
<p>The Q-PoPS framework for power-optimized QoS for SD-DCNs is illustrated. It leverages the northbound (POX) and the southbound (OpenFlow) interfaces. The core innovation lies in Q-PoPS, which integrates information from the Classifier, Monitoring, and FPLF components to manage power and QoS for real-time traffic, which is represented by <math display="inline"> <semantics> <msub> <mi mathvariant="bold">F</mi> <mi mathvariant="bold">r</mi> </msub> </semantics> </math>. <a href="#sec5dot5-electronics-13-02976" class="html-sec">Section 5.5</a> details the operation of Q-PoPS.</p>
Full article ">Figure 5
<p>Emulated <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics> </math> fat-tree topology. The figure illustrates a fat-tree network where <span class="html-italic">k</span> represents the parameter that defines the number of switches and connections in the network. Specifically, <math display="inline"> <semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics> </math> indicates a topology with four levels of switches and a total of <math display="inline"> <semantics> <mrow> <msup> <mi>k</mi> <mn>2</mn> </msup> <mo>=</mo> <mn>16</mn> </mrow> </semantics> </math> switches in the network. This topology is often used in data center networks to provide scalable and fault-tolerant network structures.</p>
Full article ">Figure 6
<p>Illustration of the active subset of the network graph, <span class="html-italic">G</span>, before (LHS window) and after (RHS window) a delay event triggered by congested real-time traffic. (<b>a</b>) Q-PoPS: active subset of the graph, <span class="html-italic">G</span>, before the delay event. (<b>b</b>) Q-PoPS, active subset of the graph, <span class="html-italic">G</span>, after the delay event.</p>
Full article ">Figure 7
<p>Comparison of power consumption, using the pure FPLF algorithm proposed in [<a href="#B10-electronics-13-02976" class="html-bibr">10</a>], and the Q-PoPS method (RHS window), along with delay value of the RTP Flow before and after the delay event (LHS window).</p>
Full article ">Figure 8
<p>Visualization of the active link path utilization as a function of simulation time. The upper LHS window displays the utilization for the path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>7</mn> <mo>,</mo> <mn>6</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>18</mn> <mo>,</mo> <mn>20</mn> <mo>)</mo> </mrow> </semantics> </math>, where all the links in the path have the same utilization. In the upper RHS window, the path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>7</mn> <mo>,</mo> <mn>5</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>17</mn> <mo>,</mo> <mn>20</mn> <mo>)</mo> </mrow> </semantics> </math> is shown, where all links have the same utilization except for link, <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>7</mn> <mo>,</mo> <mn>5</mn> <mo>)</mo> </mrow> </semantics> </math>, because it is not a shared link. The utilization of path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>8</mn> <mo>,</mo> <mn>5</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>17</mn> <mo>,</mo> <mn>20</mn> <mo>)</mo> </mrow> </semantics> </math> is presented in the lower LHS window. The lower RHS window displays all the critical links that Q-PoPS relied on to make decisions about alternative paths.</p>
Full article ">Figure 9
<p>We illustrate how Q-PoPS optimizes power consumption for the midrange traffic scenario by showing the state of the graph, <span class="html-italic">G</span>, before and after the delay event. (<b>a</b>) Q-PoPS: active subset of graph, <span class="html-italic">G</span>, before the delay event. (<b>b</b>) Q-PoPS: active subset of graph, <span class="html-italic">G</span>, after the delay event.</p>
Full article ">Figure 10
<p>We compare the power consumption of SD-DCNs managed by FPLF and Q-PoPS. FPLF exhibits lower power usage (<math display="inline"> <semantics> <mrow> <mn>2</mn> <mo>.</mo> </mrow> </semantics> </math>4 Watts). Q-PoPS consumes a larger power consumption (<math display="inline"> <semantics> <mrow> <mn>3.2</mn></mrow> </semantics> </math> Watts). Q-PoPS dynamically reroutes the real-time traffic flow when congestion occurs, requiring the activation of additional ports, 4 in this case. The primary result communicated is the trade-off between power efficiency and real-time traffic management.</p>
Full article ">Figure 11
<p>Visualization of active link path utilization as a function of simulation time. The upper LHS window shows the link utilization of the path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>11</mn> <mo>,</mo> <mn>10</mn> <mo>,</mo> <mn>4</mn> <mo>,</mo> <mn>18</mn> <mo>,</mo> <mn>20</mn> <mo>)</mo> </mrow> </semantics> </math> as time evolves. The upper RHS second displays the path link utilization for the path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>11</mn> <mo>,</mo> <mn>10</mn> <mo>,</mo> <mn>4</mn> <mo>,</mo> <mn>14</mn> <mo>,</mo> <mn>16</mn> <mo>)</mo> </mrow> </semantics> </math>. The path segment <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>11</mn> <mo>,</mo> <mn>10</mn> <mo>,</mo> <mn>4</mn> <mo>)</mo> </mrow> </semantics> </math> is utilized more because it supports both ICMP and RTP traffic before the delay event. Conversely, the link path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>4</mn> <mo>,</mo> <mn>14</mn> <mo>,</mo> <mn>16</mn> <mo>)</mo> </mrow> </semantics> </math> carries only ICMP throughout the simulation. The lower LHS window presents the utilization of the link path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>11</mn> <mo>,</mo> <mn>9</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>17</mn> <mo>,</mo> <mn>20</mn> <mo>)</mo> </mrow> </semantics> </math>. The lower RHS window showcases all the critical links that Q-PoPS relies on for decision-making regarding alternative paths.</p>
Full article ">Figure 12
<p>Q-PoPS optimizes delay in the worst-case scenario; however, due to a limited active subset (<b>a</b>), Q-PoPS finds a completely new path (<b>b</b>) to meet real-time traffic needs. (<b>a</b>) Q-PoPS: active subset of graph <span class="html-italic">G</span> before the delay event. (<b>b</b>) Q-PoPS: active subset of a graph, <span class="html-italic">G</span>, after the delay event.</p>
Full article ">Figure 13
<p>Comparison of power consumption between FPLF and Q-PoPS (RHS window), along with delay value for the RTP Flow before and after a delay event (LHS window), worst-case solution.</p>
Full article ">Figure 14
<p>Visualization of paths utilization during the simulation. The path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>7</mn> <mo>,</mo> <mn>5</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>17</mn> <mo>,</mo> <mn>20</mn> <mo>)</mo> </mrow> </semantics> </math> shared both ICMP and RTP traffic until 17:04:00, then it started carrying ICMP traffic only for the remaining simulation time, while the path <math display="inline"> <semantics> <mrow> <mo>(</mo> <mn>7</mn> <mo>,</mo> <mn>6</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>18</mn> <mo>,</mo> <mn>20</mn> <mo>)</mo> </mrow> </semantics> </math> served as the alternative path for transmitting the RTP flow.</p>
Full article ">
37 pages, 18482 KiB  
Article
Active Queue Management in L4S with Asynchronous Advantage Actor-Critic: A FreeBSD Networking Stack Perspective
by Deol Satish, Jonathan Kua and Shiva Raj Pokhrel
Future Internet 2024, 16(8), 265; https://doi.org/10.3390/fi16080265 - 25 Jul 2024
Cited by 1 | Viewed by 759
Abstract
Bufferbloat is one of the leading causes of high data transmission latency and jitter on the Internet, which severely impacts the performance of low-latency interactive applications such as online streaming, cloud-based gaming/applications, Internet of Things (IoT) applications, voice over IP (VoIP), real-time video [...] Read more.
Bufferbloat is one of the leading causes of high data transmission latency and jitter on the Internet, which severely impacts the performance of low-latency interactive applications such as online streaming, cloud-based gaming/applications, Internet of Things (IoT) applications, voice over IP (VoIP), real-time video conferencing, and so forth. There is currently a pressing need for developing Transmission Control Protocol (TCP) congestion control algorithms and bottleneck queue management schemes that can collaboratively control/reduce end-to-end latency, thus ensuring optimal quality of service (QoS) and quality of experience (QoE) for users. This paper introduces a novel solution by experimentally integrate the low latency, low loss, and scalable throughput (L4S) architecture (specified by the IETF in RFC 9330) in FreeBSD framework with the asynchronous advantage actor-critic (A3C) reinforcement learning algorithm. The first phase involves incorporating a modified dual-queue coupled active queue management (AQM) system for L4S into the FreeBSD networking stack, enhancing queue management and mitigating latency and packet loss. The second phase employs A3C to adjust and fine-tune the system performance dynamically. Finally, we evaluate the proposed solution’s effectiveness through comprehensive experiments, comparing it with traditional AQM-based systems. This paper contributes to the advancement of machine learning (ML) for transport protocol research in the field. The experimental implementation and results presented in this paper are made available through our GitHub repositories. Full article
Show Figures

Figure 1

Figure 1
<p>L4S AQM Architecture.</p>
Full article ">Figure 2
<p>A3C coupled L4S Architecture.</p>
Full article ">Figure 3
<p>Network topology utilized for evaluating AQM algorithms.</p>
Full article ">Figure 4
<p>Case 1: CoDel (ECN enabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 5
<p>Case 2: CoDel (ECN disabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 6
<p>Case 1: CoDel (ECN enabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 7
<p>Case 2: CoDel (ECN disabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 8
<p>Case 1: PIE (ECN enabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 9
<p>Case 2: PIE (ECN disabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 10
<p>Case 1: PIE (ECN enabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 11
<p>Case 2: PIE (ECN disabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 12
<p>Case 1: FQ-CoDel (ECN enabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 13
<p>Case 2: FQ-CoDel (ECN disabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 14
<p>Case 1: FQ-CoDel (ECN enabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 15
<p>Case 2: FQ-CoDel (ECN disabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 16
<p>Case 1: FQ-PIE (ECN enabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 17
<p>Case 2: FQ-PIE (ECN disabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 18
<p>Case 1: FQ-PIE (ECN enabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 19
<p>Case 2: FQ-PIE (ECN disabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 20
<p>Case 1: L4S (ECN enabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 21
<p>Case 2: L4S (ECN disabled)—Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 22
<p>Case 1: L4S (ECN enabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 23
<p>Case 2: L4S (ECN disabled)—Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) Throughput; (<b>b</b>) Congestion Window; (<b>c</b>) Smoothed TCP RTT.</p>
Full article ">Figure 24
<p>Network topology utilized for data collection for A3C-L4S model.</p>
Full article ">Figure 25
<p>Evolution of the average reward (<math display="inline"><semantics> <msub> <mi>R</mi> <mi>t</mi> </msub> </semantics></math>) of the trained A3C model over the entire fifty epochs.</p>
Full article ">Figure 26
<p>Predicted queue delay vs. actual queue delay for all workers during packet transmission. (<b>a</b>) Predicted QDelay vs. Actual QDelay—Agent Worker 1; (<b>b</b>) Predicted QDelay vs. Actual QDelay—Agent Worker 2; (<b>c</b>) Predicted QDelay vs. Actual Qdelay—Agent Worker 3; (<b>d</b>) Predicted QDelay vs. Actual Qdelay—Agent Worker 4.</p>
Full article ">Figure 27
<p>Predicted queue delay vs. actual queue delay for all workers with varying reward scaling factor in units of 100 μs. (<b>a</b>) Predicted QDelay with varying reward scaling factor Agent Worker 1; (<b>b</b>) Predicted QDelay with varying reward scaling factor Agent Worker 2; (<b>c</b>) Predicted QDelay with varying reward scaling factor Agent Worker 3; (<b>d</b>) Predicted QDelay with varying reward scaling factor Agent Worker 4.</p>
Full article ">Figure 27 Cont.
<p>Predicted queue delay vs. actual queue delay for all workers with varying reward scaling factor in units of 100 μs. (<b>a</b>) Predicted QDelay with varying reward scaling factor Agent Worker 1; (<b>b</b>) Predicted QDelay with varying reward scaling factor Agent Worker 2; (<b>c</b>) Predicted QDelay with varying reward scaling factor Agent Worker 3; (<b>d</b>) Predicted QDelay with varying reward scaling factor Agent Worker 4.</p>
Full article ">Figure 28
<p>Case 1: (ECN enabled) Throughput Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (Throughput); (<b>b</b>) FQ-PIE (Throughput); (<b>c</b>) L4S (Throughput).</p>
Full article ">Figure 29
<p>Case 2: (ECN disabled) Throughput Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (Throughput); (<b>b</b>) FQ-PIE (Throughput); (<b>c</b>) L4S (Throughput).</p>
Full article ">Figure 30
<p>Case 1: (ECN enabled) Throughput Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a FQ-CoDel (Throughput)</b>); (<b>b</b>) FQ-PIE (Throughput); (<b>c</b>) L4S (Throughput).</p>
Full article ">Figure 31
<p>Case 2: (ECN disabled) Throughput Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (Throughput); (<b>b</b>) FQ-PIE (Throughput); (<b>c</b>) L4S (Throughput).</p>
Full article ">Figure 32
<p>Case 1: (ECN enabled) Smoothed TCP RTT measured in seconds for Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (Smoothed RTT); (<b>b</b>) FQ-PIE (Smoothed RTT); (<b>c</b>) L4S (Smoothed RTT).</p>
Full article ">Figure 33
<p>Case 2: (ECN disabled) Smoothed TCP RTT measured in seconds for Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (Smoothed RTT); (<b>b</b>) FQ-PIE (Smoothed RTT); (<b>c</b>) L4S (Smoothed RTT).</p>
Full article ">Figure 34
<p>Case 1: (ECN enabled) Smoothed TCP RTT measured in seconds for Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (Smoothed RTT); (<b>b</b>) FQ-PIE (Smoothed RTT); (<b>c</b>) L4S (Smoothed RTT).</p>
Full article ">Figure 35
<p>Case 2:(ECN disabled) Smoothed TCP RTT measured in seconds for Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (Smoothed RTT); (<b>b</b>) FQ-PIE (Smoothed RTT); (<b>c</b>) L4S (Smoothed RTT).</p>
Full article ">Figure 36
<p>Case 1: (ECN enabled) Congestion Window Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (<span class="html-italic">cwnd</span>); (<b>b</b>) FQ-PIE (<span class="html-italic">cwnd</span>); (<b>c</b>) L4S (<span class="html-italic">cwnd</span>).</p>
Full article ">Figure 37
<p>Case 2: (ECN disabled) Congestion Window Scenario 1 with Bandwidth = 10 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (<span class="html-italic">cwnd</span>); (<b>b</b>) FQ-PIE (<span class="html-italic">cwnd</span>); (<b>c</b>) L4S (<span class="html-italic">cwnd</span>).</p>
Full article ">Figure 38
<p>Case 1: (ECN enabled) Congestion Window Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (<span class="html-italic">cwnd</span>); (<b>b</b>) FQ-PIE (<span class="html-italic">cwnd</span>); (<b>c</b>) L4S (<span class="html-italic">cwnd</span>).</p>
Full article ">Figure 39
<p>Case 2:(ECN disabled) Congestion Window Scenario 2 with Bandwidth = 1 Mbps, Delay = 20 ms. (<b>a</b>) FQ-CoDel (<span class="html-italic">cwnd</span>); (<b>b</b>) FQ-PIE (<span class="html-italic">cwnd</span>); (<b>c</b>) L4S (<span class="html-italic">cwnd</span>).</p>
Full article ">
32 pages, 31472 KiB  
Article
Studying the Impact of Different TCP DoS Attacks on the Parameters of VoIP Streams
by Ivan Nedyalkov
Telecom 2024, 5(3), 556-587; https://doi.org/10.3390/telecom5030029 - 8 Jul 2024
Viewed by 710
Abstract
In today’s digital world, no one and nothing is safe from potential cyberattacks. There is also no 100% protection from such attacks. Therefore, it is advisable to carry out various studies related to the effects of the different cyberattacks on the performance of [...] Read more.
In today’s digital world, no one and nothing is safe from potential cyberattacks. There is also no 100% protection from such attacks. Therefore, it is advisable to carry out various studies related to the effects of the different cyberattacks on the performance of the specific devices under attack. In this work, a study was carried out to determine how individual TCP DoS attacks affect the parameters of VoIP (Voice over IP) voice and video streams. For the purpose of this work, a model of a simple IP network has been created using the GNS3 IP network-modeling platform. The VoIP platform used was Asterisk Free PBX. Tools from Kali Linux were used to implement the individual TCP DoS attacks; IP-network-monitoring tools and round-trip-delay-measurement tools were also used. The proposed study is applicable to multiple VoIP platforms wherein voice and video traffic are passed/processed by the VoIP server. From the obtained results, it was found that Asterisk Free PBX is very well secured against TCP DoS attacks, which do not affect the platform performance or the parameters of the voice and video streams. The values of the observed parameters, such as jitter, packet loss, round-trip delay, etc., are very far from the maximum allowable values. We also observed a low load on the CPU and RAM of the system during the whole study. Full article
Show Figures

Figure 1

Figure 1
<p>Topology of the modeled network.</p>
Full article ">Figure 2
<p>Number of different TCP packets sent during normal operation.</p>
Full article ">Figure 3
<p>Summarized results for the main parameters of the voice stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during normal operation mode.</p>
Full article ">Figure 4
<p>Instantaneous values of the jitter for the voice stream between VM_4 and Asterisk (<b>a</b>) and between VM_3 and Asterisk (<b>b</b>) during normal operation mode.</p>
Full article ">Figure 5
<p>Number of different TCP packets sent during the TCP SYN attack.</p>
Full article ">Figure 6
<p>Summarized results for the main parameters of the voice stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP SYN attack.</p>
Full article ">Figure 7
<p>Instantaneous values of the jitter for the voice stream between VM_4 and Asterisk (<b>a</b>) and VM_3 and Asterisk (<b>b</b>) during the TCP SYN attack.</p>
Full article ">Figure 8
<p>Number of different TCP packets sent during the TCP ACK attack.</p>
Full article ">Figure 9
<p>Summarized results for the main parameters of the voice stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP ACK attack.</p>
Full article ">Figure 10
<p>Instantaneous values of the jitter for the voice stream between VM_4 and Asterisk (<b>a</b>) and VM_3 and Asterisk (<b>b</b>) during the TCP ACK attack.</p>
Full article ">Figure 11
<p>Number of different TCP packets sent during the TCP RST attack.</p>
Full article ">Figure 12
<p>Summarized results for the main parameters of the voice stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP RST attack.</p>
Full article ">Figure 13
<p>Instantaneous values of the jitter for the voice stream between VM_4 and Asterisk (<b>a</b>) and VM_3 and Asterisk (<b>b</b>) during the TCP RST attack.</p>
Full article ">Figure 14
<p>Number of different TCP packets sent during the TCP FIN attack.</p>
Full article ">Figure 15
<p>Summarized results for the main parameters of the voice stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP FIN attack.</p>
Full article ">Figure 16
<p>Instantaneous values of the jitter for the voice stream between VM_4 and Asterisk (<b>a</b>) and VM_3 between and Asterisk (<b>b</b>) during the TCP FIN attack.</p>
Full article ">Figure 17
<p>Traffic processed by Asterisk during the whole study period.</p>
Full article ">Figure 18
<p>Number of different TCP packets for the whole study period during the voice-stream study.</p>
Full article ">Figure 19
<p>RTD between VM_1 and Asterisk.</p>
Full article ">Figure 20
<p>RTD between VM_2 and Asterisk.</p>
Full article ">Figure 21
<p>RTD between VM_3 and Asterisk.</p>
Full article ">Figure 22
<p>RTD between VM_4 and Asterisk.</p>
Full article ">Figure 23
<p>Number of different TCP packets sent during normal operation for a video conversation.</p>
Full article ">Figure 24
<p>Summarized results for the main parameters of the video stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during normal operation.</p>
Full article ">Figure 25
<p>Instantaneous values of the jitter for the video stream between VM_4 and Asterisk (<b>a</b>) and VM_3 and Asterisk (<b>b</b>) during normal operation mode.</p>
Full article ">Figure 26
<p>Number of different TCP packets sent during the TCP SYN attack for the video conversation.</p>
Full article ">Figure 27
<p>Summarized results for the main parameters of the video stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP SYN attack.</p>
Full article ">Figure 28
<p>Instantaneous values of the jitter for the video stream between VM_4 and Asterisk (<b>a</b>) and VM_3 and Asterisk (<b>b</b>) during the TCP SYN attack.</p>
Full article ">Figure 29
<p>Number of different TCP packets sent during the TCP ACK attack for the video conversation.</p>
Full article ">Figure 30
<p>Summarized results for the main parameters of the video stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP ACK attack.</p>
Full article ">Figure 31
<p>Instantaneous values of the jitter for the video stream between VM_4 and Asterisk (<b>a</b>) and VM_3 and Asterisk (<b>b</b>) during the TCP ACK attack.</p>
Full article ">Figure 32
<p>Number of different TCP packets sent during the TCP RST attack for the video conversation.</p>
Full article ">Figure 33
<p>Summarized results for the main parameters of the video stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP RST attack.</p>
Full article ">Figure 34
<p>Instantaneous values of the jitter for the video stream between VM_4 and Asterisk (<b>a</b>) and VM_3 and Asterisk (<b>b</b>) during the TCP RST attack.</p>
Full article ">Figure 35
<p>Number of different TCP packets sent during the TCP FIN attack for the video conversation.</p>
Full article ">Figure 36
<p>Summarized results for the main parameters of the video stream between VM_3 and the Asterisk (<b>a</b>) and between VM_4 and the Asterisk (<b>b</b>) during the TCP FIN attack.</p>
Full article ">Figure 37
<p>Instantaneous values of the jitter for the video stream between VM_4 and Asterisk (<b>a</b>) and between VM_3 and Asterisk (<b>b</b>) during the TCP FIN attack.</p>
Full article ">Figure 38
<p>Proceeded video traffic from Asterisk during the whole study period.</p>
Full article ">Figure 39
<p>Number of different TCP packets for the whole study period during the video-stream study.</p>
Full article ">Figure 40
<p>RTD between VM_3 and Asterisk for the video-stream study.</p>
Full article ">Figure 41
<p>RTD between VM_4 and Asterisk for the video-stream study.</p>
Full article ">Figure 42
<p>CPU load of the Asterisk Free PBX during the two studies.</p>
Full article ">Figure 43
<p>Memory load of the Asterisk Free PBX during the two studies.</p>
Full article ">
24 pages, 943 KiB  
Article
Navigating Legal and Regulatory Frameworks to Achieve the Resilience and Sustainability of Indigenous Socioecological Systems
by Stephen Chitengi Sakapaji, Jorge García Molinos, Varvara Parilova, Tuyara Gavrilyeva and Natalia Yakovleva
Resources 2024, 13(4), 56; https://doi.org/10.3390/resources13040056 - 8 Apr 2024
Cited by 1 | Viewed by 1869
Abstract
The sustainability of Indigenous Socioecological Systems (ISES) largely depends on well-crafted policy regulations. In particular, Indigenous traditional food systems (ITFS) are an essential component of ISES that provide a variety of culturally accepted, healthy foods while also playing an important role in cultural, [...] Read more.
The sustainability of Indigenous Socioecological Systems (ISES) largely depends on well-crafted policy regulations. In particular, Indigenous traditional food systems (ITFS) are an essential component of ISES that provide a variety of culturally accepted, healthy foods while also playing an important role in cultural, spiritual, and economic value to the Indigenous people (IP). Thus, sustainably managing these traditional natural resources must be a priority. As custodians of much of the world’s ecological system, IP have, for generations, exhibited sustainable lifestyles in governing these systems. However, Indigenous perspectives and voices have not been properly reflected in the ISES sustainability discourse, and few comparative case studies have addressed this issue. This study contributes to fill this research gap using a desktop research method based on the Political Ecological Theoretical Framework (PETF) to examine how existing regulatory policies may affect the resilience and sustainability of ISES-ITFS, especially in relation to growing environmental and climatic pressures. Two Indigenous communities, the Karen in Thailand and different Indigenous groups in the Republic of Sakha (Yakutia) in Russia, are examined as case studies. Our study provides crucial insight that should help the development of robust policy interventions that integrate Indigenous concerns into policies and regulations, emphasizing self-determination, cultural preservation, and land rights. The findings emphasize the necessity for comprehensive legal frameworks prioritizing Indigenous involvement and concerns in climate and sustainability policy implementations. The ultimate goal is to foster meaningful dialogues between policymakers and IP in navigating the climate and sustainability challenges of our time. Full article
Show Figures

Figure 1

Figure 1
<p>Location of our case studies in (<b>a</b>) the Sakha Republic and (<b>b</b>) Thailand. The maps provide approximate distributions of (<b>a</b>) the Karen People in Thailand and (<b>b</b>) the main Indigenous minority Peoples of the North, Siberia, and the Far East in the Sakha Republic. Panel (<b>a</b>) also depicts the location of the Thung Yai Naresuan Wildlife Sanctuary where the Sanephong and Koh Sadueng Karen communities discussed in the text are located. White areas of the Sakha Republic in (<b>b</b>) are ethnically dominated by the Yakuts (Sakha People), a large Turkish ethnic group [<a href="#B33-resources-13-00056" class="html-bibr">33</a>].</p>
Full article ">Figure 2
<p>Conceptual diagram showing the Political Ecological Theoretical Framework model in the context of our research paper. Social, political, economic, and environmental factors defining the dimensions of the interaction between IP and other actors (national and regional regulatory bodies, industries, research institutions…) shape the formation and implementation of legal and regulatory frameworks that can impact (positively or negatively) the resilience and sustainability of the ISES and ITFS. In the context of this study, over and above the direct effects of these dimensions on ISES-ITFS (grey arrows), we focus on the legal and regulatory framework as an instrument of power channelizing and articulating the effects of the different dimensions on the ISES-ITFS (blue arrows). The double head of the grey arrows symbolizes the possibility for IP to exert power on the legal and regulatory system through their actions and agency (e.g., litigation, political representation, public awareness) on all or some of these dimensions.</p>
Full article ">
23 pages, 6802 KiB  
Article
Non-Face-to-Face P2P (Peer-to-Peer) Real-Time Token Payment Blockchain System
by Hyug-Jun Ko, Seong-Soo Han and Chang-Sung Jeong
Appl. Sci. 2023, 13(13), 7364; https://doi.org/10.3390/app13137364 - 21 Jun 2023
Cited by 1 | Viewed by 1838
Abstract
With the increase in intelligent voice phishing and the increasing reliance on open banking systems, there has been a rise in cases where individuals’ personal information has been exposed, resulting in significant financial losses for the victims. Non-face-to-face transactions in the financial sector [...] Read more.
With the increase in intelligent voice phishing and the increasing reliance on open banking systems, there has been a rise in cases where individuals’ personal information has been exposed, resulting in significant financial losses for the victims. Non-face-to-face transactions in the financial sector face challenges such as customer identification, ensuring transaction integrity and preventing transaction rejection. Blockchain-based distributed ledgers have been proposed as a solution but their adoption is limited due to the difficulty of managing private keys and the burden of gas fees management. This paper proposes a non-face-to-face P2P real-time token payment system that minimizes the risk of key loss by storing private keys in a keystore file and database through a server-based key management module. The proposed system simplifies token creation and management through a server-based token management module and implements an automatic gas-charging function for smooth token transactions. Transaction integrity and non-repudiation are ensured through a transaction confirmation module that uses transaction IDs without exposing personal information. Furthermore, advanced security measures such as blocking foreign IP access and DDoS defense are implemented to securely protect user data. The proposed system aims to provide a convenient, secure and accessible online payment solution to the public by implementing a self-authentication function using a web application that is not limited to smartphones or application platforms. Full article
(This article belongs to the Special Issue Blockchain and Intelligent Networking for Smart Applications)
Show Figures

Figure 1

Figure 1
<p>Generate a derived key.</p>
Full article ">Figure 2
<p>Create a cipher text.</p>
Full article ">Figure 3
<p>Create mac from derived key and cipher text.</p>
Full article ">Figure 4
<p>Decrypt cipher.</p>
Full article ">Figure 5
<p>Symverse transaction processing flow [<a href="#B31-applsci-13-07364" class="html-bibr">31</a>].</p>
Full article ">Figure 6
<p>Token payment blockchain platform.</p>
Full article ">Figure 7
<p>Control flow of system.</p>
Full article ">Figure 8
<p>Transaction creation process through private key signature.</p>
Full article ">Figure 9
<p>Token management module.</p>
Full article ">Figure 10
<p>Token transfer module.</p>
Full article ">Figure 11
<p>DDoS block module.</p>
Full article ">Figure 12
<p>Real-name authentication module.</p>
Full article ">Figure 13
<p>Control flow diagram for a balance inquiry.</p>
Full article ">Figure 14
<p>Control flow diagram for send token.</p>
Full article ">Figure 15
<p>UI of P2P payment system (<b>a</b>) interface of convert token. (<b>b</b>) History of convert status. (<b>c</b>) Interface of transfer token. (<b>d</b>) History of transfer token.</p>
Full article ">Figure 16
<p>Experiment environment.</p>
Full article ">Figure 17
<p>TC1 active number of threads vs. response time (<b>a</b>) without DDoS protection (<b>b</b>) with DDoS protection. (Puple: User10, Red: User20, Yellow: User50).</p>
Full article ">Figure 18
<p>TC3 Active number of threads vs. response time: (<b>a</b>) without DDoS protection (<b>b</b>) with DDoS protection.</p>
Full article ">
14 pages, 1848 KiB  
Article
NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram
by Shakeel Zafar, Imran Fareed Nizami, Mobeen Ur Rehman, Muhammad Majid and Jihyoung Ryu
Sensors 2023, 23(12), 5652; https://doi.org/10.3390/s23125652 - 16 Jun 2023
Viewed by 1258
Abstract
With the evolution in technology, communication based on the voice has gained importance in applications such as online conferencing, online meetings, voice-over internet protocol (VoIP), etc. Limiting factors such as environmental noise, encoding and decoding of the speech signal, and limitations of technology [...] Read more.
With the evolution in technology, communication based on the voice has gained importance in applications such as online conferencing, online meetings, voice-over internet protocol (VoIP), etc. Limiting factors such as environmental noise, encoding and decoding of the speech signal, and limitations of technology may degrade the quality of the speech signal. Therefore, there is a requirement for continuous quality assessment of the speech signal. Speech quality assessment (SQA) enables the system to automatically tune network parameters to improve speech quality. Furthermore, there are many speech transmitters and receivers that are used for voice processing including mobile devices and high-performance computers that can benefit from SQA. SQA plays a significant role in the evaluation of speech-processing systems. Non-intrusive speech quality assessment (NI-SQA) is a challenging task due to the unavailability of pristine speech signals in real-world scenarios. The success of NI-SQA techniques highly relies on the features used to assess speech quality. Various NI-SQA methods are available that extract features from speech signals in different domains, but they do not take into account the natural structure of the speech signals for assessment of speech quality. This work proposes a method for NI-SQA based on the natural structure of the speech signals that are approximated using the natural spectrogram statistical (NSS) properties derived from the speech signal spectrogram. The pristine version of the speech signal follows a structured natural pattern that is disrupted when distortion is introduced in the speech signal. The deviation of NSS properties between the pristine and distorted speech signals is utilized to predict speech quality. The proposed methodology shows better performance in comparison to state-of-the-art NI-SQA methods on the Centre for Speech Technology Voice Cloning Toolkit corpus (VCTK-Corpus) with a Spearman’s rank-ordered correlation constant (SRC) of 0.902, Pearson correlation constant (PCC) of 0.960, and root mean squared error (RMSE) of 0.206. Conversely, on the NOIZEUS-960 database, the proposed methodology shows an SRC of 0.958, PCC of 0.960, and RMSE of 0.114. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>The proposed methodology for NI-SQA based on a speech signal spectrogram.</p>
Full article ">Figure 2
<p>Spectrogram plot comparison of clean speech and distorted speech signal over different types of noises.</p>
Full article ">Figure 3
<p>Gaussian curve statistics from standard deviation of the clean and distorted spectrograms in <a href="#sensors-23-05652-f001" class="html-fig">Figure 1</a> for different distortions.</p>
Full article ">Figure 4
<p>Comparison of SRC score with different window length and overlapping samples in windows.</p>
Full article ">Figure 5
<p>Box plot of PCC scores using optimum length of window 400 with 0 to 90 percent overlapping samples in windows.</p>
Full article ">Figure 6
<p>Scatter plot of the proposed NI-SQA applied to (<b>a</b>) NOIZEUS-960 database, (<b>b</b>) VCTK-Corpus database.</p>
Full article ">
16 pages, 1658 KiB  
Article
Detecting SPIT Attacks in VoIP Networks Using Convolutional Autoencoders: A Deep Learning Approach
by Waleed Nazih, Khaled Alnowaiser, Esraa Eldesouky and Osama Youssef Atallah
Appl. Sci. 2023, 13(12), 6974; https://doi.org/10.3390/app13126974 - 9 Jun 2023
Viewed by 1874
Abstract
Voice over Internet Protocol (VoIP) is a technology that enables voice communication to be transmitted over the Internet, transforming communication in both personal and business contexts by offering several benefits such as cost savings and integration with other communication systems. However, VoIP attacks [...] Read more.
Voice over Internet Protocol (VoIP) is a technology that enables voice communication to be transmitted over the Internet, transforming communication in both personal and business contexts by offering several benefits such as cost savings and integration with other communication systems. However, VoIP attacks are a growing concern for organizations that rely on this technology for communication. Spam over Internet Telephony (SPIT) is a type of VoIP attack that involves unwanted calls or messages, which can be both annoying and pose security risks to users. Detecting SPIT can be challenging since it is often delivered from anonymous VoIP accounts or spoofed phone numbers. This paper suggests an anomaly detection model that utilizes a deep convolutional autoencoder to identify SPIT attacks. The model is trained on a dataset of normal traffic and then encodes new traffic into a lower-dimensional latent representation. If the network traffic varies significantly from the encoded normal traffic, the model flags it as anomalous. Additionally, the model was tested on two datasets and achieved F1 scores of 99.32% and 99.56%. Furthermore, the proposed model was compared to several traditional anomaly detection approaches and it outperformed them on both datasets. Full article
Show Figures

Figure 1

Figure 1
<p>Autoencoder with one-layer encoder and one-layer decoder.</p>
Full article ">Figure 2
<p>Stages of feature extraction: Tokening, Padding, and Embedding.</p>
Full article ">Figure 3
<p>D1-DCAE Model: The encoder part consists of three blocks of convolutional layers and max-pooling layers. The decoder part consists of three blocks of transpose-convolutional layers and upsampling layers. At the end is a convolution layer with one feature map and a flattened layer.</p>
Full article ">Figure 4
<p>F1 score of the D1-DCAE model over INRIA and RIT datasets for several reconstruction error thresholds.</p>
Full article ">Figure 5
<p>AUC of the D1-DCAE model over INRIA and RIT datasets for several reconstruction error thresholds.</p>
Full article ">Figure 6
<p>Confusion matrix of the D1-DCAE model over the INRIA and RIT datasets.</p>
Full article ">
25 pages, 10128 KiB  
Article
Integration of Virtual Reality in the Control System of an Innovative Medical Robot for Single-Incision Laparoscopic Surgery
by Florin Covaciu, Nicolae Crisan, Calin Vaida, Iulia Andras, Alexandru Pusca, Bogdan Gherman, Corina Radu, Paul Tucan, Nadim Al Hajjar and Doina Pisla
Sensors 2023, 23(12), 5400; https://doi.org/10.3390/s23125400 - 7 Jun 2023
Cited by 6 | Viewed by 2231
Abstract
In recent years, there has been an expansion in the development of simulators that use virtual reality (VR) as a learning tool. In surgery where robots are used, VR serves as a revolutionary technology to help medical doctors train in using these robotic [...] Read more.
In recent years, there has been an expansion in the development of simulators that use virtual reality (VR) as a learning tool. In surgery where robots are used, VR serves as a revolutionary technology to help medical doctors train in using these robotic systems and accumulate knowledge without risk. This article presents a study in which VR is used to create a simulator designed for robotically assisted single-uniport surgery. The control of the surgical robotic system is achieved using voice commands for laparoscopic camera positioning and via a user interface developed using the Visual Studio program that connects a wristband equipped with sensors attached to the user’s hand for the manipulation of the active instruments. The software consists of the user interface and the VR application via the TCP/IP communication protocol. To study the evolution of the performance of this virtual system, 15 people were involved in the experimental evaluation of the VR simulator built for the robotic surgical system, having to complete a medically relevant task. The experimental data validated the initial solution, which will be further developed. Full article
Show Figures

Figure 1

Figure 1
<p>The SILS master–slave training system.</p>
Full article ">Figure 2
<p>The parallel robotic structure and operating table: 1—framework; 2—operating table; 3—kinematic chain 1; 4—kinematic chain 2; 5—kinematic chain 3; 6—instrument orientation module 1; 7—active instrument 1; 8—instrument orientation module 2; 9—active instrument 2; 10—endoscopic camera.</p>
Full article ">Figure 3
<p>Kinematic diagram of the 6-DOF parallel robot type 3-R-<span class="underline">P</span>RR-<span class="underline">P</span>RS with triangular frame [<a href="#B41-sensors-23-05400" class="html-bibr">41</a>].</p>
Full article ">Figure 4
<p>Graphical representation of the values of the term <math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mrow> <mn>2</mn> <mo>_</mo> <mn>3</mn> </mrow> </msub> </mrow> </semantics></math> for the two cases.</p>
Full article ">Figure 5
<p>Workspace analysis of the 3-R-<span class="underline">P</span>RR-<span class="underline">P</span>RS parallel robot. (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>ψ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mtext> </mtext> <mi>θ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mi>φ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> </mrow> </semantics></math>; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>ψ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mtext> </mtext> <mi>θ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mi>φ</mi> <mo>=</mo> <mo>−</mo> <msup> <mrow> <mn>30</mn> </mrow> <mo>∘</mo> </msup> </mrow> </semantics></math>; (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>ψ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mtext> </mtext> <mi>θ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mi>φ</mi> <mo>=</mo> <msup> <mrow> <mn>30</mn> </mrow> <mo>∘</mo> </msup> </mrow> </semantics></math>; (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>ψ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mtext> </mtext> <mi>θ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mi>φ</mi> <mo>=</mo> <mo>−</mo> <msup> <mrow> <mn>45</mn> </mrow> <mo>∘</mo> </msup> </mrow> </semantics></math>; (<b>e</b>) <math display="inline"><semantics> <mrow> <mi>ψ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mtext> </mtext> <mi>θ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mi>φ</mi> <mo>=</mo> <mo>−</mo> <msup> <mrow> <mn>60</mn> </mrow> <mo>∘</mo> </msup> </mrow> </semantics></math>; (<b>f</b>) <math display="inline"><semantics> <mrow> <mi>ψ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mtext> </mtext> <mi>θ</mi> <mo>=</mo> <msup> <mn>0</mn> <mo>∘</mo> </msup> <mo>,</mo> <mi>φ</mi> <mo>=</mo> <mo>−</mo> <msup> <mrow> <mn>75</mn> </mrow> <mo>∘</mo> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>The total workspace of the 3-R-<span class="underline">P</span>RR-<span class="underline">P</span>RS parallel robot for φ= −60° versus φ = 0°.</p>
Full article ">Figure 7
<p>Interconnection of components.</p>
Full article ">Figure 8
<p>Control device: 1—Absolute orientation sensor, model IMU BNO055, 2—Microcontroller ESP32, 3—Power supply: 5V DC.</p>
Full article ">Figure 9
<p>Software architecture.</p>
Full article ">Figure 10
<p>UML use case diagram.</p>
Full article ">Figure 11
<p>UML class diagram.</p>
Full article ">Figure 12
<p>User interface: manual control.</p>
Full article ">Figure 13
<p>User interface: automatic control.</p>
Full article ">Figure 14
<p>Block diagram of a fuzzy logic system.</p>
Full article ">Figure 15
<p>Fuzzy sets corresponding to events related to sensors.</p>
Full article ">Figure 16
<p>Laparoscope insertion.</p>
Full article ">Figure 17
<p>Fuzzy sets corresponding to events related to organs collision.</p>
Full article ">Figure 18
<p>Rotation of the upper limb. (<b>a</b>) Around the Oy axis. (<b>b</b>–<b>f</b>) Around the Ox axis.</p>
Full article ">Figure 19
<p>Operating the robotic surgical system. (<b>a</b>) Insertion of instruments through a trocar; (<b>b</b>) points to be reached by instruments.</p>
Full article ">Figure 20
<p>Performance results of users of the surgical robotic system.</p>
Full article ">
11 pages, 2779 KiB  
Article
Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
by Amira A. Mohamed, Amira Eltokhy and Abdelhalim A. Zekry
Appl. Sci. 2023, 13(7), 4261; https://doi.org/10.3390/app13074261 - 28 Mar 2023
Cited by 1 | Viewed by 1951
Abstract
Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two [...] Read more.
Institutions have been adopting work/study-from-home programs since the pandemic began. They primarily utilise Voice over Internet Protocol (VoIP) software to perform online meetings. This research introduces a new method to enhance VoIP calls experience using deep learning. In this paper, integration between two existing techniques, Speaker Separation and Speaker Identification (SSI), is performed using deep learning methods with effective results as introduced by state-of-the-art research. This integration is applied to VoIP system application. The voice signal is introduced to the speaker separation and identification system to be separated; then, the “main speaker voice” is identified and verified rather than any other human or non-human voices around the main speaker. Then, only this main speaker voice is sent over IP to continue the call process. Currently, the online call system depends on noise cancellation and call quality enhancement. However, this does not address multiple human voices over the call. Filters used in the call process only remove the noise and the interference (de-noising speech) from the speech signal. The presented system is tested with up to four mixed human voices. This system separates only the main speaker voice and processes it prior to the transmission over VoIP call. This paper illustrates the algorithm technologies integration using DNN, and voice signal processing advantages and challenges, in addition to the importance of computing power for real-time applications. Full article
(This article belongs to the Special Issue Audio and Acoustic Signal Processing)
Show Figures

Figure 1

Figure 1
<p>Humans can focus on the target speech signal in a crowded and noisy environment [<a href="#B6-applsci-13-04261" class="html-bibr">6</a>].</p>
Full article ">Figure 2
<p>Proposed model integrated with speaker separation and identification for real-time application.</p>
Full article ">Figure 3
<p>Proposed model integrated with VoIP system.</p>
Full article ">Figure 4
<p>Algorithm flow chart GMM speaker model.</p>
Full article ">Figure 5
<p>State transition diagram [<a href="#B28-applsci-13-04261" class="html-bibr">28</a>].</p>
Full article ">Figure 6
<p>Training curves of our model for various numbers of speakers N = 2, 3, 4.</p>
Full article ">Figure 7
<p>SI-SNRi curves for various mixed numbers of speakers.</p>
Full article ">
34 pages, 8701 KiB  
Article
Towards a Smart Environment: Optimization of WLAN Technologies to Enable Concurrent Smart Services
by Ali Mohd Ali, Mohammad R. Hassan, Ahmad al-Qerem, Ala Hamarsheh, Khalid Al-Qawasmi, Mohammad Aljaidi, Ahmed Abu-Khadrah, Omprakash Kaiwartya and Jaime Lloret
Sensors 2023, 23(5), 2432; https://doi.org/10.3390/s23052432 - 22 Feb 2023
Cited by 5 | Viewed by 2340
Abstract
In this research paper, the spatial distributions of five different services—Voice over Internet Protocol (VoIP), Video Conferencing (VC), Hypertext Transfer Protocol (HTTP), and Electronic Mail—are investigated using three different approaches: circular, random, and uniform approaches. The amount of each service varies from one [...] Read more.
In this research paper, the spatial distributions of five different services—Voice over Internet Protocol (VoIP), Video Conferencing (VC), Hypertext Transfer Protocol (HTTP), and Electronic Mail—are investigated using three different approaches: circular, random, and uniform approaches. The amount of each service varies from one to another. In certain distinct settings, which are collectively referred to as mixed applications, a variety of services are activated and configured at predetermined percentages. These services run simultaneously. Furthermore, this paper has established a new algorithm to assess both the real-time and best-effort services of the various IEEE 802.11 technologies, describing the best networking architecture as either a Basic Service Set (BSS), an Extended Service Set (ESS), or an Independent Basic Service Set (IBSS). Due to this fact, the purpose of our research is to provide the user or client with an analysis that suggests a suitable technology and network configuration without wasting resources on unnecessary technologies or requiring a complete re-setup. In this context, this paper presents a network prioritization framework for enabling smart environments to determine an appropriate WLAN standard or a combination of standards that best supports a specific set of smart network applications in a specified environment. A network QoS modeling technique for smart services has been derived for assessing best-effort HTTP and FTP, and the real-time performance of VoIP and VC services enabled via IEEE 802.11 protocols in order to discover more optimal network architecture. A number of IEEE 802.11 technologies have been ranked by using the proposed network optimization technique with separate case studies for the circular, random, and uniform geographical distributions of smart services. The performance of the proposed framework is validated using a realistic smart environment simulation setting, considering both real-time and best-effort services as case studies with a range of metrics related to smart environments. Full article
(This article belongs to the Special Issue AI for Smart Home Automation)
Show Figures

Figure 1

Figure 1
<p>Flow diagram of the proposed algorithm in terms of both system environment and mathematical model.</p>
Full article ">Figure 2
<p>Design of the three network architectures across three spatial distributions for service mixing: (<b>a</b>) Basic Service Set (BSS), (<b>b</b>) Extended Service Set (ESS), and (<b>c</b>) Independent Basic Service Set (IBSS).</p>
Full article ">Figure 3
<p>QPM for jitter.</p>
Full article ">Figure 4
<p>Jitter result of the scenario.</p>
Full article ">Figure 5
<p>Throughput results of the scenario.</p>
Full article ">Figure 6
<p>(<b>a</b>) Riverbed Rapid Configuration dialog box; (<b>b</b>) circular (ring) topology; (<b>c</b>) unconnected net (random) topology; (<b>d</b>) randomized mesh topology; (<b>e</b>) uniform topology (Riverbed, 2017).</p>
Full article ">Figure 6 Cont.
<p>(<b>a</b>) Riverbed Rapid Configuration dialog box; (<b>b</b>) circular (ring) topology; (<b>c</b>) unconnected net (random) topology; (<b>d</b>) randomized mesh topology; (<b>e</b>) uniform topology (Riverbed, 2017).</p>
Full article ">Figure 7
<p>The proposed 50% VC algorithm. (<b>a</b>) BSS and ESS; (<b>b</b>) only IBSS.</p>
Full article ">Figure 8
<p>BSS &amp; ESS Performance Optimization for (50%VC algorithm).</p>
Full article ">Figure 8 Cont.
<p>BSS &amp; ESS Performance Optimization for (50%VC algorithm).</p>
Full article ">Figure 9
<p>IBSS Performance Optimization for (50%VC algorithm).</p>
Full article ">Figure 9 Cont.
<p>IBSS Performance Optimization for (50%VC algorithm).</p>
Full article ">Figure 10
<p>Proposed best-effort algorithm. (<b>a</b>) BSS and ESS; (<b>b</b>) only IBSS.</p>
Full article ">Figure 10 Cont.
<p>Proposed best-effort algorithm. (<b>a</b>) BSS and ESS; (<b>b</b>) only IBSS.</p>
Full article ">Figure 11
<p>BSS and ESS performance optimization for the best-effort algorithm.</p>
Full article ">Figure 11 Cont.
<p>BSS and ESS performance optimization for the best-effort algorithm.</p>
Full article ">Figure 12
<p>IBSS performance optimization for best-effort applications.</p>
Full article ">Figure 13
<p>Project scenarios for all three network configurations: (<b>a</b>) BSS, (<b>b</b>) ESS, and (<b>c</b>) IBSS.</p>
Full article ">Figure 14
<p>Evaluating packet loss through the utilization of MATLAB.</p>
Full article ">
18 pages, 864 KiB  
Article
A Novel Approach for Efficient Mitigation against the SIP-Based DRDoS Attack
by Ismail Melih Tas and Selcuk Baktir
Appl. Sci. 2023, 13(3), 1864; https://doi.org/10.3390/app13031864 - 31 Jan 2023
Cited by 6 | Viewed by 2051
Abstract
Voice over Internet Protocol (VoIP) and its underlying Session Initiation Protocol (SIP) are widely deployed technologies since they provide an efficient and fast means of both voice and data communication over a single network. However, in spite of their advantages, they also have [...] Read more.
Voice over Internet Protocol (VoIP) and its underlying Session Initiation Protocol (SIP) are widely deployed technologies since they provide an efficient and fast means of both voice and data communication over a single network. However, in spite of their advantages, they also have their security threats due to the inherent vulnerabilities in the underlying Internet Protocol (IP) that can potentially be exploited by hackers. This study introduces a novel defense mechanism to effectively combat advanced attacks that exploit vulnerabilities identified in some less-known features of SIP. The SIP-DRDoS (SIP-based distributed reflection denial of service) attack, which can survive the existing security systems, is an advanced attack that can be performed on an SIP network through the multiplication of legitimate traffic. In this study, we propose a novel defense mechanism that consists of statistics, inspection, and action modules to mitigate the SIP-DRDoS attack. We implement the SIP-DRDoS attack by utilizing our SIP-based audit and attack software in our VoIP/SIP security lab environment that simulates an enterprise-grade SIP network. We then utilize our SIP-based defense tool to realize our novel defense mechanism against the SIP-DRDoS attack. Our experimental results prove that our defense approach can do a deep packet analysis for SIP traffic, detect SIP flood attacks, and mitigate them by dropping attack packets. While the SIP-DRDoS attack with around 1 Gbps of traffic dramatically escalates the CPU (central processing unit) usage of the SIP server by up to 74%, our defense mechanism effectively reduces it down to 17% within 6 min after the attack is initiated. Our approach represents a significant advancement over the existing defense mechanisms and demonstrates the potential to effectively protect VoIP systems against SIP-based DRDoS attacks. Full article
Show Figures

Figure 1

Figure 1
<p>SIP-DRDoS attack network diagram.</p>
Full article ">Figure 2
<p>SIP DRDoS INVITE message sample.</p>
Full article ">Figure 3
<p>Architecture of Mr.SIP, the attack tool utilized in this study.</p>
Full article ">Figure 4
<p>Average SIP server CPU loads for registered, nonregistered, and random users (3 min after the SIP-DRDoS attack was initiated).</p>
Full article ">Figure 5
<p>Proposed defense mechanism application workflow.</p>
Full article ">Figure 6
<p>Placement of our proposed defense mechanism on the network.</p>
Full article ">Figure 7
<p>Pseudocode for the proposed defense mechanism using SIP-DD.</p>
Full article ">Figure 8
<p>SIP server CPU usage for registered, nonregistered, and random users after the SIP-DRDoS attack while our defense mechanism was active.</p>
Full article ">Figure 9
<p>Average SIP server CPU loads for registered, nonregistered, and random users after the SIP-DRDoS attack and after the mitigation.</p>
Full article ">Figure 10
<p>Average SIP server CPU load after the attack and after the mitigation for all types of users.</p>
Full article ">
23 pages, 2429 KiB  
Article
Call Me Maybe: Using Dynamic Protocol Switching to Mitigate Denial-of-Service Attacks on VoIP Systems
by John Kafke and Thiago Viana
Network 2022, 2(4), 545-567; https://doi.org/10.3390/network2040032 - 18 Oct 2022
Cited by 2 | Viewed by 2187
Abstract
Voice over IP is quickly becoming the industry standard voice communication service. While using an IP-based method of communication has many advantages, it also comes with a new set of challenges; voice networks are now accessible to a multitude of internet-based attackers from [...] Read more.
Voice over IP is quickly becoming the industry standard voice communication service. While using an IP-based method of communication has many advantages, it also comes with a new set of challenges; voice networks are now accessible to a multitude of internet-based attackers from anywhere in the world. One of the most prevalent threats to a VoIP network are Denial-of-Service attacks, which consume network bandwidth to congest or disable the communication service. This paper looks at the current state of research into the mitigation of these attacks against VoIP networks, to see if the mechanisms in place are enough. A new framework is proposed titled the “Call Me Maybe” framework, combining elements of latency monitoring with dynamic protocol switching to mitigate DoS attacks against VoIP systems. Research conducted around routing VoIP over TCP rather than UDP is integrated into the proposed design, along with a latency monitoring mechanism to detect when the service is under attack. Data gathered from a Cisco Packet Tracer simulation was used to evaluate the effectiveness of the solution. The gathered results have shown that there is a statistically significant improvement in the response times of voice traffic when using the “Call Me Maybe” framework in a network experiencing a DoS attack. The research and findings therefore aim to provide a contribution to the enhancement of the security of VoIP and future IP-based voice communication systems. Full article
Show Figures

Figure 1

Figure 1
<p>Diagram showing the VoTCP mechanism used by Skype. Recreation from Yang, Lee and Ko (2008).</p>
Full article ">Figure 2
<p>Diagram showing the retransmission mechanism used by TCP when a packet is lost or delayed. Recreation from Satoda, Nidei and Yoshida (2014). (<bold>a</bold>) No lost packets; (<bold>b</bold>) Packet loss occurred.</p>
Full article ">Figure 3
<p>Diagram showing the application of multiple TCP streams to reduce packet loss and delay.</p>
Full article ">Figure 4
<p>Diagram of the processes involved in the Call Me Maybe framework.</p>
Full article ">Figure 5
<p>Topology of the Cisco Packet Tracer network.</p>
Full article ">Figure 6
<p>Cisco Webex input room.</p>
Full article ">Figure 7
<p>Cisco Webex output room.</p>
Full article ">Figure 8
<p>Cisco Webex data collection room.</p>
Full article ">Figure 9
<p>Graph displaying the mean response times of the UDP and TCP phones.</p>
Full article ">Figure 10
<p>Average time taken to detect the DoS attack at each strength of attack.</p>
Full article ">
26 pages, 2813 KiB  
Article
Adaptive QoS-Aware Multi-Metrics Gateway Selection Scheme for Heterogenous Vehicular Network
by Mahmoud Alawi, Raed Alsaqour, Maha Abdelhaq, Reem Alkanhel, Baraa Sharef, Elankovan Sundararajan and Mahamod Ismail
Systems 2022, 10(5), 142; https://doi.org/10.3390/systems10050142 - 7 Sep 2022
Cited by 2 | Viewed by 1877
Abstract
A heterogeneous vehicular network (HetVNET) is a promising network architecture that combines multiple network technologies such as IEEE 802.11p, dedicated short-range communication (DSRC), and third/fourth generation cellular networks (3G/4G). In this network area, vehicle users can use wireless fidelity access points (Wi-Fi APs) [...] Read more.
A heterogeneous vehicular network (HetVNET) is a promising network architecture that combines multiple network technologies such as IEEE 802.11p, dedicated short-range communication (DSRC), and third/fourth generation cellular networks (3G/4G). In this network area, vehicle users can use wireless fidelity access points (Wi-Fi APs) to offload 4G long-term evolution (4G-LTE) networks. However, when using Wi-Fi APs, the vehicles must organize themselves and select an appropriate mobile gateway (MGW) to communicate to the cellular infrastructure. Researchers are facing the problem of selecting the best MGW vehicle to aggregate vehicle traffic and reduce LTE load in HetVNETs when the Wi-Fi APs are unavailable for offloading. The selection process utilizes extra network overhead and complexity due to the frequent formation of clusters in this highly dynamic environment. In this study, we proposed a non-cluster adaptive QoS-aware gateway selection (AQAGS) scheme that autonomously picks a limited number of vehicles to act as LTE gateways based on the LTE network’s load status and vehicular ad hoc network (VANET) application’s QoS requirements. The present AQAGS scheme focuses on highway scenarios. The proposed scheme was evaluated using simulation of Urban mobility (SUMO) and network simulator version 2 (NS2) simulators and benchmarked with the clustered and non-clustered schemes. A comparison was made based on the end-to-end delay, throughput, control packet overhead (CPO), and packet delivery ratio (PDR) performance metrics over Voice over Internet Protocol (VoIP) and File Transfer Protocol (FTP) applications. Using VoIP, the AQAGS scheme achieved a 26.7% higher PDR compared with the other schemes. Full article
(This article belongs to the Section Systems Engineering)
Show Figures

Figure 1

Figure 1
<p>The system model.</p>
Full article ">Figure 2
<p>The AQAGS scheme.</p>
Full article ">Figure 3
<p>An overloaded gateway vehicle condition.</p>
Full article ">Figure 4
<p>Distance between the IV and NV.</p>
Full article ">Figure 5
<p>The link lifetime computation.</p>
Full article ">Figure 6
<p>Evaluation of fixed vehicle applications (FTP vs. VoIP) vs. the number of vehicles: (<b>a</b>) the number of vehicles vs. the PDR; (<b>b</b>) the number of vehicles vs. the throughput.</p>
Full article ">Figure 7
<p>Evaluation of fixed vehicle applications (FTP vs. VoIP) vs. the number of vehicles: (<b>a</b>) the number of vehicles vs. average delay; (<b>b</b>) the number of vehicles vs. the CPO.</p>
Full article ">Figure 8
<p>Evaluation of mixed application types (FTP vs. VoIP) vs. the number of vehicles: (<b>a</b>) the number of vehicles vs. the PDR; (<b>b</b>) the number of vehicles vs. the throughput.</p>
Full article ">Figure 9
<p>Evaluation of mixed application types (FTP vs. VoIP) vs. the number of vehicles: (<b>a</b>) the number of vehicles vs. the average delay; (<b>b</b>) the number of vehicles vs. the CPO.</p>
Full article ">Figure 10
<p>Evaluation of mixed application types (FTP vs. VoIP) vs. vehicle speed: (<b>a</b>) The vehicle speed vs. the PDR; (<b>b</b>) the vehicle speed vs. the throughput.</p>
Full article ">Figure 11
<p>Evaluation of mixed application types (FTP vs. VoIP) vs. vehicle speed: (<b>a</b>) the vehicle speed vs. the average delay; (<b>b</b>) the vehicle speed vs. the CPO.</p>
Full article ">
21 pages, 624 KiB  
Article
A Reinforcement Learning Approach to Speech Coding
by Jerry Gibson and Hoontaek Oh
Information 2022, 13(7), 331; https://doi.org/10.3390/info13070331 - 11 Jul 2022
Cited by 2 | Viewed by 1822
Abstract
Speech coding is an essential technology for digital cellular communications, voice over IP, and video conferencing systems. For more than 25 years, the main approach to speech coding for these applications has been block-based analysis-by-synthesis linear predictive coding. An alternative approach that has [...] Read more.
Speech coding is an essential technology for digital cellular communications, voice over IP, and video conferencing systems. For more than 25 years, the main approach to speech coding for these applications has been block-based analysis-by-synthesis linear predictive coding. An alternative approach that has been less successful is sample-by-sample tree coding of speech. We reformulate this latter approach as a multistage reinforcement learning problem with L step lookahead that incorporates exploration and exploitation to adapt model parameters and to control the speech analysis/synthesis process on a sample-by-sample basis. The minimization of the spectrally shaped reconstruction error to finite depth manages complexity and serves as an effective stand in for the overall subjective evaluation of reconstructed speech quality and intelligibility. Different control policies that attempt to persistently excite the system states and that encourage exploration are studied and evaluated. The resulting methods produce reconstructed speech quality competitive with the most popular speech codec utilized today. This new reinforcement learning formulation provides new insights and opens up new directions for system design and performance improvement. Full article
Show Figures

Figure 1

Figure 1
<p>Reinforcement learning for speech coding.</p>
Full article ">Figure 2
<p>Lattice Predictor.</p>
Full article ">Figure 3
<p>Perceptual spectral error shaping example.</p>
Full article ">Figure 4
<p>A 4-2 multitree control segment.</p>
Full article ">Figure 5
<p>The 40 sample 4-2 randomized control multitree sequence.</p>
Full article ">Figure 6
<p>The 5-level control tree segment.</p>
Full article ">Figure 7
<p>Control tree gain adaptation for 2-level tree segments.</p>
Full article ">Figure A1
<p>Classical Differential Pulse Code Modulation (DPCM).</p>
Full article ">
24 pages, 5210 KiB  
Article
Phonation Variation as a Function of Checked Syllables and Prosodic Boundaries
by Xin Gao and Jianjing Kuang
Languages 2022, 7(3), 171; https://doi.org/10.3390/languages7030171 - 5 Jul 2022
Cited by 2 | Viewed by 1981
Abstract
The phonation variation in Shanghainese is influenced by both phonemic phonation contrast and global prosodic context. This study investigated the phonetic realization of checked and unchecked syllables at four different prosodic positions (sandhi-medial, sandhi-final, phrase-final, and IP-final). By analyzing both acoustic and articulatory [...] Read more.
The phonation variation in Shanghainese is influenced by both phonemic phonation contrast and global prosodic context. This study investigated the phonetic realization of checked and unchecked syllables at four different prosodic positions (sandhi-medial, sandhi-final, phrase-final, and IP-final). By analyzing both acoustic and articulatory voice measures, we achieved a better understanding of the nature of checkedness contrast and prosodic boundaries: (1) Different phonetic correlates are associated with the two laryngeal functions: The checkedness contrast is mostly distinguished by the relative degree of glottal constriction, but the prosodic boundaries are mostly associated with periodicity and noise measures. (2) The checkedness contrast is well maintained in all prosodic contexts, suggesting that the controls for the local checkedness contrast are rather independent of global prosody. Full article
(This article belongs to the Special Issue Exploring the Interaction between Phonation and Prosody)
Show Figures

Figure 1

Figure 1
<p>Three types of creakiness: (<b>A</b>) Coda glottal stop: short silence followed by a strong glottal pulse at the end of the syllable. (<b>B</b>) Coda creak: irregular voicing towards the end of the syllable. (<b>C</b>) Broader creak: irregular voicing occurred earlier than the last third of the vowel portion.</p>
Full article ">Figure 2
<p>Principal Component Analysis of the acoustic space. (<b>a</b>) Color-coded for targets’ phonemic type. (<b>b</b>) Color-coded for targets’ prosodic position. Concentration ellipse level = 0.95.</p>
Full article ">Figure 3
<p>The loadings for PC1 and PC2 of all acoustic features. The most correlated cues for PC1 are A2*, H1*–A2*, H1*–A1*, A3*, and H1*–A3*; the most correlated cues for PC2 are HNR15, HNR25, HNR35, HNR05, and CPP.</p>
Full article ">Figure 4
<p>The variation of PC1 influenced by phonemic type and prosodic position. Greater PC1 indicates a more constricted glottis. Significant <span class="html-italic">p</span>-values (<math display="inline"><semantics> <mrow> <mi>p</mi> <mo>≤</mo> <mn>0.05</mn> </mrow> </semantics></math>) are marked in red, which indicates that the PC1 difference between checked and unchecked syllables is significant in that prosodic position.</p>
Full article ">Figure 5
<p>The variation of PC2 is influenced by phonemic type and prosodic position. Greater PC1 indicates higher periodicity during the vowel portion. The <span class="html-italic">p</span>-values at all prosodic positions are insignificant (<span class="html-italic">p</span> &gt; 0.05, shown in blue); this indicates that the PC2 differences between checked and unchecked syllables are insignificant at all prosodic positions.</p>
Full article ">Figure 6
<p>The variation of CQ influenced by phonemic type and prosodic position. Significant <span class="html-italic">p</span>-values (<math display="inline"><semantics> <mrow> <mi>p</mi> <mo>≤</mo> <mn>0.05</mn> </mrow> </semantics></math>) are marked in red, which indicates that the CQ difference between checked and unchecked syllables is significant in that prosodic position.</p>
Full article ">Figure 7
<p>The variation of PIC influenced by phonemic type and prosodic position. Significant <span class="html-italic">p</span>-values (<math display="inline"><semantics> <mrow> <mi>p</mi> <mo>≤</mo> <mn>0.05</mn> </mrow> </semantics></math>) are marked in red, which indicates that the PIC difference between checked and unchecked syllables is significant in that prosodic position.</p>
Full article ">Figure 8
<p>The variation of f0 influenced by phonemic type and prosodic position. Significant <span class="html-italic">p</span>-values (<math display="inline"><semantics> <mrow> <mi>p</mi> <mo>≤</mo> <mn>0.05</mn> </mrow> </semantics></math>) are marked in red, which indicates that the f0 difference between checked and unchecked syllables is significant in that prosodic position.</p>
Full article ">Figure 9
<p>The variation of duration influenced by phonemic type and prosodic position. Significant <span class="html-italic">p</span>-values (<math display="inline"><semantics> <mrow> <mi>p</mi> <mo>≤</mo> <mn>0.05</mn> </mrow> </semantics></math>) are marked in red, which indicates that the duration difference between checked and unchecked syllables is significant in that prosodic position.</p>
Full article ">Figure 10
<p>The distribution of tokens with three different types of creak (coded in non-gray colors) and tokens without visible creak (coded in gray) among checked and unchecked tones at various prosodic positions.</p>
Full article ">
Back to TopTop