[go: up one dir, main page]

 
 
applsci-logo

Journal Browser

Journal Browser

Human–Artificial Intelligence (AI) Interaction: Latest Advances and Prospects

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 March 2025 | Viewed by 14175

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Information Technology, University of Jyväskylä, FI-40014 Jyväskylä, Finland
Interests: artificial intelligence; complex systems; computer supported cooperative work; human-AI interaction; hybrid intelligent systems; scientometrics; social computing; science and technology studies
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Postgraduate Program in Informatics (PPGI), Federal University of Rio de Janeiro, Rio de Janeiro 21941-916, Brazil
Interests: computer supported cooperative work; crowdsourcing; digital nomadism; human-computer interaction; social computing; social media
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
INESC TEC, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
Interests: collaborative learning; computational thinking; computer supported cooperative work; human-computer interaction; optimization; reinforcement learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Information Technology, University of Jyväskylä, FI-40014 Jyväskylä, Finland
Interests: artificial intelligence; data mining; deep learning; educational technology; learning analytics; machine learning; neural networks
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Human-Artificial Intelligence (AI) interaction is on the brink of revolutionizing the world in the coming decades, transforming everything from business operations to household applications. AI empowers systems with the ability to learn, adapt, and make decisions, bringing significant benefits to fields such as medicine, architecture, education, agriculture, and forensics. This transformative technology has redefined the way we interact with the world around us, ushering in a new era of human-AI partnerships where humans use AI-infused systems both implicitly and explicitly to augment their experiences and achieve greater outcomes based on their generative capacity and contextualized meanings in practical uses.

This special issue aims to present the latest advances and perspectives in the area of human-AI interaction. Articles accepted for publication must address topics related to the design, development and evaluation of human-AI interactive systems. We invite both researchers and practitioners to contribute their high-quality original research, reviews, insights, and perspectives on these topics to this special issue.

Topics of interest include but are not limited to:

  • AI models: AI models used for human-AI interaction, such as conversational agents, recommendation systems, and assisted learning systems.
  • User interfaces: user interfaces for human-AI interaction systems, such as natural interfaces, graphical interfaces, and virtual reality-based interfaces.
  • Evaluation of human-AI interactive systems: fieldwork studies (e.g., ethnographically-informed approaches to AI system design) and methods for evaluating human-AI interactive systems such as usability assessment scales, accessibility compliance instruments, and impact assessment methodologies.
  • Challenges and opportunities of human-AI interaction in real-world settings: potential obstacles and possibilities to implementing human-AI systems in specific application domains, such as collaborative clinical work, digital well-being, misinformation, creativity work, and entertainment.

Dr. António Correia
Dr. Daniel Schneider
Prof.Dr. Benjamim Fonseca
Prof.Dr. Tommi Kärkkäinen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • foundation models
  • human-AI interaction
  • human-centered generative AI
  • hybrid intelligent systems
  • large language models
  • machine learning
  • mixed-initiative systems
  • user experience

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

16 pages, 1760 KiB  
Article
Robot Control Platform for Multimodal Interactions with Humans Based on ChatGPT
by Jingtao Qu, Mateusz Jarosz and Bartlomiej Sniezynski
Appl. Sci. 2024, 14(17), 8011; https://doi.org/10.3390/app14178011 - 7 Sep 2024
Viewed by 2047
Abstract
This paper presents the architecture of a multimodal human–robot interaction control platform that leverages the advanced language capabilities of ChatGPT to facilitate more natural and engaging conversations between humans and robots. Implemented on the Pepper humanoid robot, the platform aims to enhance communication [...] Read more.
This paper presents the architecture of a multimodal human–robot interaction control platform that leverages the advanced language capabilities of ChatGPT to facilitate more natural and engaging conversations between humans and robots. Implemented on the Pepper humanoid robot, the platform aims to enhance communication by providing a richer and more intuitive interface. The motivation behind this study is to enhance robot performance in human interaction through cutting-edge natural language processing technology, thereby improving public attitudes toward robots, fostering the development and application of robotic technology, and reducing the negative attitudes often associated with human–robot interactions. To validate the system, we conducted experiments measuring negative attitude robot scale and their robot anxiety scale scores before and after interacting with the robot. Statistical analysis of the data revealed a significant improvement in the participants’ attitudes and a notable reduction in anxiety following the interaction, indicating that the system holds promise for fostering more positive human–robot relationships. Full article
Show Figures

Figure 1

Figure 1
<p>Robot Control Platform for Multimodal Interactions with Humans based on ChatGPT.</p>
Full article ">Figure 2
<p>Sequence of interactions in the proposed architecture, highlighting envisioned actions.</p>
Full article ">Figure 3
<p>Application working on Pepper robot, user view of the robot during conversation.</p>
Full article ">Figure 4
<p>Interaction flow used in experiments.</p>
Full article ">Figure 5
<p>Results before and after experiment with NARS survey.</p>
Full article ">Figure 6
<p>Results before and after experiment with NARS survey, grouped into three factors: S1—negative attitude towards interaction with robots, S2—negative attitude towards social influence of robots, and S3—negative attitude toward emotions in interaction with robots.</p>
Full article ">Figure 7
<p>Results before and after experiment with RAS survey.</p>
Full article ">Figure 8
<p>Results before and after experiment with RAS survey, grouped into three factors: S1—anxiety towards communication capability of robots, S2—anxiety towards behavioral characteristics of robots, S3—anxiety towards discourse with robots.</p>
Full article ">
27 pages, 6903 KiB  
Article
A Real-Time Detection of Pilot Workload Using Low-Interference Devices
by Yihan Liu, Yijing Gao, Lishengsa Yue, Hua Zhang, Jiahang Sun and Xuerui Wu
Appl. Sci. 2024, 14(15), 6521; https://doi.org/10.3390/app14156521 - 26 Jul 2024
Cited by 2 | Viewed by 1526
Abstract
Excessive pilot workload is one of the significant causes of flight accidents. The detection of flight workload can help optimize aircraft crew operation procedures, improve cockpit human–machine interface (HMIs) design, and ultimately reduce the risk of flight accidents. However, traditional detection methods often [...] Read more.
Excessive pilot workload is one of the significant causes of flight accidents. The detection of flight workload can help optimize aircraft crew operation procedures, improve cockpit human–machine interface (HMIs) design, and ultimately reduce the risk of flight accidents. However, traditional detection methods often employ invasive or patch-based devices that can interfere with the pilot’s control. In addition, they generally lack real-time capabilities, while the workload of pilots actually varies continuously. Moreover, most models do not take individual physiological differences into account, leading to the poor performance of new pilots. To address these issues, this study developed a real-time pilot workload detection model based on low-interference devices, including telemetry eye trackers and a pressure-sensing seat cushion. Specifically, the Adaptive KNN-Ensemble Pilot Workload Detection (AKE-PWD) model is proposed, combining KNN in the outer layer for identifying the physiological feature cluster with the ensemble classifier corresponding to this cluster in the inner layer. The ensemble model employs random forest, gradient boosting trees, and FCN–Transformer as base learners. It utilizes soft voting for predictions, integrating the strengths of various networks and effectively extracting the sequential features from complex data. Results show that the model achieves a detection accuracy of 82.6% on the cross-pilot testing set, with a runtime of 0.1 s, surpassing most studies that use invasive or patch-based detection devices. Additionally, the model demonstrates high accuracy across different individuals, indicating good generalization. The results are expected to improve flight safety. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of the entire model.</p>
Full article ">Figure 2
<p>The experimental scenario.</p>
Full article ">Figure 3
<p>The division of areas of interest.</p>
Full article ">Figure 4
<p>The preprocessing of ECG data.</p>
Full article ">Figure 5
<p>The preprocessing of EEG data.</p>
Full article ">Figure 6
<p>The process of statistical analysis for a dataset composed of the indicators HR and E: (<b>a</b>) an E−HR scatter plot (only partial data are included to make the image clearer); (<b>b</b>) the PCA of the dataset; (<b>c</b>) clustering results along the main feature direction; (<b>d</b>) the distribution of HR in three types of workload labels; (<b>e</b>) the distribution of E in three types of workload labels.</p>
Full article ">Figure 7
<p>The Calinski–Harabasz index of different numbers of clusters.</p>
Full article ">Figure 8
<p>The results of ablation experiments.</p>
Full article ">Figure 9
<p>The confusion matrix of the model results.</p>
Full article ">Figure 10
<p>The receiver operating characteristic (ROC) curves.</p>
Full article ">Figure 11
<p>The architecture of the FCN-DG model.</p>
Full article ">
21 pages, 5048 KiB  
Article
Open-Source Robotic Study Companion with Multimodal Human–Robot Interaction to Improve the Learning Experience of University Students
by Farnaz Baksh, Matevž Borjan Zorec and Karl Kruusamäe
Appl. Sci. 2024, 14(13), 5644; https://doi.org/10.3390/app14135644 - 28 Jun 2024
Cited by 3 | Viewed by 3233
Abstract
Remote, online learning provides opportunities for flexible, accessible, and personalised education, regardless of geographical boundaries. This study mode also promises to democratise education, making it more adaptable to individual learning styles. However, transitioning to this digital paradigm also brings challenges, including issues related [...] Read more.
Remote, online learning provides opportunities for flexible, accessible, and personalised education, regardless of geographical boundaries. This study mode also promises to democratise education, making it more adaptable to individual learning styles. However, transitioning to this digital paradigm also brings challenges, including issues related to students’ mental health and motivation and communication barriers. Integrating social robots into this evolving educational landscape presents an effective approach to enhancing student support and engagement. In this article, we focus on the potential of social robots in higher education, identifying a significant gap in the educational technology landscape that could be filled by open-source learning robots tailored to university students’ needs. To bridge this gap, we introduce the Robotic Study Companion (RSC), a customisable, open-source social robot developed with cost-effective off-the-shelf parts. Designed to provide an interactive and multimodal learning experience, the RSC aims to enhance student engagement and success in their studies. This paper documents the development of the RSC, from establishing literature-based requirements to detailing the design process and build instructions. As an open development platform, the RSC offers a solution to current educational challenges and lays the groundwork for personalised, interactive, and affordable AI-enabled robotic companions. Full article
Show Figures

Figure 1

Figure 1
<p>Fully assembled 3D-printed open-source Robotic Study Companion (RSC) (<b>left</b>) and a learner sitting at a desk with the RSC (<b>right</b>).</p>
Full article ">Figure 2
<p>Computer-Aided Design (CAD) model (<b>left</b>) and fully assembled 3D-printed RSC (<b>right</b>). Existing desktop robot-companion solutions (discussed in <a href="#sec2dot2-applsci-14-05644" class="html-sec">Section 2.2</a>) inspired the RSC’s design. As a result, the RSC’s small form-factor tabletop solution features curved edges, circular shapes, and outward shells that are neutral in colour [<a href="#B76-applsci-14-05644" class="html-bibr">76</a>]. It houses the speaker within the base and secures the remaining peripherals within its rectangular body.</p>
Full article ">Figure 3
<p>Exploded assembly view of the RSC. Components are denoted by yellow highlights and blue arrows, while 3D-printed construction parts are indicated by orange highlights and red arrows. More information on the assembly can be found on the project’s GitHub page [<a href="#B36-applsci-14-05644" class="html-bibr">36</a>].</p>
Full article ">Figure 4
<p>Process of design thinking and rapid prototyping development. The progression involves (I.) conducting a design study to investigate features of existing social robots, a step that is followed by (II.) brainstorming and concept sketching. Subsequently, (III.) component modelling is carried out in CAD-enabled layout exploration. The initial prototype design was an open-enclosure test assembly to secure all components (IV.). This test assembly helped document additional design insights. The RSC was ready for further development after identified issues had been addressed and additional features incorporated (V.).</p>
Full article ">Figure 5
<p>Simplified system-communication diagram.</p>
Full article ">Figure 6
<p>RSC Electronics schematic.</p>
Full article ">Figure 7
<p>Custom expansion board installed in the AIY Voice Bonnet extension header. Depicted on the brown protoboard to the right is a bidirectional logic level shifter (blue). To the left of the protoboard is a USB power-supply connector and two servo pin headers.</p>
Full article ">Figure 8
<p>Interaction–state loop diagram. After the loop starts, a brief setup phase commences. Once it is ready for user input (question), the RSC listens, transcribes spoken words into text, and sends the question to OpenAI for processing. The RSC then ‘speaks’ the response from the API to the user.</p>
Full article ">Figure 9
<p>HRI block diagram and library flow. Auditory interaction (green, <b>right</b>) incorporates preinstalled libraries: AIY (enabling Voice Bonnet I/O access), pyttsx3 (text-to-speech), and pyaudio (audio input/output). We installed OpenAI and SpeechRecognition to facilitate comprehensive auditory interaction. To accommodate physical (blue, <b>bottom left</b>) and visual (yellow, <b>top left</b>) interactions, we installed the CircuitPy and Adafruit-NeoPixel libraries. Both physical and visual interaction are governed by auditory interaction (NLP).</p>
Full article ">Figure 10
<p>Class diagram illustrating all the peripherals of the RSC, including various objects, their attributes, and data operations.</p>
Full article ">
21 pages, 2022 KiB  
Article
An Intelligent Human–Machine Interface Architecture for Long-Term Remote Robot Handling in Fusion Reactor Environments
by Tamara Benito and Antonio Barrientos
Appl. Sci. 2024, 14(11), 4814; https://doi.org/10.3390/app14114814 - 2 Jun 2024
Cited by 1 | Viewed by 1591
Abstract
This paper addresses the intricate challenge posed by remote handling (RH) operations in facilities with operational lifespans surpassing 30 years. The extended RH task horizon necessitates a forward-looking strategy to accommodate the continuous evolution of RH equipment. Confronted with diverse and evolving hardware [...] Read more.
This paper addresses the intricate challenge posed by remote handling (RH) operations in facilities with operational lifespans surpassing 30 years. The extended RH task horizon necessitates a forward-looking strategy to accommodate the continuous evolution of RH equipment. Confronted with diverse and evolving hardware interfaces, a critical requirement emerges for a flexible and adaptive software architecture based on changing situations and past experiences. The paper explores the inherent challenges associated with sustaining and upgrading RH equipment within an extended operational context. In response to this challenge, a groundbreaking, flexible, and maintainable human–machine interface (HMI) architecture named MAMIC is designed, guaranteeing seamless integration with a diverse range of RH equipment developed over the years. Embracing a modular and extensible design, the MAMIC architecture facilitates the effortless incorporation of new equipment without compromising system integrity. Moreover, by adopting this approach, nuclear facilities can proactively steer the evolution of RH equipment, guaranteeing sustained performance and compliance throughout the extended operational lifecycle. The proposed adaptive architecture provides a scalable and future-proof solution, addressing the dynamic landscape of remote handling technology for decades. Full article
Show Figures

Figure 1

Figure 1
<p>MAMIC model generation and integration.</p>
Full article ">Figure 2
<p>The four-person work cell tasks. (<b>a</b>) The responsible officer oversees operations. (<b>b</b>) The deputy manages cameras and support tools. (<b>c</b>) The “mover“ operates casks, transporters, and cranes. (<b>d</b>) The “manipulator” controls the master arm and other manipulative devices. Reprinted with permission from Ref. [<a href="#B30-applsci-14-04814" class="html-bibr">30</a>]. 2024, D. Hamilton.</p>
Full article ">Figure 3
<p>The ITER Maintenance System serves various areas within the ITER facility. (<b>a</b>) Full capability for its manual maintenance within the hot cell, NB cell, and test stand. (<b>b</b>) Full remote handling capability, including rescue operations, within the in-vessel, hot cell, and NB cell. (<b>c</b>) Complete remote handling capabilities for the recovery of casks within the lift, gallery, and hot cell. (<b>d</b>) Remote handling capabilities that can be seamlessly combined with local manual support within the NB cell, hot cell (part), port cell, and test stand area.</p>
Full article ">Figure 4
<p>Software development, verification, and validation lifecycle.</p>
Full article ">Figure 5
<p>ITER MAMIC architecture generation integrating (<b>a</b>) driving actors, such as external input devices, third-party frameworks, OMS, and CODAC workstations, (<b>b</b>) driven actors, such as VR, AR, RHDB, and RH robots, and (<b>c</b>) an easy automatic integration testing framework into the different bounded context using the port–adapter pair.</p>
Full article ">
19 pages, 11964 KiB  
Article
Translating Words to Worlds: Zero-Shot Synthesis of 3D Terrain from Textual Descriptions Using Large Language Models
by Guangzi Zhang, Lizhe Chen, Yu Zhang, Yan Liu, Yuyao Ge and Xingquan Cai
Appl. Sci. 2024, 14(8), 3257; https://doi.org/10.3390/app14083257 - 12 Apr 2024
Viewed by 1394
Abstract
The current research on text-guided 3D synthesis predominantly utilizes complex diffusion models, posing significant challenges in tasks like terrain generation. This study ventures into the direct synthesis of text-to-3D terrain in a zero-shot fashion, circumventing the need for diffusion models. By exploiting the [...] Read more.
The current research on text-guided 3D synthesis predominantly utilizes complex diffusion models, posing significant challenges in tasks like terrain generation. This study ventures into the direct synthesis of text-to-3D terrain in a zero-shot fashion, circumventing the need for diffusion models. By exploiting the large language model’s inherent spatial awareness, we innovatively formulate a method to update existing 3D models through text, thereby enhancing their accuracy. Specifically, we introduce a Gaussian–Voronoi map data structure that converts simplistic map summaries into detailed terrain heightmaps. Employing a chain-of-thought behavior tree approach, which combines action chains and thought trees, the model is guided to analyze a variety of textual inputs and extract relevant terrain data, effectively bridging the gap between textual descriptions and 3D models. Furthermore, we develop a text–terrain re-editing technique utilizing multiagent reasoning, allowing for the dynamic update of the terrain’s representational structure. Our experimental results indicate that this method proficiently interprets the spatial information embedded in the text and generates controllable 3D terrains with superior visual quality. Full article
Show Figures

Figure 1

Figure 1
<p>In summary, our method mainly consists of two aspects of design. The first aspect is the text-to-data process for capturing data from text, which is primarily achieved by the chain-of-thought behavior tree and Multiagents update strategy. On the other hand, the data-to-terrain process is implemented mainly by the Gaussian–Voronoi map. This process completely abandons the diffusion model, allowing us to generate and edit our 3D models more accurately.</p>
Full article ">Figure 2
<p>Significant differences exist between the cells of the original Voronoi diagram. To ensure that terrain generation is roughly equivalent across the map, we apply Lloyd’s relaxation to the Voronoi diagram. This maintains a basic uniformity in cell sizes.</p>
Full article ">Figure 3
<p>The essence of Gaussian blur is the elimination of excessive high-frequency information contained in our heightmap. This occurs during the mapping from Gaussian to Voronoi, and is difficult to avoid at this stage, as our fundamental purpose is to extract as much high-frequency information as possible from the ultra-low-frequency information in Gaussian, thereby enhancing the richness and complexity of the random terrain.</p>
Full article ">Figure 4
<p>Similar research exists in related fields, where our method shares certain similarities with the thought-of-tree strategy. Unlike ToT and other LLM tree-reasoning methods that require LLMs to evaluate the current state on their own and use thought generators to automatically generate solutions, which allows the large language model to offer more ideas for unknown problems, suitable for complex situations with uncertain text domain problems, our method directly fixes the LLM’s thought path without the need for additional solutions. This is because, in our scenario, the final problem to be solved has been fixed as "extracting terrain feature data from input text information", so we do not need LLMs to diverge too much but rather to gradually deduce and refine data based on the given solutions. Even in the final step of data generation, since our method can continue to edit the 3D terrain subsequently, there aren’t any exact data required for LLMs to provide the best solution in one step. Compared with ToT, our method offers better controllability and lower performance overhead, as the number of times text information is input to LLMs is significantly reduced.</p>
Full article ">Figure 5
<p>Unlike conventional multiagent systems, our agents do not have long-term memory. Instead, each time the terrain is modified, their memory only contains data from the last modification of the terrain feature, as longer-term memory is meaningless for the next modification the agent will make, and the only possible use of such memory, to undo the last change, can be completely bypassed without an LLM.</p>
Full article ">Figure 6
<p>We converted NERF and Gaussian results into mesh objects using methods from existing research [<a href="#B46-applsci-14-03257" class="html-bibr">46</a>,<a href="#B47-applsci-14-03257" class="html-bibr">47</a>]. This conversion facilitates a more effective comparison with our method in practical scenarios. It is observable that, compared with other methods based on diffusion models, our approach can leverage the language recognition capabilities of large language models to make accurate judgments about spatial information and generate precise three-dimensional terrains that align with the descriptions of location information. Additionally, our method employs heightmaps for construction, which significantly reduces the number of tiles required for the three-dimensional terrain, further enhancing the practical value of our approach.</p>
Full article ">Figure 7
<p>This image illustrates another significant role of the multiagent update strategy besides optimizing existing maps: continuously updating the terrain to simulate changes in the landscape over time.</p>
Full article ">Figure 8
<p>In the first scenario, we directly converted the recognition results of the chain-of-thought behavior tree into depth maps. In the second scenario, we mapped these results onto the Gaussian–Voronoi map before converting them into depth maps. Utilizing the Gaussian–Voronoi map significantly boosts terrain diversity and accurately mimics the randomness found in natural landscapes. This approach not only enriches the terrain’s visual appeal but also modifies elevation change rates, offering a more dynamic and realistic terrain modeling. Additionally, the Gaussian–Voronoi map also aids in altering the rate of elevation change across the terrain.</p>
Full article ">Figure 9
<p>We attempted to input a piece of text containing both comparative and positional information into the LLM. The LLM, which applied the chain-of-thought behavior tree, effectively identified this type of information and generated data that matched the description. On the other hand, using only the chain-of-thought-based LLM, although it recognized the three types of information, it did not arrange the positions of the three mountains well during data generation, causing them to blend together. In contrast, the natural-language-processing-based LLM was unable to fully recognize these contents. This result further demonstrates the effectiveness of the chain-of-thought behavior tree and its relative accuracy and stability in processing data compared with the chain-of-thought.</p>
Full article ">

Review

Jump to: Research

31 pages, 2972 KiB  
Review
A Review of Brain–Computer Interface-Based Language Decoding: From Signal Interpretation to Intelligent Communication
by Yingyi Qiu, Han Liu and Mengyuan Zhao
Appl. Sci. 2025, 15(1), 392; https://doi.org/10.3390/app15010392 - 3 Jan 2025
Viewed by 1930
Abstract
Brain–computer interface (BCI) technologies for language decoding have emerged as a transformative bridge between neuroscience and artificial intelligence (AI), enabling direct neural–computational communication. The current literature provides detailed insights into individual components of BCI systems, from neural encoding mechanisms to language decoding paradigms [...] Read more.
Brain–computer interface (BCI) technologies for language decoding have emerged as a transformative bridge between neuroscience and artificial intelligence (AI), enabling direct neural–computational communication. The current literature provides detailed insights into individual components of BCI systems, from neural encoding mechanisms to language decoding paradigms and clinical applications. However, a comprehensive perspective that captures the parallel evolution of cognitive understanding and technological advancement in BCI-based language decoding remains notably absent. Here, we propose the Interpretation–Communication–Interaction (ICI) architecture, a novel three-stage perspective that provides an analytical lens for examining BCI-based language decoding development. Our analysis reveals the field’s evolution from basic signal interpretation through dynamic communication to intelligent interaction, marked by three key transitions: from single-channel to multimodal processing, from traditional pattern recognition to deep learning architectures, and from generic systems to personalized platforms. This review establishes that BCI-based language decoding has achieved substantial improvements in regard to system accuracy, latency reduction, stability, and user adaptability. The proposed ICI architecture bridges the gap between cognitive neuroscience and computational methodologies, providing a unified perspective for understanding BCI evolution. These insights offer valuable guidance for future innovations in regard to neural language decoding technologies and their practical application in clinical and assistive contexts. Full article
Show Figures

Figure 1

Figure 1
<p>The Interpretation–Communication–Interaction (ICI) architecture, showing the three-stage evolution of BCI-based language decoding, from basic signal interpretation through advanced dynamic communication to innovative intelligent interaction, with corresponding architectures, features, and applications at each stage.</p>
Full article ">Figure 2
<p>(<b>a</b>) The DRC model of reading processing (adapted from [<a href="#B28-applsci-15-00392" class="html-bibr">28</a>]), showing lexical (vocabulary-based) and non-lexical (letter-to-sound conversion) pathways from print to speech output. (<b>b</b>) The multicomponent Working Memory Model (adapted from [<a href="#B35-applsci-15-00392" class="html-bibr">35</a>]), illustrating interactions between the central executive system and its specialized subsystems (visuo-spatial, episodic, and phonological components), with long-term memory processes.</p>
Full article ">Figure 3
<p>(<b>a</b>) The classical Wernicke–Lichtheim–Geschwind language network model (adapted from [<a href="#B52-applsci-15-00392" class="html-bibr">52</a>]), showing the core components of language processing including Broca’s area (orange), Wernicke’s area (red), and their connection via the arcuate fasciculus; (<b>b</b>) distributed connectivity pattern in the Perisylvian language network (adapted from [<a href="#B52-applsci-15-00392" class="html-bibr">52</a>]), illustrating information-specific pathways (green: syntactic, blue: semantic, red: phonological) between Broca’s subregions and posterior language areas. SPL/IPL: superior/inferior parietal lobule; AG: angular gyrus; pSTG: posterior superior temporal gyrus; pMTG: posterior middle temporal gyrus; pITG: posterior inferior temporal gyrus.</p>
Full article ">Figure 4
<p>(<b>a</b>) Sequential processing stages of EEGNet (adapted from [<a href="#B63-applsci-15-00392" class="html-bibr">63</a>]), showing temporal filtering (Conv2D), frequency-specific spatial filtering (DepthwiseConv2D), and feature map integration (SeparableConv2D), followed by classification. (<b>b</b>) An integrated deep learning pipeline for neural signal classification (adapted from [<a href="#B67-applsci-15-00392" class="html-bibr">67</a>]), demonstrating the sequential stages of data augmentation, feature selection, feature extraction, and classification.</p>
Full article ">Figure 5
<p>End-to-end architecture for decoding speech from brain signals using wav2vec 2.0 and contrastive learning (adapted from [<a href="#B14-applsci-15-00392" class="html-bibr">14</a>]); wav2vec: wave-to-vector; CLIP: contrastive language–image pre-training; Conv: convolution.</p>
Full article ">
Back to TopTop