[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (62)

Search Parameters:
Keywords = subtitles

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 303 KiB  
Article
A Study on the Characteristics of Sports Athletes’ YouTube Channels and User Reactions
by Bora Moon and Taeyeon Oh
Behav. Sci. 2024, 14(8), 700; https://doi.org/10.3390/bs14080700 - 12 Aug 2024
Viewed by 1273
Abstract
This study examined the content characteristics and user responses of athlete-run sports YouTube channels, providing empirical insights for content production strategies and contributing to the development of athlete-run sports YouTube channels. Content analysis was conducted on 3306 videos posted on 20 popular YouTube [...] Read more.
This study examined the content characteristics and user responses of athlete-run sports YouTube channels, providing empirical insights for content production strategies and contributing to the development of athlete-run sports YouTube channels. Content analysis was conducted on 3306 videos posted on 20 popular YouTube channels of South Korean athletes from 1 January 2020 to 31 December 2021. The formal characteristics analyzed included video length, the presence of foreign language subtitles, paid advertisements, and information sources. The content characteristics examined were the types of sports events, main content themes, and whether the content matched the athlete’s sport. Results revealed significant differences in content characteristics and user responses based on whether the athletes were active or retired. This study’s distinctive contribution lies in highlighting the evolving role of athletes as content creators and providing strategic implications for enhancing the competitiveness of athlete-run sports YouTube channels. Future research should consider a broader range of sports YouTubers and a wider variety of YouTube channels to gain comprehensive insights into the sports content ecosystem on this platform. Full article
(This article belongs to the Special Issue Social Media as Interpersonal and Masspersonal)
28 pages, 6576 KiB  
Article
Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine Learning
by Akshara Pande and Deepti Mishra
Biomimetics 2024, 9(7), 391; https://doi.org/10.3390/biomimetics9070391 - 27 Jun 2024
Viewed by 1827
Abstract
Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, [...] Read more.
Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, it is essential to carefully assess the accuracy of the audio recordings captured by Pepper. Therefore, in this study, an experiment is conducted with eight participants with the primary objective of examining Pepper’s speech recognition system with the help of audio features such as Mel-Frequency Cepstral Coefficients, spectral centroid, spectral flatness, the Zero-Crossing Rate, pitch, and energy. Furthermore, the K-means algorithm was employed to create clusters based on these features with the aim of selecting the most suitable cluster with the help of the speech-to-text conversion tool Whisper. The selection of the best cluster is accomplished by finding the maximum accuracy data points lying in a cluster. A criterion of discarding data points with values of WER above 0.3 is imposed to achieve this. The findings of this study suggest that a distance of up to one meter from the humanoid robot Pepper is suitable for capturing the best speech recordings. In contrast, age and gender do not influence the accuracy of recorded speech. The proposed system will provide a significant strength in settings where subtitles are required to improve the comprehension of spoken statements. Full article
(This article belongs to the Special Issue Intelligent Human-Robot Interaction: 2nd Edition)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>A pipeline of overall work to evaluate the efficiency of Pepper’s speech recognition system.</p>
Full article ">Figure 2
<p>Screenshot of top five rows (out of 18 records) of extracted audio features of a person.</p>
Full article ">Figure 3
<p>Screenshot of top 5 rows (out of 60,336 records) with standardized 18 audio features.</p>
Full article ">Figure 4
<p>Three clusters generated by K-mean clustering. The color indicators for cluster labels are shown in the box.</p>
Full article ">Figure 5
<p>Distribution of data records in each cluster. Cluster 0, Cluster 1, and Cluster 2 are indicated by orange, blue, and green colors, respectively.</p>
Full article ">Figure 6
<p>Scatter matrix of all eighteen features in each cluster. Purple color indicates Cluster 0, green indicates Cluster 1, and yellow indicates Cluster 2. Blue color plots in the diagonal represent the data distribution for each feature.</p>
Full article ">Figure 7
<p>Visualization of trends of twenty-two features in Cluster 0.</p>
Full article ">Figure 8
<p>Visualization of trends of twenty-two features in Cluster 1.</p>
Full article ">Figure 9
<p>Visualization of trends of twenty-two features in Cluster 2.</p>
Full article ">Figure 10
<p>Visualization of MFCCs in each of the clusters.</p>
Full article ">Figure 11
<p>Ten most important features in three clusters.</p>
Full article ">Figure 11 Cont.
<p>Ten most important features in three clusters.</p>
Full article ">Figure 12
<p>Screenshot of top 5 rows (out of 60,336 records) and 28 features, including person’s demographics, position, audio features, and evaluation metrics.</p>
Full article ">Figure 13
<p>Screenshot of code snippet to select the best cluster.</p>
Full article ">Figure 14
<p>Analysis of the distribution of (<b>a</b>) gender, (<b>b</b>) distances from the robot, (<b>c</b>) statements spoken, and (<b>d</b>) age for best records in Cluster 1.</p>
Full article ">
25 pages, 4805 KiB  
Article
LightSub: Unobtrusive Subtitles with Reduced Information and Decreased Eye Movement
by Yuki Nishi, Yugo Nakamura, Shogo Fukushima and Yutaka Arakawa
Multimodal Technol. Interact. 2024, 8(6), 51; https://doi.org/10.3390/mti8060051 - 14 Jun 2024
Viewed by 841
Abstract
Subtitles play a crucial role in facilitating the understanding of visual content when watching films and television programs. In this study, we propose a method for presenting subtitles in a way that considers cognitive load when viewing video content in a non-native language. [...] Read more.
Subtitles play a crucial role in facilitating the understanding of visual content when watching films and television programs. In this study, we propose a method for presenting subtitles in a way that considers cognitive load when viewing video content in a non-native language. Subtitles are generally displayed at the bottom of the screen, which causes frequent eye focus switching between subtitles and video, increasing the cognitive load. In our proposed method, we focused on the position, display time, and amount of information contained in the subtitles to reduce the cognitive load and to avoid disturbing the viewer’s concentration. We conducted two experiments to investigate the effects of our proposed subtitle method on gaze distribution, comprehension, and cognitive load during English-language video viewing. Twelve non-native English-speaking subjects participated in the first experiment. The results show that participants’ gazes were more focused around the center of the screen when using our proposed subtitles compared to regular subtitles. Comprehension levels recorded using LightSub were similar, but slightly inferior to those recorded using regular subtitles. However, it was confirmed that most of the participants were viewing the video with a higher cognitive load using the proposed subtitle method. In the second experiment, we investigated subtitles considering connected speech form in English with 18 non-native English speakers. The results revealed that the proposed method, considering connected speech form, demonstrated an improvement in cognitive load during video viewing but it remained higher than that of regular subtitles. Full article
Show Figures

Figure 1

Figure 1
<p>Layout comparison between regular subtitles and proposed LightSub subtitles. Regular subtitles are displaying the sentence “Can I have your emergency contact number”? in Japanese, while the proposed LightSub subtitles is displaying the word “emergency” in Japanese.</p>
Full article ">Figure 2
<p>Experimental scene: participant is watching the video, and gaze data were measured by Tobii Pro Nano, a screen-based eye tracker.</p>
Full article ">Figure 3
<p>Gaze distribution during video viewing.</p>
Full article ">Figure 4
<p>The process and results for quantifying and comparing the spread of gaze distribution.</p>
Full article ">Figure 5
<p>The result of comprehension test. The comprehension test consists of a total of 10 questions, each with four multiple-choice options. The test is scored on a scale of one point per question, with a total of 10 points available for all correct answers and 0 points awarded for all incorrect answers. (<b>a</b>) Comprehension score for each video. (<b>b</b>) Comprehension score for each subtitle method.</p>
Full article ">Figure 6
<p>Evaluation score of cognitive load for each subtitle presentation method. The evaluation score of cognitive load was obtained using the NASA-TLX, with participants rating each item on a 7-point scale (7: very high; 1: very low). (<b>a</b>) Mental demand: how much mental and perceptual activity was required. (<b>b</b>) Effort: how hard participants had to work to accomplish the tasks. (<b>c</b>) Frustration: the extent to which participants felt stressed and anxious.</p>
Full article ">Figure 7
<p>Subjective video comprehension level.</p>
Full article ">Figure 8
<p>How participants perceived the amount of information in subtitles. (<b>a</b>) CEFR condition: Did you perceive the displayed subtitles as insufficient? (<b>b</b>) Connected speech form and CEFR condition: Did you perceive the displayed subtitles as redundant?</p>
Full article ">Figure 9
<p>Gaze distribution during video viewing.</p>
Full article ">Figure 10
<p>The results of the comprehension test. The comprehension test consists of a total of 11 questions, each with 4 multiple-choice options. The test is scored on a scale of 1 point per question, with a total of 11 points available for all correct answers and 0 points awarded for all incorrect answers. (<b>a</b>) Comprehension score for each video. (<b>b</b>) Comprehension score for each subtitle method.</p>
Full article ">Figure 11
<p>Evaluation scores of cognitive load for each subtitle presentation method. The evaluation score of cognitive load was obtained using the NASA-TLX, with participants rating each item on a 7-point scale (7: very high; 1: very low). (<b>a</b>) Mental demand: how much mental and perceptual activity was required. (<b>b</b>) Effort: how hard participants had to work to accomplish the tasks. (<b>c</b>) Frustration: the extent to which participants felt stressed and anxious.</p>
Full article ">Figure 12
<p>System usability scale evaluation.</p>
Full article ">
20 pages, 1000 KiB  
Article
The Subtitling of Swearing: A Pilot Reception Study
by Willian Moura
Languages 2024, 9(5), 184; https://doi.org/10.3390/languages9050184 - 17 May 2024
Viewed by 1289
Abstract
Reception studies in audiovisual translation seek to explore how translation choices affect the audience’s comprehension, emotional engagement, enjoyment, and overall viewing experience of audiovisual materials. This study focuses on the subtitling product and analyzes the acceptability of swear words translated through different stimuli: [...] Read more.
Reception studies in audiovisual translation seek to explore how translation choices affect the audience’s comprehension, emotional engagement, enjoyment, and overall viewing experience of audiovisual materials. This study focuses on the subtitling product and analyzes the acceptability of swear words translated through different stimuli: subtitles with softened, maintained, and intensified swearing, along with standard Netflix subtitles (control). Employing a multi-method approach, the study collected data through a survey, using questionnaires with a Likert scale and interviews, following the user-centered translation model to understand how participants receive and perceive swear words in subtitling. The results indicate that the control group had the highest acceptability of the participants, while the group with softened swear words presented the lowest acceptability rate. The analysis shows that participants across all groups reported that discomfort does not arise from reading the swear word in the subtitle but from perceiving a deliberate change in its offensive load—usually softened. The findings demonstrate that this change can lead to a breach of the contract of illusion in subtitling, as participants are exposed to the original dialogue and the translated subtitle simultaneously. In conclusion, when perceived, the change in the offensive load can redirect the viewer’s focus from the video to the subtitles, negatively affecting the enjoyment of the audiovisual experience. Full article
Show Figures

Figure 1

Figure 1
<p>Swearing usage in daily life.</p>
Full article ">Figure 2
<p>Discomfort watching films and series with a lot of swearing.</p>
Full article ">
17 pages, 2957 KiB  
Article
Out-of-School Exposure to English in EFL Teenage Learners: Is It Related to Academic Performance?
by Linh Tran and Imma Miralpeix
Educ. Sci. 2024, 14(4), 393; https://doi.org/10.3390/educsci14040393 - 10 Apr 2024
Viewed by 2355
Abstract
Learning a Foreign Language (FL) beyond the classroom has become common practice thanks to advances in technology and the use of English as a Lingua Franca. This study explores the types and amount of out-of-school informal exposure to English that Spanish secondary school [...] Read more.
Learning a Foreign Language (FL) beyond the classroom has become common practice thanks to advances in technology and the use of English as a Lingua Franca. This study explores the types and amount of out-of-school informal exposure to English that Spanish secondary school students typically receive in their daily lives. Informed by recent literature on the influence of extramural activities on FL proficiency, the second aim of this study is to investigate the potential relationship between out-of-school exposure and academic performance, as measured by English school grades. Data were obtained from a questionnaire answered by secondary school students aged 12–16 (N = 2015) regarding the different types and amounts of activities they perform in English outside school. Findings revealed that teenage learners were most frequently exposed to English through audiovisual input. Social media interaction, along with reading and writing (with or without digital support), were closely associated with their English marks. Other popular activities, such as listening to music or playing video games, were not found to be related to proficiency or even showed a negative correlation with it, while less popular activities, such as watching subtitled movies and series, could have greater potential for language learning. This study contributes to the understanding of informal practices in FL learning settings and provides insights that can help bridge interactive language practices and formal curriculum to create holistic learning experiences for language learners. Full article
(This article belongs to the Section Language and Literacy Education)
Show Figures

Figure 1

Figure 1
<p>Factor 1. Speaking/reading (online or offline) and writing (online).</p>
Full article ">Figure 2
<p>Factor 2. Gaming.</p>
Full article ">Figure 3
<p>Factor 3. Listening to music and watching short online videos.</p>
Full article ">Figure 4
<p>Factor 4. Watching subtitled movies and series.</p>
Full article ">Figure A1
Full article ">Figure A2
Full article ">
15 pages, 3855 KiB  
Article
Advanced Techniques for Geospatial Referencing in Online Media Repositories
by Dominik Warch, Patrick Stellbauer and Pascal Neis
Future Internet 2024, 16(3), 87; https://doi.org/10.3390/fi16030087 - 1 Mar 2024
Cited by 1 | Viewed by 1751
Abstract
In the digital transformation era, video media libraries’ untapped potential is immense, restricted primarily by their non-machine-readable nature and basic search functionalities limited to standard metadata. This study presents a novel multimodal methodology that utilizes advances in artificial intelligence, including neural networks, computer [...] Read more.
In the digital transformation era, video media libraries’ untapped potential is immense, restricted primarily by their non-machine-readable nature and basic search functionalities limited to standard metadata. This study presents a novel multimodal methodology that utilizes advances in artificial intelligence, including neural networks, computer vision, and natural language processing, to extract and geocode geospatial references from videos. Leveraging the geospatial information from videos enables semantic searches, enhances search relevance, and allows for targeted advertising, particularly on mobile platforms. The methodology involves a comprehensive process, including data acquisition from ARD Mediathek, image and text analysis using advanced machine learning models, and audio and subtitle processing with state-of-the-art linguistic models. Despite challenges like model interpretability and the complexity of geospatial data extraction, this study’s findings indicate significant potential for advancing the precision of spatial data analysis within video content, promising to enrich media libraries with more navigable, contextually rich content. This advancement has implications for user engagement, targeted services, and broader urban planning and cultural heritage applications. Full article
Show Figures

Figure 1

Figure 1
<p>Workflow diagram illustrating data acquisition from ARD Mediathek.</p>
Full article ">Figure 2
<p>Workflow diagram illustrating the analysis of the visible image.</p>
Full article ">Figure 3
<p>Workflow diagram illustrating the extraction of text from the visible image and performing NER and geocoding to retrieve coordinates.</p>
Full article ">Figure 4
<p>Workflow diagram illustrating the analysis of the audio source and subtitles to retrieve coordinates.</p>
Full article ">Figure 5
<p>Map excerpt of Dresden showing successfully identified location references in green (<b>a</b>) and unsuccessful (orange) location references and false positives (red) (<b>b</b>). The numbers represent the number of overlapping location references. Basemap: powered by Esri.</p>
Full article ">Figure 6
<p>Misidentification by the landmark recognition model, interpreting a person in a white hood (<b>a</b>) as the Swedish F 15 Flygmuseum (<b>b</b>), showing challenges with AI interpretability and training data biases. Image sources: (<b>a</b>) ARD Mediathek; (<b>b</b>) Wikimedia Commons.</p>
Full article ">Figure 7
<p>Examples in which OCR captured parts of the text (<b>a</b>) and where OCR was not able to recognize text due to large and partly concealed fonts (<b>b</b>). Image source: (<b>a</b>,<b>b</b>): ARD Mediathek.</p>
Full article ">
8 pages, 1264 KiB  
Proceeding Paper
Enhancing Virtual Experiences: A Holistic Approach to Immersive Special Effects
by Georgios Tsaramirsis, Oussama H. Hamid, Amany Mohammed, Zamhar Ismail and Princy Randhawa
Eng. Proc. 2023, 59(1), 23; https://doi.org/10.3390/engproc2023059023 - 8 Dec 2023
Viewed by 743
Abstract
To create a more immersive experience, electronic content developers utilize hardware solutions that not only display images and produce sounds but also manipulate the viewer’s real environment. These devices can control visual effects like lighting variations and fog, emit scents, simulate liquid effects, [...] Read more.
To create a more immersive experience, electronic content developers utilize hardware solutions that not only display images and produce sounds but also manipulate the viewer’s real environment. These devices can control visual effects like lighting variations and fog, emit scents, simulate liquid effects, and provide vibration or locomotion sensations, such as moving the viewer’s chair. The goal is to emulate additional sensations for the viewers and engender the belief that they are truly present within the virtual environment. These devices are typically found in specially designed cinemas referred to as xD cinemas, such as 4D, 5D, 9D, etc., where each effect is treated as an additional dimension, enhancing the overall experience. Currently, all of these effects are triggered by timers. The system determines which effect to play based on timers. This approach is problematic, for it requires programming each device for each movie. In this research, we address this problem by introducing the idea of Special Effect Tags (SETs) that can be added in the subtitle files. The SETs aim to serve as a standard that will allow the various devices to know when each artificial phenomenon should be triggered. They are generic and can support infinite artificial phenomena, also known as dimensions. This paper introduces the idea of a common special effect framework and a generic architecture of a special effects player that is independent of any specific hardware solutions. Full article
(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)
Show Figures

Figure 1

Figure 1
<p>Required infrastructures.</p>
Full article ">Figure 2
<p>A flowchart of the underlying algorithm of a generic special effect player.</p>
Full article ">
16 pages, 1349 KiB  
Article
InnoDAT—An Innovative Project Based on Subtitling for the Deaf and Hard-of-Hearing for Learning Languages and Cultures
by Pilar Couto-Cantero, Noemi Fraga-Castrillón and Giuseppe Trovato
Languages 2023, 8(4), 235; https://doi.org/10.3390/languages8040235 - 16 Oct 2023
Cited by 1 | Viewed by 2069
Abstract
The InnoDAT project is framed within the TRADILEX Project, which is aimed at demonstrating the applicability of Audiovisual Translation (AVT) for teaching and learning languages. TRADILEX is an ongoing project presented at a state-funded competitive call and supported by the Spanish Government. This [...] Read more.
The InnoDAT project is framed within the TRADILEX Project, which is aimed at demonstrating the applicability of Audiovisual Translation (AVT) for teaching and learning languages. TRADILEX is an ongoing project presented at a state-funded competitive call and supported by the Spanish Government. This article is aimed at presenting InnoDAT, an innovative project based in the use of AVT for teaching and learning languages through: Subtitling for the Deaf and Hard-of-Hearing (SDH). It has been designed to learn Spanish as a Second Foreign Language in an Italian Higher Education context according to a B2 CEFR level. The methodology used was developed by researchers of TRADILEX. Six tailormade Learning Units (LU), based on the SDH mode, were designed and implemented among participants (N = 97). Authentic materials and cultural matters were also used and adapted according to the B2 level. The results show a clear improvement in the process of teaching and learning languages, knowledge of the culture and traditions of the target language, and the consciousness of accessibility among the participants. The authors compare this innovative research with former research The InnoDAT project validates the applicability of the didactic audiovisual translation (DAT) as a means for learning languages and cultures within digital educational settings and how languages and cultures are intricately connected. Moreover, not only cultural issues but also accessibility were paramount in this research. Finally, motivation, autonomous and meaningful learning, communicative language competence, and digital competence were also nurtured by means of the InnoDAT project. Full article
Show Figures

Figure 1

Figure 1
<p>Mean scores of the improvement in communication skills.</p>
Full article ">Figure 2
<p>Communication skills that participants felt improved the most.</p>
Full article ">Figure 3
<p>Number of topics learned about Spanish culture.</p>
Full article ">Figure 4
<p>Degree to which cultural content fosters respect for other cultures.</p>
Full article ">Figure 5
<p>Degree to which the project enriched participants as people.</p>
Full article ">
17 pages, 2282 KiB  
Article
A Short Video Classification Framework Based on Cross-Modal Fusion
by Nuo Pang, Songlin Guo, Ming Yan and Chien Aun Chan
Sensors 2023, 23(20), 8425; https://doi.org/10.3390/s23208425 - 12 Oct 2023
Cited by 5 | Viewed by 1753
Abstract
The explosive growth of online short videos has brought great challenges to the efficient management of video content classification, retrieval, and recommendation. Video features for video management can be extracted from video image frames by various algorithms, and they have been proven to [...] Read more.
The explosive growth of online short videos has brought great challenges to the efficient management of video content classification, retrieval, and recommendation. Video features for video management can be extracted from video image frames by various algorithms, and they have been proven to be effective in the video classification of sensor systems. However, frame-by-frame processing of video image frames not only requires huge computing power, but also classification algorithms based on a single modality of video features cannot meet the accuracy requirements in specific scenarios. In response to these concerns, we introduce a short video categorization architecture centered around cross-modal fusion in visual sensor systems which jointly utilizes video features and text features to classify short videos, avoiding processing a large number of image frames during classification. Firstly, the image space is extended to three-dimensional space–time by a self-attention mechanism, and a series of patches are extracted from a single image frame. Each patch is linearly mapped into the embedding layer of the Timesformer network and augmented with positional information to extract video features. Second, the text features of subtitles are extracted through the bidirectional encoder representation from the Transformers (BERT) pre-training model. Finally, cross-modal fusion is performed based on the extracted video and text features, resulting in improved accuracy for short video classification tasks. The outcomes of our experiments showcase a substantial superiority of our introduced classification framework compared to alternative baseline video classification methodologies. This framework can be applied in sensor systems for potential video classification. Full article
(This article belongs to the Special Issue Smart Mobile and Sensing Applications)
Show Figures

Figure 1

Figure 1
<p>Different types of video classification architectures (<b>a</b>) 3D-ConvNet, (<b>b</b>) Two-Stream and (<b>c</b>) SlowFast.</p>
Full article ">Figure 2
<p>Two-channel classification framework based on cross-modal fusion.</p>
Full article ">Figure 3
<p>Video feature extraction process.</p>
Full article ">Figure 4
<p>The Timesformer coding flow chart.</p>
Full article ">Figure 5
<p>Word vector model.</p>
Full article ">Figure 6
<p>Text feature extraction framework.</p>
Full article ">
16 pages, 1828 KiB  
Article
Development of Professional Foreign Language Competence of Economics Students with MOOCs during the Pandemic
by Artyom Dmitrievich Zubkov
Educ. Sci. 2023, 13(10), 1010; https://doi.org/10.3390/educsci13101010 - 5 Oct 2023
Viewed by 1443
Abstract
Delving into the realm of massive open online courses (MOOCs), this investigation scrutinizes their role in enriching the professional language proficiency of economics undergraduates amidst a global coronavirus crisis. The research pivots on assessing the efficacy of MOOCs within higher learning curriculums while [...] Read more.
Delving into the realm of massive open online courses (MOOCs), this investigation scrutinizes their role in enriching the professional language proficiency of economics undergraduates amidst a global coronavirus crisis. The research pivots on assessing the efficacy of MOOCs within higher learning curriculums while shedding light on the benefits and drawbacks of such an approach. The methodologies employed encompassed the analysis and interpretation of data derived from student polls and statistical analysis of yielded outcomes. The findings shed light on MOOCs as a potent instrument for honing language prowess and enriching the professional linguistic expertise of budding economists. The language experiment was participated in by a group of 34 students and their responses have been collated and interpreted as proposed guidelines for future reference. The participants notably appreciated the video-lecture format and exposure to native speakers’ elocution. However, the study did not shy away from spotlighting certain limitations such as complexities in platform navigation and the difficulties of watching videos without subtitles. To optimize MOOC utilization, a series of recommendations were drawn up. These included offering choices to students, simplifying platform navigation, extending support during challenges, and bolstering speaking abilities. This exploration holds valuable insights not just for educators and students but also for academic institutions at large, offering hands-on data regarding the perks and pitfalls of employing MOOCs to cultivate professional foreign language competence within the field of economic education. Full article
Show Figures

Figure 1

Figure 1
<p>Methodical model for the development of professional foreign language competence of economics students using MOOCs.</p>
Full article ">Figure 2
<p>The results of diagnosing the level of professional foreign language competence.</p>
Full article ">Figure 3
<p>“What did you like most about using MOOCs to learn a foreign language?” Question answer statistics.</p>
Full article ">Figure 4
<p>“Are there any aspects of using MOOCs that were difficult or not understood?” Question answer statistics.</p>
Full article ">Figure 5
<p>“What changes or improvements would you like to see in the proposed method of learning a foreign language?” Question answer statistics.</p>
Full article ">Figure 6
<p>“Do you feel that MOOCs have helped you improve your language skills such as vocabulary, listening, speaking, reading or writing?” Question answer statistics.</p>
Full article ">Figure 7
<p>“How has this approach contributed to your understanding of economic terms and concepts in a foreign language?” Question answer statistics.</p>
Full article ">Figure 8
<p>“Were the MOOC materials relevant and useful in the context of your economics education?” Question answer statistics.</p>
Full article ">Figure 9
<p>“Would you recommend the use of MOOCs to other students for learning a foreign language and why?” Question answer statistics.</p>
Full article ">
17 pages, 5406 KiB  
Article
Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning
by Tian Xie, Weiping Ding, Jinbao Zhang, Xusen Wan and Jiehua Wang
Appl. Sci. 2023, 13(13), 7916; https://doi.org/10.3390/app13137916 - 6 Jul 2023
Cited by 4 | Viewed by 3195
Abstract
The discipline of automatic image captioning represents an integration of two pivotal branches of artificial intelligence, namely computer vision (CV) and natural language processing (NLP). The principal functionality of this technology lies in transmuting the extracted visual features into semantic information of a [...] Read more.
The discipline of automatic image captioning represents an integration of two pivotal branches of artificial intelligence, namely computer vision (CV) and natural language processing (NLP). The principal functionality of this technology lies in transmuting the extracted visual features into semantic information of a higher order. The bidirectional long short-term memory (Bi-LSTM) has garnered wide acceptance in executing image captioning tasks. Of late, scholarly attention has been focused on modifying suitable models for innovative and precise subtitle captions, although tuning the parameters of the model does not invariably yield optimal outcomes. Given this, the current research proposes a model that effectively employs the bidirectional LSTM and attention mechanism (Bi-LS-AttM) for image captioning endeavors. This model exploits the contextual comprehension from both anterior and posterior aspects of the input data, synergistically with the attention mechanism, thereby augmenting the precision of visual language interpretation. The distinctiveness of this research is embodied in its incorporation of Bi-LSTM and the attention mechanism to engender sentences that are both structurally innovative and accurately reflective of the image content. To enhance temporal efficiency and accuracy, this study substitutes convolutional neural networks (CNNs) with fast region-based convolutional networks (Fast RCNNs). Additionally, it refines the process of generation and evaluation of common space, thus fostering improved efficiency. Our model was tested for its performance on Flickr30k and MSCOCO datasets (80 object categories). Comparative analyses of performance metrics reveal that our model, leveraging the Bi-LS-AttM, surpasses unidirectional and Bi-LSTM models. When applied to caption generation and image-sentence retrieval tasks, our model manifests time economies of approximately 36.5% and 26.3% vis-a-vis the Bi-LSTM model and the deep Bi-LSTM model, respectively. Full article
(This article belongs to the Special Issue Recent Trends in Automatic Image Captioning Systems)
Show Figures

Figure 1

Figure 1
<p>Example captions generated by the model. (<b>a</b>) Caption generation (by the unidirectional model (<b>upper</b>) and by our model (<b>lower</b>)) on Flickr30K. (<b>b</b>) Caption generation (by the unidirectional model (<b>upper</b>) and by our model (<b>lower</b>)) on MSCOCO.</p>
Full article ">Figure 2
<p>LSTM cell structure.</p>
Full article ">Figure 3
<p>Bi-LSTM cell structure.</p>
Full article ">Figure 4
<p>Proposed model architecture.</p>
Full article ">Figure 5
<p>(<b>a</b>) Comparison of METEOR scores of three models on two benchmark datasets; (<b>b</b>) comparison of CIDEr scores of three models on two benchmark datasets.</p>
Full article ">Figure 6
<p><b>Figure 6</b>. (<b>a</b>) Comparison of Captioning Models on Flickr30K; (<b>b</b>) Comparison of Captioning Models on MSCOCO.</p>
Full article ">Figure 7
<p>Example of using our model for image retrieval and caption retrieval on the MSCOCO validation set. (<b>a</b>) To search for three images using captions. (<b>b</b>) To search for three captions using images.</p>
Full article ">Figure 8
<p>Examples of image captioning for the baseline and our model on the datasets. The captions generated by the baseline model are above, while the captions generated by our model are below.</p>
Full article ">Figure 9
<p>Examples of failed experiments: (<b>a</b>) feature extraction error, (<b>b</b>) Image representation error, (<b>c</b>) caption logic error. We mark the extracted features on the image with red boxes and use blue fonts to distinguish errors.</p>
Full article ">
16 pages, 1134 KiB  
Article
Multilingualism and Multiculturalism in Family Guy: Challenges in Dubbing and Subtitling L3 Varieties of Spanish
by Mariazell Eugènia Bosch Fábregas
Languages 2023, 8(2), 143; https://doi.org/10.3390/languages8020143 - 30 May 2023
Viewed by 2250
Abstract
Multilingualism and multiculturalism are verbally and visually recurrent in the sitcom Family Guy (1999-in production) through a combination of a main language of communication (L1) and other languages (L3) in the source language (SL) or source text (ST). The use of L3 is [...] Read more.
Multilingualism and multiculturalism are verbally and visually recurrent in the sitcom Family Guy (1999-in production) through a combination of a main language of communication (L1) and other languages (L3) in the source language (SL) or source text (ST). The use of L3 is tantamount to tokenism and stereotyping characters, especially those whose recurrence is incidental and part of jokes. This paper compares two versions of the episode “Road to Rhode Island” (American and Spanish DVDs) and addresses a scene to analyze the linguistic challenges and lexical choices in dubbing and subtitling L1 and L3 in two geographical varieties of Spanish: Latin American Spanish and Peninsular Spanish. In this regard, this study focuses on the role and function of L3 in translation, the techniques to represent L3 in translation, L1 and L3 translation techniques, and which techniques are used in translation. Overall, this paper explores how the Spanish DVD adds a new L3 in the target text (TT) to maintain its original function in subtitling and dubbing, and the differences in the American DVD: L3TT omission in subtitling and L3TT change of function and meaning in dubbing, which ultimately accentuates linguistic and cultural misrepresentation and stereotypes. Full article
Show Figures

Figure 1

Figure 1
<p>“Doblado” (Screenshot) Source: Spanish DVD (2005).</p>
Full article ">Figure 2
<p>Brian smiling Source: SP DVD (2005) (Screenshot).</p>
Full article ">
14 pages, 414 KiB  
Article
Translating Multilingualism in Mira Nair’s Monsoon Wedding
by Montse Corrius, Eva Espasa and Laura Santamaria
Languages 2023, 8(2), 129; https://doi.org/10.3390/languages8020129 - 17 May 2023
Cited by 1 | Viewed by 1978
Abstract
Linguistic diversity is present in many audiovisual productions and has given rise to fruitful research on translation of multilingualism and language variation. Monsoon Wedding (Mira Nair, 2001) is a prototypical film for translation analysis, since multilingualism is a recurrent feature, as the film [...] Read more.
Linguistic diversity is present in many audiovisual productions and has given rise to fruitful research on translation of multilingualism and language variation. Monsoon Wedding (Mira Nair, 2001) is a prototypical film for translation analysis, since multilingualism is a recurrent feature, as the film dialogue combines English (L1) with Hindi and Punjabi (L3), which creates an effect of code-switching. This article analyses how the multilingualism and the cultural elements present in the source text (ST) have been transferred to the Spanish translated text (TT) La boda del monzón. The results show that in the Spanish dubbed and subtitled versions, few Indian cultural elements are left, and little language variation is preserved. Thus, L3 does not play a central role as it does in the source text. In the translation, only a few loan words from Hindi or Punjabi are kept, mainly from the domains of food and cooking, as well as terms of address and greetings, or words related to the wedding ceremony. The results also show that when L3 is not fully rendered in translation, otherness is still conveyed through image and music, thus (re)creating a different atmosphere for Spanish audiences. Full article
23 pages, 676 KiB  
Article
The Rendering of Multilingual Occurrences in Netflix’s Italian Dub Streams: Evolving Trends and Norms on Streaming Platforms
by Sofia Savoldelli and Giselle Spiteri Miggiani
Languages 2023, 8(2), 113; https://doi.org/10.3390/languages8020113 - 20 Apr 2023
Cited by 1 | Viewed by 2966
Abstract
Given the vast scholarly attention paid to multilingualism on traditional media over the years, it seems timely to focus on streaming platforms. This paper sets out to identify potential norms for the rendering of multilingual occurrences in the localised content of Netflix series. [...] Read more.
Given the vast scholarly attention paid to multilingualism on traditional media over the years, it seems timely to focus on streaming platforms. This paper sets out to identify potential norms for the rendering of multilingual occurrences in the localised content of Netflix series. It also seeks to explore whether streaming translation practices related to multilingualism differ from the consolidated norms and practices for TV and cinema content. The chosen data sample consists of the Italian dub streams of five TV Netflix-produced shows featuring multilingualism as a main characteristic. The strategies and techniques adopted in each series are singled out, quantified, and labelled according to a combination of taxonomies. These include dubbing, revoicing, subtitling, part-subtitling, diegetic interpreting, unchanged speech transfer, and no translation. A wider analysis is also carried out across all the data sample to draw patterns on a macro level. The findings reveal a strong tendency to mark and preserve multilingualism, in line with Netflix’s own policies and dubbing specifications. Transfer unchanged combined with subtitles emerges as the most recurrent strategy, while the dub-over strategy accounts for 13% of the multilingual occurrences in the data sample. Extensive neutralisation is therefore not encountered. That said, a certain degree of overlap between multilingual translation norms on Netflix and conventional Italian dubbing practices (which tend to neutralise) can still be observed. Full article
17 pages, 399 KiB  
Article
Subtitling for the Deaf and Hard of Hearing, Audio Description and Audio Subtitling in Multilingual TV Shows
by Micòl Beseghi
Languages 2023, 8(2), 109; https://doi.org/10.3390/languages8020109 - 17 Apr 2023
Cited by 1 | Viewed by 3960
Abstract
Multilingualism in audiovisual productions has substantially increased in recent years as a reflection of today’s globalised world. While the number of publications looking at the phenomenon from the perspective of audiovisual translation (AVT)—especially interlingual subtitling and dubbing—has grown considerably in the last decade, [...] Read more.
Multilingualism in audiovisual productions has substantially increased in recent years as a reflection of today’s globalised world. While the number of publications looking at the phenomenon from the perspective of audiovisual translation (AVT)—especially interlingual subtitling and dubbing—has grown considerably in the last decade, there seems to be relatively little research on the rendering of multilingualism from the perspective of accessibility modes, namely subtitling for the deaf and hard of hearing (SDH) and audio description (AD). This article aims to investigate how multilingualism is rendered for deaf and hard-of-hearing as well as blind and partially sighted audiences, focusing on SDH and AD, as well as audio subtitling (AST). The study analyses a small corpus of TV shows available on Netflix and aims to highlight how multilingualism is made accessible both in SDH and AD. The products selected for the study had to satisfy three main criteria: they had to be a recent production, include the presence of an L1 (English) and one or more third languages and offer both intralingual SDH (closed captions) and AD. The results show that, even within the context of a single streaming platform, the strategies applied to deal with multilingualism seem to vary quite significantly both in SDH and AD/AST, ranging from neutralisation to L3 visibility. Full article
Back to TopTop