Big Data and Cognitive Computing

19 pages, 626 KiB

Open AccessArticle

Áika: A Distributed Edge System for AI Inference

by Joakim Aalstad Alslie, Aril Bernhard Ovesen, Tor-Arne Schmidt Nordmo, Håvard Dagenborg Johansen, Pål Halvorsen, Michael Alexander Riegler and Dag Johansen

Big Data Cogn. Comput. 2022, 6(2), 68; https://doi.org/10.3390/bdcc6020068 - 17 Jun 2022

Cited by 3 | Viewed by 3273

Abstract

Video monitoring and surveillance of commercial fisheries in world oceans has been proposed by the governing bodies of several nations as a response to crimes such as overfishing. Traditional video monitoring systems may not be suitable due to limitations in the offshore fishing [...] Read more.

Video monitoring and surveillance of commercial fisheries in world oceans has been proposed by the governing bodies of several nations as a response to crimes such as overfishing. Traditional video monitoring systems may not be suitable due to limitations in the offshore fishing environment, including low bandwidth, unstable satellite network connections and issues of preserving the privacy of crew members. In this paper, we present Áika, a robust system for executing distributed Artificial Intelligence (AI) applications on the edge. Áika provides engineers and researchers with several building blocks in the form of Agents, which enable the expression of computation pipelines and distributed applications with robustness and privacy guarantees. Agents are continuously monitored by dedicated monitoring nodes, and provide applications with a distributed checkpointing and replication scheme. Áika is designed for monitoring and surveillance in privacy-sensitive and unstable offshore environments, where flexible access policies at the storage level can provide privacy guarantees for data transfer and access. Full article

(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)

► Show Figures

Figure 1

19 pages, 5202 KiB

Open AccessEditor’s ChoiceArticle

Iris Liveness Detection Using Multiple Deep Convolution Networks

by Smita Khade, Shilpa Gite and Biswajeet Pradhan

Big Data Cogn. Comput. 2022, 6(2), 67; https://doi.org/10.3390/bdcc6020067 - 15 Jun 2022

Cited by 11 | Viewed by 3838

Abstract

In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identification and iris liveness detection. This study used five pre-trained [...] Read more.

In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identification and iris liveness detection. This study used five pre-trained networks, including VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7, to recognize iris liveness using transfer learning techniques. These models are compared using three state-of-the-art biometric databases: the LivDet-Iris 2015 dataset, IIITD contact dataset, and ND Iris3D 2020 dataset. Validation accuracy, loss, precision, recall, and f1-score, APCER (attack presentation classification error rate), NPCER (normal presentation classification error rate), and ACER (average classification error rate) were used to evaluate the performance of all pre-trained models. According to the observational data, these models have a considerable ability to transfer their experience to the field of iris recognition and to recognize the nanostructures within the iris region. Using the ND Iris 3D 2020 dataset, the EfficeintNetB7 model has achieved 99.97% identification accuracy. Experiments show that pre-trained models outperform other current iris biometrics variants. Full article

(This article belongs to the Special Issue Data, Structure, and Information in Artificial Intelligence)

► Show Figures

Figure 1

32 pages, 7749 KiB

Open AccessEditor’s ChoiceArticle

CompositeView: A Network-Based Visualization Tool

by Stephen A. Allegri, Kevin McCoy and Cassie S. Mitchell

Big Data Cogn. Comput. 2022, 6(2), 66; https://doi.org/10.3390/bdcc6020066 - 14 Jun 2022

Cited by 4 | Viewed by 4870

Abstract

Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display [...] Read more.

Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi. Full article

(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)

► Show Figures

Figure 1

15 pages, 712 KiB

Open AccessArticle

Analysis and Prediction of User Sentiment on COVID-19 Pandemic Using Tweets

by Nilufa Yeasmin, Nosin Ibna Mahbub, Mrinal Kanti Baowaly, Bikash Chandra Singh, Zulfikar Alom, Zeyar Aung and Mohammad Abdul Azim

Big Data Cogn. Comput. 2022, 6(2), 65; https://doi.org/10.3390/bdcc6020065 - 10 Jun 2022

Cited by 18 | Viewed by 4031

Abstract

The novel coronavirus disease (COVID-19) has dramatically affected people’s daily lives worldwide. More specifically, since there is still insufficient access to vaccines and no straightforward, reliable treatment for COVID-19, every country has taken the appropriate precautions (such as physical separation, masking, and lockdown) [...] Read more.

The novel coronavirus disease (COVID-19) has dramatically affected people’s daily lives worldwide. More specifically, since there is still insufficient access to vaccines and no straightforward, reliable treatment for COVID-19, every country has taken the appropriate precautions (such as physical separation, masking, and lockdown) to combat this extremely infectious disease. As a result, people invest much time on online social networking platforms (e.g., Facebook, Reddit, LinkedIn, and Twitter) and express their feelings and thoughts regarding COVID-19. Twitter is a popular social networking platform, and it enables anyone to use tweets. This research used Twitter datasets to explore user sentiment from the COVID-19 perspective. We used a dataset of COVID-19 Twitter posts from nine states in the United States for fifteen days (from 1 April 2020, to 15 April 2020) to analyze user sentiment. We focus on exploiting machine learning (ML), and deep learning (DL) approaches to classify user sentiments regarding COVID-19. First, we labeled the dataset into three groups based on the sentiment values, namely positive, negative, and neutral, to train some popular ML algorithms and DL models to predict the user concern label on COVID-19. Additionally, we have compared traditional bag-of-words and term frequency-inverse document frequency (TF-IDF) for representing the text to numeric vectors in ML techniques. Furthermore, we have contrasted the encoding methodology and various word embedding schemes, such as the word to vector (Word2Vec) and global vectors for word representation (GloVe) versions, with three sets of dimensions (100, 200, and 300) for representing the text to numeric vectors for DL approaches. Finally, we compared COVID-19 infection cases and COVID-19-related tweets during the COVID-19 pandemic. Full article

► Show Figures

Figure 1

20 pages, 3478 KiB

Open AccessArticle

Decision-Making Using Big Data Relevant to Sustainable Development Goals (SDGs)

by Saman Fattahi, Sharifu Ura and Md. Noor-E-Alam

Big Data Cogn. Comput. 2022, 6(2), 64; https://doi.org/10.3390/bdcc6020064 - 5 Jun 2022

Cited by 5 | Viewed by 3773

Abstract

Policymakers, practitioners, and researchers around the globe have been acting in a coordinated manner, yet remaining independent, to achieve the seventeen Sustainable Development Goals (SDGs) defined by the United Nations. Remarkably, SDG-centric activities have manifested a huge information silo known as big data. [...] Read more.

Policymakers, practitioners, and researchers around the globe have been acting in a coordinated manner, yet remaining independent, to achieve the seventeen Sustainable Development Goals (SDGs) defined by the United Nations. Remarkably, SDG-centric activities have manifested a huge information silo known as big data. In most cases, a relevant subset of big data is visualized using several two-dimensional plots. These plots are then used to decide a course of action for achieving the relevant SDGs, and the whole process remains rather informal. Consequently, the question of how to make a formal decision using big data-generated two-dimensional plots is a critical one. This article fills this gap by presenting a novel decision-making approach (method and tool). The approach formally makes decisions where the decision-relevant information is two-dimensional plots rather than numerical data. The efficacy of the proposed approach is demonstrated by conducting two case studies relevant to SDG 12 (responsible consumption and production). The first case study confirms whether or not the proposed decision-making approach produces reliable results. In this case study, datasets of wooden and polymeric materials regarding two eco-indicators (CO₂ footprint and water usage) are represented using two two-dimensional plots. The plots show that wooden and polymeric materials are indifferent in water usage, whereas wooden materials are better than polymeric materials in terms of CO₂ footprint. The proposed decision-making approach correctly captures this fact and correctly ranks the materials. For the other case study, three materials (mild steel, aluminum alloys, and magnesium alloys) are ranked using six criteria (strength, modulus of elasticity, cost, density, CO₂ footprint, and water usage) and their relative weights. The datasets relevant to the six criteria are made available using three two-dimensional plots. The plots show the relative positions of mild steel, aluminum alloys, and magnesium alloys. The proposed decision-making approach correctly captures the decision-relevant information of these three plots and correctly ranks the materials. Thus, the outcomes of this article can help those who wish to develop pragmatic decision support systems leveraging the capacity of big data in fulfilling SDGs. Full article

► Show Figures

Figure 1

17 pages, 1055 KiB

Open AccessArticle

Social Media Analytics as a Tool for Cultural Spaces—The Case of Twitter Trending Topics

by Vassilis Poulopoulos and Manolis Wallace

Big Data Cogn. Comput. 2022, 6(2), 63; https://doi.org/10.3390/bdcc6020063 - 2 Jun 2022

Cited by 1 | Viewed by 3853

Abstract

We are entering an era in which online personalities and personas will grow faster and faster. People are tending to use the Internet, and social media especially, more frequently and for a wider variety of purposes. In parallel, a number of cultural spaces [...] Read more.

We are entering an era in which online personalities and personas will grow faster and faster. People are tending to use the Internet, and social media especially, more frequently and for a wider variety of purposes. In parallel, a number of cultural spaces have already decided to invest in marketing and message spreading through the web and the media. Growing their audience, or locating the appropriate group of people to share their information, remains a tedious task within the chaotic environment of the Internet. The investment is mainly financial—usually large—and directed to advertisements. Still, there is much space for research and investment in analytics that can provide evidence considering the spreading of the word and finding groups of people interested in specific information or trending topics and influencers. In this paper, we present a part of a national project that aims to perform an analysis of Twitter’s trending topics. The main scope of the analysis is to provide a basic ordering on the topics based on their “importance”. Based on this, we clarify how cultural institutions can benefit from such an analysis in order to empower their online presence. Full article

(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)

► Show Figures

Figure 1

22 pages, 3553 KiB

Open AccessEditor’s ChoiceArticle

Synthesizing a Talking Child Avatar to Train Interviewers Working with Maltreated Children

by Pegah Salehi, Syed Zohaib Hassan, Myrthe Lammerse, Saeed Shafiee Sabet, Ingvild Riiser, Ragnhild Klingenberg Røed, Miriam S. Johnson, Vajira Thambawita, Steven A. Hicks, Martine Powell, Michael E. Lamb, Gunn Astrid Baugerud, Pål Halvorsen and Michael A. Riegler

Big Data Cogn. Comput. 2022, 6(2), 62; https://doi.org/10.3390/bdcc6020062 - 1 Jun 2022

Cited by 15 | Viewed by 5649

Abstract

When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer’s [...] Read more.

When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer’s ability to follow empirically based guidelines. In doing so, it is essential to implement economical and scientific training courses for interviewers. Due to recent advances in artificial intelligence, we propose to generate a realistic and interactive child avatar, aiming to mimic a child. Our ongoing research involves the integration and interaction of different components with each other, including how to handle the language, auditory, emotional, and visual components of the avatar. This paper presents three subjective studies that investigate and compare various state-of-the-art methods for implementing multiple aspects of the child avatar. The first user study evaluates the whole system and shows that the system is well received by the expert and highlights the importance of its realism. The second user study investigates the emotional component and how it can be integrated with video and audio, and the third user study investigates realism in the auditory and visual components of the avatar created by different methods. The insights and feedback from these studies have contributed to the refined and improved architecture of the child avatar system which we present here. Full article

(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)

► Show Figures

Figure 1

Figure 1
A comprehensive category of face manipulation techniques. Full article ">Figure 2
System architecture. Green blocks denote the interactive parts, yellow blocks are language-related, blue audio-related, and pink the parts of the system related to visualization. Full article ">Figure 3
A comparison between natural and synthetic voices in animated unity-based and GAN-based avatars. Full article ">Figure 4
Excerpt from the user study with window size 5 where both models are in agreement with the human opinion. Full article ">Figure 5
Excerpt from the user study with window size 3 for which both models were not in agreement with the human raters. Full article ">Figure 6
Excerpt from the user study with window size 5 for which both GPT-3 and the human raters agreed that this should be classified as fear, while the BART model classified its as anger. Full article ">Figure 7
Given an arbitrary source face image generated by styleGAN [<a href="#B59-BDCC-06-00062" class="html-bibr">59</a>,<a href="#B101-BDCC-06-00062" class="html-bibr">101</a>] and a driving video, ICface [<a href="#B74-BDCC-06-00062" class="html-bibr">74</a>] has generated the talking-head of a child. Full article ">Figure 8
Illustration of a talking-head video generated using two methods, PCAVS [<a href="#B75-BDCC-06-00062" class="html-bibr">75</a>] and MakeItTalk [<a href="#B65-BDCC-06-00062" class="html-bibr">65</a>]. The input is an image generated using styleGAN and an audio generated using IBM Watson. The first two rows: PCAVS and the second two rows: MakeItTalk. Full article ">Figure 9
Bar-plot (95% confidence interval) for comparison of MakeItTalk [<a href="#B65-BDCC-06-00062" class="html-bibr">65</a>] and PC-AVS [<a href="#B75-BDCC-06-00062" class="html-bibr">75</a>]. Full article ">Figure 10
Bar-plot (95% confidence interval) to show results of the user study for the evaluation of two of the best female and male characters created for both the GAN-Based and game engine-based approaches. Full article ">

21 pages, 3407 KiB

Open AccessArticle

A Novel Method of Exploring the Uncanny Valley in Avatar Gender(Sex) and Realism Using Electromyography

by Jacqueline D. Bailey and Karen L. Blackmore

Big Data Cogn. Comput. 2022, 6(2), 61; https://doi.org/10.3390/bdcc6020061 - 30 May 2022

Cited by 1 | Viewed by 5112

Abstract

Despite the variety of applications that use avatars (virtual humans), how end-users perceive avatars are not fully understood, and accurately measuring these perceptions remains a challenge. To measure end-user responses more accurately to avatars, this pilot study uses a novel methodology which aims [...] Read more.

Despite the variety of applications that use avatars (virtual humans), how end-users perceive avatars are not fully understood, and accurately measuring these perceptions remains a challenge. To measure end-user responses more accurately to avatars, this pilot study uses a novel methodology which aims to examine and categorize end-user facial electromyography (f-EMG) responses. These responses (n = 92) can be categorized as pleasant, unpleasant, and neutral using control images sourced from the International Affective Picture System (IAPS). This methodology can also account for variability between participant responses to avatars. The novel methodology taken here can assist in the comparisons of avatars, such as gender(sex)-based differences. To examine these gender(sex) differences, participant responses to an avatar can be categorized as either pleasant, unpleasant, neutral or a combination. Although other factors such as age may unconsciously affect the participant responses, age was not directly considered in this work. This method may allow avatar developers to better understand how end-users objectively perceive an avatar. The recommendation of this methodology is to aim for an avatar that returns a pleasant, neutral, or pleasant-neutral response, unless an unpleasant response is the intended. This methodology demonstrates a novel and useful way forward to address some of the known variability issues found in f-EMG responses, and responses to avatar realism and uncanniness that can be used to examine gender(sex) perceptions. Full article

(This article belongs to the Special Issue Cognitive and Physiological Assessments in Human-Computer Interaction)

► Show Figures

Figure 1

21 pages, 4883 KiB

Open AccessArticle

Earthquake Insurance in California, USA: What Does Community-Generated Big Data Reveal to Us?

by Fabrizio Terenzio Gizzi and Maria Rosaria Potenza

Big Data Cogn. Comput. 2022, 6(2), 60; https://doi.org/10.3390/bdcc6020060 - 20 May 2022

Cited by 5 | Viewed by 6247

Abstract

California has a high seismic hazard, as many historical and recent earthquakes remind us. To deal with potential future damaging earthquakes, a voluntary insurance system for residential properties is in force in the state. However, the insurance penetration rate is quite low. Bearing [...] Read more.

California has a high seismic hazard, as many historical and recent earthquakes remind us. To deal with potential future damaging earthquakes, a voluntary insurance system for residential properties is in force in the state. However, the insurance penetration rate is quite low. Bearing this in mind, the aim of this article is to ascertain whether Big Data can provide policymakers and stakeholders with useful information in view of future action plans on earthquake coverage. Therefore, we extracted and analyzed the online search interest in earthquake insurance over time (2004–2021) through Google Trends (GT), a website that explores the popularity of top search queries in Google Search across various regions and languages. We found that (1) the triggering of online searches stems primarily from the occurrence of earthquakes in California and neighboring areas as well as oversea regions, thus suggesting that the interest of users was guided by both direct and vicarious earthquake experiences. However, other natural hazards also come to people’s notice; (2) the length of the higher level of online attention spans from one day to one week, depending on the magnitude of the earthquakes, the place where they occur, the temporal proximity of other natural hazards, and so on; (3) users interested in earthquake insurance are also attentive to knowing the features of the policies, among which are first the price of coverage, and then their worth and practical benefits; (4) online interest in the time span analyzed fits fairly well with the real insurance policy underwritings recorded over the years. Based on the research outcomes, we can propose the establishment of an observatory to monitor the online behavior that is suitable for supporting well-timed and geographically targeted information and communication action plans. Full article

(This article belongs to the Special Issue Big Data and Internet of Things)

► Show Figures

Figure 1

14 pages, 577 KiB

Open AccessArticle

The Predictive Power of a Twitter User’s Profile on Cryptocurrency Popularity

by Maria Trigka, Andreas Kanavos, Elias Dritsas, Gerasimos Vonitsanos and Phivos Mylonas

Big Data Cogn. Comput. 2022, 6(2), 59; https://doi.org/10.3390/bdcc6020059 - 20 May 2022

Cited by 6 | Viewed by 3830

Abstract

Microblogging has become an extremely popular communication tool among Internet users worldwide. Millions of users daily share a huge amount of information related to various aspects of their lives, which makes the respective sites a very important source of data for analysis. Bitcoin [...] Read more.

Microblogging has become an extremely popular communication tool among Internet users worldwide. Millions of users daily share a huge amount of information related to various aspects of their lives, which makes the respective sites a very important source of data for analysis. Bitcoin (BTC) is a decentralized cryptographic currency and is equivalent to most recurrently known currencies in the way that it is influenced by socially developed conclusions, regardless of whether those conclusions are considered valid. This work aims to assess the importance of Twitter users’ profiles in predicting a cryptocurrency’s popularity. More specifically, our analysis focused on the user influence, captured by different Twitter features (such as the number of followers, retweets, lists) and tweet sentiment scores as the main components of measuring popularity. Moreover, the Spearman, Pearson, and Kendall Correlation Coefficients are applied as post-hoc procedures to support hypotheses about the correlation between a user influence and the aforementioned features. Tweets sentiment scoring (as positive or negative) was performed with the aid of Valence Aware Dictionary and Sentiment Reasoner (VADER) for a number of tweets fetched within a concrete time period. Finally, the Granger causality test was employed to evaluate the statistical significance of various features time series in popularity prediction to identify the most influential variable for predicting future values of the cryptocurrency popularity. Full article

(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)

► Show Figures

Figure 1

20 pages, 9323 KiB

Open AccessEditor’s ChoiceArticle

COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method

by Yosra Didi, Ahlam Walha and Ali Wali

Big Data Cogn. Comput. 2022, 6(2), 58; https://doi.org/10.3390/bdcc6020058 - 18 May 2022

Cited by 20 | Viewed by 4948

Abstract

In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better [...] Read more.

In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification. Full article

► Show Figures

Figure 1

18 pages, 1494 KiB

Open AccessArticle

Sentiment Analysis of Emirati Dialect

by Arwa A. Al Shamsi and Sherief Abdallah

Big Data Cogn. Comput. 2022, 6(2), 57; https://doi.org/10.3390/bdcc6020057 - 17 May 2022

Cited by 14 | Viewed by 4738

Abstract

Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually [...] Read more.

Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA. Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati dataset to prepare the dataset for the sentiment analysis experiment and improve its classification performance. The sentiment analysis experiment was carried out on both balanced and unbalanced datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis experiments were accuracy, recall, precision, and f-measure. The results reported that the best accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the sentiment classification of the unbalanced dataset. Full article

► Show Figures

Figure 1

33 pages, 5213 KiB

Open AccessArticle

A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks

by Changwon Yoo, Efrain Gonzalez, Zhenghua Gong and Deodutta Roy

Big Data Cogn. Comput. 2022, 6(2), 56; https://doi.org/10.3390/bdcc6020056 - 17 May 2022

Cited by 3 | Viewed by 2653

Abstract

Every year, biomedical data is increasing at an alarming rate and is being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). This article presents and evaluates a practical [...] Read more.

Every year, biomedical data is increasing at an alarming rate and is being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). This article presents and evaluates a practical causal discovery algorithm that uses modern statistical, machine learning, and informatics approaches that have been used in the learning of causal relationships from biomedical Big Data, which in turn integrates clinical, omics (genomic and proteomic), and environmental aspects. The learning of causal relationships from data using graphical models does not address the hidden (unknown or not measured) mechanisms that are inherent to most measurements and analyses. Also, many algorithms lack a practical usage since they do not incorporate current mechanistic knowledge. This paper proposes a practical causal discovery algorithm using causal Bayesian networks to gain a better understanding of the underlying mechanistic process that generated the data. The algorithm utilizes model averaging techniques such as searching through a relative order (e.g., if gene A is regulating gene B, then we can say that gene A is of a higher order than gene B) and incorporates relevant prior mechanistic knowledge to guide the Markov chain Monte Carlo search through the order. The algorithm was evaluated by testing its performance on datasets generated from the ALARM causal Bayesian network. Out of the 37 variables in the ALARM causal Bayesian network, two sets of nine were chosen and the observations for those variables were provided to the algorithm. The performance of the algorithm was evaluated by comparing its prediction with the generating causal mechanism. The 28 variables that were not in use are referred to as hidden variables and they allowed for the evaluation of the algorithm’s ability to predict hidden confounded causal relationships. The algorithm’s predicted performance was also compared with other causal discovery algorithms. The results show that incorporating order information provides a better mechanistic understanding even when hidden confounded causes are present. The prior mechanistic knowledge incorporated in the Markov chain Monte Carlo search led to the better discovery of causal relationships when hidden variables were involved in generating the simulated data. Full article

► Show Figures

Figure 1

19 pages, 2274 KiB

Open AccessEditor’s ChoiceArticle

Virtual Reality Adaptation Using Electrodermal Activity to Support the User Experience

by Francesco Chiossi, Robin Welsch, Steeven Villa, Lewis Chuang and Sven Mayer

Big Data Cogn. Comput. 2022, 6(2), 55; https://doi.org/10.3390/bdcc6020055 - 13 May 2022

Cited by 19 | Viewed by 4895

Abstract

Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based [...] Read more.

Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based on physiological arousal, i.e., electrodermal activity. We investigated the usability of the adaptive system in a simulated social virtual reality scenario. Participants completed an n-back task (primary) and a visual detection (secondary) task. Here, we adapted the visual complexity of the secondary task in the form of the number of non-player characters of the secondary task to accomplish the primary task. We show that an adaptive virtual reality can improve users’ comfort by adapting to physiological arousal regarding the task complexity. Our findings suggest that physiologically adaptive virtual reality systems can improve users’ experience in a wide range of scenarios. Full article

(This article belongs to the Special Issue Cognitive and Physiological Assessments in Human-Computer Interaction)

► Show Figures

Figure 1

Figure 1
Game view capture of a single trial of the VR n-back (<math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>) and the visual detection tasks. Participants were required to place a sphere into the corresponding bucket. If the sphere matched the color of the previous sphere one step before, participants placed it into the right bucket. If not, the sphere had to be placed into the left bucket. The visual detection task required participants to monitor if visitors of a museum either possessed a ticket to enter the building or not. To signal a missing ticket after detection, the participant had to select the NPC (see <a href="#app1-BDCC-06-00055" class="html-app">Supplymentary Materials</a>). Full article ">Figure 2
Individual predicted standardized mean EDA from the optimal Stream for the non-adaptive condition (crosses) with individual regression lines, as well as the actual mean EDA (points) at local maxima of adaptation. Full article ">Figure 3
Adaptation across time for one participant. The pink line indicates Stream, the green line indicates the z-scored mean EDA signal that was used for adaptation. Grey areas indicate whether the algorithm chose to increase (light grey) or decrease (dark grey) the Stream in a time window of 20 s. Full article ">Figure 4
The relative difference for (a) raw NASA-TLX score difference, (b) standardized mean EDA, and (c) averaged SCL scores. Full article ">Figure 5
The relative difference for overall task accuracies in the n-Back and visual detection tasks. Full article ">Figure 6
Standardized mean EDA at local maxima of adaptation as a function of raw NASA-TLX for the adaptive condition. There is a significant negative correlation between EDA and workload, r(13) = −0.62, p = 0.013. Full article ">Figure 7
The relative difference for (a) usability questions measured on a 5-point Likert scale and (b) GEQ subscales (Competence, Positive Affection, and Immersion). * indicates that measurements are significantly different from the no-adaptation baseline. Outliers were defined as data points with a value greater than 2 SDs on the log-scale from its participant-mean. Outliers are represented as bold dots. Full article ">

11 pages, 1357 KiB

Open AccessArticle

A New Comparative Study of Dimensionality Reduction Methods in Large-Scale Image Retrieval

by Mohammed Amin Belarbi, Saïd Mahmoudi, Ghalem Belalem, Sidi Ahmed Mahmoudi and Aurélie Cools

Big Data Cogn. Comput. 2022, 6(2), 54; https://doi.org/10.3390/bdcc6020054 - 13 May 2022

Cited by 1 | Viewed by 3176

Abstract

Indexing images by content is one of the most used computer vision methods, where various techniques are used to extract visual characteristics from images. The deluge of data surrounding us, due the high use of social and diverse media acquisition systems, has created [...] Read more.

Indexing images by content is one of the most used computer vision methods, where various techniques are used to extract visual characteristics from images. The deluge of data surrounding us, due the high use of social and diverse media acquisition systems, has created a major challenge for classical multimedia processing systems. This problem is referred to as the ‘curse of dimensionality’. In the literature, several methods have been used to decrease the high dimension of features, including principal component analysis (PCA) and locality sensitive hashing (LSH). Some methods, such as VA-File or binary tree, can be used to accelerate the search phase. In this paper, we propose an efficient approach that exploits three particular methods, those being PCA and LSH for dimensionality reduction, and the VA-File method to accelerate the search phase. This combined approach is fast and can be used for high dimensionality features. Indeed, our method consists of three phases: (1) image indexing within SIFT and SURF algorithms, (2) compressing the data using LSH and PCA, and (3) finally launching the image retrieval process, which is accelerated by using a VA-File approach. Full article

(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)

► Show Figures

Figure 1

3 pages, 186 KiB

Open AccessEditorial

Knowledge Modelling and Learning through Cognitive Networks

by Massimo Stella and Yoed N. Kenett

Big Data Cogn. Comput. 2022, 6(2), 53; https://doi.org/10.3390/bdcc6020053 - 13 May 2022

Cited by 1 | Viewed by 2697

Abstract

Knowledge modelling is a growing field at the fringe of computer science, psychology and network science [...] Full article

(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)

21 pages, 4585 KiB

Open AccessEditor’s ChoiceArticle

Cognitive Networks Extract Insights on COVID-19 Vaccines from English and Italian Popular Tweets: Anticipation, Logistics, Conspiracy and Loss of Trust

by Massimo Stella, Michael S. Vitevitch and Federico Botta

Big Data Cogn. Comput. 2022, 6(2), 52; https://doi.org/10.3390/bdcc6020052 - 12 May 2022

Cited by 12 | Viewed by 4574

Abstract

Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science [...] Read more.

Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science and AI-based image analysis. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between December 2020 and March 2021. One popular English tweet contained in our data set was liked around 495,000 times, highlighting how popular tweets could cognitively affect large parts of the population. We investigate both text and multimedia content in tweets and build a cognitive network of syntactic/semantic associations in messages, including emotional cues and pictures. This network representation indicates how online users linked ideas in social discourse and framed vaccines along specific semantic/emotional content. The English semantic frame of “vaccine” was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with “vaccine,” “hoax” and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in extremely popular English posts. Interestingly, these were absent in Italian messages. Popular tweets with images of people wearing face masks used language that lacked the trust and joy found in tweets showing people with no masks. This difference indicates a negative effect attributed to face-covering in social discourse. Behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like sad messages less. Both patterns indicate an interplay between emotions and content diffusion beyond sentiment. After its suspension in mid-March 2021, “AstraZeneca” was associated with trustful language driven by experts. After the deaths of a small number of vaccinated people in mid-March, popular Italian tweets framed “vaccine” by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

24 pages, 7573 KiB

Open AccessEditor’s ChoiceArticle

Robust Multi-Mode Synchronization of Chaotic Fractional Order Systems in the Presence of Disturbance, Time Delay and Uncertainty with Application in Secure Communications

by Ali Akbar Kekha Javan, Assef Zare, Roohallah Alizadehsani and Saeed Balochian

Big Data Cogn. Comput. 2022, 6(2), 51; https://doi.org/10.3390/bdcc6020051 - 8 May 2022

Cited by 5 | Viewed by 2631

Abstract

This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero [...] Read more.

This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero was guaranteed using the Lyapunov function. Additionally, the control rules were extracted as explicit continuous functions. An image encryption approach was proposed based on maps with time-dependent coding for secure communication. The simulations indicated the effectiveness of the proposed design regarding the suitability of the parameters, the convergence of errors, and robustness. Subsequently, the presented method was applied to fractional-order Chen systems and was encrypted using the chaotic masking of different benchmark images. The results indicated the desirable performance of the proposed method in encrypting the benchmark images. Full article

► Show Figures

Figure 1

32 pages, 5511 KiB

Open AccessEditor’s ChoiceArticle

Gender Stereotypes in Hollywood Movies and Their Evolution over Time: Insights from Network Analysis

by Arjun M. Kumar, Jasmine Y. Q. Goh, Tiffany H. H. Tan and Cynthia S. Q. Siew

Big Data Cogn. Comput. 2022, 6(2), 50; https://doi.org/10.3390/bdcc6020050 - 6 May 2022

Cited by 5 | Viewed by 50166

Abstract

The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots [...] Read more.

The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots and using a novel method of identifying story tropes, we demonstrate that gender stereotypes exist in Hollywood movies. An analysis of specific paths in the network and the words reflecting various domains show the dynamic changes in some of these stereotypical associations. Our results suggest that gender stereotypes are complex and dynamic in nature. Specifically, whereas male characters appear to be associated with a diversity of themes in movies, female characters seem predominantly associated with the theme of romance. Although associations of female characters to physical beauty and marriage are declining over time, associations of female characters to sexual relationships and weddings are increasing. Our results demonstrate how the application of cognitive network science methods can enable a more nuanced investigation of gender stereotypes in textual data. Full article

(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)

► Show Figures

Figure 1

19 pages, 3173 KiB

Open AccessEditor’s ChoiceArticle

A Comparative Study of MongoDB and Document-Based MySQL for Big Data Application Data Management

by Cornelia A. Győrödi, Diana V. Dumşe-Burescu, Doina R. Zmaranda and Robert Ş. Győrödi

Big Data Cogn. Comput. 2022, 6(2), 49; https://doi.org/10.3390/bdcc6020049 - 5 May 2022

Cited by 11 | Viewed by 12546

Abstract

In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data [...] Read more.

In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data accessing and processing, including response times to the most important CRUD operations (CREATE, READ, UPDATE, DELETE). In this paper, the behavior of two of the major document-based NoSQL databases, MongoDB and document-based MySQL, was analyzed in terms of the complexity and performance of CRUD operations, especially in query operations. The main objective of the paper is to make a comparative analysis of the impact that each specific database has on application performance when realizing CRUD requests. To perform this analysis, a case-study application was developed using the two document-based MongoDB and MySQL databases, which aim to model and streamline the activity of service providers that use a lot of data. The results obtained demonstrate the performance of both databases for different volumes of data; based on these, a detailed analysis and several conclusions were presented to support a decision for choosing an appropriate solution that could be used in a big-data application. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

19 pages, 2456 KiB

Open AccessEditor’s ChoiceArticle

A New Ontology-Based Method for Arabic Sentiment Analysis

by Safaa M. Khabour, Qasem A. Al-Radaideh and Dheya Mustafa

Big Data Cogn. Comput. 2022, 6(2), 48; https://doi.org/10.3390/bdcc6020048 - 29 Apr 2022

Cited by 10 | Viewed by 4391

Abstract

Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for [...] Read more.

Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for Arabic sentiment analysis based on domain ontologies and features’ importance. In this paper, we built a semantic orientation approach for calculating overall polarity from the Arabic subjective texts based on built domain ontology and the available sentiment lexicon. We used the ontology concepts to extract and weight the semantic domain features by considering their levels in the ontology tree and their frequencies in the dataset to compute the overall polarity of a given textual review based on the importance of each domain feature. For evaluation, an Arabic dataset from the hotels’ domain was selected to build the domain ontology and to test the proposed approach. The overall accuracy and f-measure reach 79.20% and 78.75%, respectively. Results showed that the approach outperformed the other semantic orientation approaches, and it is an appealing approach to be used for Arabic sentiment analysis. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

28 pages, 772 KiB

Open AccessEditor’s ChoiceArticle

Incentive Mechanisms for Smart Grid: State of the Art, Challenges, Open Issues, Future Directions

by Sweta Bhattacharya, Rajeswari Chengoden, Gautam Srivastava, Mamoun Alazab, Abdul Rehman Javed, Nancy Victor, Praveen Kumar Reddy Maddikunta and Thippa Reddy Gadekallu

Big Data Cogn. Comput. 2022, 6(2), 47; https://doi.org/10.3390/bdcc6020047 - 27 Apr 2022

Cited by 37 | Viewed by 6655

Abstract

Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow [...] Read more.

Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow of electricity automatically based on supply/demand, and thus, responding to problems becomes quicker and easier. This also plays a crucial role in controlling carbon emissions, by avoiding energy losses during peak load hours and ensuring optimal energy management. The scope of big data analytics in smart grids is huge, as they collect information from raw data and derive intelligent information from the same. However, these benefits of the smart grid are dependent on the active and voluntary participation of the consumers in real-time. Consumers need to be motivated and conscious to avail themselves of the achievable benefits. Incentivizing the appropriate actor is an absolute necessity to encourage prosumers to generate renewable energy sources (RES) and motivate industries to establish plants that support sustainable and green-energy-based processes or products. The current study emphasizes similar aspects and presents a comprehensive survey of the start-of-the-art contributions pertinent to incentive mechanisms in smart grids, which can be used in smart grids to optimize the power distribution during peak times and also reduce carbon emissions. The various technologies, such as game theory, blockchain, and artificial intelligence, used in implementing incentive mechanisms in smart grids are discussed, followed by different incentive projects being implemented across the globe. The lessons learnt, challenges faced in such implementations, and open issues such as data quality, privacy, security, and pricing related to incentive mechanisms in SG are identified to guide the future scope of research in this sector. Full article

(This article belongs to the Special Issue Energy-Efficient IoT (Internet of Things) and Big Data Challenges for Connected Intelligence)

► Show Figures

Figure 1

23 pages, 1539 KiB

Open AccessArticle

A Non-Uniform Continuous Cellular Automata for Analyzing and Predicting the Spreading Patterns of COVID-19

by Puspa Eosina, Aniati Murni Arymurthy and Adila Alfa Krisnadhi

Big Data Cogn. Comput. 2022, 6(2), 46; https://doi.org/10.3390/bdcc6020046 - 24 Apr 2022

Cited by 3 | Viewed by 3843

Abstract

During the COVID-19 outbreak, modeling the spread of infectious diseases became a challenging research topic due to its rapid spread and high mortality rate. The main objective of a standard epidemiological model is to estimate the number of infected, suspected, and recovered from [...] Read more.

During the COVID-19 outbreak, modeling the spread of infectious diseases became a challenging research topic due to its rapid spread and high mortality rate. The main objective of a standard epidemiological model is to estimate the number of infected, suspected, and recovered from the illness by mathematical modeling. This model does not capture how the disease transmits between neighboring regions through interaction. A more general framework such as Cellular Automata (CA) is required to accommodate a more complex spatial interaction within the epidemiological model. The critical issue of modeling in the spread of diseases is how to reduce the prediction error. This research aims to formulate the influence of the interaction of a neighborhood on the spreading pattern of COVID-19 using a neighborhood frame model in a Cellular-Automata (CA) approach and obtain a predictive model for the COVID-19 spread with the error reduction to improve the model. We propose a non-uniform continuous CA (N-CCA) as our contribution to demonstrate the influence of interactions on the spread of COVID-19. The model has succeeded in demonstrating the influence of the interaction between regions on the COVID-19 spread, as represented by the coefficients obtained. These coefficients result from multiple regression models. The coefficient obtained represents the population’s behavior interacting with its neighborhood in a cell and influences the number of cases that occur the next day. The evaluation of the N-CCA model is conducted by root mean square error (RMSE) for the difference in the number of cases between prediction and real cases per cell in each region. This study demonstrates that this approach improves the prediction of accuracy for 14 days in the future using data points from the past 42 days, compared to a baseline model. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

29 pages, 5206 KiB

Open AccessArticle

Virtual Reality-Based Stimuli for Immersive Car Clinics: A Performance Evaluation Model

by Alexandre Costa Henriques, Thiago Barros Murari, Jennifer Callans, Alexandre Maguino Pinheiro Silva, Antonio Lopes Apolinario, Jr. and Ingrid Winkler

Big Data Cogn. Comput. 2022, 6(2), 45; https://doi.org/10.3390/bdcc6020045 - 20 Apr 2022

Cited by 2 | Viewed by 4159

Abstract

This study proposes a model to evaluate the performance of virtual reality-based stimuli for immersive car clinics. The model considered Attribute Importance, Stimuli Efficacy and Stimuli Cost factors and the method was divided into three stages: we defined the importance of fourteen attributes [...] Read more.

This study proposes a model to evaluate the performance of virtual reality-based stimuli for immersive car clinics. The model considered Attribute Importance, Stimuli Efficacy and Stimuli Cost factors and the method was divided into three stages: we defined the importance of fourteen attributes relevant to a car clinic based on the perceptions of Marketing and Design experts; then we defined the efficacy of five virtual stimuli based on the perceptions of Product Development and Virtual Reality experts; and we used a cost factor to calculate the efficiency of the five virtual stimuli in relation to the physical. The Marketing and Design experts identified a new attribute, Scope; eleven of the fifteen attributes were rated as Important or Very Important, while four were removed from the model due to being considered irrelevant. According to our performance evaluation model, virtual stimuli have the same efficacy as physical stimuli. However, when cost is considered, virtual stimuli outperform physical stimuli, particularly virtual stimuli with glasses. We conclude that virtual stimuli have the potential to reduce the cost and time required to develop new stimuli in car clinics, but with concerns related to hardware, software, and other definitions. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

40 pages, 14654 KiB

Open AccessEditor’s ChoiceReview

Deep Learning Approaches for Video Compression: A Bibliometric Analysis

by Ranjeet Vasant Bidwe, Sashikala Mishra, Shruti Patil, Kailash Shaw, Deepali Rahul Vora, Ketan Kotecha and Bhushan Zope

Big Data Cogn. Comput. 2022, 6(2), 44; https://doi.org/10.3390/bdcc6020044 - 19 Apr 2022

Cited by 39 | Viewed by 7903

Abstract

Every data and kind of data need a physical drive to store it. There has been an explosion in the volume of images, video, and other similar data types circulated over the internet. Users using the internet expect intelligible data, even under the [...] Read more.

Every data and kind of data need a physical drive to store it. There has been an explosion in the volume of images, video, and other similar data types circulated over the internet. Users using the internet expect intelligible data, even under the pressure of multiple resource constraints such as bandwidth bottleneck and noisy channels. Therefore, data compression is becoming a fundamental problem in wider engineering communities. There has been some related work on data compression using neural networks. Various machine learning approaches are currently applied in data compression techniques and tested to obtain better lossy and lossless compression results. A very efficient and variety of research is already available for image compression. However, this is not the case for video compression. Because of the explosion of big data and the excess use of cameras in various places globally, around 82% of the data generated involve videos. Proposed approaches have used Deep Neural Networks (DNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs), and various variants of Autoencoders (AEs) are used in their approaches. All newly proposed methods aim to increase performance (reducing bitrate up to 50% at the same data quality and complexity). This paper presents a bibliometric analysis and literature survey of all Deep Learning (DL) methods used in video compression in recent years. Scopus and Web of Science are well-known research databases. The results retrieved from them are used for this analytical study. Two types of analysis are performed on the extracted documents. They include quantitative and qualitative results. In quantitative analysis, records are analyzed based on their citations, keywords, source of publication, and country of publication. The qualitative analysis provides information on DL-based approaches for video compression, as well as the advantages, disadvantages, and challenges of using them. Full article

► Show Figures

Figure 1

25 pages, 664 KiB

Open AccessEditor’s ChoiceArticle

New Efficient Approach to Solve Big Data Systems Using Parallel Gauss–Seidel Algorithms

by Shih Yu Chang, Hsiao-Chun Wu and Yifan Wang

Big Data Cogn. Comput. 2022, 6(2), 43; https://doi.org/10.3390/bdcc6020043 - 19 Apr 2022

Viewed by 2844

Abstract

In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its [...] Read more.

In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its factorized form, advantages arise in terms of computation, implementation, and data-compression. In this work, we propose two new parallel iterative algorithms as extensions of the Gauss–Seidel algorithm (GSA) to solve regression problems involving many variables. The convergence study in terms of error-bounds of the proposed iterative algorithms is also performed, and the required computation resources, namely time- and memory-complexities, are evaluated to benchmark the efficiency of the proposed new algorithms. Finally, the numerical results from both Monte Carlo simulations and real-world datasets are presented to demonstrate the striking effectiveness of our proposed new methods. Full article

► Show Figures

Figure 1

18 pages, 1307 KiB

Open AccessEditor’s ChoiceArticle

An Emergency Event Detection Ensemble Model Based on Big Data

by Khalid Alfalqi and Martine Bellaiche

Big Data Cogn. Comput. 2022, 6(2), 42; https://doi.org/10.3390/bdcc6020042 - 16 Apr 2022

Cited by 5 | Viewed by 4074

Abstract

Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as [...] Read more.

Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as well as the inherent financial consequences. Social network utilization in emergency event detection models can play an important role as information is shared and users’ status is updated once an emergency event occurs. Besides, big data proved its significance as a tool to assist and alleviate emergency events by processing an enormous amount of data over a short time interval. This paper shows that it is necessary to have an appropriate emergency event detection ensemble model (EEDEM) to respond quickly once such unfortunate events occur. Furthermore, it integrates Snapchat maps to propose a novel method to pinpoint the exact location of an emergency event. Moreover, merging social networks and big data can accelerate the emergency event detection system: social network data, such as those from Twitter and Snapchat, allow us to manage, monitor, analyze and detect emergency events. The main objective of this paper is to propose a novel and efficient big data-based EEDEM to pinpoint the exact location of emergency events by employing the collected data from social networks, such as “Twitter” and “Snapchat”, while integrating big data (BD) and machine learning (ML). Furthermore, this paper evaluates the performance of five ML base models and the proposed ensemble approach to detect emergency events. Results show that the proposed ensemble approach achieved a very high accuracy of 99.87% which outperform the other base models. Moreover, the proposed base models yields a high level of accuracy: 99.72%, 99.70% for LSTM and decision tree, respectively, with an acceptable training time. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

9 pages, 473 KiB

Open AccessArticle

Revisiting Gradient Boosting-Based Approaches for Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids

by Maya Hilda Lestari Louk and Bayu Adhi Tama

Big Data Cogn. Comput. 2022, 6(2), 41; https://doi.org/10.3390/bdcc6020041 - 16 Apr 2022

Cited by 8 | Viewed by 4050

Abstract

Gradient boosting ensembles have been used in the cyber-security area for many years; nonetheless, their efficacy and accuracy for intrusion detection systems (IDSs) remain questionable, particularly when dealing with problems involving imbalanced data. This article fills the void in the existing body of [...] Read more.

Gradient boosting ensembles have been used in the cyber-security area for many years; nonetheless, their efficacy and accuracy for intrusion detection systems (IDSs) remain questionable, particularly when dealing with problems involving imbalanced data. This article fills the void in the existing body of knowledge by evaluating the performance of gradient boosting-based ensembles, including gradient boosting machine (GBM), extreme gradient boosting (XGBoost), LightGBM, and CatBoost. This paper assesses the performance of various imbalanced data sets using the Matthew correlation coefficient (MCC), area under the receiver operating characteristic curve (AUC), and F1 metrics. The article discusses an example of anomaly detection in an industrial control network and, more specifically, threat detection in a cyber-physical smart power grid. The tests’ results indicate that CatBoost surpassed its competitors, regardless of the imbalance ratio of the data sets. Moreover, LightGBM showed a much lower performance value and had more variability across the data sets. Full article

(This article belongs to the Special Issue Cyber Security in Big Data Era)

► Show Figures

Figure 1

13 pages, 1797 KiB

Open AccessArticle

Breast and Lung Anticancer Peptides Classification Using N-Grams and Ensemble Learning Techniques

by Ayad Rodhan Abbas, Bashar Saadoon Mahdi and Osamah Younus Fadhil

Big Data Cogn. Comput. 2022, 6(2), 40; https://doi.org/10.3390/bdcc6020040 - 12 Apr 2022

Cited by 2 | Viewed by 3586

Abstract

Anticancer peptides (ACPs) are short protein sequences; they perform functions like some hormones and enzymes inside the body. The role of any protein or peptide is related to its structure and the sequence of amino acids that make up it. There are 20 [...] Read more.

Anticancer peptides (ACPs) are short protein sequences; they perform functions like some hormones and enzymes inside the body. The role of any protein or peptide is related to its structure and the sequence of amino acids that make up it. There are 20 types of amino acids in humans, and each of them has a particular characteristic according to its chemical structure. Current machine and deep learning models have been used to classify ACPs problems. However, these models have neglected Amino Acid Repeats (AARs) that play an essential role in the function and structure of peptides. Therefore, in this paper, ACPs offer a promising route for novel anticancer peptides by extracting AARs based on N-Grams and k-mers using two peptides’ datasets. These datasets pointed to breast and lung cancer cells assembled and curated manually from the Cancer Peptide and Protein Database (CancerPPD). Every dataset consists of a sequence of peptides and their synthesis and anticancer activity on breast and lung cancer cell lines. Five different feature selection methods were used in this paper to improve classification performance and reduce the experimental costs. After that, ACPs were classified using four classifiers, namely AdaBoost, Random Forest Tree (RFT), Multi-class Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). These classifiers were evaluated by applying five well-known evaluation metrics. Experimental results showed that the breast and lung ACPs classification process provided an accurate performance that reached 89.25% and 92.56%, respectively. In terms of AUC, it reached 95.35% and 96.92% for both breast and lung ACPs, respectively. The proposed classifiers performed competently somewhat equally in AUC, accuracy, precision, F-measures, and recall, except for Multi-class SVM-based feature selection, which showed superior performance. As a result, this paper significantly improved the predictive performance that can effectively distinguish ACPs as virtual inactive, experimental inactive, moderately active, and very active. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

24 pages, 3689 KiB

Open AccessArticle

PCB Component Detection Using Computer Vision for Hardware Assurance

by Wenwei Zhao, Suprith Reddy Gurudu, Shayan Taheri, Shajib Ghosh, Mukhil Azhagan Mallaiyan Sathiaseelan and Navid Asadizanjani

Big Data Cogn. Comput. 2022, 6(2), 39; https://doi.org/10.3390/bdcc6020039 - 8 Apr 2022

Cited by 17 | Viewed by 6916

Abstract

Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new [...] Read more.

Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new techniques are required to overcome the emerging problems. Existing ML-based methods outperform traditional CV methods; however, they often require more data, have low explainability, and can be difficult to adapt when a new technology arises. To overcome these challenges, CV methods can be used in tandem with ML methods. In particular, human-interpretable CV algorithms such as those that extract color, shape, and texture features increase PCB assurance explainability. This allows for incorporation of prior knowledge, which effectively reduces the number of trainable ML parameters and, thus, the amount of data needed to achieve high accuracy when training or retraining an ML model. Hence, this study explores the benefits and limitations of a variety of common computer vision-based features for the task of PCB component detection. The study results indicate that color features demonstrate promising performance for PCB component detection. The purpose of this paper is to facilitate collaboration between the hardware assurance, computer vision, and machine learning communities. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 6, Issue 2 (June 2022) – 36 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI