[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (17)

Search Parameters:
Keywords = Hadoop Ecosystem

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 10483 KiB  
Article
Evading Cyber-Attacks on Hadoop Ecosystem: A Novel Machine Learning-Based Security-Centric Approach towards Big Data Cloud
by Neeraj A. Sharma, Kunal Kumar, Tanzim Khorshed, A B M Shawkat Ali, Haris M. Khalid, S. M. Muyeen and Linju Jose
Information 2024, 15(9), 558; https://doi.org/10.3390/info15090558 - 10 Sep 2024
Viewed by 289
Abstract
The growing industry and its complex and large information sets require Big Data (BD) technology and its open-source frameworks (Apache Hadoop) to (1) collect, (2) analyze, and (3) process the information. This information usually ranges in size from gigabytes to petabytes of data. [...] Read more.
The growing industry and its complex and large information sets require Big Data (BD) technology and its open-source frameworks (Apache Hadoop) to (1) collect, (2) analyze, and (3) process the information. This information usually ranges in size from gigabytes to petabytes of data. However, processing this data involves web consoles and communication channels which are prone to intrusion from hackers. To resolve this issue, a novel machine learning (ML)-based security-centric approach has been proposed to evade cyber-attacks on the Hadoop ecosystem while considering the complexity of Big Data in Cloud (BDC). An Apache Hadoop-based management interface “Ambari” was implemented to address the variation and distinguish between attacks and activities. The analyzed experimental results show that the proposed scheme effectively (1) blocked the interface communication and retrieved the performance measured data from (2) the Ambari-based virtual machine (VM) and (3) BDC hypervisor. Moreover, the proposed architecture was able to provide a reduction in false alarms as well as cyber-attack detection. Full article
(This article belongs to the Special Issue Cybersecurity, Cybercrimes, and Smart Emerging Technologies)
Show Figures

Figure 1

Figure 1
<p>BD gaps and loopholes. Here, MAPreduce and HDFS are the acronyms for big data analysis model that processes data sets using a parallel algorithm on computer clusters and Hadoop Distributed File System.</p>
Full article ">Figure 2
<p>Graphical abstract of BDC and security vulnerabilities.</p>
Full article ">Figure 3
<p>BDC—ingredients and basis. In this figure, SaaS, PaaS, and IaaS are the acronyms of software as a service, platform as a service, and infrastructure as a service, respectively.</p>
Full article ">Figure 4
<p>Hadoop Ecosystem—an infrastructure. Here, HDFS is the acronym of Hadoop Distributed File System.</p>
Full article ">Figure 5
<p>Experimental design.</p>
Full article ">Figure 6
<p>Ambari-based web interface in pre-attack.</p>
Full article ">Figure 7
<p>Ambari-based web interfaced during an attack.</p>
Full article ">Figure 8
<p>Attack performed on VM port 8080 with Java LOIC.</p>
Full article ">Figure 9
<p>Hadoop VM performance graph—Generated attack using Java LOIC [<a href="#B28-information-15-00558" class="html-bibr">28</a>].</p>
Full article ">Figure 10
<p>Hadoop VM attack—Running RTDOS (Rixer) on default HTTP port 80.</p>
Full article ">Figure 11
<p>Hadoop VM during RTDoS attack (Rixer)—CPU performance and trends [<a href="#B24-information-15-00558" class="html-bibr">24</a>].</p>
Full article ">Figure 12
<p>Graphical presentation—an ML-driven workflow.</p>
Full article ">Figure 13
<p>Percentage-based comparative analysis. From left to right, the comparison is made between references [<a href="#B77-information-15-00558" class="html-bibr">77</a>,<a href="#B78-information-15-00558" class="html-bibr">78</a>,<a href="#B79-information-15-00558" class="html-bibr">79</a>,<a href="#B80-information-15-00558" class="html-bibr">80</a>,<a href="#B81-information-15-00558" class="html-bibr">81</a>], and proposed PART algorithm respectively.</p>
Full article ">
16 pages, 3541 KiB  
Article
Development of a Low-Cost Distributed Computing Pipeline for High-Throughput Cotton Phenotyping
by Vaishnavi Thesma, Glen C. Rains and Javad Mohammadpour Velni
Sensors 2024, 24(3), 970; https://doi.org/10.3390/s24030970 - 2 Feb 2024
Cited by 3 | Viewed by 1093
Abstract
In this paper, we present the development of a low-cost distributed computing pipeline for cotton plant phenotyping using Raspberry Pi, Hadoop, and deep learning. Specifically, we use a cluster of several Raspberry Pis in a primary-replica distributed architecture using the Apache Hadoop ecosystem [...] Read more.
In this paper, we present the development of a low-cost distributed computing pipeline for cotton plant phenotyping using Raspberry Pi, Hadoop, and deep learning. Specifically, we use a cluster of several Raspberry Pis in a primary-replica distributed architecture using the Apache Hadoop ecosystem and a pre-trained Tiny-YOLOv4 model for cotton bloom detection from our past work. We feed cotton image data collected from a research field in Tifton, GA, into our cluster’s distributed file system for robust file access and distributed, parallel processing. We then submit job requests to our cluster from our client to process cotton image data in a distributed and parallel fashion, from pre-processing to bloom detection and spatio-temporal map creation. Additionally, we present a comparison of our four-node cluster performance with centralized, one-, two-, and three-node clusters. This work is the first to develop a distributed computing pipeline for high-throughput cotton phenotyping in field-based agriculture. Full article
(This article belongs to the Special Issue Sensor and AI Technologies in Intelligent Agriculture)
Show Figures

Figure 1

Figure 1
<p>Top-down view of cotton field plot layout in Tifton, GA.</p>
Full article ">Figure 2
<p>Front view of the rover deployed to collect video streams of cotton plants in Tifton, GA.</p>
Full article ">Figure 3
<p>An example of ZED2 stereo camera cotton image data collected on 26 August from our research cotton farm in Tifton, GA. The image frame contains both left and right views, and several open blooms are apparent.</p>
Full article ">Figure 4
<p>Our proposed Hadoop cluster consisting of a client, a primary node, and three replica nodes.</p>
Full article ">Figure 5
<p>HDFS architecture for our proposed four-node cluster.</p>
Full article ">Figure 6
<p>YARN architecture for our proposed four-node cluster.</p>
Full article ">Figure 7
<p>An overview of our distributed computing cluster setup with four nodes.</p>
Full article ">Figure 8
<p>A closeup of our distributed computing cluster setup with four nodes.</p>
Full article ">Figure 9
<p><a href="#sensors-24-00970-f003" class="html-fig">Figure 3</a> split into the left and right image frames using MapReduce and our distributed cluster.</p>
Full article ">Figure 10
<p>Summary of our proposed workflow in this paper.</p>
Full article ">Figure 11
<p>An illustrative example of our spatio-temporal maps.</p>
Full article ">
34 pages, 10875 KiB  
Article
EverAnalyzer: A Self-Adjustable Big Data Management Platform Exploiting the Hadoop Ecosystem
by Panagiotis Karamolegkos, Argyro Mavrogiorgou, Athanasios Kiourtis and Dimosthenis Kyriazis
Information 2023, 14(2), 93; https://doi.org/10.3390/info14020093 - 3 Feb 2023
Cited by 4 | Viewed by 2061
Abstract
Big Data is a phenomenon that affects today’s world, with new data being generated every second. Today’s enterprises face major challenges from the increasingly diverse data, as well as from indexing, searching, and analyzing such enormous amounts of data. In this context, several [...] Read more.
Big Data is a phenomenon that affects today’s world, with new data being generated every second. Today’s enterprises face major challenges from the increasingly diverse data, as well as from indexing, searching, and analyzing such enormous amounts of data. In this context, several frameworks and libraries for processing and analyzing Big Data exist. Among those frameworks Hadoop MapReduce, Mahout, Spark, and MLlib appear to be the most popular, although it is unclear which of them best suits and performs in various data processing and analysis scenarios. This paper proposes EverAnalyzer, a self-adjustable Big Data management platform built to fill this gap by exploiting all of these frameworks. The platform is able to collect data both in a streaming and in a batch manner, utilizing the metadata obtained from its users’ processing and analytical processes applied to the collected data. Based on this metadata, the platform recommends the optimum framework for the data processing/analytical activities that the users aim to execute. To verify the platform’s efficiency, numerous experiments were carried out using 30 diverse datasets related to various diseases. The results revealed that EverAnalyzer correctly suggested the optimum framework in 80% of the cases, indicating that the platform made the best selections in the majority of the experiments. Full article
Show Figures

Figure 1

Figure 1
<p>Big Data lifecycle.</p>
Full article ">Figure 2
<p>EverAnalyzer High Level Architecture.</p>
Full article ">Figure 3
<p>EverAnalyzer Low Level Architecture.</p>
Full article ">Figure 4
<p>(<b>a</b>) MapReduce proposition flow; (<b>b</b>) Spark proposition flow.</p>
Full article ">Figure 5
<p>Analytics proposition flow.</p>
Full article ">Figure 6
<p>Use Case diagram of EverAnalyzer users.</p>
Full article ">Figure 7
<p>Use Case diagram of EverAnalyzer Objective #1.</p>
Full article ">Figure 8
<p>Use Case diagram of EverAnalyzer Objective #2.</p>
Full article ">Figure 9
<p>Use Case diagram of EverAnalyzer Objective #3.</p>
Full article ">Figure 10
<p>Use Case diagram of EverAnalyzer Objective #4.</p>
Full article ">Figure 11
<p>Use Case diagram of EverAnalyzer Objectives #5 and #6.</p>
Full article ">Figure 12
<p>Use Case diagram of EverAnalyzer Objective #7.</p>
Full article ">Figure 13
<p>(<b>a</b>) Sign-in interface; (<b>b</b>) Sign-up interface.</p>
Full article ">Figure 14
<p>Homepage Interface.</p>
Full article ">Figure 15
<p>Collection Interface.</p>
Full article ">Figure 16
<p>(<b>a</b>) Collected datasets; (<b>b</b>) Pre-processing form.</p>
Full article ">Figure 17
<p>Processing Interface—Pre-processing datasets.</p>
Full article ">Figure 18
<p>(<b>a</b>) Processing form; (<b>b</b>) Processing proposal.</p>
Full article ">Figure 19
<p>Analytics Interface—Pre-processing datasets.</p>
Full article ">Figure 20
<p>(<b>a</b>) Analytics form; (<b>b</b>) Analytics proposal.</p>
Full article ">Figure 21
<p>(<b>a</b>) Visualization lists; (<b>b</b>) Visualizable results.</p>
Full article ">Figure 22
<p>(<b>a</b>) Visualization of processing task; (<b>b</b>) Visualization of analysis task.</p>
Full article ">Figure 23
<p>(<b>a</b>) Management lists; (<b>b</b>) Management results.</p>
Full article ">Figure 24
<p>(<b>a</b>) Viewing pre-processing results; (<b>b</b>) Viewing processing results.</p>
Full article ">Figure 25
<p>EverAnalyzer correct suggestion streaks.</p>
Full article ">Figure 26
<p>(<b>a</b>) Worst—best execution with best speed; (<b>b</b>) Worst—best execution with EverAnalyzer.</p>
Full article ">
28 pages, 4528 KiB  
Article
A Framework for Attribute-Based Access Control in Processing Big Data with Multiple Sensitivities
by Anne M. Tall and Cliff C. Zou
Appl. Sci. 2023, 13(2), 1183; https://doi.org/10.3390/app13021183 - 16 Jan 2023
Cited by 7 | Viewed by 4698
Abstract
There is an increasing demand for processing large volumes of unstructured data for a wide variety of applications. However, protection measures for these big data sets are still in their infancy, which could lead to significant security and privacy issues. Attribute-based access control [...] Read more.
There is an increasing demand for processing large volumes of unstructured data for a wide variety of applications. However, protection measures for these big data sets are still in their infancy, which could lead to significant security and privacy issues. Attribute-based access control (ABAC) provides a dynamic and flexible solution that is effective for mediating access. We analyzed and implemented a prototype application of ABAC to large dataset processing in Amazon Web Services, using open-source versions of Apache Hadoop, Ranger, and Atlas. The Hadoop ecosystem is one of the most popular frameworks for large dataset processing and storage and is adopted by major cloud service providers. We conducted a rigorous analysis of cybersecurity in implementing ABAC policies in Hadoop, including developing a synthetic dataset of information at multiple sensitivity levels that realistically represents healthcare and connected social media data. We then developed Apache Spark programs that extract, connect, and transform data in a manner representative of a realistic use case. Our result is a framework for securing big data. Applying this framework ensures that serious cybersecurity concerns are addressed. We provide details of our analysis and experimentation code in a GitHub repository for further research by the community. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>HDFS, YARN, and the Hadoop client component interfaces. Solid lines indicate data and job exchanges during program execution and dashed lines indicate FSImage information exchange.</p>
Full article ">Figure 2
<p>Modified NIST SP 800-162 ABAC trust chain with BDP ecosystem attributes.</p>
Full article ">Figure 3
<p>A standard architecture includes multiple points that contribute to the AC process.</p>
Full article ">Figure 4
<p>A multi-tenant, multi-level “wellness program” use-case for analyzing BDP security.</p>
Full article ">Figure 5
<p>Attributes assigned and managed for AC policy decisions in the healthcare use case. The shades of the color red indicates sensitive data, shades of green indicates public data that may contain PII, and yellow indicates data where sensitive information has been removed.</p>
Full article ">Figure 6
<p>Apache Ranger and Atlas interfaces to HDFS Name Node and YARN Resource Manager. Directory information exchanges indicated by black dashed lines, Ranger policy information exchanges are indicated by solid black lines, Atlas attribute tag information exchanges are indicated by dashed red lines, and synchronizations between the directory, Ranger and Atlas are indicated by double headed arrows.</p>
Full article ">Figure 7
<p>The sequence of attribute information exchange in job execution.</p>
Full article ">Figure 8
<p>Example Atlas entity for HDFS folders with attributes.</p>
Full article ">Figure 9
<p>AWS EC2 instances for the security analysis of Apache Hadoop.</p>
Full article ">Figure 10
<p>Lifecycle of data processing for security experiments.</p>
Full article ">Figure 11
<p>Implementation of ABAC using the HDFS, LDAP, YARN, Ranger, and Atlas.</p>
Full article ">Figure 12
<p>Configuration of data processing and propagation of attribute classification in Atlas.</p>
Full article ">Figure 13
<p>Elapsed times for executions of the PySpark program under different configurations.</p>
Full article ">
20 pages, 616 KiB  
Article
The Time Machine in Columnar NoSQL Databases: The Case of Apache HBase
by Chia-Ping Tsai, Che-Wei Chang, Hung-Chang Hsiao and Haiying Shen
Future Internet 2022, 14(3), 92; https://doi.org/10.3390/fi14030092 - 15 Mar 2022
Cited by 3 | Viewed by 2848
Abstract
Not Only SQL (NoSQL) is a critical technology that is scalable and provides flexible schemas, thereby complementing existing relational database technologies. Although NoSQL is flourishing, present solutions lack the features required by enterprises for critical missions. In this paper, we explore solutions to [...] Read more.
Not Only SQL (NoSQL) is a critical technology that is scalable and provides flexible schemas, thereby complementing existing relational database technologies. Although NoSQL is flourishing, present solutions lack the features required by enterprises for critical missions. In this paper, we explore solutions to the data recovery issue in NoSQL. Data recovery for any database table entails restoring the table to a prior state or replaying (insert/update) operations over the table given a time period in the past. Recovery of NoSQL database tables enables applications such as failure recovery, analysis for historical data, debugging, and auditing. Particularly, our study focuses on columnar NoSQL databases. We propose and evaluate two solutions to address the data recovery problem in columnar NoSQL and implement our solutions based on Apache HBase, a popular NoSQL database in the Hadoop ecosystem widely adopted across industries. Our implementations are extensively benchmarked with an industrial NoSQL benchmark under real environments. Full article
(This article belongs to the Section Network Virtualization and Edge/Fog Computing)
Show Figures

Figure 1

Figure 1
<p>System model [<a href="#B5-futureinternet-14-00092" class="html-bibr">5</a>].</p>
Full article ">Figure 2
<p>Overall system architecture.</p>
Full article ">Figure 3
<p>(<b>a</b>) The mapper-based architecture and (<b>b</b>) the mapreduce-based architecture (Gray areas are major components that are involved in the recovery process).</p>
Full article ">Figure 4
<p>Latency of the recovery process (the table restored in the source cluster is in size of 30 Gbytes).</p>
Full article ">Figure 5
<p>Mean delay of read operations.</p>
Full article ">Figure 6
<p>Overheads, where (<b>a</b>) is the delay for write operations, and (<b>b</b>) the storage space required.</p>
Full article ">Figure 7
<p>Latency of the recovery process, where the table restored in the source cluster is in size of 3 to 300 Gbytes.</p>
Full article ">Figure 8
<p>Effects of varying the cluster size (the ratio of 20-node source/shadow/destination cluster to 10-node one).</p>
Full article ">
24 pages, 1008 KiB  
Article
SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink
by Oscar Ceballos, Carlos Alberto Ramírez Restrepo, María Constanza Pabón, Andres M. Castillo and Oscar Corcho
Appl. Sci. 2021, 11(15), 7033; https://doi.org/10.3390/app11157033 - 30 Jul 2021
Cited by 4 | Viewed by 2610
Abstract
Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based [...] Read more.
Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher data processing performance. In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API. We use this formalization to provide a mapping to translate a SPARQL query to a Flink program. The mapping was implemented in a prototype used to determine the correctness and performance of the solution. The source code of the project is available in Github under the MIT license. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>SPARQL2Flink conceptual architecture.</p>
Full article ">Figure 2
<p>Execution times of nine queries after running the first scalability test. The x-axis represents the number of nodes on cluster C1, C3, and C5. The y-axis represents the time in seconds, which includes the dataset loading time (dlt), the query execution time (qet), and the amount of time taken in creating the file with query results. Additionally, on the top, the number of triples of each dataset is shown.</p>
Full article ">Figure 3
<p>Execution times of nine queries after running the second scalability test. The x-axis represents the number of nodes on cluster C2, C4, and C5. The y-axis represents the time in seconds, which includes the dataset load time (dlt), the query execution time (qet), and the creation file time with query results. Additionally, on the top, the number of dataset triples is shown.</p>
Full article ">
15 pages, 7168 KiB  
Article
Design and Implementation of Edge-Fog-Cloud System through HD Map Generation from LiDAR Data of Autonomous Vehicles
by Junwon Lee, Kieun Lee, Aelee Yoo and Changjoo Moon
Electronics 2020, 9(12), 2084; https://doi.org/10.3390/electronics9122084 - 7 Dec 2020
Cited by 22 | Viewed by 3588
Abstract
Self-driving cars, autonomous vehicles (AVs), and connected cars combine the Internet of Things (IoT) and automobile technologies, thus contributing to the development of society. However, processing the big data generated by AVs is a challenge due to overloading issues. Additionally, near real-time/real-time IoT [...] Read more.
Self-driving cars, autonomous vehicles (AVs), and connected cars combine the Internet of Things (IoT) and automobile technologies, thus contributing to the development of society. However, processing the big data generated by AVs is a challenge due to overloading issues. Additionally, near real-time/real-time IoT services play a significant role in vehicle safety. Therefore, the architecture of an IoT system that collects and processes data, and provides services for vehicle driving, is an important consideration. In this study, we propose a fog computing server model that generates a high-definition (HD) map using light detection and ranging (LiDAR) data generated from an AV. The driving vehicle edge node transmits the LiDAR point cloud information to the fog server through a wireless network. The fog server generates an HD map by applying the Normal Distribution Transform-Simultaneous Localization and Mapping(NDT-SLAM) algorithm to the point clouds transmitted from the multiple edge nodes. Subsequently, the coordinate information of the HD map generated in the sensor frame is converted to the coordinate information of the global frame and transmitted to the cloud server. Then, the cloud server creates an HD map by integrating the collected point clouds using coordinate information. Full article
(This article belongs to the Special Issue IoT Sensor Network Application)
Show Figures

Figure 1

Figure 1
<p>Cloud server vs. fog server. (<b>a</b>) Concept of cloud server; (<b>b</b>) Concept of fog server.</p>
Full article ">Figure 2
<p>Edge-fog-cloud architecture.</p>
Full article ">Figure 3
<p>Concept of robot operating system (ROS) message system communication.</p>
Full article ">Figure 4
<p>Kafka message system communication.</p>
Full article ">Figure 5
<p>Kafka message system vs. ROS message system.</p>
Full article ">Figure 6
<p>Hadoop cluster.</p>
Full article ">Figure 7
<p>Driving route plan.</p>
Full article ">Figure 8
<p>Edge-fog-cloud system.</p>
Full article ">Figure 9
<p>Schematic diagram of edge-fog-cloud system functionality.</p>
Full article ">Figure 10
<p>Sensor data security concept diagram of an autonomous vehicles (AV).</p>
Full article ">Figure 11
<p>AV’s sensor data encryption and decryption example.</p>
Full article ">Figure 12
<p>High-definition (HD) map generated for each route.</p>
Full article ">Figure 13
<p>Integrated large-scale HD map.</p>
Full article ">Figure 14
<p>Distributed stored HD map in DataNode.</p>
Full article ">Figure 15
<p>Changes in processing time according to the number of vehicles. (<b>a</b>) Processing time according to the number of vehicles; (<b>b</b>) Change in processing time according to the number of vehicles.</p>
Full article ">Figure 16
<p>Changes in virtual memory usage and processing speed according to the number of vehicles. (<b>a</b>) Virtual memory usage according to the number of vehicles; (<b>b</b>) Processing speed according to the number of vehicles.</p>
Full article ">
16 pages, 5166 KiB  
Article
Implementation of a Sensor Big Data Processing System for Autonomous Vehicles in the C-ITS Environment
by Aelee Yoo, Sooyeon Shin, Junwon Lee and Changjoo Moon
Appl. Sci. 2020, 10(21), 7858; https://doi.org/10.3390/app10217858 - 5 Nov 2020
Cited by 11 | Viewed by 9792
Abstract
To provide a service that guarantees driver comfort and safety, a platform utilizing connected car big data is required. This study first aims to design and develop such a platform to improve the function of providing vehicle and road condition information of the [...] Read more.
To provide a service that guarantees driver comfort and safety, a platform utilizing connected car big data is required. This study first aims to design and develop such a platform to improve the function of providing vehicle and road condition information of the previously defined central Local Dynamic Map (LDM). Our platform extends the range of connected car big data collection from OBU (On Board Unit) and CAN to camera, LiDAR, and GPS sensors. By using data of vehicles being driven, the range of roads available for analysis can be expanded, and the road condition determination method can be diversified. Herein, the system was designed and implemented based on the Hadoop ecosystem, i.e., Hadoop, Spark, and Kafka, to collect and store connected car big data. We propose a direction of the cooperative intelligent transport system (C-ITS) development by showing a plan to utilize the platform in the C-ITS environment. Full article
(This article belongs to the Special Issue Internet of Things (IoT))
Show Figures

Figure 1

Figure 1
<p>Sensor big data processing system for autonomous vehicles in the cooperative intelligent transport system environment.</p>
Full article ">Figure 2
<p>Concept of LDM.</p>
Full article ">Figure 3
<p>Kafka system architecture.</p>
Full article ">Figure 4
<p>(<b>a</b>) Existing C-ITS environment diagram; (<b>b</b>) C-ITS environment with the proposed platform diagram.</p>
Full article ">Figure 5
<p>Overview of the proposed C-ITS environment.</p>
Full article ">Figure 6
<p>Architecture of the vehicle system in the proposed C-ITS environment.</p>
Full article ">Figure 7
<p>Architecture of the platform in the proposed C-ITS environment.</p>
Full article ">Figure 8
<p>Schema of the RDBMS.</p>
Full article ">Figure 9
<p>(<b>a</b>) Server; (<b>b</b>) test vehicle.</p>
Full article ">Figure 10
<p>Implementation of the proposed C-ITS environment.</p>
Full article ">Figure 11
<p>Autonomous vehicle data collection process.</p>
Full article ">Figure 12
<p>Message from ROS to the database.</p>
Full article ">Figure 13
<p>Zeppelin-based data visualization.</p>
Full article ">Figure 14
<p>(<b>a</b>) Location information transmitted when Spark detects abnormal data; (<b>b</b>) web-based visualization of the messages delivered to the central LDM.</p>
Full article ">
20 pages, 893 KiB  
Article
A Hadoop-Based Platform for Patient Classification and Disease Diagnosis in Healthcare Applications
by Hassan Harb, Hussein Mroue, Ali Mansour, Abbass Nasser and Eduardo Motta Cruz
Sensors 2020, 20(7), 1931; https://doi.org/10.3390/s20071931 - 30 Mar 2020
Cited by 27 | Viewed by 6393
Abstract
Nowadays, the increasing number of patients accompanied with the emergence of new symptoms and diseases makes heath monitoring and assessment a complicated task for medical staff and hospitals. Indeed, the processing of big and heterogeneous data collected by biomedical sensors along with the [...] Read more.
Nowadays, the increasing number of patients accompanied with the emergence of new symptoms and diseases makes heath monitoring and assessment a complicated task for medical staff and hospitals. Indeed, the processing of big and heterogeneous data collected by biomedical sensors along with the need of patients’ classification and disease diagnosis become major challenges for several health-based sensing applications. Thus, the combination between remote sensing devices and the big data technologies have been proven as an efficient and low cost solution for healthcare applications. In this paper, we propose a robust big data analytics platform for real time patient monitoring and decision making to help both hospital and medical staff. The proposed platform relies on big data technologies and data analysis techniques and consists of four layers: real time patient monitoring, real time decision and data storage, patient classification and disease diagnosis, and data retrieval and visualization. To evaluate the performance of our platform, we implemented our platform based on the Hadoop ecosystem and we applied the proposed algorithms over real health data. The obtained results show the effectiveness of our platform in terms of efficiently performing patient classification and disease diagnosis in healthcare applications. Full article
(This article belongs to the Special Issue Sensor and Systems Evaluation for Telemedicine and eHealth)
Show Figures

Figure 1

Figure 1
<p>Architecture of our platform.</p>
Full article ">Figure 2
<p>National Early Warning Score (NEWS) [<a href="#B30-sensors-20-01931" class="html-bibr">30</a>].</p>
Full article ">Figure 3
<p>NEWS Clinical Response (NEWS-CR) [<a href="#B30-sensors-20-01931" class="html-bibr">30</a>].</p>
Full article ">Figure 4
<p>Variation of raw record data during 4 h of patient monitoring.</p>
Full article ">Figure 5
<p>Distribution of patients over clusters.</p>
Full article ">Figure 6
<p>Illustrative example for distribution of patients’ IDs over clusters.</p>
Full article ">Figure 7
<p>Number of iterations when applying SKmeans and traditional Kmeans.</p>
Full article ">Figure 8
<p>Execution time when applying SKmeans and Kmeans.</p>
Full article ">Figure 9
<p>Clustering accuracy of SKmeans and Kmeans.</p>
Full article ">Figure 10
<p>Variation of number of rules as a function of <math display="inline"><semantics> <mi>μ</mi> </semantics></math> and <math display="inline"><semantics> <mi>ρ</mi> </semantics></math>.</p>
Full article ">
24 pages, 552 KiB  
Article
Two-Step Classification with SVD Preprocessing of Distributed Massive Datasets in Apache Spark
by Athanasios Alexopoulos, Georgios Drakopoulos, Andreas Kanavos, Phivos Mylonas and Gerasimos Vonitsanos
Algorithms 2020, 13(3), 71; https://doi.org/10.3390/a13030071 - 24 Mar 2020
Cited by 14 | Viewed by 4585
Abstract
At the dawn of the 10V or big data data era, there are a considerable number of sources such as smart phones, IoT devices, social media, smart city sensors, as well as the health care system, all of which constitute but a small [...] Read more.
At the dawn of the 10V or big data data era, there are a considerable number of sources such as smart phones, IoT devices, social media, smart city sensors, as well as the health care system, all of which constitute but a small portion of the data lakes feeding the entire big data ecosystem. This 10V data growth poses two primary challenges, namely storing and processing. Concerning the latter, new frameworks have been developed including distributed platforms such as the Hadoop ecosystem. Classification is a major machine learning task typically executed on distributed platforms and as a consequence many algorithmic techniques have been developed tailored for these platforms. This article extensively relies in two ways on classifiers implemented in MLlib, the main machine learning library for the Hadoop ecosystem. First, a vast number of classifiers is applied to two datasets, namely Higgs and PAMAP. Second, a two-step classification is ab ovo performed to the same datasets. Specifically, the singular value decomposition of the data matrix determines first a set of transformed attributes which in turn drive the classifiers of MLlib. The twofold purpose of the proposed architecture is to reduce complexity while maintaining a similar if not better level of the metrics of accuracy, recall, and F 1 . The intuition behind this approach stems from the engineering principle of breaking down complex problems to simpler and more manageable tasks. The experiments based on the same Spark cluster indicate that the proposed architecture outperforms the individual classifiers with respect to both complexity and the abovementioned metrics. Full article
(This article belongs to the Special Issue Mining Humanistic Data 2019)
Show Figures

Figure 1

Figure 1
<p>Knowledge discovery pipeline.</p>
Full article ">Figure 2
<p>Proposed system architecture.</p>
Full article ">Figure 3
<p>Apache Spark API stack.</p>
Full article ">
30 pages, 2154 KiB  
Review
Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications
by Ifeyinwa Angela Ajah and Henry Friday Nweke
Big Data Cogn. Comput. 2019, 3(2), 32; https://doi.org/10.3390/bdcc3020032 - 10 Jun 2019
Cited by 100 | Viewed by 46346
Abstract
Big data and business analytics are trends that are positively impacting the business world. Past researches show that data generated in the modern world is huge and growing exponentially. These include structured and unstructured data that flood organizations daily. Unstructured data constitute the [...] Read more.
Big data and business analytics are trends that are positively impacting the business world. Past researches show that data generated in the modern world is huge and growing exponentially. These include structured and unstructured data that flood organizations daily. Unstructured data constitute the majority of the world’s digital data and these include text files, web, and social media posts, emails, images, audio, movies, etc. The unstructured data cannot be managed in the traditional relational database management system (RDBMS). Therefore, data proliferation requires a rethinking of techniques for capturing, storing, and processing the data. This is the role big data has come to play. This paper, therefore, is aimed at increasing the attention of organizations and researchers to various applications and benefits of big data technology. The paper reviews and discusses, the recent trends, opportunities and pitfalls of big data and how it has enabled organizations to create successful business strategies and remain competitive, based on available literature. Furthermore, the review presents the various applications of big data and business analytics, data sources generated in these applications and their key characteristics. Finally, the review not only outlines the challenges for successful implementation of big data projects but also highlights the current open research directions of big data analytics that require further consideration. The reviewed areas of big data suggest that good management and manipulation of the large data sets using the techniques and tools of big data can deliver actionable insights that create business values. Full article
Show Figures

Figure 1

Figure 1
<p>Structure of the review paper.</p>
Full article ">Figure 2
<p>The Gartner’s Vector model.</p>
Full article ">Figure 3
<p>Business analytics process.</p>
Full article ">Figure 4
<p>Functional view of Hadoop.</p>
Full article ">Figure 5
<p>The primary component of Hadoop cluster.</p>
Full article ">Figure 6
<p>Overview of big data and business analytics in Hadoop.</p>
Full article ">Figure 7
<p>Common tools used in a Hadoop cluster.</p>
Full article ">
18 pages, 7491 KiB  
Article
Improvement of Kafka Streaming Using Partition and Multi-Threading in Big Data Environment
by Bunrong Leang, Sokchomrern Ean, Ga-Ae Ryu and Kwan-Hee Yoo
Sensors 2019, 19(1), 134; https://doi.org/10.3390/s19010134 - 2 Jan 2019
Cited by 14 | Viewed by 8208
Abstract
The large amount of programmable logic controller (PLC) sensing data has rapidly increased in the manufacturing environment. Therefore, a large data store is necessary for Big Data platforms. In this paper, we propose a Hadoop ecosystem for the support of many features in [...] Read more.
The large amount of programmable logic controller (PLC) sensing data has rapidly increased in the manufacturing environment. Therefore, a large data store is necessary for Big Data platforms. In this paper, we propose a Hadoop ecosystem for the support of many features in the manufacturing industry. In this ecosystem, Apache Hadoop and HBase are used as Big Data storage and handle large scale data. In addition, Apache Kafka is used as a data streaming pipeline which contains many configurations and properties that are used to make a better-designed environment and a reliable system, such as Kafka offset and partition, which is used for program scaling purposes. Moreover, Apache Spark closely works with Kafka consumers to create a real-time processing and analysis of the data. Meanwhile, data security is applied in the data transmission phase between the Kafka producers and consumers. Public-key cryptography is performed as a security method which contains public and private keys. Additionally, the public-key is located in the Kafka producer, and the private-key is stored in the Kafka consumer. The integration of these above technologies will enhance the performance and accuracy of data storing, processing, and securing in the manufacturing environment. Full article
Show Figures

Figure 1

Figure 1
<p>System Architecture and Flow.</p>
Full article ">Figure 2
<p>Hadoop and HBase Cluster.</p>
Full article ">Figure 3
<p>Design Hadoop Ecosystem.</p>
Full article ">Figure 4
<p>Kafka Offset Configurations and Properties.</p>
Full article ">Figure 5
<p>Partitioning in Kafka Consumer.</p>
Full article ">Figure 6
<p>Multi-threading in Kafka Consumer Program.</p>
Full article ">Figure 7
<p>Recommend the number of partitions and threads.</p>
Full article ">Figure 8
<p>Secured Kafka messages by using public/private key cryptography.</p>
Full article ">Figure 9
<p>Six programs for grabbing sensing data.</p>
Full article ">Figure 10
<p>Kafka Producers with partitions and threads (gathering the sensing data from programmable logic controller (PLCs)).</p>
Full article ">Figure 11
<p>Hadoop Cluster Installation and Information.</p>
Full article ">Figure 12
<p>The Testing performance of PLCs and partitions in Kafka.</p>
Full article ">Figure 13
<p>Using Spark Cluster to improve processing time.</p>
Full article ">Figure 14
<p>Using Multithreading in Kafka Consumers to reduce the time-consuming.</p>
Full article ">
34 pages, 19994 KiB  
Article
A Magnetoencephalographic/Encephalographic (MEG/EEG) Brain-Computer Interface Driver for Interactive iOS Mobile Videogame Applications Utilizing the Hadoop Ecosystem, MongoDB, and Cassandra NoSQL Databases
by Wilbert McClay
Diseases 2018, 6(4), 89; https://doi.org/10.3390/diseases6040089 - 28 Sep 2018
Cited by 6 | Viewed by 7389
Abstract
In Phase I, we collected data on five subjects yielding over 90% positive performance in Magnetoencephalographic (MEG) mid-and post-movement activity. In addition, a driver was developed that substituted the actions of the Brain Computer Interface (BCI) as mouse button presses for real-time use [...] Read more.
In Phase I, we collected data on five subjects yielding over 90% positive performance in Magnetoencephalographic (MEG) mid-and post-movement activity. In addition, a driver was developed that substituted the actions of the Brain Computer Interface (BCI) as mouse button presses for real-time use in visual simulations. The process was interfaced to a flight visualization demonstration utilizing left or right brainwave thought movement, the user experiences, the aircraft turning in the chosen direction, or on iOS Mobile Warfighter Videogame application. The BCI’s data analytics of a subject’s MEG brain waves and flight visualization performance videogame analytics were stored and analyzed using the Hadoop Ecosystem as a quick retrieval data warehouse. In Phase II portion of the project involves the Emotiv Encephalographic (EEG) Wireless Brain–Computer interfaces (BCIs) allow for people to establish a novel communication channel between the human brain and a machine, in this case, an iOS Mobile Application(s). The EEG BCI utilizes advanced and novel machine learning algorithms, as well as the Spark Directed Acyclic Graph (DAG), Cassandra NoSQL database environment, and also the competitor NoSQL MongoDB database for housing BCI analytics of subject’s response and users’ intent illustrated for both MEG/EEG brainwave signal acquisition. The wireless EEG signals that were acquired from the OpenVibe and the Emotiv EPOC headset can be connected via Bluetooth to an iPhone utilizing a thin Client architecture. The use of NoSQL databases were chosen because of its schema-less architecture and Map Reduce computational paradigm algorithm for housing a user’s brain signals from each referencing sensor. Thus, in the near future, if multiple users are playing on an online network connection and an MEG/EEG sensor fails, or if the connection is lost from the smartphone and the webserver due to low battery power or failed data transmission, it will not nullify the NoSQL document-oriented (MongoDB) or column-oriented Cassandra databases. Additionally, NoSQL databases have fast querying and indexing methodologies, which are perfect for online game analytics and technology. In Phase II, we collected data on five MEG subjects, yielding over 90% positive performance on iOS Mobile Applications with Objective-C and C++, however on EEG signals utilized on three subjects with the Emotiv wireless headsets and (n < 10) subjects from the OpenVibe EEG database the Variational Bayesian Factor Analysis Algorithm (VBFA) yielded below 60% performance and we are currently pursuing extending the VBFA algorithm to work in the time-frequency domain referred to as VBFA-TF to enhance EEG performance in the near future. The novel usage of NoSQL databases, Cassandra and MongoDB, were the primary main enhancements of the BCI Phase II MEG/EEG brain signal data acquisition, queries, and rapid analytics, with MapReduce and Spark DAG demonstrating future implications for next generation biometric MEG/EEG NoSQL databases. Full article
(This article belongs to the Section Neuro-psychiatric Disorders)
Show Figures

Figure 1

Figure 1
<p>Phase II, MongoDB MEG Brain Computer Interface Database(s).</p>
Full article ">Figure 2
<p>Phase II, magnetoencephalography brain-computer interface(s) (MEG BCI) with Apple iOS Mobile Applications stored in MongoDB and Cassandra.</p>
Full article ">Figure 3
<p>Yongwook Chae, “EYE-BRAIN INTERFACE (ERI) SYSTEM AND METHOD FOR CONTROLLING SAME”, US2018/0196511.</p>
Full article ">Figure 4
<p>University of San Francisco in California (UCSF) MEG Scanner with Superconducting Quantum Interference Device (SQUID) detectors.</p>
Full article ">Figure 5
<p>Phase I, “A Real-Time Magnetoencephalography Brain-Computer Interface Using Interactive three-dimensional 3D-Visualization and the Hadoop Ecosystem”, Journal of Brain Sciences, 2015.</p>
Full article ">Figure 6
<p>Phase I, “A Real-Time Magnetoencephalography Brain-Computer Interface Using Interactive 3D-Visualization and the Hadoop Ecosystem”, flowchart process of BCI analytics in the Hadoop Ecosystem.</p>
Full article ">Figure 7
<p>Phase I, “A Real-Time Magnetoencephalography Brain-Computer Interface Using Interactive 3D-Visualization and the Hadoop Ecosystem”, Pig analysis for MEG Subject performance on Warfighter.</p>
Full article ">Figure 8
<p>(<b>a</b>) Phase II, MongoDB Magnetoencephalography Brain-Computer Interface Database. (<b>b</b>) Phase II, Variational Bayesian Factor Analysis (VBFA) Machine Learning Algorithm. (<b>c</b>) Phase II, MEG Subject Brain Wave Data and VBFAgeneratorCTF training matrices in MongoDBdatabase(s). (<b>d</b>) Phase II, C code testVBFA function on MEG Subject Brainwave Data.</p>
Full article ">Figure 8 Cont.
<p>(<b>a</b>) Phase II, MongoDB Magnetoencephalography Brain-Computer Interface Database. (<b>b</b>) Phase II, Variational Bayesian Factor Analysis (VBFA) Machine Learning Algorithm. (<b>c</b>) Phase II, MEG Subject Brain Wave Data and VBFAgeneratorCTF training matrices in MongoDBdatabase(s). (<b>d</b>) Phase II, C code testVBFA function on MEG Subject Brainwave Data.</p>
Full article ">Figure 9
<p>Phase II, MongoDB Magnetoencephalography Brain-Computer Interface Database storage of MEG Subject Variational Bayesian Factor Analysis training matrices and MEG Subject Performance and Metadata.</p>
Full article ">Figure 10
<p>MEG Brainwave data acquisition in MongoDB with 12-byte BSON timestamp representing ObjectID for Epoch Trial performance for MEG Subject.</p>
Full article ">Figure 11
<p>(<b>a</b>) MEG Brainwave data acquisition in MongoDB with 12-byte BSON timestamp representing ObjectID representing Subject’s Training Matrices acquired during VBFA Machine learning algorithm training on MEG brainwaves. (<b>b</b>) MEG Brainwave data acquisition in MongoDB with 12-byte BSON timestamp representing ObjectID representing with Subject Brainwaves controlling flight of Warfighter simulation. (<b>c</b>) Nazzy Ironman Subject MEG Brain Computer Interface to Warfighter Flight Simulator iOS Mobile Applications yielding over 90% performance on MEG Subject brain signal data. (<b>d</b>) Nazzy Ironman Subject MEG Brain Computer Interface to Warfighter Flight Simulator iOS Mobile Applications stored in MongoDB databases yielding over 90% performance on Subject Data, demonstrated in <a href="#diseases-06-00089-f009" class="html-fig">Figure 9</a>, <a href="#diseases-06-00089-f010" class="html-fig">Figure 10</a> and <a href="#diseases-06-00089-f011" class="html-fig">Figure 11</a>.</p>
Full article ">Figure 11 Cont.
<p>(<b>a</b>) MEG Brainwave data acquisition in MongoDB with 12-byte BSON timestamp representing ObjectID representing Subject’s Training Matrices acquired during VBFA Machine learning algorithm training on MEG brainwaves. (<b>b</b>) MEG Brainwave data acquisition in MongoDB with 12-byte BSON timestamp representing ObjectID representing with Subject Brainwaves controlling flight of Warfighter simulation. (<b>c</b>) Nazzy Ironman Subject MEG Brain Computer Interface to Warfighter Flight Simulator iOS Mobile Applications yielding over 90% performance on MEG Subject brain signal data. (<b>d</b>) Nazzy Ironman Subject MEG Brain Computer Interface to Warfighter Flight Simulator iOS Mobile Applications stored in MongoDB databases yielding over 90% performance on Subject Data, demonstrated in <a href="#diseases-06-00089-f009" class="html-fig">Figure 9</a>, <a href="#diseases-06-00089-f010" class="html-fig">Figure 10</a> and <a href="#diseases-06-00089-f011" class="html-fig">Figure 11</a>.</p>
Full article ">Figure 12
<p>(<b>a</b>) NAZZY IronMan with Frozen Videogame &amp; iOS Warfighter Mobile Game for Brain Computer Interface Project with Emotiv/OpenVibe Wireless electroencephalography (EEG) brain signal(s) data while using machine learning algorithms to classify brain signals in iOS videogame applications utilizing EEG brain signal data storage in NoSQL database MongoDB. (<b>b</b>) NAZZY IronMan with Frozen Project with Emotiv Wireless EEG brain signal(s) data using machine learning algorithms to classify brain signals in iOS Frozen videogame utilizing EEG brain signal data storage in NoSQL database MongoDB.</p>
Full article ">Figure 13
<p>(<b>a</b>) Emotiv EPOC Headset, Features, and Brain Computer Interface applications. (<b>b</b>) Utilization of Matlab FIR (Finite Impulse Response) &amp; IIR (Infinite Impulse Response) Bandpass and Lowpass Filters on Wireless EEG Signals.</p>
Full article ">Figure 14
<p>Nazzy IronMan Brain Computer Interface Cloud Provider Facility with Cassandra NoSQL database(s).</p>
Full article ">Figure 15
<p>Nazzy IronMan Brain Computer Interface Cassandra Cloud Security Architecture Strategy.</p>
Full article ">Figure 16
<p>Emotiv and OpenVibe EEG Sensor Array stored in Cassandra NoSQL database.</p>
Full article ">Figure 17
<p>OpenVibe EEG Sensor Array stored in Cassandra NoSQL KEYSPACE (database) with Simple_Strategy and Replication Factor = 1.</p>
Full article ">Figure 18
<p>OpenVibe EEG Sensor Array stored in Cassandra NoSQL KEYSPACE (database) with Simple_Strategy and Replication Factor = 1 displaying primary key and all attributes for keyspace, eeg_motor_imagery_openvibe and table, eeg_1_signal Cassandra statistics.</p>
Full article ">Figure 19
<p>OpenVibe EEG Sensor Array stored in Cassandra NoSQL KEYSPACE (database) with Simple_Strategy, table, eeg_1_signal importing 317,825 rows of EEG brain signal data.</p>
Full article ">Figure 20
<p>OpenVibe EEG Sensor Array stored in Cassandra NoSQL KEYSPACE (database) with Simple_Strategy, Stimulation table, eeg_signal_1_stimulation_table importing eeg brain signal data (<span class="html-italic">e.g., time, identifier, duration</span>).</p>
Full article ">Figure 21
<p>MongoDB Brain Computer Interface Cloud Security Restraints.</p>
Full article ">Figure 22
<p>Java Tokenization of OpenVibe EEG Sensor Array inputted into MongoDB Collection utilizing db.openVibeSignal.find() queries.</p>
Full article ">Figure 23
<p>Usage of NoSQL database MongoDB for Wireless EEG Signal Storage and Retrieval with MongoDB BSON Timestamp with EEG Signal Electrode Array.</p>
Full article ">Figure 24
<p>Java Program for Emotiv and OpenVibe EEG Sensor Array Channel inserting a document into MongoDB Collection using Java class <b><span class="html-italic">BasicDBObject</span></b>.</p>
Full article ">Figure 25
<p>OpenVibe EEG Sensor Array Java Program for Brainwave Signal Stimulation Codes for time, stimulation code, and duration.</p>
Full article ">Figure 26
<p>Wireless EEG Java Stimulation Code Dictionary to input EEG signal patterns in MongoDB.</p>
Full article ">Figure 27
<p>Stimulation Codes have to match the acquired EEG signal patterns in MongoDB.</p>
Full article ">Figure 28
<p>MapReduce in MongoDB for Signal Processing and EEG data analytics.</p>
Full article ">Figure 29
<p>(<b>a</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 (Khronos Group, Beaverton, Oregon if USA, country, <a href="https://www.khronos.org/about/" target="_blank">https://www.khronos.org/about/</a>) and GLKit with the UITapGestureRecognizer class to fire a projectile. (<b>b</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit with aerial targets using the addTarget Method. (<b>c</b>) Display of iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit with aerial targets using the addTarget Method (close-up).</p>
Full article ">Figure 29 Cont.
<p>(<b>a</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 (Khronos Group, Beaverton, Oregon if USA, country, <a href="https://www.khronos.org/about/" target="_blank">https://www.khronos.org/about/</a>) and GLKit with the UITapGestureRecognizer class to fire a projectile. (<b>b</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit with aerial targets using the addTarget Method. (<b>c</b>) Display of iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit with aerial targets using the addTarget Method (close-up).</p>
Full article ">Figure 30
<p>iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit to evade or chase aerial targets.</p>
Full article ">Figure 31
<p>(<b>a</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit to evade or chase aerial targets. (<b>b</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit to evade or chase aerial targets can be interfaced to MEG Subject Brain Signal Data with over 90% classification performance. (<b>c</b>) Nazzy IronMan with Apple iOS Frozen Videogram Application can be interfaced to with MEG Subject Brain Signal Data with over 90% classification performance.</p>
Full article ">Figure 31 Cont.
<p>(<b>a</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit to evade or chase aerial targets. (<b>b</b>) iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit to evade or chase aerial targets can be interfaced to MEG Subject Brain Signal Data with over 90% classification performance. (<b>c</b>) Nazzy IronMan with Apple iOS Frozen Videogram Application can be interfaced to with MEG Subject Brain Signal Data with over 90% classification performance.</p>
Full article ">Figure 32
<p>iOS Mobile Application of Warfighter Videogame using OpenGL ES 2.0 and GLKit for online user’s game analytics and dynamic biometrics.</p>
Full article ">Figure 33
<p>Nazzy Ironman MEG/EEG (Virtual LAN) VLAN Base Unit for Security Authentication.</p>
Full article ">Figure 34
<p>MEG/EEG Cryptographic Key Authentication utilizing MEG/EEG brainwaves with Cassandra and MongoDB NoSQL databases.</p>
Full article ">
20 pages, 966 KiB  
Article
Hadoop Oriented Smart Cities Architecture
by Vlad Diaconita, Ana-Ramona Bologa and Razvan Bologa
Sensors 2018, 18(4), 1181; https://doi.org/10.3390/s18041181 - 12 Apr 2018
Cited by 19 | Viewed by 7619
Abstract
A smart city implies a consistent use of technology for the benefit of the community. As the city develops over time, components and subsystems such as smart grids, smart water management, smart traffic and transportation systems, smart waste management systems, smart security systems, [...] Read more.
A smart city implies a consistent use of technology for the benefit of the community. As the city develops over time, components and subsystems such as smart grids, smart water management, smart traffic and transportation systems, smart waste management systems, smart security systems, or e-governance are added. These components ingest and generate a multitude of structured, semi-structured or unstructured data that may be processed using a variety of algorithms in batches, micro batches or in real-time. The ICT architecture must be able to handle the increased storage and processing needs. When vertical scaling is no longer a viable solution, Hadoop can offer efficient linear horizontal scaling, solving storage, processing, and data analyses problems in many ways. This enables architects and developers to choose a stack according to their needs and skill-levels. In this paper, we propose a Hadoop-based architectural stack that can provide the ICT backbone for efficiently managing a smart city. On the one hand, Hadoop, together with Spark and the plethora of NoSQL databases and accompanying Apache projects, is a mature ecosystem. This is one of the reasons why it is an attractive option for a Smart City architecture. On the other hand, it is also very dynamic; things can change very quickly, and many new frameworks, products and options continue to emerge as others decline. To construct an optimized, modern architecture, we discuss and compare various products and engines based on a process that takes into consideration how the products perform and scale, as well as the reusability of the code, innovations, features, and support and interest in online communities. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

Figure 1
<p>Hadoop architecture for smart cities.</p>
Full article ">Figure 2
<p>The structure of the first data set.</p>
Full article ">
2806 KiB  
Article
GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark
by Zhou Huang, Yiran Chen, Lin Wan and Xia Peng
ISPRS Int. J. Geo-Inf. 2017, 6(9), 285; https://doi.org/10.3390/ijgi6090285 - 8 Sep 2017
Cited by 27 | Viewed by 8118
Abstract
In the era of big data, Internet-based geospatial information services such as various LBS apps are deployed everywhere, followed by an increasing number of queries against the massive spatial data. As a result, the traditional relational spatial database (e.g., PostgreSQL with PostGIS and [...] Read more.
In the era of big data, Internet-based geospatial information services such as various LBS apps are deployed everywhere, followed by an increasing number of queries against the massive spatial data. As a result, the traditional relational spatial database (e.g., PostgreSQL with PostGIS and Oracle Spatial) cannot adapt well to the needs of large-scale spatial query processing. Spark is an emerging outstanding distributed computing framework in the Hadoop ecosystem. This paper aims to address the increasingly large-scale spatial query-processing requirement in the era of big data, and proposes an effective framework GeoSpark SQL, which enables spatial queries on Spark. On the one hand, GeoSpark SQL provides a convenient SQL interface; on the other hand, GeoSpark SQL achieves both efficient storage management and high-performance parallel computing through integrating Hive and Spark. In this study, the following key issues are discussed and addressed: (1) storage management methods under the GeoSpark SQL framework, (2) the spatial operator implementation approach in the Spark environment, and (3) spatial query optimization methods under Spark. Experimental evaluation is also performed and the results show that GeoSpark SQL is able to achieve real-time query processing. It should be noted that Spark is not a panacea. It is observed that the traditional spatial database PostGIS/PostgreSQL performs better than GeoSpark SQL in some query scenarios, especially for the spatial queries with high selectivity, such as the point query and the window query. In general, GeoSpark SQL performs better when dealing with compute-intensive spatial queries such as the kNN query and the spatial join query. Full article
Show Figures

Figure 1

Figure 1
<p>Running framework of Spark SQL.</p>
Full article ">Figure 2
<p>GeoSpark SQL framework.</p>
Full article ">Figure 3
<p>Simulated point dataset of northwest pacific typhoon routes for 5000 years.</p>
Full article ">Figure 4
<p>Land use dataset of Zhenlong Town.</p>
Full article ">Figure 5
<p>Performance comparison graph of attribute queries (in milliseconds).</p>
Full article ">Figure 6
<p>Performance comparison graph of kNN queries (in milliseconds).</p>
Full article ">Figure 7
<p>Performance comparison graph of point queries (in milliseconds).</p>
Full article ">Figure 8
<p>Performance comparison graph of window queries (in milliseconds).</p>
Full article ">Figure 9
<p>Performance comparison graph of range queries (in milliseconds).</p>
Full article ">Figure 10
<p>Performance comparison graph of directional queries (in milliseconds).</p>
Full article ">Figure 11
<p>Performance comparison graph of topological queries (in milliseconds).</p>
Full article ">Figure 12
<p>Performance comparison graph of spatial join queries (in seconds).</p>
Full article ">
Back to TopTop