[go: up one dir, main page]

0% found this document useful (0 votes)
114 views15 pages

The Role of Big Data Analytics For The Internet of Things (Iot)

The document discusses the role of big data analytics for the internet of things. It outlines how big data and IoT are related and produce large amounts of data. It also discusses concepts like big data properties, security and privacy challenges, and how IoT components connect physical objects to enable data collection and exchange.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views15 pages

The Role of Big Data Analytics For The Internet of Things (Iot)

The document discusses the role of big data analytics for the internet of things. It outlines how big data and IoT are related and produce large amounts of data. It also discusses concepts like big data properties, security and privacy challenges, and how IoT components connect physical objects to enable data collection and exchange.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

The Role of Big Data Analytics for the Internet of Things (IoT)

Wagdi A. Ashry Inas Fouad Noseir


Geographic Information Assistant Prof.New cairo Institute of
System Specialist Management and information Systems
CAPMAS New Cairo, Egypt
Wagdy.ashry@gmail.com nosier_inas2005@yahoo.com
Abstract
Big Data and The Internet of Things (IoT) are two evolving technology topics in latest
years which are considered as two sides of the same coin. The main idea behind the
IoT is that almost every object or device will be having an identity, physical attributes
and will be linked to each other forming a machine to machine (M2M) communication
without a human intervention.
This paper presents the important role Big Data with its massive volumes that could
not be processed using traditional computing techniques due its nature also the
mountain of data that Internet of Things produces, which is not merely a data rather it
becomes a complete subject involves various tools, techniques and framework, it
would be useless without Big Data analytics power, both Big Data and Internet of
Things become an essential solution affecting our life due the transformation to smart
life.
Therefor this paper outlines the Big Data and its relevance to Internet of Things and
data science, Hadoop as an open source framework with different components and
structures has an essential role in big data processing, also the privacy and security
challenges and how the big data problems in IoT are reviewed from a reliability
engineering perspective and how both are integrated to support new supply chain
management methods using radio frequency identification (RFID) and global
positioning system (GPS) technology are quickly being adopted by companies as
various inventory management benefits are being realized.
Keywords - Big Data, Internet of Things,Analytics, Reliability.

1. Introduction.
With more than 50 percent of world population living in cities and nearly 70 percent
of world population projected to live in cities by 2050, it is expected that cities will
face various challenges from sustainability and energy use to safety and effective
service delivery. Advances in the effective integration of networked information
systems, sensing and communication devices, data sources, decision making, and
physical infrastructure are creating new opportunities to foster economic development
and make local governments more open, responsive, and efficient. More and more
cities are beginning to harness the power of sensors, engage citizens equipped with
smartphones, cloud computing, high-speed networks, and data analytics. The
aggregation of data from a large number of IoT ecosystems, can lead to large data sets
for analytic purposes. Consider, for example, the ensemble data from 300 million
1|Page
automobile IoT ecosystems, or 300 million household IoT ecosystems, or the
composition of both. Furthermore, IoT applications could connect (deliberately or
accidentally) to one or more big data systems outside the ecosystem, thus creating an
aggregate of big data system orders of magnitude larger than any of the constituents.
In this sense, every IoT system, even a small, local IoT ecosystem, is a potential big
data system .

2. Big Data Perspective.


The explosion of data collection, sharing, and analytics known as “big data” is a
rapidly sprawling phenomenon that promises to have wonderful impacts on
businesses, e.g. economics, policing, security, science, education, policy, governance,
health care, public health, and much more. Within a relatively short term, the
infrastructures and methods of big data have processed significant intelligent and
organizational changes for many academic disciplines, government bodies,
philanthropic and nonprofit organizations, and private enterprises. Given its newly
expansive reach, big data has still-unfolding consequences for how various
stakeholders‟ access and wield power, and therefore how social and political goods are
distributed. New industrial, educational, and governmental investments in data science
and artificial intelligence highlight how big data is collapsing historically segmented
technical approaches.
Big data is a collection of large datasets that cannot be processed using traditional
computing techniques. Big data is not merely a data; rather it has become a complete
subject which involves various tools, techniques and frameworks, it could be both
structured and unstructured. Big data has some key properties among them are:
Volume, Velocity, Veracity, Variety, Value etc. (Atzori et al, 2010), ( Daniel E. 2013).
What security and privacy policies and technologies are more adequate to fulfill the
current top Big Data privacy and security demands? These challenges may be
organized into four Big Data aspects such as infrastructure security (e.g. secure
distributed computations using MapReduce), data privacy (e.g. data mining that
preserves privacy/granular access), data management (e.g. secure data provenance and
storage) and, integrity and reactive security (e.g. real time monitoring of anomalies
and attacks), (Peter & Robert, 2014).
3. Internet of Things (IoT) Perspective
Internet of Things has an essential role in our daily life creating an effective impact in
the current state and in the near future, providing the ability to machines to
communicate each other forming the concept of M2M or machine to machine, where
many definitions of IoT traced within the previous years.
The Internet of Things (IoT) considered as the network of physical objects or ”things”
embedded with electronics, software, sensors, and network connectivity, which
enables these objects to collect and exchange data, or known as a dynamic global
network infrastructure with self-configuring capabilities based on standard and
interoperable communication protocol where physical and virtual things have

2|Page
identities, physical attributes and virtual personalities use intelligent interfaces and are
seamlessly integrated into the information network (Zanella et al , 2014)
These IoT‟s are within our bodies, on our bodies, observing our activities, monitoring
and reporting on our appliances, houses, and buildings, our cars and environment, and
many facets of our cities, planet, oceans, and space. They are starting to play a role in
our health, fitness and wellbeing, our comfort and entertainment, our financial
activities, and many other facets of life that could be achieved where many things are
connected to each other due to their connection to the internet and become intelligent
in capturing the data using sensors such as location identifiers as in Global Positioning
System (GPS) and individual identification devices as in Radio Frequency
Identification (RFID) (Ovidiu & Peter 2013).
3.1. Internet of Things (IoT) components
The term Internet of Things generally refers to scenarios where network connectivity
and computing capability extends to objects, sensors and everyday items not normally
considered computers, allowing these devices to generate exchange and consume data
with minimal human intervention.

The components of the “Internet of Things” are:

1) Physical Objects (e.g. Things like vehicle)


2) Sensors (e.g. GPS, Air Bag)
3) Actuators ( e.g. brake controller )
4) Virtual Objects (e.g. Electronic tickets,)
5) People (e.g. Humans can control the environment via mobile apps)
6) Services. (e.g. Cloud services – can be used to Process big data)
7) Platforms (Type of middleware used to connect IoT components to IoT to
Data analytics)
8) Networks. ( e.g. Wireless and wire line technologies, standards, and
protocols ) (Irena, 2017)
4. Big Data and its relevance to IoT and Data Science.
There is a deep influence among the IoT, Big Data and Data Science concepts and
their effective role in various fields and firms in our daily life. Both Big Data and Data
Science offer the potential to produce value from data, Data Science looks to
create models that capture the underlying patterns of complex systems, and document
those models into working applications that move data from raw to relevant. Big Data
looks to collect and manage large amounts of varied data to serve large-scale web
applications and vast sensor networks. With time, Big Data approaches can work in
concert with Data Science. The increased variety of data extracted can help make new
discoveries or improve an existing model‟s ability to predict or classify. Big Data and
IoT have deep relationship: The mountain of data that the IoT produces would be
useless without the analytic power of Big Data.Conversely, without the IoT, Big Data
3|Page
would not have the raw materials from which to fashion solutions that are Expected of
it.There are two sides of the same coin, it makes sense to See them as natural partners
because you can‟t operate complex machines or devices efficiently without predictive
analytics and you need Big Data for the analytics.
5. Big Data – How to meet IoT.
The integration between the IoT and the Big Data play an effective role in
many aspects in our life, the IoT produces enormous amount of data which is
very essential in Big Data analytic and, Big data are needed, due toits
effectively and efficiently captured, processed, and analyzed. Companies are
able to gain a more complete understanding of their business, customers,
products, competitors, etc. which can lead to efficiency improvements,
increased sales, lower costs, better customer service, and/or improved
products and services. The term “big data “literally referring tousing larger
volumes of data for visualization of scientific data, this integration led to the
appearance of the notion of the “Internet of Signs”. The Internet of Signs
indicates that the data generated on the internet from the broad range of
sources, including devices in the Internet of Things, information from social
media (e.g. blogs) and other internet sources(often associated with „Big
Data‟), provide „signs‟, such as the „sentiment‟ toward some issue. Those
„signs‟ generated from information associated with the internet provide an
„Internet of Signs‟. Internet of Sign can be helpful in providing potential
information about events and situations (Daniel E. 2013).
Big data and the Internet of Things (IoT) have the power to drive the
implementation. Big data and the IoT are going to work with other software
and hardware to lead the vision of smart city to fruition; Cities are being
identified as future smart cities that can fundamentally change our lives at
many levels such as less pollution, garbage, parking problems and more
energy savings, also could help in the following ways:
 The traffic will be measured and regulated with the help of RFID tags
on the cars. The RFID tags will send the geo location data to a central
monitoring unit that will identify the congested areas.
 Also, the citizens will always know via their smartphones and mobile
devices the exact status of public transportation and its availability.
 Children playing in the parks will wear bracelets with sensors which
will allow the children to get tracked in case they go missing.
 Big data can help reduce emissions and bring down pollution. Sensors
fitted in the roads will measure the total traffic at different times of a
day and the total emissions. The data can be sent to a central unit which
will coordinate with the traffic police (K.R.Kundhavai et al, 2016).

4|Page
6. Big Data Analytics.
Mike Butler, (2015) published ,Big Data analytics refers to when data scientists,
analysts and statisticians use powerful tools and techniques to discover trends and
patterns from huge unstructured and
dissimilar data sets and make these
easily and quickly accessible to business
leaders, managers and other key
stakeholders. These insights are used to
inform and develop business strategies
and plans. There are four types of data
analysis
6.1. Descriptive analytics:
This type of analytics seeks to
answer “what happened”. This type
of analytics reviews data and uses a
lot of traditional research
approaches. Generally, Classical or Bayesian statistical methods are used to
learn about the data set. An example would be the average amount of money
in a bank account on a monthly basis.
6.2. Diagnostic analytics:
Diagnostic seeks to answer “why did something happen”. An example when a
customer visits a location and purchases a DVD player made by one company
when there are many alternatives next to it on the shelf and even more online.
6.3. Predictive analytics:
Predictive model seeks to answer “what will happen in the future”. This
branch of analytics focused on predicting what you may do. An example may
be a sales forecast for your business.
6.4. Prescriptive analytics:
Prescriptive analytics seeks to determine “what should be done”. A stock
portfolio optimization model is an example of this branch of analytics to
predict what might happen and to tell us how we should allocate our portfolio
as well.
7. Hadoop and Big Data Processing.
Hadoop Ecosystem is a framework of various types of complex and evolving
tools and components which have proficient advantage in solving problems.
Some of the elements may be very dissimilar from each other in terms of
their architecture; however, what keeps them all together under a single roof
is that they all derive their functionalities from the scalability and power of
Hadoop. Hadoop Ecosystem is alienated in four different layers: 1) Data
Storage,2) Data Processing, 3) Data Access, and 4) Data Management.
Next figure illustrate how the diverse hadoop‟s elements of involve at various
layers of processing data (Hadoop, 2017).
5|Page
7.1. Data Storage Layer
Data Storage is the layer where the
data is stored in a distributed file
system; consist of HDFS and HBase
Column DBStorage.HBase is
scalable, distributed database that
supports structured data storage for
large tables (J. Yates Monteith, John
D. McGregor, and John E. Ingram,
2013).
1) HDFS
HDFS, the storage layer of Hadoop,
is a distributed, scalable, Java-based
file system adept at storing large volumes of data with high-throughput access to
application data on the community machines, providing very high aggregate
bandwidth across the cluster. When data is pushed to HDFS, it automatically splits up
into multiple blocks and stores/replicates the data thus ensuring high availability and
fault tolerance.
2) HBase
HBase is a scalable, distributed database that supports structured data storage for large
tables. Apache HBase provides big table - like capabilities on top of Hadoop and
HDFS. HBase is a data model that is similar to Google‟s big table designed to provide
quick random access to huge amounts of structured data. HBase facilitates
reading/writing of Big Data randomly and efficiently in real time. It stores data into
tables with rows and columns as in RDBMs. HBase tables have one key feature called
“versioning” which helps in keeping a track of the changes made in a cell and allows
the retrieval of the previous version of the cell contents, if required (HBase, 2017).
7.2. Data Processing Layer
Scheduling, resource management and cluster management is premeditated here.
YARN job scheduling and cluster resource management with Map Reduce are located
in this layer.
1) Map Reduce
Map Reduce(Jeffrey & Sanjay, 2008) is a software framework for distributed
processing of large data sets that serves as the compute layer of Hadoop which
process vast amounts of data (multitera byte data-sets) in-parallel on large clusters
(thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
A Map Reduce job usually splits the input data-set into independent chunks which are
processed by the map tasks in a completely parallel manner. The framework sorts the
outputs of the maps, which are then input to the reduce tasks. The “Reduce” function
aggregates the results of the “Map” function to determine the “answer” to the query.
On average both the input and the output of the job are stored in a file-system. The

6|Page
framework takes care of scheduling tasks, monitoring them and re-executing any
failed tasks.
Although the Hadoop framework is implemented in Java, Map Reduce (Dean &
Ghemawat, 2010) applications need not be written in Java. Hadoop Streaming is a
utility which allows users to create and run jobs with any executable (e.g. shell
utilities) as the mapper and/or the reducer. Hadoop Pipes is a SWIG compatible C++
API s to implement Map Reduce applications
2) YARN
Varsha B. ,(2016) reported that ,“YARN (Yet another Resource Negotiator) forms an
integral part of Hadoop2.0.YARN is great enabler for dynamic resource utilization on
Hadoop framework as users can run various Hadoop applications without having to
bother about increasing workloads. The inclusion of YARN in hadoop 2also means
scalability provided to the data processing applications. YARN is a core hadoop
service that supports two major services:
a) Global resource management (Resource Manager),
Thus, YARN Resource Manager is responsible for almost all the tasks.
b) b) Per-application management (Application Master)
Resource Manager performs all its tasks in integration with Node Manager and
Application Manager.
YARN benefits include efficient resource utilization, highly scalability, beyond Java,
novel programming models and services and agility.
7.3. Data Access Layer
Data Access layer, where the request from Management layer is sent to Data
Processing Layer. Some projects have been setup for this layer, Some of them are:
Hive, A data warehouse infrastructure that provides data summarization and ad hoc
querying; Pig, A high-level data-flow language and execution framework for parallel
computation; Mahout, A Scalable machine learning and data mining library; Avro,
data serialization system(Deepika, et.al, (2015).
1) Hive
Hive (Hive, 2017), is a Hadoop-based data warehousing-like framework originally
developed by Facebook, later the Apache Software Foundation took it up and
developed it further as an open source under the name Apache Hive. It resides on top
of Hadoop to summarize Big Data, and makes querying and analyzing easy. It allows
users to write queries in a SQL-like language called HiveQL which are then converted
to Map Reduce. This allows SQL programmers with no Map Reduce experience to use
the warehouse and makes it easier to integrate with business intelligence and
visualization tools.
2) Apache Pig
Apache Pig (Pig, 2017), is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs, coupled with infrastructure
for evaluating these programs. The salient property of Pig programs is that their

7|Page
structure is amenable to substantial parallelization, which in turns enables them to
handle very large data sets. At the present time, Pig's infrastructure layer consists of a
compiler that produces sequences of Map-Reduce programs, for which large-scale
parallel implementations already exist(e.g., the Hadoop subproject).
3) Apache Mahout
Apache Mahout (Mahout ,2017), is a project of the Apache Software Foundation to
produce free implementations of distributed or otherwise scalable machine learning
algorithms focused primarily in the areas of collaborative filtering, clustering and
classification and implements the musing the Map Reduce model.
4) Avro
Avro (Avro, 2017), is a data serialization system that allows for encoding the schema
of Hadoop files. It is adept at parsing data and performing removed procedure calls. It
was developed by Doug Cutting, the father of Hadoop. Since Hadoop writable classes
lack language portability, Avro has become quite helpful, as it deals with data formats
that can be processed by multiple languages. Avro is a preferred tool to serialize data
in Hadoop. Avro uses JSON format to declare the data structures. Presently, it
supports languages such as Java, C, C++, C#,Python, and Ruby.
5) Apache Sqoop
(J.YatesMonteith et. al, 2013) Sqoop is a connectivity tool for moving data from non
Hadoop data stores – such as relational databases and data warehouses into Hadoop. It
allows users to specify the target location inside of Hadoop and instruct Sqoop to
move data from Oracle, Teradata or other relational databases to the target.Sqoop is a
tool designed to transfer data between Hadoopand relational database servers. It is
used to import data from relational databases such as MySQL, Oracle to Hadoop
HDFS, and export from Hadoop file system to relational databases.
7.4. Data Management Layer
A layer that meets user. User access the system through this layer which has the
components like: Chukwa, A data collection system for managing large distributed
system and Zookeeper, high-performance coordination service for distributed
applications.
1) Oozie
Oozie is a workflow processing system that lets users define a series of jobs written in
multiple languages – such as Map Reduce, Pig and Hive -- then intelligently link them
to one another. Oozie allows users to specify, for example, that a particular query is
only to be initiated after specified previous jobs on which it relies for data are
completed. Oozie is a scalable, reliable and extensible system. Oozie workflow is a
collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control
dependency DAG (Direct Acyclic Graph), specifying a sequence of actions execution.
This graph is specified in hPDL (a XML Process Definition Language).
2) Apache Chukwa
Chukwa (Chukwa, 2014), aims to provide a flexible and powerful lplatform for
distributed data collection and rapid data processing. It is an open source data
8|Page
collection system for monitoring large distributed system and is built on top of the
Hadoop Distributed File System (HDFS) and Map/Reduce framework that inherits
Hadoop‟s scalability and robustness. Chukwa also includes a flexible and powerful
toolkit for displaying, monitoring and analyzing results to make the best use of the
collected data. In order to maintain this flexibility, Chukwa is structured as a pipeline
of collection and processing stages, with clean and narrow interfaces between stages.
3) Apache Flume
Flume (Flume, 2017), is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data. It has a simple and
flexible architecture based on streaming data flows. It is robust and fault tolerant with
tunable reliability mechanisms and many failover and recovery mechanisms. It uses a
simple extensible data model that allows for online analytic application.
4) Apache Zookeeper
Apache Zookeeper (Zookeeper, 2017), is a coordination service for distributed
application that enables synchronization across a cluster. Zookeeper in Hadoop can be
viewed as centralized repository where distributed applications can put data and get
data out of it. It is used to keep the distributed system functioning together as a single
unit, using its synchronization, serialization and coordination goals. For simplicity's
sake Zookeeper can be thought of as a file system where we have znodes that store
data instead of files or directories storing data. Zookeeper is a Hadoop Admin tool
used for managing the jobs in the cluster.
7.5. Why is Hadoop important
a) Ability to store and process huge amounts of any kind of data, quickly. With
data volumes and varieties constantly increasing, especially from social media and
the Internet of Things (IoT), that's a key consideration.
b) Computing power. Hadoop's distributed computing model processes big data fast.
The more computing nodes you use, the more processing power you have.
c) Fault tolerance. Data and application processing are protected against hardware
failure. If a node goes down, jobs are automatically redirected to other nodes to
make sure the distributed computing does not fail. Multiple copies of all data are
stored automatically.
d) Flexibility. Unlike traditional relational databases, you don‟t have to preprocess
data before storing it. You can store as much data as you want and decide how to
use it later. That includes unstructured data like text, images and videos.
e) Low cost. The open-source framework is free and uses commodity hardware to
store large quantities of data.
f) Scalability. You can easily grow your system to handle more data simply by
adding nodes. Little administration is required.

8. Customer Knowledge Management


Customer relationship management (CRM) and knowledge management (KM) are
leading strategies for value creation for businesses in the new economy. A customer
knowledge management (CKM) result in the merging of KM and CRM, where the

9|Page
knowledge management process is applied to customer knowledge and customer
knowledge is applied to customer relationship management operations(Bueren,
Schierholz, Kolbe, & Brenner, 2004). With the emergence of big data as the latest
phase in the evolution of technology in business, CKM strategies need to be adjusted
to meet the new challenges, changing from an internal organizational focus to new
external channels such as social media and machine communications. This paper
explores the concept of big data customer knowledge management. It presents an
architecture that integrates CRM operations and KM processes with big data
technologies that include NoSQL databases, Hadoop Distributed File System,
MapReduce, and platforms for social media and machine-to-machine
communications)(Gebert , Geib, Kolbe, &Riempp, 2002; Gebert, Geib, Kolbe, &
Brenner, 2003).
8.1. Classification of Customer Knowledge
There are three types of customer knowledge (Gebert et al., 2002).:
1) Knowledge from customers.
Knowledge from customers is knowledge created through the customers‟ experience
with a firm and is residing in customers. Such experiences can be derived from
customers‟ interactions with the firm in CRM operations of marketing, sales, and
service, or from using a firm‟s products or services.
2) Knowledge about customers.
Knowledge about customers may include characteristics in customer behavior,
demographics and previous purchasing patterns. It can be knowledge accumulated
from customer interactions captured through various touch points in CRM operations,
and from external sources such as data mining firms, credit bureaus and public
records.
3) Knowledge for customers.
Knowledge about customer can also be generated via the knowledge creation
processes in the KM cycle for knowledge from customers. Knowledge for customers
is knowledge created to satisfy customer needs. It may include knowledge about a
firm‟s products and services. It is created as a result of identifying knowledge deficits
in the knowledge management cycle for knowledge from and about customers.
8.2. Big Data for CRM and Customer Knowledge
Tam (2013) reported that, the propagation of customer interactions via social
networking services such as Facebook, Google+, LinkedIn, and Twitter, has created
new sources of customer knowledge and new paradigms in CRM operations and
analytics. Social CRM is an emerging CRM model that connects social media with
CRM processes. Social media has become a new channel where the CRM operations
of marketing, sales and service can be conducted via social campaigns and social
engagement. It has become an important source of knowledge from customers where
they express their experiences and opinions about a firm‟s products and services. It is
also a source of knowledge of customers may contain demographic, psychographic,
behavioral, and personal information about customers. Collecting and analyzing real-
time sentiments form social media can provide market perceptions of a firm‟s products
10 | P a g e
and services. Knowledge about customers from social media can be used for market
segmentation and target marketing. Knowledge for customers can be disseminated via
social customer engagement such as social media support forums, Facebook pages and
twitter streams (André, Bernstein, & Luther, 2012).
Another source of customer big data is machine-to-machine (M2M) communication,
which includes data generated from sensors, smart meters, and scanning equipment
IEE (2013). M2M communication facilitates the collection of knowledge from and
about customers and the dissemination of knowledge for customers.
Spatial data is data that describes geometric objects via coordinates and topologies that
identifies geographical locations, boundaries and features on the surface of the earth.
Spatial data from many devices, mobile or stationary, via wired and wireless networks,
contributes to the high volume and velocity characteristics of big data. Knowledge
from and about customer associated with spatial data may include knowledge of
customer travel location and time and associated activities. A customer standing in
front of a shelf in a store may trigger a firm to promote certain products and provide
comparative shopping in real-time. GPS trackers can be used to follow individuals,
pets, or vehicles with smart phones (Tully, 2013).
8.3. The Changing Face of CRM Operations in the Era of Big Data:
Social media data inherently possess a combination of big data characteristics of
volume, velocity and variety. The traditional CRM-CKM cycle includes the
acquisition of customer knowledge from CRM operations of marketing, sales and
service, and the utilization of knowledge created through the CKM process in CRM
operations. New paradigms are emerging in CRM operations in the face of big data.
Jones (2012) described that listening, engaging and capturing as some characteristics
of social media strategy. Social media provides a platform for a firm to sense and
respond. Listening through listening platforms allows a firm to capture and understand
the interests of customers. A firm can respond to customer interests and pursue
opportunities by engaging customers via social platforms. Social media becomes
another channel for CKM where knowledge is captured, created, disseminated, shared
and utilized. New ways of marketing emerge leveraging “The Wisdom of Crowds,”
which is knowledge created from the interaction of large group of people (Joseph
Chan, 2014). A form of crowd knowledge acquisition is crowd sourcing, where a firm
solicits and captures ideas from a large group of people, for example from online
communities utilizing social media platforms, to solve business problems.
8.4. An Architecture for Big Data Customer Knowledge Management:
Architecture for big data customer knowledge management, which integrates big data,
customer relationship management and knowledge management is illustrated next
Figure. There are three dimensions to the architecture:
1. Shows the process of big data capture, storage and analytics.
2. Shows the knowledge management cycle of knowledge acquisition, creation,
distribution and sharing, representation and storage, and utilization.
3. Illustrates the CRM operations of marketing, sales, and service.
11 | P a g e
 Data analytics can go through two ways, the traditional business intelligence (BI)
path through data warehousing, and the big data analytics path through the big
data platforms utilizing technologies such as NoSQL databases and Hadoop.
 Knowledge derived from these analytical platforms provides the input to the
knowledge management cycle.

Figure: Architecture for Big Data Customer Knowledge Management.

 Previous figure shows the stages of implementation of KCM paradigm where the
data from customers can be captured from various sources including high volume
(terabytes, exabyte) transactional data, high velocity data such as data in motion,
high variety (structured, unstructured, multimedia) data. CRM transactions in
marketing, sales, and service are traditional.
a) High variety knowledge from and about customers can be collected via call
center text and audio transcripts, customer Web contents, video feeds such as
YouTube and images from photos published and shared by customers, and other
sources includes log events from devices, websites and databases, smartphones.
b) Social CRM, M2M and geospatial platforms increasingly contribute to big data
for customer knowledge.
 Nest stage is the storage platform: structured transactional data can go through the
traditional data warehousing and business analytics platforms (such as OLAP and
data mining). The platforms handle also big data include Hadoop and real-time
NoSQL databases.
a) Hadoop consists of the Hadoop distributed file system (HDFS) that provides the
data storage(Apache, 2013).

12 | P a g e
 Third stage, analytical results from Map Reduce provide an input to the knowledge
management cycle for the creation of actionable customer knowledge (Apache,
2013).
There is a bidirectional relationship between data warehousing and Hadoop. Data
warehousing can be a data source for complex Hadoop jobs, simultaneously
leveraging the massively parallel capabilities of two systems (Paulraj Ponniah,
2010), whereas, Map Reduced data can be integrated with the data warehouse for
further analytic processing.
In addition to traditional knowledge sources from CRM operations, the CKM
process captures knowledge from customers using listening platforms in social
CRM, and by logging into M2M and geospatial data from remote sensors.
Knowledge creation via the SECI cycle, dissemination and sharing of knowledge
can also take an external dimension via social media leveraging the wisdom of the
crowds. Knowledge for customers can be disseminated via social customer
engagements.

13 | P a g e
REFERENCES
 André, P., Bernstein, M. S., & Luther, K. (2012) “Who gives a tweet? Evaluating microblog
contentvalue”. In Proceedings of CSCW ‘12, 2012 (pp. 471-474). Seattle, WA.
 Apache. (2013). “Welcome to Apache™ Hadoop®!” Retrieved from http://hadoop.apache.org/,
[Viewed at 7/3/2017].
 Atzori et al , (2010), “The Internet of Things: A survey”, Contents lists available at
ScienceDirect , ELSEVIER journal homepage: www.elsevier.com/locate/comn.
 Avro, (2017), “Apache Software Foundation project home page”http://avro.apache.org,
[Viewed at 26/3/2017].
 Bueren,Schierholz, Kolbe, & Brenner, (2004), “Customer knowledge management:Improving
performance of customer relationship management with knowledgemanagement.” In
Proceedings of the 37th Annual Hawaii International Conference onSystem Sciences (HICSS‟04).
doi: 10.1109/HICSS.2004.1265416.
 Chukwa, (2014),“Apache Software Foundation project homepage”https://chukwa.apache.org,
[Viewed at 26/3/2017].
 Daniel E. O’leary, (2013), " Big Data‟, The „Internet Of Things‟and The„Internet Of Signs", v
Ntelligent Systems In Accounting, Finance And Management, Intell. Sys. University of Southern
California, Los Angeles, CA, USA, Mgmt.20,53–65.
 Deepika P, Anantha Raman G R, (2015)” A Study of HadoopRelated Tools and Techniques”,
IJARCSSE, Volume 5,Issue 9.
 Flume, (2017), “Apache Software Foundation project home page”https://flume.apache.org ,
[Viewed at 26/3/2017].
 Gebert ,Geib, Kolbe, &Riempp, (2002), “Towards customer knowledgemanagement:
Integrating customer relationship management and knowledge ManagementConcepts.” In The
Second International Conference on Electronic Business (pp. 296-298).Taipei, Taiwan.
 Gebert, Geib, Kolbe,& Brenner,(2003), “Knowledge-enabled customer
relationshipmanagement: Integrating customer relationship management and knowledge
managementconcepts.” [1]. Journal of Knowledge Management, 7(5), 107-123.
 Gebert, Het al., (2002). “Towards customer knowledge management: Integrating customer
relationship management and knowledge Management Concepts”. Second International
Conference on Electronic Business (pp. 296-298). Taipei, Taiwan.
 Hadoop, (2017). “Apache Software Foundation project home page.”http://hadoop.apache.org.
 HBase, (2017). “Apache Software Foundation project home” page
http://hbase.apache.org,[Viewed at 27/3/2017].
 Hive, (2017), “Apache Software Foundation project home
page”http://hive.apache.org.[Viewed at 26/3/2017].
 IEE. (2013). “Utility-scale smart meter deployments: A foundation for expanded grid benefits”.
Retrieved from “http://www.edisonfoundation.net/iee/Documents/IEE SmartMeterUpdate_0813.pdf“
[Viewed at 7/3/2017]
 Irena Bojanova, (2017), “What are component of IoT?”, https://www.computer.org
/web/sensing-iot/content?g=53926943&type=article&urlTitle=what-are-the-components-of-iot- ,
[Viewed at 31/3/2017].
 J. Dean and S. Ghemawat,(2010), “MapReduce: A FlexibleData Processing Tool”. CACM,
53(1):72–77.
 J. Yates Monteith, John D. McGregor, and John E.Ingram, (2013). “Hadoop and its
evolving ecosystem”,IWSECO@ ICSOB, Citeseer.
14 | P a g e
 Jeffrey Dean, Sanjay Ghemawat, (2008), “MapReduce: simplified data processing on large
clusters”, Communications of the ACM, v.51 n.1 .
 Jens Dittrich, Jorge-ArnulfoQuiane-Ruiz, (2014).“Efficient Big Data Processing in
HadoopMapReduce”, http://infosys.cs.uni-saarland.de,Information Systems GroupSaarland
University.
 Jones, T. B. (2012). “Oracle social CRM featuring Buzzient.”. Retrieved from
“http://www.slideshare.net/tbjbuzzient/oracle-social-crm-featuring-buzzient.”[Viewed at
7/3/2017]
 Joseph O. Chan, (2014). “Big Data Customer Knowledge Management”, Communications of
the IIMA: Vol. 14: Iss. 3, Article 5.
 K.R.Kundhavai et al , (2016), “International Journal of Computer Science and Mobile
Computing”, International Journal of Computer Science and Mobile Computing, Vol.5 Issue.1,
Available Online at www.ijcsmc.com.
 Mahout ,(2017), “Apache Software Foundation project home page”http://mahout.apache.org,
[Viewed at 26/3/2017].
 Mike Butler, (2015), “Getting Started with Analytics: Types of Analytics”
https://www.linkedin.com/pulse/getting-started-analytics-types-mike-butler, [Viewed at
26/3/2017].
 OvidiuVermesan and Peter Friess,(2013), “Internet of Things: Converging Technologies for
Smart Environments and Integrated Ecosystems.”, Published, sold and distributed by: River
Publishers, PO box 1657, Algade 42, 9000 Aalborg, Denmark
 PaulrajPonniah, (2010) “Data Warehousing Fundamentals for It Professionals”. (2nd
Edition) Published by John Wiley & Sons, Inc., Hoboken, New Jersey
 Peter Lake and Robert Drake, (4102), "Information Systems Management in the Big Data
Era" Advanced Information and Knowledge Processing, Springer International Publishing AG
Switzerland
 Pig, (2017), “Apache Software Foundation project home page”. http://pig.apache.org. [Viewed
at 26/3/2017].
 Tam, D. (2013). “Facebook by the numbers: 1.06 billion monthly active users.” CNET Report.
Retrieved from, https://www.cnet.com/news/facebook-by-the-numbers-1-06-billion-monthly-
active-users/, [Viewed at 7/3/2017].
 Tully, M. (2013). “The rise of the [geospatial] machines part 3: New opportunities in the
comingunmanned aerial system (UAS) age.”Sensors & Systems. Retrieved
fromhttps://aerialservicesinc.com/2013/07/the-rise-of-the-geospatial-machines-the-future-with-
unmanned-aerial-systems-uas-by-ceo-mike-tully-featured-by-sensors-systems/, [Viewed at
26/3/2017].
 Varsha B.Bobade,(2016),“Survey Paper on Big Data and Hadoop”, IRJET, Volume: 03 Issue:
01.
 Zanella et al , (2014), “Internet of Things for Smart Cities” , , IEEE Internet Of Things Journal,
VOL. 1, NO. 1., Smart Santander [Online]. Available: http://www.smartsantander.eu/.
 Zookeeper, (2017), “Apache Software Foundation project
homepage”https://zookeeper.apache.org, [Viewed at 26/3/2017].

15 | P a g e

You might also like