Big Data Analytics For Healthcare Organization A S
Big Data Analytics For Healthcare Organization A S
4, 189-196 (2017)
ASTESJ
www.astesj.com
ISSN: 2415-6698
Big Data Analytics for Healthcare Organization, BDA Process, Benefits and Challenges of BDA: A Review
Siva Sankara Reddy Donthi Reddy*, 1, Udaya Kumar Ramanadham2
1
Department of Computer Science & Engineering, BIHER, Bharath University, Chennai, Tamilanadu, India
2
Department of Information Technology, BIHER, Bharath University, Chennai, Tamilanadu, India
1. Introduction petabytes or exabytes. According to [3], with such fast and rapid
growth of data, U.S. healthcare alone will soon reach the zettabyte
In the digital world, data are generated as large sets from
(1021 gigabytes) scale. The main goal of healthcare industry is to
various sources. The fast transition from conventional to digital
analyse this big volume of data for unknown and useful facts,
technologies has contributed to the growth of big data. It provides
patterns, associations and trends with the help of machine learning
evolutionary breakthroughs in many fields with collection of large
algorithms, which can give new innovative techniques for
datasets. Big Data is generated everyday by diverse segments of
treatment of various diseases. The aim is to provide high quality
industries like business, finance, manufacturing, healthcare,
healthcare at lower cost to all. This can be a beneficial one for the
education, research and development etc. In general, it refers to
entire world. Big Data sources are showed in the following Figure
the collection of large and complex datasets which are difficult to
1.
store and process using traditional database management tools or
data processing applications. So there is need of developing and 2. Characteristics of Big Data:
using an effective, innovative tools and technologies offered by
Big Data. Data can be of structured, unstructured and semi- The 5 V’s of Big Data relevant to Healthcare are:
structured type. Different variety of data include the text, audio, i) Volume: As described earlier, healthcare industry
video, log files, sensor data etc. in petabytes and beyond. As the produces the variety of data with more growth rate.
data is too big from various sources in different form, it is According to EMC report and the research firm IDC, the
characterized as 5 V’s. The 5 V’s of Big Data are: Volume, healthcare data increases with 48 per cent annually. In 2013
Variety, Velocity, Veracity and Value [1].Volume represent the year, the healthcare data was 153 Exabyte’s and it may
size of the data - how large the data is. The size of the data can be increase to 2,314 Exabyte’s by 2020.[1-2]
represented in terabytes and petabytes. Variety represents the data
which appears in different forms. Velocity represents the motion ii) Variety: In the past, the healthcare organization was
of the data and the analysis of streaming of the data. Veracity generating clinical data of patients with similar symptoms,
represents the availability and accountability of various sizes of storing and analysing it to derive the most effective course
data. Value represents the high quality of data. The Big Data helps of treatment for the admitted patient. Now the healthcare
more to healthcare in the world [9].The healthcare organization industry is focusing on complete healthcare, by providing
has generated large amount of data till date, which is scaled in an effective treatment through analysis of a patient’s data
from various other sources also. This refers to the variety.
*
Corresponding Author: Udaya Kumar Ramanadham, Professor, Department of Generally, the varied health care data falls into one of the
Information Technology, BIHER, Bharath University, Chennai, Tamilanadu, three categories - i.e. structured, semi structured and
India, Contact No: (+91) 9789994242, E-Mail: rsukumar2007@gmail.com unstructured. Generally the following data is collected:
www.astesj.com 189
D. S. S. Reddy et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 4, 189-196 (2017)
clinical data from Clinical Decision Support systems for anesthesia, bedside heart monitors, etc.) can mean the
(CDSS) (physician’s notes, genomic data, behavioural difference between life and death.[8-9]
data, data in Electronic Health Records (EHR), Electronic v) Value: It refers to the quality of data. The data of EMR’s
Medical Records (EMR)), machine generated sensor data, and EHR’s are recognized as high value data normally. But
data from wearable devices, Medical Image data (from CT it is too difficult to certify the value of data from social
scan, MRI, X Ray’s etc.), medical claim related data, media. So, the effective analytical methods are needed for
hospital’s administrative data, national health register data, the high value data to lead for better quality, effective
medicine and surgical instruments expiry date healthcare solutions and innovations.
identification based on RFID data[3-6], social media data
like Twitter data, Facebook data, web pages, blogs and The following Figure 2 depicts the 5 V’s of Big Data in
various articles.[7] Healthcare.
Suppo
Volume Velocity
rt
Batch
Storage Terabytes
Cloud Records/Arch Real Time
Transactions Processes
Data Streams
Tools
Tables, Files
base
Statist
ics
5 V’s of Statistical
Structured
BIG Mobile Unstructured
Big Events
DATA Multi-Factor Data Correlations
Probabilistic Hypothetical
Analyze
Informat Trustworthiness
ion Value
Proces NoSQL Tera Variety Authenticity
sing bytes Orgin, Reputation
Availability
Accountability Veracity
Figure 1: Big Data Sources
Pre-
Acquisition Integration Analysis Interpretation
processing
(Model)
Application (phase 2)
Classification
Regression
Segmentation
Input Output Association
Apply
Data, some linked Algorithm Tailored Results Sequence
to Individuals
Refine Models
www.astesj.com 192
D. S. S. Reddy et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 4, 189-196 (2017)
ii) Cleaning of Data: Generally, the healthcare data is seen as The following Figure 3 depicts the functional architecture of
flaws like many patients don’t share their data completely Big Data Analytics process steps.
like data about their dietary habits, weight and lifestyle. In 5.1 Technology Support for Big Data Analytics in Health Care:
this type of cases the empty fields need to be filled
appropriately. Another example, the gender can be either at There are large number of open source and proprietary platforms
most one of two values i.e. male or female. In case any other and tools available in the market. Some of them are Hadoop, Map
value or no value is present then such entries need to updated Reduce, Storm, Grid Grain. Big Data Databases like Cassanadra,
and handled accordingly. The data from sensors, HBase, Mongo DB, Couch DB, Orient DB, Terrastore, Hive etc.
prescriptions, medical image data and social media data need
Data Mining tools like RapidMiner, Mahout, Orange, Weka,
to be expressed in a structured and suitable form for
Rattle, and KEEL etc. File Systems like HDFS and Gluster.
performing effective analysis.[2]
iii) Integration of Data: The BDA process makes use of data Programming Languages like Pig/PigLatin, R, and ECL. Big Data
where accumulated across various platforms. This data may Search Tools like Lucene, Solr etc. Data Aggregation and
be varied in metadata (the number of fields, type, and format). Transfer Tools like Sqoop, Flume, and Chukwa. Other tools like
The total data can be grouped correctly and consistently into Oozie, Zookeeper, Avro, and Terracotta. Some open source
a dataset which can be effectively used for data analysis platforms are also available like Lumify, IKANOW [11].
purpose. This is a very challenging task, considering the big
volume and variety of big data. The criteria for platform evaluation may be varied for different
iv) Querying, Analysis and Interpretation of Data: After organizations. Generally the ease of use, availability, the
cleaning of data and integration, the next step is to query the capability to handle voluminous data, support for visualization,
data. A query can be simple one like what is mortality rate in high quality assurance, cost, security can be some of the variables
a particular area? Or complex query like how many patients to decide upon the platform and tool to be used. Some of the
with diabetes would be likely to develop heart related platforms and tools are mentioned the following Table
problems in next 6 years? Based upon the complexity of the
query, the data analyst can choose appropriate platform and
analytic tool.
Table 1 Platforms & Tools for Big Data Analytics in Healthcare
www.astesj.com 193
D. S. S. Reddy et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 4, 189-196 (2017)
Oozie, an open source project, iii) Insurance Companies: The government is reimbursed the
Oozie streamlines the workflow and large amount of expenditure for giving medical claims for
coordination among the tasks. patients. We can analyze, identify, predict and minimize
The Lucene project is used the possible frauds related to medical claims by using BDA.
Lucene widely for text [3]
analytics/searches and has been iv) Pharmaceutical Companies: By using BDA techniques
incorporated into several open effectively, the R&D can help pharmaceutical companies
source projects. Its scope to produce drugs that may be most effective for treating a
includes full text indexing and specific disease with in the shorter period.
library search for use within a v) Government: The BDA can help in improving the public
Java application. health surveillance and speed up the response to disease
Avro facilitates data outbreaks. The government can use demographic data,
Avro serialization services. historical data of disease outbreak, weather data, data from
Versioning and version control social media over disease keywords like cholera, flu etc.
are additional useful features. BDA can analyze this massive data to predict epidemics,
Mahout is yet another Apache finding correlation between the weather and likely
project whose goal is to generate occurrence of disease. Therefore preventive measures can
Mahout free applications of distributed be taken to avoid the same. [3]
and scalable machine learning
algorithms that support big data 7. Big Data Analytics - Challenges:
analytics on the Hadoop
platform. The advantages of big data are more for healthcare, but there
are number of challenges which can be broken up.
6. Big Data Analytics Benefits in Healthcare
i) Unstructured and Provenance of Data: The BDA process
The massive amount of data provides the opportunities for can collect data from different sources. Most of the data is
researchers in the Healthcare field to use tools and techniques for unstructured data like medical prescriptions, blogs, tweets,
opening the hidden answers. Big Data Analytics tools and status updates, and comments. It is necessary to generate
techniques can be applied in effective way on large sets of data right metadata for this unstructured data and transform it
then the following benefits will be given: into a structured format. The image and video data should
i) Individuals/Patients: Generally, when treatment is be structured for semantic content and search. By using data
given to a patient, then the historical data can be considered such analysis process, the provenance of data along with its
as a set of similar patients about the symptoms, drugs used metadata should be carried out so it is easy to track the
outcome/response of different patients. With the help of BDA, the processing steps when error generates [3]. Some intelligent
specific treatment is given for a patient based on his genomic data, processing techniques should be proposed to deal the data
location, weather, lifestyle, medical history, response to certain input from sensors and wearable devices. This will help to
medicines, allergies, family history etc. When the genome data is filter/derive the meaningful data, which can then be stored
fully explored for some kind of relation and it can be established on permanent storage. Therefore it will save space.
between the DNA and a particular disease. Then the specific line ii) Missing or Incomplete Data: Some patients may hide their
of treatment can be constructed for every individual. The patients personal information about his/her life style at the time of
will benefit in the following ways: filling forms or oral interviews by doctors. Some fields may
• Correct and effective treatment can be applied. be empty at the time of storing the data in digital format.
• Health related issues will be known in better way. Sometimes it may happen that some of the fields produce
• Preventive steps can be taken in time. wrong results. If analysis is done on the empty or wrong
• Continuous health monitoring at patients location using fields of data, then it may or may not get processed. In both
wearable wireless devices. the cases they produce wrong results. If we leave some
• Designing specialized treatment for patient. records as empty then the analysis may not on cumulative
• Life expectancy and quality will be found in advance. data. If we take wrong value fields then the analysis is
ii) Hospitals: By using effective BDA techniques on the data incorrect and unreliable. This type of issues will be
availability, the hospitals can get following benefits: addressed.
• Predict the patients staying and readmission information. iii) Quality of Data: When we consider data from social media,
then we need to ensure that data whether it is a valid data or
• New healthcare plans will be developed to prevent
not. So it is great challenge to determine the validation and
hospitalization.
quality of data.
• Various questions can be answered by analyzing the data
iv) Technical Challenges: There are different technical
using BDA tools and techniques regarding disease
challenges.
treatment.
• Data aggregation with different database management
• The hospital management can take and manage
systems is also a great challenge in BDA. By dividing
administrative decisions in the better way.
certain standard database design practices meant for a
www.astesj.com 194
D. S. S. Reddy et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 4, 189-196 (2017)
specific domain like healthcare, financial sector etc., it BDA solutions can also help clinicians and
can be made easier [3]. We are required more epidemiologists performing analyses across patient
technological standards and protocols for different populations and care venues to help identify disease
database management systems to integrate seamlessly. trends.
• The traditional algorithms can be scaled up to handle the • Clinical Operations: BDA can produce accurate
big volume of data in data mining processes or analysis. solutions for clinical operations without waiting for
The processors speed has come to a point beyond which longer time to take fast decisions.
it’s hard to increase in parallelogram process. So the • Policy, Financial and Administrative: BDA has
trend can be moved towards multi-core processors. In supported the decision makers to integrate and analyze
such a scenarios, we need statistical algorithms which data related to key performance indicators on policy and
can be parallelized otherwise the computing financial aspects.
performance will decrease when they handle complex 9. Conclusion and Future Work
big volume data.[9] The interactive response time is
another big problem while apart from this scaling Big Data Analytics in healthcare is evolving into a promising
complex query processing techniques to terabytes. [3] field for giving new insights from huge data sets and
• An analysis is more useful if a non-technical person is improving results while reducing costs. Its strength is high;
able to understand and interpret it. The large volume and however there are more challenges to overcome. Big Data
variety of data is too hard to represent it visually in a Analytics has the potential to transform the way healthcare
more understandable and easy way. A user should be providers from traditional ways to more suitable and right
able to perform the repeated analysis with the different tools and technologies to gain insight from their clinical and
set of assumptions, data sets and parameters. It will help other data repositories and make constructive decisions. In
the user to better understand the analysis process and the future we’ll see the rapid, widespread implementation and
verify whether the system works in a required way or not. use of big data analytics across the healthcare organizations
and the healthcare industry. To that end, the challenges must
• We need careful evaluation process to use the best
be discussed and see the overcoming measures. As big data
platform and tool for market floods. analytics become more important, more attention will be
v) Data Security: Data Security is another major challenge as required, due to some issues such as guaranteeing privacy,
more and more data is digitized. Most of the people are not safeguarding security, establishing standards and governance,
willing to share their personal data with a fear of security and continually improving the tools and technologies. Big
breach. If there is assurance for data security, then the data analytics and applications in healthcare are at an initial
problem can be managed. There should be strict government stage of development, but rapid advancements of Big Data
policies and norms for what data can be shared and what not. platforms and tools can accelerate their maturing process.
Apart from this, strong technological hardware and software Conflict of Interest
level security precautions and measures should be
implemented to prevent the hacking and interpreting The authors declare no conflict of interest.
malicious code. Acknowledgment
vi) Lack of Experts: There is a more shortage of qualified and I would like to thank to all people who help me prepare this paper
experienced data scientists in the world. So it is necessary to completely. I would also thank to my guide who help me and get
create an expertise in the field of data science to turn the proper suggestions. I would like to thank to all website and journal
promises of big data into reality. papers which I have referred to create my review paper
8. Innovative Ideas and Solutions: successfully.
The following are some possible new innovative ideas and The authors would like to thank all reviewers and Prof. Passerini
solutions of Big Data in Healthcare industry. Kazmerski, Editor for his valuable comments on the manuscript.
• Clinical Decision Support: BDA technologies predict References
outcomes or recommend alternative treatments to
[1] Jasleen Kaur Bains, “Big Data Analytics in Healthcare- Its Benefits, Phases
clinicians and patients at the point of care by and Challenges” , International Journal of Advanced Research in Computer
understanding, analyzing, categorizing and learning Science and Software Engineering, Volume 6, Issue 4, April 2016,Available
from them. online at: www.ijarcsse.com
Wullianallur Raghupathi and Viju Raghupathi, “Big data analytics in
• Personalized Care: By predicting and analyzing disease [2]
healthcare: promise and potential”, Health Information Science and Systems
symptoms in advance personalized care is taken (e.g., 2014, 2:3, Available: http://www.hissjournal.com/content/2/1/3
genomic DNA sequence for cancer care) in real time to [3] VivekWadhwa,”The rise of big data brings tremendous possibilities and
highlight best practice treatments to patients. These frightening
perils”,April2014.Available:http://www.washingtonpost.com/blogs/innovat
solutions may offer early detection and diagnosis before
ions/wp/2014/04/18/therise-of-big-data-brings-remendous-possibilities-
a patient develops disease symptoms. and-frightening-perils/
• Public And Population Health: BDA solutions that can [4] D. Agrawal et. al, “Challenges and Opportunities with Big Data”, Big Data
help in searching and identifying patient population via WhitePaper-Computing Research Association, Feb-2012, Available:
http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf
social media data to predict flu outbreaks based on [5] Nambiar, R. ; Cisco Syst., Inc., San Jose, CA, USA ; Bhardwaj, R. ; Sethi,
consumers’ search, social content and query activity. A. ; Vargheese, R.,”A look at challenges and opportunities of Big Data
www.astesj.com 195
D. S. S. Reddy et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 2, No. 4, 189-196 (2017)
analytics in healthcare”, IEEEConference 2013, Available:
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6691753&url=http
%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3
D6691753
[6] Ahmed E. Youssef,” A Framework for Secure Healthcare Systems Based on
Big Data Analytics in Mobile Cloud Computing Environments”, The
International Journal of AmbientSystem and Applications 06-2014,
Available: http://airccse.org/journal/ijasa/papers/2214asa01.pdf
[7] J. Archenaa, E.A. Mary Anita,” A Survey of Big Data Analytics in
Healthcare and Government”, Procedia Computer Science, Elsevier,
Volume 50, 2015, Pages 408–413,Big Data,Cloud and Computing
Challenges, Available:
http://www.sciencedirect.com/science/article/pii/S1877050915005220
[8] Matthew Herland, Taghi M Khoshgoftaar and RandallWald, “A review of
data mining using bigdata in health informatics”, Herland et al. Journal of
Big Data 2014, Springer, 1:2 Available:
http://www.journalofbigdata.com/content/1/1/2
[9] MH Kuo, T Sahama, AW Kushniruk, EM Borycki, DK Grunwell, ―"Health
big data analytics: current perspectives, challenges and potential solutions",
International Journal of Big Data Intelligence ,Vol. 1, Issue 1, pp.114-126.
[10] Bernard Marr, "How Big Data Is Changing Healthcare", Available:
http://www.forbes.com/sites/bernardmarr/2015/04/21/how-big-data-is-
changing-healthcare/
[11] “Improve Healthcare Win $3,000,000”, Available:
http://www.heritagehealthprize.com/c/hhp
[12] Cynthia Harvey, “50 Top Open Source Tools for Big Data", Available:
http://www.datamation.com/data-center/50-top-open-source-tools-for-big-
data-1.html
[13] “Big Data Provides True Picture of Diabetic Population”, Available:
http://www.sas.com/en_us/news/sascom/2014q1/nz-ministry-of-health.html
http://www-01.ibm.com/common/ssi/cgi-
bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_YT_YT_USE
N&htmlfid=YTC03753USEN&attachment=YTC03753USEN.PDF
[14] Health Analytics, Available: http://datascience.columbia.edu/health-
analytics
[15] http://www.ibm.com/smarterplanet/us/en/ibmwatson/assets/pdfs/WellPoint
_Case_Study_IMC14792.pdf
[16] Linda L. Briggs, “BigData means better care at Seattle's Children Hospital",
Available: http://tdwi.org/articles/2013/08/13/big-data-analytics-smarter-
care.aspx
[17] Kiyana Zolfaghar, Naren Meadem, Ankur teredesai, Senjuti Basu Roy, Si-
Chi Chin.“Big Data Solutions for Predicting Risk-of-Readmission for
Congestive Heart Failure Patients”.2013 IEEE International Conference on
Big Data, 978-1-4799-1293-3/13.
http://dx.doi.org/10.1109/bigdata.2013.6691760
[18] Joseph M. Woodside. Virtual Health Management, 2014 11th International
Conference on Information Technology New Generations 978-1-4799-3187-
3/14. http://dx.doi.org/10.1109/itng.2014.124
www.astesj.com 196