Blockchain Based Data Preservi
Blockchain Based Data Preservi
sciences
Article
Blockchain-Based Data-Preserving AI Learning
Environment Model for AI Cybersecurity Systems in
IoT Service Environments
Jinsu Kim 1 and Namje Park 1,2, *
1 Department of Convergence Information Security, Graduate School, Jeju National University, 61 Iljudong-ro,
Jeju-si 63294, Korea; kimjinsu@jejunu.ac.kr
2 Department of Computer Education, Teachers College, Jeju National University, 61 Iljudong-ro,
Jeju-si 63294, Korea
* Correspondence: namjepark@jejunu.ac.kr
Received: 6 May 2020; Accepted: 7 July 2020; Published: 8 July 2020
Abstract: Artificial intelligence (AI) has a limitation in that it is only in the passive cognition area,
so its operating process is not transparent; therefore, the technology relies on learning data. Since raw
data for AI learning are processed and inspected manually to assure high quality for sophisticated
AI learning, human errors are inevitable, and damaged and incomplete data and differences from
the original data may lead to unexpected outputs of AI learning for which processed data are used.
In this context, this research examines cases where AI learning data were inaccurate, in terms of
cybersecurity, and the need for learning data management before machine learning through analysis
of cybersecurity attack techniques, and we propose the direction of establishing a data-preserving
AI system, which is a blockchain-based learning data environment model to verify the integrity of
learning data. The data-preserving AI learning environment model is expected to prevent cyberattacks
and data deterioration that may occur when data are provided and utilized in an open network for
the processing and collection of raw data.
1. Introduction
With machine learning and deep learning technologies, artificial intelligence (AI) has been
developed at a fast pace to the extent that it can be used commercially, and it has been leading
innovation in various fields, including the medical, finance, robot, and culture sectors. Google has
been researching to cure incurable diseases through the AI Calico Project by using genetic data and
genealogy to extend people’s average life span. Besides this, Korea has adopted AI Doctor to control
and diagnose diseases. As such, AI technology has been realized not only at a corporate level, but at a
national level.
The Ministry of Science and information and communications technologies (ICT) mentioned
that obtaining sufficient quality data in the relevant field is crucial for the development of AI
algorithms, and they published a dataset establishment plan in many quarters through AI learning
data establishment challenges 10. To increase AI’s capability to identify hazardous materials, building
AI capacity for disease diagnosis, detecting abnormal behaviors from a community, and collecting
data from many fields including industry, distribution, medical, history, and culture, the government
has been promoting “multi-modal” video data establishment to support the development of AI with
integrated cognitive ability and with translation, situation and movement cognition, object and risk
element identification, and disease diagnosis data.
The existing AI has not been able to provide sufficient evidence of the results when presenting
information on cognition, decisions, and prediction; therefore, explainable AI is required to overcome
the limitations of AI, which is restricted to the passive recognition area. EU has increased the demand
for an explainable AI algorithm through the General Data Protection Regulation (GDPR), and in 2017,
Defense Advanced Research Projects Agency (DARPA) promoted the development of an explainable AI
algorithm through the XAI (explainable AI) project. Deep learning has a problem in that transparency
of its operating process is not guaranteed due to such functions as the black box of an artificial neural
network. To solve this reliability issue, relevant policy and technology are required. In particular,
to adopt everyday life AI such as for medical diagnosis and autonomous driving, algorithm verification
should be reinforced and data should be accurately used regarding the uncertainty of judgment on the
result of the AI’s action. It is necessary to conduct technology development for the AI system itself and
to minimize errors, while adopting a structure where malicious attacks can be defended against.
In this context, this research examines the need to manage learning data before machine learning
by analyzing cases where inaccurate AI learning data were used and cybersecurity attacking methods
in terms of cybersecurity, to improve the reliability of AI. We also intend to propose the direction of
establishing a data-preserving AI system, which is a blockchain-based learning data environment
model for the verification of learning data integrity.
2. Related Research
2.1. AI Cyberthreats
This section describes the need for cybersecurity of AI by examining forgeries or errors in AI
learning data. First of all, accidents that have occurred in AI learning data are as follows [1].
AI chatbots’ malicious learning: In 2016, Microsoft presented AI chatting robot Tay, but the
learning was closed in 16 h because of intentional messages of racial and sexual discrimination included
in the learning. Microsoft-made AI chat robot Zo also showed similar problems despite precautions
having been taken [2].
Adversarial patch-based attacks: Google Research Group announced the Adversarial Patch,
which can make the image recognition AI algorithm malfunction. Adversarial patch, a patch with
round-shape abstract images, caused malfunction in the AI algorithm, which recognized an image
when the printed patch was stuck to an object [3].
Error in pre-qualifying the recidivism rate in Brown County of Florida, USA: The COMPAS
(Correctional Offender Management Profiling for Alternative Sanctions)’s prediction of recidivism for
7000 arrests over the two years of 2013 and 2014 was racially biased, with a 44.9% chance that African
Americans would not repeat crimes, about twice as much as 23.5% of whites. It was also incorrectly
predicted that white people, repeat offenders, had a 47.7% chance of not re-offending, while African
Americans had a 28% chance of not re-offending [4].
Forgery of medical records through deep learning: The AI research group of Ben-Gurion University
of the Negev in Israel conducted an experiment where malware generated through deep learning
technology was used to manipulate patients’ 3D scan images, deceiving all of three doctors. At that
time, the problem that 3D CT (Computed Tomography) images, X-ray, and MRI were distributed
without a security system was raised [5].
Deepfake using AI: In late 2017, fake videos appeared on the U.S. social news website Reddit
through Deepfake, which changes faces using a swapping algorithm, and it was difficult to winnow
the truth from falsehood. At that time, it was claimed that responses to the leakage of AI learning data
from AI technology platforms should be prepared [6].
The above cases revealed that AI processing myriad data is vulnerable to inappropriate or
incomplete data, and when malignant data are applied, incorrect and wrong outputs can be derived
regardless of the AI functions and performance. There is a security risk that an attacker may exploit
AI based on its efficiency and dissemination by using data or carrying out attacks disturbing the AI
Appl. Sci. 2020, 10, 4718 3 of 12
3. 3.
Proposed Method
Proposed Method
3.1.3.1.
Characteristics
CharacteristicsofofAI
AILearning
LearningData
Data
Various
Varioustypes ofofAIAItechnology,
types technology,including
including deep
deep learning, arecomposed
learning, are composedofofanalysis
analysis and
and learning
learning
algorithms, computing systems, and
algorithms, computing systems, and data data learning directly related to the sophistication of algorithms.
directly related to the sophistication of algorithms.
ForFor
AIAIlearning,
learning,data
data examples
examples areare needed,
needed,and andtoto develop
develop an an AI model
AI model withwith a specific
a specific feature,
feature, an
anappropriate
appropriatedataset
datasetshould
shouldbe beestablished
establishedforforlearning.
learning.The
The learning
learning flow
flow of of
AIAI machine
machine learning
learning is is
as shown in Figure 1, and generally, data to be
as shown in Figure 1, and generally, data to be applied applied to the AI model follow the process below.
AI model follow the process below.
Figure1.1.Artificial
Figure Artificial intelligence
intelligence (AI)
(AI)machine
machinelearning
learningflow.
flow.
Data
Data Collection:This
Collection: Thisisisaastage
stageforfor collecting
collecting non-structured
non-structureddata,data,including
includingimages,
images, texts, and
texts, and
voice. In this stage, the first preprocessing is carried out, which is extracting data fit for
voice. In this stage, the first preprocessing is carried out, which is extracting data fit for the purposes the purposes
andand functions
functions ofofthe
theAIAItotobe
bedeveloped.
developed.
Data Preprocessing: In this stage,
Data Preprocessing: In this stage,thethecollected
collecteddata areare
data converted in order
converted to enter
in order them in
to enter themthe in
machine learning model, including filling or deleting missing values, selecting or deleting data
the machine learning model, including filling or deleting missing values, selecting or deleting data
properties, combining existing data properties, and converting raw data into a designated type as
properties, combining existing data properties, and converting raw data into a designated type
needed.
as needed.
Data Analysis: In this stage, data are analyzed to be applied to AI, including exploring
Data Analysis: In this stage, data are analyzed to be applied to AI, including exploring standardized
standardized data patterns, data mapping, extracting data based on exploration and inference, and
data patterns,
data learningdata
usingmapping,
some of theextracting
data. data based on exploration and inference, and data learning
using some
Raw of theindata.
data the course of data collection have some noise, have no consistency, and are often
Raw data in the
repeated, so they arecourse of data
not suitable forcollection
application have
to AIsome noise, have
algorithms. no consistency,
To ensure high quality,and are often
reliability,
repeated, so they are not suitable for application to AI algorithms. To ensure high
accuracy, and performance, a stage for improving the quality of data is necessary; in this stage, dataquality, reliability,
accuracy,
analysisandand performance,
organization is aconducted
stage forwith improving the quality
professionalism and of data including
insight, is necessary; in this stage,
modification of
Appl. Sci. 2020, 10, 4718 4 of 12
data analysis and organization is conducted with professionalism and insight, including modification
of data errors, elimination of overlapping data, deletion of inconsistent data, and coordination of data
conflicts. Preprocessing of data accounts for 80% of the entire process [10–14]. Obtaining a sufficient
amount of high-quality data is crucial for the development of AI, and quality assurance of AI learning
data is required.
Table 1 shows attack techniques applicable to AI and which use AI’s vulnerability to malignant
data. Data integrity is thus required for AI learning [15–20].
In particular, raw data need manual processing and inspection to secure data quality for
sophisticated AI learning; therefore, human errors are inevitable. Damaged and incomplete data,
as well as differences from raw data, may lead to unexpected outputs when the AI learns the processed
data. In other words, in the course of establishing an AI model, the processed learning data need to be
monitored to verify the AI’s different outputs, and at this time, the integrity of the raw data should be
guaranteed [21–24].
There are various services of collecting and processing data to be applied to AI, but the reliability
of information is still an issue. The source of copious unreliable data uploads for malicious purposes
needs to be analyzed and identified, and a traceable learning data collection environment is required.
In general, the central server exists to control open data, so if an intermediary server stops or is
modulated, many security threats can occur, including degradation of information reliability and
availability. In addition, AI learning data present a type of big data in the course of collection, so the
stability of a big data server is required [25–29].
To prevent data forgery and assure data integrity, in this research we propose a blockchain-based
learning data management method.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 12
provided [31,32].
Figure
3.3. AI 2 shows
Learning the structure
Environment Model proposed in this research, where raw data received from at least
one data provider and the hashcode of the raw data are stored in blockchain, the stored raw data are
Whentoanthe
provided AIAIlearning model
learning learns
model, andusing
thenraw
the data as they
hashcode ofare
theor dataused
data processed
for AI for learning,
leaning data
model is
forgery caused by a third party’s malicious attack must be prevented. Particularly,
compared with the hashcode of the raw data stored in blockchain to verify data integrity. Raw data when collecting
raw data
for AI from aenable
learning number of fundamental
the data providers through an
prevention ofopen network
raw data and and
forgery using them astracking
accurate learningofdata,
the
personal information must be protected
data provider in an open network environment. through encryption of the raw data [30].
The proposed
Blockchain learning
stores raw environment model for
data by encrypting thedata-preserving
block data of the AIraw
based on received
data blockchain satisfies
from the
the data
integrity
provider,requirements
along with of raw
the and processed
hashcode, in a data
blockforcomposed
AI learningofusing the structure
a block header ofandblockchain,
block data,andorit
ensures that AI learns with data that are not modulated before AI learning data
separately forms block data storing the hashcode of raw data to ensure convenience at the time of are provided [31,32].
Figureraw
processing 2 shows the structure
data according proposed
to the in this research,
characteristics of the AI where
model.raw data received from at least
one data
The provider
applicationandofthe hashcode
block chain ofin the
thisraw data are
research canstored
ensurein data
blockchain,
integritythewith
stored
theraw
factdata
that are
AI
provided to the AI learning model, and then the hashcode of the data used for
learning data are not forged according to inflexibility. Furthermore, it provides safety against AI leaning model is
compared with theand
malicious attacks hashcode of the raw
incapacitation of data storedsuch
the server, in blockchain
as DDoS to verify dataDenial
(Distributed integrity. Raw dataand
Of Service), for
AI learning enable the fundamental prevention of raw data forgery and accurate
prevents manipulation by an insider. In addition, it is free from data leakage thanks to block tracking of the data
provider
encryption in of
anthe
open network
learning environment.
data and enables tracking of data for AI outputs.
Blockchain
3.4. AI stores
Verification raw dataModel
Environment by encrypting the block data of the raw data received from the data
provider, along with the hashcode, in a block composed of a block header and block data, or separately
formsInblock
the AI
datalearning environment,
storing the hashcode ofdatarawshould
data to be verifiable
ensure through
convenience tracking
at the ofprocessing
time of raw data raw
and
processing data. Figure 3 shows the configuration
data according to the characteristics of the AI model. to verify the integrity of AI learning data. The
verification node of the system connected with the AI server over wired and
The application of block chain in this research can ensure data integrity with the fact that AIwireless networks
conductsdata
learning verification of the data
are not forged through
according connection with
to inflexibility. the blockchain
Furthermore, server.
it provides The against
safety verification node
malicious
receives data from the data provision node and stores it in blockchain, and upon
attacks and incapacitation of the server, such as DDoS (Distributed Denial Of Service), and prevents the AI server’s
request for data,
manipulation by antheinsider.
data stored in blockchain
In addition, it is freeare
fromprovided to thethanks
data leakage AI server. Thisencryption
to block process forms
of thea
verification
learning dataenvironment
and enables for AI learning.
tracking of data forTheAIdata provider layer in the verification environment
outputs.
model transmits raw data and their hashcode to the data layer. The data layer for data management
3.4.
and AI Verification
collection Environment
encrypts the data Model
and hashcode received from the provider layer and stores them in
blockchain. The AI machine learning layer receives the data and encrypted hashcode from the data
In the AI learning environment, data should be verifiable through tracking of raw data and
layer, teaches the AI, and derives results through learning patterns for model management.
processing data. Figure 3 shows the configuration to verify the integrity of AI learning data.
The verification node encrypts and stores data received from the data provision node in
The verification node of the system connected with the AI server over wired and wireless networks
blockchain with the encryption key and decryption key, or encryption and decryption keys created
conducts verification of the data through connection with the blockchain server. The verification node
in the verification node can be received by the data provision node to enable the data provider to
receives data from the data provision node and stores it in blockchain, and upon the AI server’s request
provide data after encryption using the encryption key received at the time of data provision. This
for data, the data stored in blockchain are provided to the AI server. This process forms a verification
allows data encryption using the same encryption key for a number of data users in an open network,
environment for AI learning. The data provider layer in the verification environment model transmits
which has the effectiveness of managing only a single decryption key.
raw data and their hashcode to the data layer. The data layer for data management and collection
Appl. Sci. 2020, 10, 4718 6 of 12
encrypts the data and hashcode received from the provider layer and stores them in blockchain. The AI
machine learning layer receives the data and encrypted hashcode from the data layer, teaches the AI,
and derives results through learning patterns for model management.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 12
Figure
Figure3.3.Proposed
Proposedblock
blockconfiguration.
configuration.
The verification
Table node encrypts
2 shows components in and stores
the AI data received
Verification from the data
Environment provision
Model node inof
and consists blockchain
storage,
with the encryption
providers, key and decryption key, or encryption and decryption keys created in the
and validation.
verification node can be received by the data provision node to enable the data provider to provide
Table 2. key
data after encryption using the encryption Construct of verification.
received at the time of data provision. This allows data
encryption using the same encryption key for a number of data users in an open network, which has
Scope Description
the effectiveness of managing only a single decryption key.
Savecomponents
Table 2 shows data hashcodein with data
the AI storage. Store
Verification hashcodesModel
Environment together
andinconsists
one blockchain
of storage,
Storage
providers, and validation. or save each one separately.
When requesting data from AI server, it is transmitted to the data server and stored
Provider in blockchain. Provided Construct
Table 2.after of verification.
decrypting the encrypted data in data encryption
mode.
Scope Description
Integrity verification using the hashcode of data stored in blockchain. Compare the
Save data hashcode with data storage. Store hashcodes together in one blockchain or save
Storage
Verification hashcode of data received fromeach the one
AI server to the hashcode of data stored in
separately.
blockchain.
When requesting data from AI server, it is transmitted to the data server and stored in
Provider
blockchain. Provided after decrypting the encrypted data in data encryption mode.
Figure 4 showsIntegrity
the overall structure
verification ofthe
using thehashcode
AI environment forinverification
of data stored of the proposed
blockchain. Compare the
Verification
model. hashcode of data received from the AI server to the hashcode of data stored in blockchain.
Figure 4 shows the overall structure of the AI environment for verification of the proposed model.
Upon request for the verification of data and its hashcode being received from the AI server,
the encrypted hashcode of the corresponding data is transmitted to the provision module from the
block chain ledger in the data storage module. In the provision module, the encrypted hash value
is decrypted and transmitted to the verification module. In the verification module, the decrypted
hashcode and the hashcode of the data received from AI are compared to verify whether it is forged.
Integrity verification using the hashcode of data stored in blockchain. Compare the
Verification hashcode of data received from the AI server to the hashcode of data stored in
blockchain.
Appl. Figure
Sci. 2020,410,shows
4718 the overall structure of the AI environment for verification of the proposed
7 of 12
model.
Figure 4.
Figure AIenvironment
4. AI environmentstructure
structure for
for verification.
verification.
linked list in which the blocks of data are generated one by one increases the difficulty of falsifying
the data and further enhances the integrity of the data by applying a distributed ledger structure in
which the block data are shared by network participants to recognize more than half of the data as
the ledger. This can provide more stable learning data in the AI learning environment. Figure 6
illustrates the difference between the general AI learning environment model and the proposed AI
Figure5.5.Process
learning environment model.Figure Processof
ofverification
verificationfor
forAI
AIdata.
data.
A typical AI learning model is often configured on a single server, so it is difficult to ensure the
integrity of the collected data, and the data collected and recorded on the server are learned by the
learning model. At this time, the integrity of data collected for use in learning cannot be ensured in
the server record before they are processed for learning. In this case, the AI learning model assumes
that no forgery has been carried out for the learning environment, and in the event of forgery, it is
difficult to determine when or why it occurred. In response to these problems, the architecture of the
Figure6.6.Comparing
Figure Comparingcommon
commonAI
AIlearning
learningmodels
modelswith
withthe
theproposed
proposedAIAIlearning
learningmodel.
model.
The
Themodel
modelproposed
proposedininthis thispaper
paperisisbased
basedonona adistributed
distributedand andopen-source
open-sourcesystemsystemthrough
througha a
peer-to-peer
peer-to-peer(P2P)
(P2P)environment
environmentthat thatenables
enablesdata datarecording
recordingby byvarious
variousparticipants
participantson onthe
thenetwork
network
without
without applying a separate trust organization by instead applying a blockchain. Scalabilitycan
applying a separate trust organization by instead applying a blockchain. Scalability canbebe
guaranteed,
guaranteed,and andthe
thecontents
contentsofofeach
eachblock
blockcan canbebeshared
sharedby byall
allparticipants
participantsininthethetransaction
transactionrecords
records
totoprovide
providetransparency
transparency of of
raw rawdata forfor
data learning.
learning.Also, since
Also, information
since informationabout datadata
about is shared withwith
is shared all
users on the
all users on network, it isitdifficult
the network, to manipulate
is difficult to manipulate datadatabybyconsensus
consensus algorithm,
algorithm, thereby
therebyenhancing
enhancing
security,
security,and
andunlike
unlikea normal
a normal network,
network, failure
failureonononeone node
node has
hasnonoabsolute
absolute effect onon
effect thethe
system.
system.
Based
Basedononthethe
above characteristics,
above it is possible
characteristics, to highlight
it is possible the difference
to highlight from thefrom
the difference general
thesystem.
general
Dispersibility: The generally applied network has a single server and aims to provide a
system.
server-dependent
Dispersibility:service. In this way,
The generally errornetwork
applied in the server
has a can be server
single solvedandat aaims
singleto point,
provide and since
a server-
the service provider has authority for the server, it is free to modify. However, the
dependent service. In this way, error in the server can be solved at a single point, and since the service integrity of the
data cannot
provider beauthority
has guaranteed, andserver,
for the the data it isoffree
a system
to modify.for which integrity
However, is not guaranteed
the integrity of the datacannot
cannot
be guaranteed, and the data of a system for which integrity is not guaranteed cannot be trusted. In
this regard, a blockchain that applies a distributed network to a system where data integrity is
prioritized over ownership of the system can provide improved services when compared to a general
network.
Scalability: In the case of the general network service, the axis, connection, and expansion of the
Appl. Sci. 2020, 10, 4718 9 of 12
be trusted. In this regard, a blockchain that applies a distributed network to a system where data
integrity is prioritized over ownership of the system can provide improved services when compared to
a general network.
Scalability: In the case of the general network service, the axis, connection, and expansion of
the service can be performed by the provider that provides the service. This may be more effective
in services that do not need to be modified externally. Scalability through the use of open source
in a distributed network can be proposed by various participants, and it is possible to provide an
environment in which flexible services can be created, but a method to limit services by untrusted
proposers is required.
Transparency: Data of general network services depend only on the data recorded on the server
and are not shared with participants. This method can be applied as a more effective method for
personal information. However, in the case of blockchain, since the recorded data are shared by all
participants, the shared data can be transparently provided to all participants; it is thus possible to
guarantee transparency of the data, but it is necessary to study security measures to protect information.
Security: General network services tend to rely on a single server, which presents a problem in
that integrity can be compromised through forgery and alteration of the server. For such a problem, the
strategy of the learning environment that applies blockchain is to ownership the data by a majority of
users who share the data, and some errors can be corrected by having a large number of data. However,
there is a possibility that the collected data may not have been provided in a reliable environment, and
research on a mechanism for reliably collecting the data is required.
Stability: In the case of a network service that relies on a single server, the service may be stopped
due to a fatal error of the central server, which degrades system availability. In this regard, the learning
environment using blockchain has a minimal impact on service due to fatal error of a single node in
the distributed network and can thus provide stable services. Table 3 shows a comparative analysis of
the learning model in the existing general learning environment and the learning environment model
using blockchain.
In general, the learning environment of artificial intelligence can be attacked externally, and attacks
on the learning environment are largely adversarial attacks, poisoning attacks, and evasion attacks.
In the case of adversary attack, incorrect data are injected into the learning environment; this attack
changes the appearance of the data being learned so that the AI does not properly understand the
original target. These attacks have been applied to Tesla’s automatic control system. This was achieved
by the Keen Security Lab in Tencent, China, where three dots were painted on the road and driven on
using Tesla’s electric vehicle self-driving system, which mis-recognized lanes and began reverse-driving
them [35]. The adversary attack is difficult to resolve in the proposed mechanism, which allows an
attacker to compromise the data collected in the learning environment to reach a different conclusion
from the data learned in artificial intelligence.
Poisoning attack is a prime example of an attack that intentionally allows an attacker to proceed
with the intended learning from the beginning of the learning process, and Microsoft’s chatbot Tay was
quickly completed by learning malicious comments by some malicious users, resulting in problems
such as abusive language, sexism, racism, and excessive use of provocative political remarks. Poisoning
attack is an example of how to attack the learning model itself, which shows that filters are required
for early learning data, and it is difficult to resolve under the proposed mechanism because it is not an
attack that undermines the integrity of the entered learning data.
The evasion attack is not easily distinguishable in the eyes of a typical person by the way in
which an attacker tampers with learning data to attack a learning model, and it can be perceived as
modulated learning data as a result of learning. One study produced hostile examples of images and
conducted evasive attacks, and they found that from a typical human perspective, they were successful
with a 93% chance [36]. Such an attack can be prevented by preventing the forgery of entered data in
the mechanism proposed due to tampering with learning data.
The proposed mechanism may, through the above examples, enhance the integrity of the entered
data, but it is difficult to ensure the integrity of the data before they are entered, and further research is
required on ways to enhance the integrity of the data before input.
4. Conclusions
To resolve the problem of uncertainty of determination in AI results, it is necessary to develop
technology embedded in the system, for example, for the strengthened verification of the algorithm
and the use of accurate data, while minimizing errors and defending against malicious attacks.
In this context, in this research we examined the need for the management of learning data before
conducting machine learning, by analyzing cases of the use of inaccurate AI learning data and
cyberattack techniques in terms of cybersecurity, ultimately to improve the reliability of AI. We also
proposed a data-preserving AI system configuration, which is a blockchain-based learning data
environment model to verify the integrity of learning data. The framework proposed in this research
collects and stores the blockchain structure of learning data, which is a base for AI machine learning,
ensures data confidentiality through encryption and data integrity, and also secures the reliability
of the AI model through monitoring and verification of AI learning data. Overall, it is expected to
contribute to preventing threats such as cyberattacks and data deterioration that can occur at the time
of data processing or data provision and utilization in open networks for the collection of raw data.
The proposed AI learning environment model treats data as normal data in a block chain environment,
a learning environment recorded for learning. However, the collected data not only contain data by
legitimate users, but may also contain data collected by third parties. Future research is required on
how to collect reliable data in unreliable network environments.
Acknowledgments: This work was supported by Institute for Information & Communications Technology
Promotion (IITP) grant funded by the Korea government (MSIT) [2019-0-00203, The Development of Predictive
Visual Security Technology for Preemptive Threat Response]. This work was also supported by the Ministry of
Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2019S1A5C2A04083374).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Moon, Y. The Malicious Use of Artificial Intelligence Forecasting, Prevention, and Mitigation; NIA Special Report,
2018-12; National Information Society Agency (NIA): Seoul, Korea, August 2018.
2. Fuchs, D.J. The Dangers of Human-Like Bias in Machine-Learning Algorithms. J. Mo. ST’s Peer Peer 2018, 2, 1.
3. Brown, T.B.; Mané, D.; Roy, A.; Abadi, M.; Gilmer, J. Adversarial patch. In Proceedings of the 31st Conference
on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017.
4. Dressel, J.; Farid, H. The accuracy, fairness, and limits of predicting recidivism. J. Sci. 2017, 4, eaao5580.
[CrossRef] [PubMed]
5. Mirsky, Y.; Mahler, T.; Shelef, I.; Elovici, Y. CT-GAN: Malicious tampering of 3D medical imagery using deep
learning. In Proceedings of the 28th USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August
2019; pp. 461–478.
6. Güera, D.; Delp, E.J. Deepfake video detection using Recurrent neural networks. In Proceedings of the
15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland,
New Zealand, 27–30 November 2018.
7. Kim, Y.; Woo, J.; Lee, J.; Shin, J.S. High-quality data collection for machine learning using block chain. J. Korea
Inst. Inf. Commun. Eng. 2019, 23, 13–19.
8. Aum, S. Artificial Intelligence Learning Data Productivity Improvement System based on Label Type Data
Management Using Block Chain, and Method Thereof. KR Patent 1020180153330, 3 November 2018.
9. Aum, S. Automatic Inspection System for Label Type Data Based on Artificial Intelligence Learning to
Improve Data Productivity, and Method Thereof. KR Patent 1020180153327, 3 November 2018.
10. Azaria, A.; Ekblaw, A.; Vieira, T.; Lippman, A. MedRec: Using blockchain for medical data access and
permission management. In Proceedings of the Conference of Open and Big Data, Vienna, Austria, 22–24
August 2016; pp. 25–30.
11. Woo, Y.; Lee, S.; Choi, W.; Ahn, C.; Baek, O. Trend of Utilization of Machine Learning Technology for Digital
Healthcare Data Analysis. Electron. Telecommun. Trends 2019, 34, 98–110.
12. Park, N.; Kim, B.; Kim, J. A Mechanism of Masking Identification Information regarding Moving Objects
Recorded on Visual Surveillance Systems by Differentially Implementing Access Permission. Electronics
2019, 8, 735. [CrossRef]
13. Kim, J.; Park, N.; Kim, G.; Jin, S. CCTV Video Processing Metadata Security Scheme Using Character Order
Preserving–Transformation in the Emerging Multimedia. Electronics 2019, 412, 412. [CrossRef]
14. Park, N.; Sung, Y.; Jeong, Y.; Shin, So.; Kim, C. The analysis of the appropriateness of information education
curriculum standard model for elementary school in Korea. In Proceedings of the International Conference
on Computer and Information Science, Singapore, 6–8 June 2018; Springer: Berlin, Germany, 2018; pp. 1–15.
15. Lee, D.; Park, N.; Kim, G.; Jin, S. De-identification of metering data for smart grid personal security in
intelligent CCTV-based P2P cloud computing environment. J. Peer Peer Netw. Appl. 2018, 11, 1299–1308.
[CrossRef]
16. Lee, D.; Park, N. Electronic identity information hiding methods using a secret sharing scheme in
multimedia-centric internet of things environment. J. Pers. Ubiquitous Comput. 2018, 22, 3–10.
17. Lee, D.; Park, N. Geocasting-based synchronization of Almanac on the maritime cloud for distributed smart
surveillance. Supercomputing 2017, 73, 1103–1118. [CrossRef]
18. Park, N.; Bang, H. Mobile middleware platform for secure vessel traffic system in IoT service environment.
J. Secur. Commun. Netw. 2014, 9, 500–512. [CrossRef]
19. Park, N.; Kang, N. Mutual Authentication Scheme in Secure Internet of Things Technology for Comfortable
Lifestyle. J. Sens. 2015, 16, 1–16. [CrossRef]
20. Park, N.; Kwak, J.; Kim, S.; Won, D.; Kim, H. WIPI Mobile Platform with Secure Service for Mobile RFID
Network Environment. J. AWNTA 2006, 741–748. [CrossRef]
Appl. Sci. 2020, 10, 4718 12 of 12
21. Park, N.; Hu, H.; Jin, Q. Security and Privacy Mechanisms for Sensor Middleware and Application in Internet
of Things (IoT). J. Distrib. Sens. Netw. 2016. [CrossRef]
22. Se, J. Business Value of Blockchain and Applications of Artificial Intelligence. J. AJMAHS 2018, 8, 779–789.
23. Ku, J.H. A Study on Adaptive Learning Model for Performance Improvement of Stream Analytics. J. Converg.
Inf. Technol. 2018, 8, 201–206.
24. Choi, J. A study on the standardization strategy for building of learning data set for machine learning
applications. J. Digit. Converg. 2018, 16, 205–212.
25. Frost, R.; Paul, D.; Li, F. AI pro: Data processing framework for AI models. In Proceedings of the IEEE 35th
International Conference on Data Engineering (ICDE), Macau SAR, China, 8–11 April 2019; pp. 1980–1983.
26. Aoaddah, A.; Elkalam, A.A.; Ouahman, A.A. FairAccess: a new Blockchain-based access control framework
for the Internet of Things. J. Secur. Commun. Netw. 2017, 9, 5943–5964. [CrossRef]
27. Lee, J.; Kang, S.; Kim, S. Study on the AI Speaker Security Evaluations and Countermeasure. J. Korea Inst. Inf.
Secur. Cryptol. 2018, 28, 1523–1537.
28. Kim, J.; Kim, S.; Park, N. Face Information Conversion Mechanism to Prevent Privacy Infringement. J. KIIT
2019, 17, 115. [CrossRef]
29. Kim, J.; Park, N. Inteligent Video Surveilance Incubating Security Mechanism in Open Cloud Environments.
J. KIIT 2019, 17, 105–116. [CrossRef]
30. Park, N.; Kim, M. Implementation of load management application system using smart grid privacy policy
in energy management service environment. Clust. Comput. 2014, 17, 653–664. [CrossRef]
31. Lee, D.; Park, N. A Proposal of SH-Tree Based Data Synchronization Method for Secure Maritime Cloud.
J. Korea Inst. Inf. Secur. Cryptol. 2016, 26, 929–940. [CrossRef]
32. Lee, D.; Park, N. A Secure Almanac Synchronization Method for Open IoT Maritime Cloud Environment.
J. Korean Inst. Inf. Technol. 2017, 15, 79–90. [CrossRef]
33. Singh, S.; Sharma, P.K.; Yoon, B.; Shojafar, M.; Cho, G.H.; Ra, I.-H. Convergence of Blockchain and Artificial
Intelligence in IoT Network for the Sustainable Smart City. Sustain. Cities Soc. 2020, 1, 102364. [CrossRef]
34. Singh, S.K.; Rathore, S.; Parkm, J.H. BlockIoTIntelligence: A Blockchain-enabled Intelligent IoT Architecture
with Artificial Intelligence. Future Gener. Comput. Syst. 2020, 110, 721–743. [CrossRef]
35. Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust
physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018.
36. Qian, Y.; Ma, D.; Wang, B.; Pan, J.; Wang, J.; Chen, J.; Zhou, W.; Lei, J. Spot Evasion Attacks: Adversarial
Examples for License Plate Recognition Systems with Convolutional Neural Networks. Comput. Secur. 2020,
95, 101826. [CrossRef]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
© 2020. This work is licensed under
http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.