Journal of Physics: Conference Series
PAPER • OPEN ACCESS
Malware Detection: Issues and Challenges
To cite this article: Muchammad Naseer et al 2021 J. Phys.: Conf. Ser. 1807 012011
View the article online for updates and enhancements.
This content was downloaded from IP address 157.119.40.46 on 21/04/2021 at 15:22
ICSINTESA 2019 IOP Publishing
Journal of Physics: Conference Series 1807 (2021) 012011 doi:10.1088/1742-6596/1807/1/012011
Malware Detection: Issues and Challenges
Muchammad Naseer1; Jack Febrian Rusdi1; Nuruddeen Musa Shanono2; Sazilah
Salam3; Zulkiflee Bin Muslim3; Nur Azman Abu3; Iwan Abadi4.
1
Informatics, Sekolah Tinggi Teknologi Bandung, Bandung, Indonesia.
2
Kano University of Science and Technology, Wudil, Kano, Nigeria.
3
Faculty of Information and Communication Technology, Universiti Teknikal
Malaysia Melaka, Melaka, Malaysia.
4
Informatics Engineering, Universitas Langlangbuana, Bandung, Indonesia.
Corresponding author: inijack@gmail.com; jack@sttbandung.ac.id
Abstract. Malware is a severe threat that makes computer security more vulnerable. Many
studies have been conducted to improve the capability of detection techniques. However, there
is a lack of analysis of the current trend of IDS. This paper is about extracting and analyzing the
latest detection techniques which had been conducted by various studies. This paper will also
emphasize the current challenges of malware deployment from recent studies. Finally, the
similarities and differences between the detection techniques will be exposed, and the issues and
problems related to detection techniques will highlight as well. In the future, this paper outcome
can be used to highlight the current topic addressed in malware research.
1. Introduction
In today's world of computing, Malware is a grave threat that makes computer security more vulnerable
[1]. Generally, various malware activities firmly bound by the opening of information and networks that
exist today [2], [3], including network management and settings [4], [5]. The act of malicious activity
continues to grow exponentially, and it is getting very sophisticated [6], [7]. It exposes the computer
system to the possibility of being attacked or harmed, employing Internet or data communication [8].
The internet and data communication, both through existing technology and various possibilities for
future development [5], provide opportunities for the growth of this malware.
The term Malware comes from merging the two words “Malicious” and “Software.” A Malware is a
piece of program that installed on a system without the knowledge of the owner to steal sensitive
information, accesses private data, and also harms the system by altering the functionality of some
legitimate applications which slows down the system [9]–[11].
Malware needs to be identified and removed from the infected system to avoid the leakage of
sensitive data or any other malicious activity in the computer. Malware classified into viruses, worms,
Trojans, Spywares, Adwares, Rootkits [12], [13].
In the next section, the remaining part of the paper divided into different sections. Section 2 highlights
the malware overview. Section 2.1 presents the malware detection techniques classification. Sections
2.2 discussed the malware detection analysis. Section 3 discusses the issues and limitations. Lastly, the
study concluded in Section 4.
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICSINTESA 2019 IOP Publishing
Journal of Physics: Conference Series 1807 (2021) 012011 doi:10.1088/1742-6596/1807/1/012011
2. Malware overview
Malware (short for malicious software), is usually considered as software that aims to disrupt regular
activities of a computerized system by gathering sensitive information or making unauthorized access
to computer systems and mainly irritate clients [13].
Malware is divided into virus, worm, and trojan [13], [14].
A computer virus is a code that replicates when inserting into other programs. A necessary caution
is that in order for viruses to function, a virus needs an existing host program for it to cause harm [9].
Unlike a virus, computer worm replicates itself by executing its code independent of any other program
[11]. In general, viruses attempt to spread through programs/files on a single computer system [15].
While worms spread through network connections to infect as many computer systems connected to the
network as possible [9], and strengthened by the high reliance on the use of technology for humans
today [16]. A Trojan horse is a malware embedded by its designer in an application or system. The
application or system would appear to perform some useful function but is performing some
unauthorized action like capturing the user’s information through keystrokes and sending it to a
malicious host. [9], [17]
Even though there are lots of security solutions available in the current market like antivirus, SSL
certificate encryption, firewall protection, these security solutions only provide temporary protection
due to their defensive mode. Hence, the new defensive solution must be updated consistently to ensure
it continuously protects the information against malicious activities or software.
Recently, through universities as higher education institutions and research institutions [18], [19],
various studies have proposed new techniques and approaches to solve these problems, such as limited
storage, which resulted in the existence of a large amount of literature in the field of study. In this paper,
we conducted a review study of the existing literature in the field of study to highlights the existing
research issues and challenges. The outcome of our work shows that they need to reduce signature
repository and increase accuracy score.
Table 1: Malware Detection Techniques Classification
Detection techniques Definition Benefit Limitation
Signature-based It is the most generally utilized - Straight forward and - Requires a forward mark database as
antivirus method. A signature relatively fast malware not present in the database will not
is a succession of bytes that can - Successful against most be recognized.
be utilized to distinguish regular sorts of malware. - Relatively basic obscurity system can be
malware. [8], [12], [20], [21] utilized to dodge this method.
- The archive refreshed as often as possible
as new threats found.
Behavior-based It centers around the activities Systematical conduct - Both benign and malware examined amid
performed By the malware investigation of the the preparation stage.
amid execution [6], [12], [22] suspected malware -Classification of them will be only during
the execution phase.
Statistical based Properties derived from Has served a benchmark Visible only when utilizing HMMs as the
program features as in Hidden in an assortment of basis for the malware detection schemes
display are utilized to arrange different investigations
transformative/metamorphic
malware [23]
Heuristic Techniques Primarily utilize machine -Distinguished - Expansive arrangement of produced rules
learning and data mining polymorphic and obscure for building classifier.
strategies to recognize the malware.
conduct of the running project. -Fewer false positive than
[12], [13] different scanners.
Anomaly-based Typically happens in two -Increasing the rule - Ability to be tricked by an effectively
stages, a preparation (learning) set helps in less false conveyed assault
stage and an identification positive alarms. -A high false-positive ratio
(monitoring) stage. [7], [10], -A novel attack for which
[24]–[26] a signature does not exist
can be recognized.
2
ICSINTESA 2019 IOP Publishing
Journal of Physics: Conference Series 1807 (2021) 012011 doi:10.1088/1742-6596/1807/1/012011
2.1. Malware Detection Techniques Classification
2.2. Classification of Malware detection techniques seen from several aspects, such as those based on
Signature-based, Behavior-based, Statistical-based, Heuristic techniques, and anomaly-based. This
classification technique, as shown in Table 1.
Table 2: Malware Detection Analysis
Author Analysis Design Implementation Testing
Huda et al. (2018) Information sent to the Data will be tested to Framework for feature Wrapper detection
[27] sandbox to produce a produce the extracted run selection, estimate model engine will verify
behavioral log file. time feature, file parameters for wrapper, whether to result is
training, and test data. most significant run time harmful or not
behavior.
Narudin et al. (2016) The first stage is data Feature selection and extraction, the TCP packets are The machine learning
[28] collection, filtered; then, features selected from among various classifier entails the final
which captures the network network features extracted, labeled, and stored in a phase, whereby the
traffic of normal and database to applied in the next phase. information
malicious in the database trains the
applications and transmits it machine learning
to the next phase. classifier to produce a
detection model.
Noor et al. (2018) A malware sample executed AEMS employs a built-in-the box monitoring The results are generated
[29] in a sandbox, which based on mechanism in the form of a dynamic link library in the form of an
Cuckoo malware analysis deployed as a kernel driver. execution profile,
engine. describing the malware
behavior
categorized in the
network activity, file
system activity, and
system calls.
Talha et al. (2015) APK Auditor client offers an Applications are stored The APK Auditor central server manages the analysis
[30] analysis request, in the APK Auditor process and works as a link between the signature
showing whether the signature database server database and the Android client while analyzing
application can be trusted or together with the results requested applications.
not. of the analysis.
Ambusaidi et al. Data collection is where Data preprocessing, Classifier training is Attack
(2016) sequences of network Is where training and test where the model recognition is where the
[31] packets are collected data are preprocessed for classification is trained classifier is used
and trained using LS-SVM to detect
essential features that intrusions on the test
can distinguish one class data.
from the others are
selected
Ali Mirza et al. Prepares file by eliminating Obfuscated parts from the extracted features Analysis result of
(2018) any apparent obfuscation, it eliminated, and the features arranged in a format that module output stored in
[32] is then thoroughly is understandable by the classification module. the analysis repository.
analyzed and all the possible
features are
Tong & Yan (2017) Data gathered about runtime Malicious and a regular Different patterns The runtime system
[33] system calls of a set of pattern set built by between malicious and calling data collected
known malware and extracting the patterns benign apps inserted into about both individual
benign apps using a dynamic from the collected data the malicious pattern set, system calls, and
method in order to traverse and different patterns sequential system calls
most app features between normal and with different depth, and
malicious apps inserted then the target patterns
into the familiar pattern are extracted.
set.
2.3. Malware Detection Analysis
2.3.1. Malware Detection Construction. Analysis of the detection of construction of Malware was
carried out by several researchers, as shown in Table 2. The criteria for detecting this construction
include review based on Design, Implementation, and Testing.
3
ICSINTESA 2019 IOP Publishing
Journal of Physics: Conference Series 1807 (2021) 012011 doi:10.1088/1742-6596/1807/1/012011
2.3.2. Malware Detection Characteristics
2.3.3. Malware has several characteristics. Related to the characteristics of malware, some of which are
seen based on novelty, advantages, and disadvantages based on previous research, as shown in Table 3.
Table 3: Malware Detection Characteristics
Author Novelty Advantages Disadvantages
Huda et al. (2018) - A hybrid-multi filter-wrapper based Improves the detection All hybrids in the framework
[27] framework was accuracies by taking advantages reduce the computational
proposed that overcomes the of the filter and wrapper. complexity from an exponential
limitations of current detection - The proposed framework to a polynomial
systems. finds the most significant run- type as a function of the
- It integrates the knowledge (from time characteristics of malware. cardinality of the run-time
the intrinsic features sets of malware.
Characteristics of run-time behavior)
obtained by more than one filters into
the wrapper.
Narudin et al. (2016) This study presented an evaluation Proves the effectiveness Evaluation is limited to a certain
[28] using machine learning and efficiency of machine amount of malware, and few of
classifiers to detect mobile malware learning in a real mobile the approaches consider feature
effectively by selecting malware selection in the classification
the appropriate network features for operational environment. process to increase the result
inspection by the classifiers, - proves that machine learning accuracy.
as well as to determine the ideal classifiers can detect the latest
classifier based on true-positive rate malware.
(TPR) values.
Noor et al. (2018) Analysis Evasion Malware Sandbox - Effective in detecting the AEMS faces constraints in
[29] AEMS system, which possesses the variations in malware behavior offsetting the effects of evasion
capability of detecting the presence of within the sandbox and the proof based on timing differences and
an analysis evasion technique within of concept countermeasures, through the identification of the
the malware and can force implemented by AEMS are parent process.
it to exhibit its correct functionality effective against a large - it less scalable for handling a
inside the sandbox. proportion of common malware. large number of malware
- A novel technique for samples which generated every
detection of malware evasive day
behavior is Presented.
Talha et al. (2015) - Provides a new approach to assessing - APK Auditor is a learning- - Because it is a learning-based
[30] potential based, extensible, and mechanism, each
the maliciousness of Android lightweight system that provides application analysis affects the
applications by a new approach for Android malware detection process
calculating a statistical score through malware detection. positively
the requested - Helps digital investigators, and moreover updates the signature
permissions. likely Android users to check database.
- Application analysis is wholly whether or not applications are - The number of FP and FN
carried out on a central server, and the malicious. detections is high.
results retrieved by a web service
Ambusaidi et al. (2016) - A new filter-based feature selection - The proposed detection system The proposed feature selection
[31] method, in which theoretical analysis has achieved promising algorithms
of performance in detecting can only rank features in terms
mutual information is introduced to intrusions over computer of their relevance, but they
evaluate the networks. cannot reveal the best number of
dependence between features and -A flexible method for the features that are needed to train
output classes. problem of feature selection, a classifier.
-An (IDS), named Least Square FMIFS, is developed. -The proposed feature selection
Support Vector Machine based IDS - The proposed feature selection algorithm could be
(LSSVM-IDS), is built. algorithm is computationally further enhanced by optimizing
efficient when it applied to the the search strategy.
LSSVM-IDS.
Ali Mirza et al. (2018) - One of the contributions of this paper - An approach that applied on - Unable to compare the systems
[32] is energy efficiency, which is one of multiple security threats and can overall performance results with
the weakest areas of many antiviruses. identify not just known, but it any previously available study.
-The classification methodology can also predict unknown - They did not test the cloud-
proposed in this research prove the threats. based architecture against a
initial hypothesis of enhanced - Capable of managing a large large number of clients or a big
accuracy in malware identification number of requests coming from network.
multiple individual clients and
enterprise networks.
4
ICSINTESA 2019 IOP Publishing
Journal of Physics: Conference Series 1807 (2021) 012011 doi:10.1088/1742-6596/1807/1/012011
Tong & Yan (2017) Proposed a novel and hybrid approach The approach is efficient and -They need to continually gather
[33] for mobile malware detection which accurate for malware detection. new malware and benign apps in
outperforms other mobile app - The proposed approach can order to keep detection accuracy
detection methods with better detect different types of because new types of malware
detection accuracy rates regarding malware with higher accuracy continue to emerge.
different types of malware than existing methods. - Due to limited computing and
- The proposed approach is storage resources, it is not
simple. The data process based suitable to perform large scale
on simple algorithms with low data processing in the mobile
computation cost. phone.
The above seven latest malware detection technique was selected and analyzed. We then extracted
the relevant information and compared, and then finally a conclusion made.
3. Issues and Challenges
In this section, we present some of the issues and challenges identified in the field of study which is, for
example;
Stages Issues
Data Collection -Few of the approaches consider feature selection in the classification process to increase result accuracy.
- Continue emergence of malware.
Analysis -Some of the evaluations are limited to a certain amount of malware like anomaly-based approach.
- Some approaches could not reveal the best number of features that are needed to train a classifier.
Response - Scalability in handling a large number of malware samples.
-The high number of false positive and false negative.
- Need to gather new malware and make it benign continually.
- Limited computing and storage resources.
4. Conclusion
Malware has rapidly become a significant security threat for the computing community, which
becomes one of the reasons for most of the current security problems on the Internet. Although a
considerable amount of research effort has gone into malware detection, however, malicious code
remains a vital threat on the Internet today. Of recent, various Malware detection techniques and
approaches have been proposed to tackle these problems. Unfortunately, these techniques and
approaches have some shortcomings that deter them from eliminating the problem.
This paper extracts the shortcomings of the latest detection techniques for further analysis. The
outcome of our work shows that there is a need to reduce signature repository fit in lightweight devices
such as IoT sensors. Our future work is to deal with the problem.
Acknowledgments
This research conducted by the Pervasive Computing & Educational Technology Research Group. C-
ACT, Universiti Teknikal Malaysia Melaka (UTeM). Sekolah Tinggi Teknologi Bandung which has
provided research materials.
References
[1] A. Qamar, A. Karim, and V. Chang, “Mobile malware attacks: Review, taxonomy & future directions,” Future
Generation Computer Systems, vol. 97, pp. 887–909, Aug. (2019), doi: 10.1016/J.FUTURE.2019.03.007.
[2] J. Febrian, “Menjelajah Dunia dengan Google,” Penerbit Informatika, (2008).
[3] J. Febrian, “Google & Yahoo Secrets!,” Penerbit Informatika, (2007).
[4] M. R. K. Ariffin, M. A. Asbullah, and N. A. Abu, “Security Features of an Asymmetric Cryptosystem based on the
Diophantine Equation Hard Problem,” Mar. (2011).
[5] J. F. Rusdi, S. Salam, N. A. Abu, S. Sahib, M. Naseer, and A. A. Abdullah, “Drone Tracking Modelling Ontology
for Tourist Behavior,” Journal of Physics: Conference Series, vol. 1201, no. 1, p. 012032, May (2019), doi:
10.1088/1742-6596/1201/1/012032.
[6] A. Souri and R. Hosseini, “A state-of-the-art survey of malware detection approaches using data mining
techniques,” Human-centric Computing and Information Sciences, vol. 8, no. 1, p. 3, Dec. (2018), doi:
10.1186/s13673-018-0125-x.
[7] Jabez J and B. Muthukumar, “Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection
Approach,” Procedia - Procedia Computer Science, vol. 48, pp. 338–346, (2015), doi: 10.1016/j.procs.2015.04.191.
[8] D. Gavrilut, M. Cimpoesu, D. Anton, and L. Ciortuz, “Malware detection using machine learning,” in 2009
5
ICSINTESA 2019 IOP Publishing
Journal of Physics: Conference Series 1807 (2021) 012011 doi:10.1088/1742-6596/1807/1/012011
International Multiconference on Computer Science and Information Technology, 2009, pp. 735–741.
[9] N. Idika and A. P. Mathur, “A Survey of Malware Detection Techniques,” SERC Technical Reports, no. October, p.
48, (2007).
[10] ONT209, “Malware Detection Techniques Description | MalwareTips Community,” Malwaretips, 2013. [Online].
Available: https://malwaretips.com/threads/malware-detection-techniques-description.14028/. [Accessed: 31-Aug-
2019].
[11] M. A. Jerlin and K. Marimuthu, “A New Malware Detection System Using Machine Learning Techniques for API
Call Sequences,” Journal of Applied Security Research, vol. 13, no. 1, pp. 45–62, Jan. (2018), doi:
10.1080/19361610.2018.1387734.
[12] S. Alqurashi and O. Batarfi, “A Comparison of Malware Detection Techniques Based on Hid-den Markov Model,”
Journal of Information Security, vol. 7, pp. 215–223, (2016), doi: 10.4236/jis.2016.73017.
[13] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, and A. Hamzeh, “A survey on heuristic malware detection techniques,”
in The 5th Conference on Information and Knowledge Technology, 2013, pp. 113–120.
[14] M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant, “Semantics-Aware Malware Detection,” in 2005
IEEE Symposium on Security and Privacy (S&P’05), pp. 32–46.
[15] M. I. Abdullah Almarshad, M. M. Z E Mohammed, and A.-S. Khan Pathan, “Detecting Zero-day Polymorphic
Worms with Jaccard Similarity Algorithm,” (2016).
[16] J. F. Rusdi, S. Salam, N. A. Abu, B. Sunaryo, R. Taufiq, L. S. Muchlis, T. Septiana, K. Hamdi, Arianto, B. Ilman,
Desfitriady, F. R. Kodong, and A. V. Vitianingsih, “Dataset Smartphone Usage of International Tourist Behavior,”
Data in Brief, p. 104610, Oct. (2019), doi: 10.1016/j.dib.2019.104610.
[17] B. Amro, “Malware Detection Techniques for Mobile Devices,” International Journal of Mobile Network
Communications & Telematics, vol. 7, no. 4/5/6, pp. 01–10, Dec. (2017), doi: 10.5121/ijmnct.2017.7601.
[18] J. Febrian, “Buku Saku Tentang Pendidikan Tinggi di Indonesia,” Penerbit Informatika, (2000).
[19] J. F. Rusdi, S. Salam, N. A. Abu, T. G. Baktina, R. G. Hadiningrat, B. Sunaryo, A. Rusmartiana, W. Nashihuddin, P.
Fannya, F. Laurenty, N. Shanono, and R. Hardi, “ICT Research in Indonesia,” SciTech Framework, vol. 1, pp. 1–23,
(2019).
[20] P. Pongle and G. Chavan, “A survey: Attacks on RPL and 6LoWPAN in IoT,” in 2015 International Conference on
Pervasive Computing (ICPC), 2015, pp. 1–6.
[21] S. G. Kene and D. P. Theng, “A review on intrusion detection techniques for cloud computing and security
challenges,” in 2015 2nd International Conference on Electronics and Communication Systems (ICECS), 2015, pp.
227–232.
[22] P. D. Sawle and A. B. Gadicha, “Analysis of Malware Detection Techniques in Android,” (2014).
[23] G. A. N. Mohamed and N. B. Ithnin, “Survey on Representation Techniques for Malware Detection System,”
American Journal of Applied Sciences, vol. 14, no. 11, pp. 1049–1069, Nov. (2017), doi:
10.3844/ajassp.2017.1049.1069.
[24] V. Jyothsna and V. V. R. Prasad, “A Review of Anomaly based IntrusionDetection Systems,” International Journal
of Computer Applications, vol. 28, no. 7, (2011).
[25] A. Sari, “A Review of Anomaly Detection Systems in Cloud Networks and Survey of Cloud Security Measures in
Cloud Storage Applications,” Journal of Information Security, vol. 06, no. 02, pp. 142–154, Mar. (2015), doi:
10.4236/jis.2015.62015.
[26] N. M. Zamry, A. Zainal, and M. A. Rassam, “Unsupervised Anomaly Detection for Unlabelled Wireless Sensor
Networks Data,” (2018).
[27] S. Huda, R. Islam, J. Abawajy, J. Yearwood, M. M. Hassan, and G. Fortino, “A hybrid-multi filter-wrapper
framework to identify run-time behaviour for fast malware detection,” Future Generation Computer Systems, vol.
83, pp. 193–207, Jun. (2018), doi: 10.1016/J.FUTURE.2017.12.037.
[28] F. A. Narudin, A. Feizollah, N. B. Anuar, and A. Gani, “Evaluation of machine learning classifiers for mobile
malware detection,” Soft Computing, vol. 20, no. 1, pp. 343–357, Jan. (2016), doi: 10.1007/s00500-014-1511-6.
[29] M. Noor, H. Abbas, and W. Bin Shahid, “Countering cyber threats for industrial applications: An automated
approach for malware evasion detection and analysis,” Journal of Network and Computer Applications, vol. 103, pp.
249–261, Feb. (2018), doi: 10.1016/J.JNCA.2017.10.004.
[30] K. A. Talha, D. I. Alper, and C. Aydin, “APK Auditor: Permission-based Android malware detection system,”
Digital Investigation, vol. 13, pp. 1–14, Jun. (2015), doi: 10.1016/J.DIIN.2015.01.001.
[31] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an Intrusion Detection System Using a Filter-Based
Feature Selection Algorithm,” IEEE Transactions on Computers, vol. 65, no. 10, pp. 2986–2998, Oct. (2016), doi:
10.1109/TC.2016.2519914.
[32] Q. K. Ali Mirza, I. Awan, and M. Younas, “CloudIntell: An intelligent malware detection system,” Future
Generation Computer Systems, vol. 86, pp. 1042–1053, Sep. (2018), doi: 10.1016/J.FUTURE.2017.07.016.
[33] F. Tong and Z. Yan, “A hybrid approach of mobile malware detection in Android,” Journal of Parallel and
Distributed Computing, vol. 103, pp. 22–31, May (2017), doi: 10.1016/J.JPDC.2016.10.012.