ICCAKM conf paper
ICCAKM conf paper
ICCAKM conf paper
Abstract— Web applications are providing the front end to the So, it is require securing these web technologies and
web users and service providers to easily facilitate the on services with web users by few kinds of security techniques.
demand access of web services through IP. So, web is These groups of methods, standards, technologies to
repeatedly attracting the attackers to play with majority of strengthen the reliability, integrity, and assurance of web
web users from the remote end by exploiting its identity. Day contents at every step of web architecture are called the web
by day attackers are exploiting the new web vulnerabilities at security [3]. Researchers have proposed numerous web
any stage of web environment including client side, server side security approaches for detecting the ever increasing
or communication side. From the literature it has been number of web sites spreading malware via drive-by
identified that it is required to identify the newly emerging
downloads. Here, literature survey has been carried out to
attack vectors and also require an easily updatable detection
framework. So, in this paper firstly variants of frame jacking explore the existing work and identify their limitations.
vulnerabilities and its severity have been explored. Secondly, a The outline of the researcher paper is given as follows:
framework to identify the variants of frame jacking Section II describes the related work to Framejacking
vulnerabilities is proposed. Thereafter, the proposed Vulnerabilities in Web applications. In Section III, the
framework has been analysed on different attack vectors
various kinds of web attacks are given. Framjacking and
generated and identified from the standard open source
vulnerable projects. The log files generated at various stages of website malwares are given in section IV. Proposed work is
these vulnerable projects are scrutinized to test the accuracy of given in section V. Performance evaluation based on our
the developed framework as live dataset. It benefits to train technique is given in section VI. Conclusion of this research
proposed system for newly emerging attack vectors. Further, work is given in section VI.
to perform the depth study, same framework has also been
analysed on existing available dataset. It fits the framework II. LITERATURE REVIEW
accurately on existing standards. It is observed from the
validation of framework that the result of LogitBoost is more Wang, Z. et. al. [4] presented a distributed framework
accurate on both the datasets rather than the other VulAware that detects remote vulnerabilities automatically.
classification techniques including Naïve Bayes and J48.
This framework is faster and more robust when compared
Keywords—Framejacking, Clickjacking, Keyboard Strokejacking, with state-of-the-art techniques. Any individual node’s
Likejacking, Cursorjacking, Web Page Redirect accident can be handled because of its distributed
deployment. There are some weaknesses, one of which is its
I. INTRODUCTION ability to only detect the already known vulnerabilities
Web technology and web service are the mostly used now present in the database. Future direction would be to
days. A web technology represents the concept to put the perform automatic discovery of web application
computing services on the internet [1]. While the web vulnerabilities. Fang, Y. et al. [5] devised the DarkHunter
services represent the computing resources shared on framework that classifies varied fingerprints of the
internet. So, web users may access it. Here, web application automated tools. The objective is to utilize a deep learning
work as the graphical user interface which made up with the algorithm to find out and analyse payload features of
use of web technologies and providing the interface for web different tools. Authors also used a docker to create
user to access the web services easily. It can be accessed distributed scanner data collection architecture and
through IP, so web user at the remote area can access the framework was evaluated using actual data. Convolutional
computing services such as software, hardware, platform, Neural Network was utilized and experiments confirmed
infrastructure, etc. DarkHunter’s 94.6% precision rate of with a 95.0% recall
Accessing through network also have flip side due to rate.
various security loopholes during each and every end like at
the end user side, at the server side and also at the Moustafa, N. et. al. [6] introduced a framework to design a
communication side, also at the development side, etc. these threat intelligence technique for web attacks and which
loopholes at any step discussed is treated as vulnerability if involves four steps: 1) collection of web attack data through
it can be exploited by the attackers [2]. the crawling of websites and combining the network traffic
to represent this information as feature vectors, 2) Utilizing
the ARM algorithm (Association Rule Mining) to use of top secure coding practices. The other aim of this
dynamically extract vital features, 3) simulating the web initiative is to ensure client side code integrity at runtime. In
attack data using these extracted features and 4) Using the this system web applications would be secured against Cross
anomaly detection methodology to create the latest OGM Site Scripting, SQL injection as well as resource alteration
(Outlier Gaussian Mixture) technique for detection of attacks. The policy would comprise of content security
known and zero-day attacks. The two popular datasets are policy, principle of least privilege, sub resource integrity as
then used to evaluate the scheme- the UNSW-NB15 and the well as input-output sanitization. The attack prevention rate
Web Attack. When evaluations are done considering false of this work showed an average increase of 3.3 percent only
alarm rates and detection rates on original and the simulated and the code tampering attacks reported by integrity
web data, it was concluded that the proposed technique verification module.
could fare better than the four other machine learning
mechanisms. Clincy, V. et al. [7] discussed the crucial role Liu, X. et al. [14] developed a hybrid attack detection sensor
of a Web Application Firewall (WAF) as well as outlined a called OwlEye that is designed to protect from web layer
summary of traffic filtering models. It is based on positive code injection attacks, like XSS and SQL-injection. It is the
and negative policy-based attack detection models. Author Hidden Markov Model (HMM) based bi-directory scoring
concluded that the default configuration of a web server architecture design. HMM based model makes it dynamic,
may not end up vulnerabilities and this should be the focus thus saving time with fewer errors. It utilizes malicious as
of firewall. Future research would concentrate on well as benign traffic in model training. Presently, this
comparison of WAFs as well as the default security research mainly concentrated on detecting query strings; in
configurations, and the most suitable techniques for future it is proposed to include additional section to detect
mitigation. headers to improve the outcomes by detecting zero-day
attacks. Thomé, J. et al. [15] proposed a technique to detect
Appelt, D. et al. [8] created ML-Driven technique to analyze injection vulnerabilities in Java web applications at server
web application vulnerabilities automatically for SQL side code. It integrates security slicing with hybrid
injection attacks and developed ModSecurity tool for further constraint solving and automata based solving through
analysis. Proposed technique is based on concept of meta-heuristic search. Static analysis was used to retrieve
Machine Learning technique to learn the new attacks minimal program slices that were appropriate for web
patterns by checking if previously generated attacks were program’s security and for generation of attack scenarios.
blocked or bypassed by WAF. The results signify that ML- Further on hybrid constraint solving was applied for finding
Driven is useful in generating SQL injection attacks that out attack conditions’ appropriateness and vulnerabilities
may bypass web application firewall. Betarte, G. et al. [9] detection. On the basis of proposed work author
propounded an integrated technique to improve detection implemented JOACO tool. Future work would focus on
and accuracy of the Web application Firewall, ModSecurity. extension of vulnerability detection with help of popular
It is utilizing n-gram analysis and one-class classification Java Web frameworks like “Spring” and also to utilize
technique. The outcomes of this work are fare better than dynamic symbolic execution for further enhancements.
ModSecurity when it is set up with the OWASP Core Rule
Set. Author, presented future research on development of Stasinopoulos, A. et al. [16] developed an open-source tool
automated tools to train system for new dataset. COMM and Injection eXploiter (Commix) that
automatically detects command injection flaws on web
Jain, T. et al. [10] proposed integration of two approaches to applications. It may automatically generate attack vectors,
facilitate variety of vulnerability detection and its detection of vulnerabilities as well as exploitation. It handles
mitigation. One is the discovery through web application a wide range of exploitation scenarios like system and user
vulnerability scanners and combining them on same enumeration, custom headers, different authentication
framework. Second, customization of configuration rules mechanisms, attack vectors produced by programming
through the ModSecurity WAF for mitigation of the languages, tor networking. It could also find out many zero-
detected vulnerabilities. Kononov, D. et al. [11] proposed a day command injection vulnerabilities. It also appears pre-
security access control model for web applications. It is a installed in several security oriented operating systems like
path-based security model extending the original RBAC Kali Linux. D. Mitropoulos et al. [17] introduced a model
model. It is based on analysis of prohibited data flows from to investigate various defence strategies to protect against
high to low security levels. The authors outlined step by step code injection attacks. This model pointed out the major
procedure to apply the proposed model on real-world web flaws that allow the prevalence of the attacks and thus
application scenarios. offered a common perception to further study the available
and known defences. Various factors were considered to
Z. Guojun et al. [12] designed and developed the intelligent classify and study them on the basis of performance,
and dynamic Web Crawler that can stores the data accuracy, security, deployment and availability. The results
extraction rules in the database. It may load these rules inferred that plenty of defence mechanisms were flawed and
dynamically as per the target detection needs. To calculate had been tested poorly. The findings concluded with the
rule relevance it uses the TF-IDF method. Author proposed requisite know how of the working of these strategies could
this Web Crawler to detect the public vulnerabilities and easily be exploited by the attackers.
create awareness of their attack vectors. Anis, A., et al. [13]
Hassan, M. M. et al. [18] developed SAISAN, an automated
proposed a system that could assist developers in applying
LFI (Local File Inclusion) vulnerability detection model and
appropriate security processes at the client side through the
also implemented a tool based on it. Author’s experimental
results indicate that the tool that was tested on 265 real
world applications, was successful in identifying 113
vulnerabilities and was 88% accurate when compared with
the manual penetration testing outcome. Major issues found
out from this study is, the insecure application’s design and
careless coding practice especially in information retrieval
techniques. Future work would be improvement of
accuracy, addition of $_POST method for detection of LFI
vulnerability from the web application and addition of
features that shall detect many other web application
Fig. 2. Possible Variants of Frame Jacking Attack
vulnerabilities.
C. Keyboard “strokejacking”
IV. FRAMEJACKING: WEBSITE MALWARES In keyboard strokejacking, attacker tries to emulate a
Website malwares can be defined as snippets or small codes focused input field with targeted input field instead, getting
injected, tampered, or distributed over the web in order to the focus of the user. In fact, since the keyboard’s focus
intentionally cause harm or subvert the intended function of moves to the target element, this misleads the user into
the web system. Millions of malicious URLs are used as typing some vital data onto the target element as shown in
distribution channels to propagate malware all over the Figure 3.
Web. One of the categories of web site malware is frame-
Attacker’s page Hidden iframe within attacker’s page
jacking.
Framejacking is all about frames to overlays multiple
transparent or opaque frames to trick a user into clicking on Typing Game Bank Transfer
a button or link on another page. Variants of Framejacking Type whatever screen shows to 9540
Bank Account: ________
are clickjacking, likejacking, keyboard “strokejacking”,
cursorjacking, web page redirect as shown in Figure 2.
you 3062
Amount: ___________ USD
Xfpog95403poigr06=2kfpx
Transfer
[__________________________]
Fig. 3. Keyboard Strokejacking
Cursor for typing is visible in one text field but when user These web applications and environments can be exploited
started typing, typed keystrokes are entered into fake text through leveraging the web vulnerabilities in code as well as
field, which may come at position instantly or may be at environment also.
hidden or may working in background. Such types of attack It consists of six phases as Log File Repository, Pre-
are referred as Keystroke-jacking. processing of Log File, Fine-grained Log Feature Selection,
Feature Modelling, Classification, and Abnormal Behaviour
D. Cursorjacking
Analysis that is shown in Figure 6.
Another type is cursor jacking, in which attacker tries to
simulate multiple cursors in parallel with visible original
cursor. It simulates a fake cursor with original one through
CSS cursor property or JavaScript. For simulation of a Fine-grained
Log File Preprocessing
replica cursor icon to fool the users, JavaScript and CSS Log Feature
Repository of Log File
cursor property’s privilege is taken by the attacker as shown Selection
in figure 4.
cursor: none It works in six phases. First is the formation of Log File
Repository. Second, it’s pre-processing, third, selection of
fine grained features from the coarse grained features set.
Fig. 4. Cursorjacking: Fake Cursor Fourth is Feature Modelling of the selected features. Fifth is
its classification and sixth is behavioural analysis based on
E. Web Page Redirect extracted rules.
End user targets to hit on ‘click here’ hyperlink to visit
intended website. But, Attacker tricked this button and VI. EXHAUSTIVE PERFORMANCE EVALUATION
created it into two separate hyperlinks as shown in Figure 5. Comprehensive comparative result analysis has been done
to authenticate the validity of developed framework. Hence,
to achieve this aim both the Live Dataset (LD) and Standard
Dataset (SD) are explored to train and test the proposed
developed system.
Firstly, to generate the Live Dataset (LD) vulnerable web
applications project are configured on Apache web server.
Thereafter vulnerabilities configured in vulnerable web
applications projects have been exploited in client server
environment. These experimental exploitations generated
the different log files for benign and malicious request and
reply. These generated log files has been extracted for
Fig. 5. Snapshot of Redirection Attack vulnerability analysis as Live Dataset (LD).
Secondly, existing standard datasets (SD) such as Honeynet
If the part-1 button is clicked by end user, it will take him to and secrepo have been explored. It contains approx six Lakh
original website while clicking on part-2 button log entries with the file size of approx 1.3 GB. From which
unintentionally will redirect him to a fake website that is first 78665 log entries are selected and preprocessed to
intended by attacker. perform further analysis.
All the discussed attacks vectors are exploited the web users For cross validation initially 1st to 4th subset is used for
or server machines due to the vulnerability in technology training according to the proposed algebraic methods and 5th
stack like development language, or due to the nature of subset is used for testing that contains 15733 instances of a
forgiveness of human being. Whereas, server machine log file. The 5th subset has 1000 malware instances and
records the every executed process as evidence in registry, 14733 benign instances. After validation class label returns
log file, history file, etc. as evidences. Hence, these 914 Malware and 14401 benign.
evidences can be the source of attack detection. The result analysis and evaluation of the proposed work has
V. PROPOSED WORK been carried out to get the various evaluation measures such
as accuracy, False Positive Rate (FPR), False Negative Rate
The web technologies and services are highly programmable (FNR), etc. that are convenient to perform the comparative
environment that provides a front end to the web user to analysis with other systems. Overall obtained results in
work from remote end. It facilitates the dynamic update and percentage are compared and presented in Table 1.
deployment on the diverse range of new technologies and
services. The core components of the web communication TABLE 1. COMPARISON TABLE OF OBTAINED RESULTS IN
are its client side browser and its application environment. PERCENTAGE
Result (in percentage)
Parameters Naïve Bayes J48 LogitBoost Here, F1-score for LogitBoost classifier on both LD as well
SD LD SD LD SD LD as on SD is 0.979291412 and 0.985931805 respectively.
Accuracy 94.59 93.74 97.51 96.89 98.04 97.85 That is the highest or more towards the 1 rather Naïve Bayes
TPR 94.64 94.12 97.52 96.74 99.42 97.64 and J48 classifier. It shows that LogitBoost is more stable
FPR 8.91 9.91 5.13 5.33 5.09 5 rather other comparative classifiers.
Precision 94.62 95.32 97.46 97.68 97.78 98.22 Overall result comparison represented in table and graphs
Recall 94.66 93.22 96.84 96.21 99.42 97.64 concludes that LogitBoost Classifier is providing the better
results for developed framework in comparison of the J48
The precession for the standard as well as live dataset is and Naïve Bayes classifiers. Also shows that the all
compared and shown in Figure 7. parameters such as TPR, FPR, Precision, Recall and
accuracy of proposed framework are better than the other
comparative machine learning classifiers i.e. J48 and Naïve
Precision Graph Bayes.
100
Precesion in percentage
Standard
98 Dataset VII. CONCLUSION
96 Live
Dataset In Web applications are provided the front end to web users
94 and facilitated to access the web services. It uses the user
92 client side as well as server side technological stacks to
Naïve Bayes J48 LogitBoost communicate through IP. On the flip side, it is also a door to
the attacker to exploit attacks in the victim’s system. To
Classifier Techniques demonstrate the severity of security threats which may be
Fig. 7. Precision Curve to Evaluate the Proposed Framework posed web application attacks, web attack scenarios have
been developed and implemented.
The recall percentage for the standard as well as The proposed framework detects the web vulnerabilities
prepared dataset is compared and shown in Figure 8. including Framejacking, Clickjacking, Keyboard
“strokejacking”, Likejacking, Cursorjacking, Web Page
Redirect, Zero day vulnerabilities, etc. Few of the main
benefits of developed detection engine are as follows:
A static code scanner without interacting with humans
cannot prevent users from the vulnerabilities like code
injection. Whereas, detection engine can prevent novice end
user from newly encountered vulnerabilities and
automatically report it to expert team for future
enhancements.
Machine learning methods add extra wings in detection
engine for web application vulnerability analysis and
security.
Fig. 8. Recall Curve to Evaluate the Proposed Framework
It would be transformed into hardware product from the
software product working on application layer.
Further, F1-score has been calculated for more accurate The detection engine manages and bounds the system to
comparison in results of proposed framework. F1-score work within the security limit and reports the miscellaneous
represents the balance between the precision and the recall. activities to the experts.
F1-score for proposed framework at both the dataset include There are a few improvements that may be addressed in the
LD as well as SD is shown in Figure 9. future in the current work developed for web application
analysis. Firstly, ontology based analysis of web application
security with a focus on a set of concepts and relations
Comparison through F1- Score between these concepts. Secondly the privacy preservation
of the end user data is an important issue in analyzing web
0.99 Standard application attacks. Hence, a comprehensive strategy should
0.98 Dataset be developed to preserve the privacy of end user by
0.97
F1-Score