[go: up one dir, main page]

0% found this document useful (0 votes)
108 views93 pages

DEA-RNN A Hybrid Deep Learning Approach

Project

Uploaded by

mirthaami0607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views93 pages

DEA-RNN A Hybrid Deep Learning Approach

Project

Uploaded by

mirthaami0607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 93

DEA-RNN: A Hybrid Deep Learning Approach

for Cyberbullying Detection in Twitter


Social Media Platform

A PROJECT REPORT

in the partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

in

INFORMATION TECHNOLOGY

MAY 2023

1
BONAFIDE CERTIFICATE

2
TABLE OF CONTENTS

CHAPTE TITLE PAGE


R No

LIST OF FIGURES Vi

LIST OF ABBREVATIONS Vii

1 INTRODUCTION

1.1 About the Project

2 SYSTEM ANALYSIS

2.1 Existing system

2.2 Proposed system

3 REQUIREMENTS SPECIFICATION

3.1 Introduction

3.2 Hardware and Software specification

3.3 Technologies Used

3.3.1 Python

3.3.3.1 Introduction to Python

3.3.1.2 Working of Python

3
4.5 Design and Implementation Constraints

4.6 Other Nonfunctional Requirements

5 SYSTEM DESIGN

5.1 Architecture Diagram

6 SYSTEM DESIGN – DETAILED

6.1 Modules

6.2 Module explanation

7 CODING AND TESTING

7.1 Coding

7.2 Coding standards

7.3 Test procedure

7.4 Test data and output

SOURCE CODE

SNAP SHOTS

REFERENCES

LIST OF FIGURES

5.1 Sequence Diagram

5.2 Use Case Diagram

4
5.3 Activity Diagram

5.4 Collaboration Diagram:

LIST OF ABBREVATIONS

5
CNN Convolutional Neural Network

LCNN Lookup based Convolutional Neural Network

RNN Recurrent Neural Network

DEX Dalvik Executables

TCP Transmission Control Protocol

IP Internet Protocol

HTTP Hyper Text Transfer Protocol

ADT Android Development Tool

6
ABSTRACT

Cyberbullying (CB) has become increasingly prevalent in social media platforms.


With the popularity and widespread use of social media by individuals of all ages,
it is vital to make social media platforms safer from cyberbullying. This paper
presents a hybrid deep learning model, called DEA-RNN, to detect CB on Twitter
social media network. The proposed DEA-RNN model combines Elman type
Recurrent Neural Networks (RNN) with an optimized Dolphin Echolocation
Algorithm (DEA) for _netuning the Elman RNN's parameters and reducing
training time. We evaluated DEA-RNN thoroughly utilizing a dataset of 10000
tweets and compared its performance to those of state-of-the-art algorithms such as
Bi-directional long short term memory (Bi-LSTM), RNN, SVM, Multinomial
Naive Bayes (MNB), Random Forests (RF). The experimental results show that
DEA-RNN was found to be superior in all the scenarios. It outperformed the
considered existing approaches in detecting CB on Twitter platform. DEA-RNN
was more ef_cient in scenario 3, where it has achieved an average of 90.45%
accuracy, 89.52% precision, 88.98% recall, 89.25% F1-score, and 90.94%
speci_city.

7
CHAPTER 1

INTRODUCTION

Aim:

Cyberbullying (CB) has become increasingly prevalent in social media platforms.


With thepopularity and widespread use of social media by individuals of all ages, it is vital to
make social mediaplatforms safer from cyberbullying. This model presents a hybrid deep
learning model, called DEA-RNN,to detect CB on Twitter social media network.

Synopsis:
The proposed DEA-RNN model combines Elman typeRecurrent Neural Networks (RNN)
with an optimized Dolphin Echolocation Algorithm (DEA) for fine-tuningthe Elman RNN's
parameters and reducing training time. We evaluated DEA-RNN thoroughlyutilizing a dataset of
10000 tweets and compared its performance to those of state-of-the-art algorithmssuch as Bi-
directional long short term memory (Bi-LSTM), RNN, SVM, Multinomial Naive Bayes
(MNB),Random Forests (RF). The experimental results show that DEA-RNN was found to be
superior in allthe scenarios. It outperformed the considered existing approaches in detecting CB
on Twitter platform.DEA-RNN was more efficient in scenario, where it has achieved an average
of 90.45% accuracy, 89.52%precision, 88.98% recall, 89.25% F1-score, and 90.94% specificity.
.

8
CHAPTER 2

SYSTEM ANALYSIS

2.1 EXISTING SYSTEM

Cyberbullying detection within the Twitterplatform has largely been pursued


through tweet classificationand to a certain extent with topic modeling approaches.Text
classification based on supervised machine learning(ML) models are commonly used for
classifying tweets intobullying and non-bullying tweets. Supervised classifiers have low
performance in case the classlabels are unchangeable and are not relevant to the new events.
Also, it may be suitable only for a pre-determinedcollection of events, but it cannot successfully
handle tweets that change on . Topic modeling approaches have longbeen utilized as the medium
to extract the vital topics froma set of data to form the patterns or classes in the completedataset.
Although the concept is similar, the general unsupervisedtopic models cannot be efficient for
short texts, andhence specialized unsupervised short text topic models were employed. These
models effectively identify the trendingtopics from tweets and extract them for further
processing.These models help in leveraging the bidirectional processingto extract meaningful
topics. However, these unsupervisedmodels require extensive training to obtain sufficient
priorknowledge, which is not adequate in all cases.Consideringthese limitations, an efficient
tweet classificationapproach must be developed to bridge the gap between theclassifier and the
topic model so that the adaptability issignificantly proficient.
Problem Definition:
Supervised classifiers have low performance in case the classlabels are unchangeable and
are not relevant to the new events.Also, it may be suitable only for a pre-determinedcollection of
events, but it cannot successfully handle tweets that change on.

2.2PROPOSED SYSTEM

9
In this model, we propose a hybrid deep learning-basedapproach, called DEA-
RNN, which automatically detectsbullying from tweets. The DEA-RNN approach
combinesRecurrent Neural Networks (RNN) with animproved Dolphin Echolocation Algorithm
(DEA) for fine-tuningthe Elman RNN's parameters. DEA-RNN can handlethe dynamic nature of
short texts and can cope with thetopic models for the effective extraction of trending
topics.DEA-RNN outperformed the considered existing approachesin detecting cyberbullying on
the Twitter platform in all scenariosand with various evaluation metrics.

Advantage:
Propose DEA-RNN by combining the Elman type RNNand the improved DEA for optimal
classification oftweets;A new Twitter dataset is collected based on cyberbullyingkeywords for
evaluating the performance ofDEA-RNN and the existing methods; andThe efficiency of DEA-
RNN in recognizing and classifyingcyberbullying tweets is assessed using Twitterdatasets. The
thorough experimental results revealthat DEA-RNN outperforms other competing modelsin
terms of recall, precision, accuracy, F1 score, andspecificity

CHAPTER 3

REQUIREMENT SPECIFICATIONS

3.1 INTRODUCTION

Social media networks such as Facebook, Twitter, Flickr, and Instagram have
become the preferred online platforms for interaction and socialization among
people of all ages. While these platforms enable people to communicate and
interact in previously unthinkableways, they have also led to malevolent activities
such as cyber-bullying. Cyberbullying is a type of psychological abuse with a
signi_cant impact on society. Cyber-bullying events have been increasing mostly
among young people spending most of their time navigating between different
10
social media platforms. Particularly, social media networks such as Twitter and
Facebook are prone to CB because of their popularity and the anonymity that the
Internet provides to abusers. In India, for example, 14 percent of all harassment
occurs on Facebook and Twitter, with 37 percent of these incidents involving
youngsters [1]. Moreover, cyberbullying might lead to serious mental issues and
adverse mental health effects. Most suicides are due to the anxiety, depression,
stress, and social and emotional dif_culties from cyber-bullying events [2]_[4].
This motivates the need for an approach to identify cyberbullying in social media
messages (e.g., posts, tweets, and comments). In this article, we mainly focus on
the problem of cyberbullying detection on the Twitter platform. As cyberbullying
is becoming a prevalent problem in Twitter, the detection of cyberbullying events
from tweets and provisioning preventive measures are the primary tasks in battling
cyberbullying threats [5]. Therefore, there is a greater need to increase the research
on social networks-based CB in order to get greater insights and aid in the
development of effective tools and approaches to effectively combat cyberbullying
problem [6]. Manually monitoring and controlling cyberbullying on Twitter
platform is virtually impossible [7]. Furthermore, mining social media messages
for cyberbullying detection is quite dif_cult. For example, Twitter messages are
often brief, full of slang, and may include emojis, and gifs, which makes it
impossible to deduce individuals' intentions and meanings purely from social
media messages. Moreover, bullying can be dif_cult to detect if the bully uses
strategies like sarcasm or passive-aggressiveness to conceal it.

3.2 HARDWARE AND SOFTWARE SPECIFICATION

11
3.2.1 HARDWARE REQUIREMENTS

 Hard Disk : 500GB and Above


 RAM : 4GB and Above
 Processor : I3 and Above
 Webcam – 1

3.2.2 SOFTWARE REQUIREMENTS

 Operating System : Windows 10 (64 bit)


 Software : Python
 Tools : Anaconda.

3.3 TECHNOLOGIES USED


 Python

 Machine Learning

3.3.1 Python

Python is a widely used general-purpose, high level programming language. It was

initially designed by Guido van Rossum in 1991 and developed by Python Software Foundation.

It was mainly developed for emphasis on code readability, and its syntax allows programmers to

express concepts in fewer lines of code.

Python is a programming language that lets you work quickly and integrate systems more

efficiently.

It is used for:

12
 web development (server-side),
 software development,
 mathematics,
 System scripting.

What can Python do?

 Python can be used on a server to create web applications.


 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software development.

Why Python?

 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
 Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-orientated way or a functional way.

Good to know

 The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.
 Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases until
December 2008. At that time, the development team made the decision to release version
3.0, which contained a few relatively small but significant changes that were not
backward compatible with the 2.x versions. Python 2 and 3 are very similar, and some

13
features of Python 3 have been backported to Python 2. But in general, they remain not
quite compatible.
 Both Python 2 and 3 have continued to be maintained and developed, with periodic
release updates for both. As of this writing, the most recent versions available are 2.7.15
and 3.6.5. However, an official End Of Life date of January 1, 2020 has been established
for Python 2, after which time it will no longer be maintained.
 Python is still maintained by a core development team at the Institute, and Guido is still
in charge, having been given the title of BDFL (Benevolent Dictator For Life) by the
Python community. The name Python, by the way, derives not from the snake, but from
the British comedy troupe Monty Python’s Flying Circus, of which Guido was, and
presumably still is, a fan. It is common to find references to Monty Python sketches and
movies scattered throughout the Python documentation.
 It is possible to write Python in an Integrated Development Environment, such as
Thonny, Pycharm, Netbeans or Eclipse which are particularly useful when managing
larger collections of Python files.

Python Syntax compared to other programming languages

 Python was designed to for readability, and has some similarities to the English language
with influence from mathematics.
 Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets for
this purpose.

Python is Interpreted

 Many languages are compiled, meaning the source code you create needs to be translated
into machine code, the language of your computer’s processor, before it can be run.
Programs written in an interpreted language are passed straight to an interpreter that runs
them directly.

14
 This makes for a quicker development cycle because you just type in your code and run
it, without the intermediate compilation step.
 One potential downside to interpreted languages is execution speed. Programs that are
compiled into the native language of the computer processor tend to run more quickly
than interpreted programs. For some applications that are particularly computationally
intensive, like graphics processing or intense number crunching, this can be limiting.
 In practice, however, for most programs, the difference in execution speed is measured in
milliseconds, or seconds at most, and not appreciably noticeable to a human user. The
expediency of coding in an interpreted language is typically worth it for most
applications.
 For all its syntactical simplicity, Python supports most constructs that would be expected
in a very high-level language, including complex dynamic data types, structured and
functional programming, and object-oriented programming.
 Additionally, a very extensive library of classes and functions is available that provides
capability well beyond what is built into the language, such as database manipulation or
GUI programming.
 Python accomplishes what many programming languages don’t: the language itself is
simply designed, but it is very versatile in terms of what you can accomplish with it.

3.3.3 Machine Learning

Machine learning

Introduction:

Machine learning (ML) is the scientific study of algorithms and statistical


models that computer systems use to perform a specific task without using explicit
instructions, relying on patterns and inference instead. It is seen as a subset
of artificial intelligence. Machine learning algorithms build a mathematical

15
model based on sample data, known as "training data", in order to make
predictions or decisions without being explicitly programmed to perform the task.
Machine learning algorithms are used in a wide variety of applications, such
as email filtering and computer vision, where it is difficult or infeasible to develop
a conventional algorithm for effectively performing the task.

Machine learning is closely related to computational statistics, which focuses on


making predictions using computers. The study of mathematical
optimization delivers methods, theory and application domains to the field of
machine learning. Data mining is a field of study within machine learning, and
focuses on exploratory data analysis through learning. In its application across
business problems, machine learning is also referred to as predictive analytics.

Machine learning tasks:

Machine learning tasks are classified into several broad categories. In supervised
learning, the algorithm builds a mathematical model from a set of data that
contains both the inputs and the desired outputs. For example, if the task were
determining whether an image contained a certain object, the training data for a
supervised learning algorithm would include images with and without that object
(the input), and each image would have a label (the output) designating whether it
contained the object. In special cases, the input may be only partially available, or
restricted to special feedback. Semi algorithms develop mathematical models from
incomplete training data, where a portion of the sample input doesn't have labels.

Classification algorithms and regression algorithms are types of supervised


learning. Classification algorithms are used when the outputs are restricted to
a limited set of values. For a classification algorithm that filters emails, the input

16
would be an incoming email, and the output would be the name of the folder in
which to file the email. For an algorithm that identifies spam emails, the output
would be the prediction of either "spam" or "not spam", represented by
the Boolean values true and false. Regression algorithms are named for their
continuous outputs, meaning they may have any value within a range. Examples of
a continuous value are the temperature, length, or price of an object.

In unsupervised learning, the algorithm builds a mathematical model from a set of


data that contains only inputs and no desired output labels. Unsupervised learning
algorithms are used to find structure in the data, like grouping or clustering of data
points. Unsupervised learning can discover patterns in the data, and can group the
inputs into categories, as in feature learning. Dimensionality reduction is the
process of reducing the number of "features", or inputs, in a set of data.

Active learning algorithms access the desired outputs (training labels) for a limited
set of inputs based on a budget and optimize the choice of inputs for which it will
acquire training labels. When used interactively, these can be presented to a human
user for labeling. Reinforcement learning algorithms are given feedback in the
form of positive or negative reinforcement in a dynamic environment and are used
in autonomous vehicles or in learning to play a game against a human opponent.
Other specialized algorithms in machine learning include topic modeling, where
the computer program is given a set of natural language documents and finds other
documents that cover similar topics. Machine learning algorithms can be used to
find the unobservable probability density function in density
estimation problems. Meta learning algorithms learn their own inductive bias based
on previous experience. In developmental robotics, robot learning algorithms
generate their own sequences of learning experiences, also known as a curriculum,
to cumulatively acquire new skills through self-guided exploration and social
17
interaction with humans. These robots use guidance mechanisms such as active
learning, maturation, motor synergies, and imitation.

Types of learning algorithms:

The types of machine learning algorithms differ in their approach, the type of data
they input and output, and the type of task or problem that they are intended to
solve.

Supervised learning:
Supervised learning algorithms build a mathematical model of a set of data that
contains both the inputs and the desired outputs. The data is known as training
data, and consists of a set of training examples. Each training example has one or
more inputs and the desired output, also known as a supervisory signal. In the
mathematical model, each training example is represented by an array or vector,
sometimes called a feature vector, and the training data is represented by a matrix.
Through iterative optimization of an objective function, supervised learning
algorithms learn a function that can be used to predict the output associated with
new inputs. An optimal function will allow the algorithm to correctly determine the
output for inputs that were not a part of the training data. An algorithm that
improves the accuracy of its outputs or predictions over time is said to have
learned to perform that task.

Supervised learning algorithms include classification and regression. Classification


algorithms are used when the outputs are restricted to a limited set of values, and
regression algorithms are used when the outputs may have any numerical value
within a range. Similarity learning is an area of supervised machine learning
closely related to regression and classification, but the goal is to learn from

18
examples using a similarity function that measures how similar or related two
objects are. It has applications in ranking, recommendation systems, visual identity
tracking, face verification, and speaker verification.

In the case of semi-supervised learning algorithms, some of the training examples


are missing training labels, but they can nevertheless be used to improve the
quality of a model. In weakly supervised learning, the training labels are noisy,
limited, or imprecise; however, these labels are often cheaper to obtain, resulting in
larger effective training sets.

Unsupervised learning:
Unsupervised learning algorithms take a set of data that contains only inputs, and
find structure in the data, like grouping or clustering of data points. The
algorithms, therefore, learn from test data that has not been labeled, classified or
categorized. Instead of responding to feedback, unsupervised learning algorithms
identify commonalities in the data and react based on the presence or absence of
such commonalities in each new piece of data. A central application of
unsupervised learning is in the field of density estimation in statistics, though
unsupervised learning encompasses other domains involving summarizing and
explaining data features.

Cluster analysis is the assignment of a set of observations into subsets


(called clusters) so that observations within the same cluster are similar according
to one or more pre designated criteria, while observations drawn from different
clusters are dissimilar. Different clustering techniques make different assumptions
on the structure of the data, often defined by some similarity metric and evaluated,
for example, by internal compactness, or the similarity between members of the

19
same cluster, and separation, the difference between clusters. Other methods are
based on estimated density and graph connectivity.

Semi-supervised learning:

Semi-supervised learning falls between unsupervised learning (without any labeled


training data) and supervised learning (with completely labeled training data).
Many machine-learning researchers have found that unlabeled data, when used in
conjunction with a small amount of labeled data, can produce a considerable
improvement in learning accuracy.

20
CHAPTER 4

4.5 Design and Implementation Constraints

4.5.1 Constraints in Analysis


 Constraints as Informal Text

 Constraints as Operational Restrictions

 Constraints Integrated in Existing Model Concepts

 Constraints as a Separate Concept

 Constraints Implied by the Model Structure

4.5.2 Constraints in Design

 Determination of the Involved Classes

 Determination of the Involved Objects

 Determination of the Involved Actions

 Determination of the Require Clauses

 Global actions and Constraint Realization

4.5.3 Constraints in Implementation

A hierarchical structuring of relations may result in more classes and a more


complicated structure to implement. Therefore it is advisable to transform the hierarchical
relation structure to a simpler structure such as a classical flat one. It is rather
straightforward to transform the developed hierarchical model into a bipartite, flat model,
consisting of classes on the one hand and flat relations on the other. Flat relations are
preferred at the design level for reasons of simplicity and implementation ease. There is
no identity or functionality associated with a flat relation. A flat relation corresponds with
the relation concept of entity-relationship modeling and many object oriented methods.

21
4.6 Other Nonfunctional Requirements

4.6.1 Performance Requirements

The application at this side controls and communicates with the following three main general
components.

 embedded browser in charge of the navigation and accessing to the web service;

 Server Tier: The server side contains the main parts of the functionality of the proposed
architecture. The components at this tier are the following.

Web Server, Security Module, Server-Side Capturing Engine, Preprocessing Engine,


Database System, Verification Engine, Output Module.

4.6.2 Safety Requirements

1. The software may be safety-critical. If so, there are issues associated with its integrity level

2. The software may not be safety-critical although it forms part of a safety-critical system. For

example, software may simply log transactions.

3. If a system must be of a high integrity level and if the software is shown to be of that

integrity level, then the hardware must be at least of the same integrity level.

4. There is little point in producing 'perfect' code in some language if hardware and system

software (in widest sense) are not reliable.

5. If a computer system is to run software of a high integrity level then that system should not at

the same time accommodate software of a lower integrity level.

6. Systems with different requirements for safety levels must be separated.

22
7. Otherwise, the highest level of integrity required must be applied to all systems in the same

environment.

CHAPTER 5

5.1 Architecture Diagram:

23
Data Cleaning & Preprocessing
Data Collection

Feature Extraction & Feature Selection

Algorithms Implementation

Model Creation

Evaluation Performance
perperfor

Prediction

24
Fig: 5.1

5.2 Sequence Diagram:

A Sequence diagram is a kind of interaction diagram that shows how processes operate
with one another and in what order. It is a construct of Message Sequence diagrams are
sometimes called event diagrams, event sceneries and timing diagram.

25
USER WHATSAPP SUPERVISOR

Creating

Data Extraction Training data


User output

Testing data

Apply

ML Techniques

validate
Cyberbullying Predict value

Detection

26
5.3 Use Case Diagram:

Unified Modeling Language (UML) is a standardized general-purpose modeling


language in the field of software engineering. The standard is managed and was created by the
Object Management Group. UML includes a set of graphic notation techniques to create visual
models of software intensive systems. This language is used to specify, visualize, modify,
construct and document the artifacts of an object oriented software intensive system under
development.

5.3.1. USECASE DIAGRAM


A Use case Diagram is used to present a graphical overview of the functionality provided
by a system in terms of actors, their goals and any dependencies between those use cases.

Use case diagram consists of two parts:

Use case: A use case describes a sequence of actions that provided something of measurable
value to an actor and is drawn as a horizontal ellipse.

Actor: An actor is a person, organization or external system that plays a role in one or more
interaction with the system.

27
28
GET DATASET

WHATSAPP

EXTRACTED DATASET

APPLY MACHINE LEARNING


TECHNIQUES

VALIDATION

CYBERBULLING

DETECTION

29
5.4 Activity Diagram:

Activity diagram is a graphical representation of workflows of stepwise activities and


actions with support for choice, iteration and concurrency. An activity diagram shows the overall
flow of control.

The most important shape types:

 Rounded rectangles represent activities.


 Diamonds represent decisions.
 Bars represent the start or end of concurrent activities.
 A black circle represents the start of the workflow.
 An encircled circle represents the end of the workflow.

30
DATASET

DATA EXTRACTION

MACHINCE LEARNING

MODEL CREATION

USER WHATSAPP

RESULT
31
5.5 Collaboration Diagram:
UML Collaboration Diagrams illustrate the relationship and interaction between

software objects. They require use cases, system operation contracts and domain model to

already exist. The collaboration diagram illustrates messages being sent between classes and

objects.

32
GET DATASET
EXTRACTED DATA
GET DATASET

Extracted dataset
collected from user

GET DATASET
TRAINING DATASET MACHINCE LEARNING

Pass output to user Get input from user

Get user input Cyberbullying Detection


WHATSAPP
Show output

33
CHAPTER 6

6.1 MODULES

 Dataset Collection
 Data Cleaning & Preprocessing
 Algorithm Implementation
 Prediction

6.2 MODULE EXPLANATION:

6.2.1 Dataset Collection:

Collect the dataset from Kaggle.com.

Data Engineering:

The data cleansing and pre-processing phase contain three sub-phases. This process is
performed on the raw tweet dataset to form the finalized data as described in the previous
dataset. In the first sub-phase, noise removal such as URL removal, hashtag/mentions removal,
punctuation/ symbol removal, and emoticon transformation processes are performed. In the
second sub-phase, Out of Vocabulary Cleansing such as spell checking, acronym expansion,
slang modification, elongated (repeated Characters removal) are performed. In the finalsub-
phase, tweet transformations such as lower-case conversion, stemming,word segmentation
(tokenization), and stop word filtering are conducted. These subphases are performed to enhance
the tweets and improve feature extraction and classification accuracy.

Algorithm:

Support Vector Machine (SVM) is a supervised machine learning algorithm which can be
used for either classification or regression challenges. However, it is mostly used in classification

34
problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is
number of features you have) with the value of each feature being the value of a particular
coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two
classes very well .

Logistic regression is a Machine Learning classification algorithm that is used to predict


the probability of certain classes based on some dependent variables. In short, the logistic
regression model computes a sum of the input features (in most cases, there is a bias term), and
calculates the logistic of the result.

Prediction:

In the implementation and the experiments congurations, with some required


libraries.The experimental evaluations are carried out on a personal system with
configurations.The preprocessing steps are performed as proposed in using the NLTK Python
package. The input dataset is divided into training and testing datasets. For the evaluation, it is
also classified into three different scenarios 60:40% , 70:30% , and 90:10% . The evaluation
metrics are chosen to display the best performance of the tweet classification of each method.
The prediction results of cyberbullying are validated based on various input dataset scenarios
60:40% , 70:30% , and 90:10% . The performance evaluation is carried out in terms of the
aforesaid metrics.

CHAPTER 7
CODING AND TESTING

7.1 CODING

35
Once the design aspect of the system is finalizes the system enters into the coding and

testing phase. The coding phase brings the actual system into action by converting the design of

the system into the code in a given programming language. Therefore, a good coding style has to

be taken whenever changes are required it easily screwed into the system.

7.2 CODING STANDARDS

Coding standards are guidelines to programming that focuses on the physical structure and

appearance of the program. They make the code easier to read, understand and maintain. This

phase of the system actually implements the blueprint developed during the design phase. The

coding specification should be in such a way that any programmer must be able to understand the

code and can bring about changes whenever felt necessary. Some of the standard needed to

achieve the above-mentioned objectives are as follows:

Program should be simple, clear and easy to understand.

Naming conventions

Value conventions

Script and comment procedure

Message box format

Exception and error handling

7.2.1 NAMING CONVENTIONS

Naming conventions of classes, data member, member functions, procedures etc., should be

self-descriptive. One should even get the meaning and scope of the variable by its name. The
36
conventions are adopted for easy understanding of the intended message by the user. So it is

customary to follow the conventions. These conventions are as follows:

Class names

Class names are problem domain equivalence and begin with capital letter and have

mixed cases.

Member Function and Data Member name

Member function and data member name begins with a lowercase letter with each

subsequent letters of the new words in uppercase and the rest of letters in lowercase.

7.2.2 VALUE CONVENTIONS

Value conventions ensure values for variable at any point of time. This involves the

following:

 Proper default values for the variables.

 Proper validation of values in the field.

 Proper documentation of flag values.

7.2.3 SCRIPT WRITING AND COMMENTING STANDARD

37
Script writing is an art in which indentation is utmost important. Conditional and looping

statements are to be properly aligned to facilitate easy understanding. Comments are included to

minimize the number of surprises that could occur when going through the code.

7.2.4 MESSAGE BOX FORMAT

When something has to be prompted to the user, he must be able to understand it properly.

To achieve this, a specific format has been adopted in displaying messages to the user. They are

as follows:

 X – User has performed illegal operation.

 ! – Information to the user.

7.3 TEST PROCEDURE

SYSTEM TESTING
Testing is performed to identify errors. It is used for quality assurance. Testing is

an integral part of the entire development and maintenance process. The goal of the testing

during phase is to verify that the specification has been accurately and completely incorporated

into the design, as well as to ensure the correctness of the design itself. For example the design

must not have any logic faults in the design is detected before coding commences, otherwise the

cost of fixing the faults will be considerably higher as reflected. Detection of design faults can be

achieved by means of inspection as well as walkthrough.

Testing is one of the important steps in the software development phase. Testing checks for

the errors, as a whole of the project testing involves the following test cases:

 Static analysis is used to investigate the structural properties of the Source code.

38
 Dynamic testing is used to investigate the behavior of the source code by executing the

program on the test data.

7.4 TEST DATA AND OUTPUT

7.4.1 UNIT TESTING

Unit testing is conducted to verify the functional performance of each modular

component of the software. Unit testing focuses on the smallest unit of the software design (i.e.),

the module. The white-box testing techniques were heavily employed for unit testing.

7.4.2 FUNCTIONAL TESTS

Functional test cases involved exercising the code with nominal input values for

which the expected results are known, as well as boundary values and special values, such as

logically related inputs, files of identical elements, and empty files.

Three types of tests in Functional test:

 Performance Test

 Stress Test

 Structure Test

7.4.3 PERFORMANCE TEST

It determines the amount of execution time spent in various parts of the unit, program

throughput, and response time and device utilization by the program unit.

7.4.4 STRESS TEST


39
Stress Test is those test designed to intentionally break the unit. A Great deal can be

learned about the strength and limitations of a program by examining the manner in which a

programmer in which a program unit breaks.

7.4.5 STRUCTURED TEST

Structure Tests are concerned with exercising the internal logic of a program and

traversing particular execution paths. The way in which White-Box test strategy was employed

to ensure that the test cases could Guarantee that all independent paths within a module have

been have been exercised at least once.

 Exercise all logical decisions on their true or false sides.

 Execute all loops at their boundaries and within their operational bounds.

 Exercise internal data structures to assure their validity.

 Checking attributes for their correctness.

 Handling end of file condition, I/O errors, buffer problems and textual errors in

output information

7.4.6 INTEGRATION TESTING

Integration testing is a systematic technique for construction the program structure

while at the same time conducting tests to uncover errors associated with interfacing. i.e.,

integration testing is the complete testing of the set of modules which makes up the product. The

objective is to take untested modules and build a program structure tester should identify critical

modules. Critical modules should be tested as early as possible. One approach is to wait until all

the units have passed testing, and then combine them and then tested. This approach is evolved

from unstructured testing of small programs. Another strategy is to construct the product in

40
increments of tested units. A small set of modules are integrated together and tested, to which

another module is added and tested in combination. And so on. The advantages of this approach

are that, interface dispenses can be easily found and corrected.

The major error that was faced during the project is linking error. When all the

modules are combined the link is not set properly with all support files. Then we checked out for

interconnection and the links. Errors are localized to the new module and its

intercommunications. The product development can be staged, and modules integrated in as they

complete unit testing. Testing is completed when the last module is integrated and tested.

7.5 TESTING TECHNIQUES / TESTING STRATERGIES

7.5.1 TESTING

Testing is a process of executing a program with the intent of finding an error. A good

test case is one that has a high probability of finding an as-yet –undiscovered error. A successful

test is one that uncovers an as-yet- undiscovered error. System testing is the stage of

implementation, which is aimed at ensuring that the system works accurately and efficiently as

expected before live operation commences. It verifies that the whole set of programs hang

together. System testing requires a test consists of several key activities and steps for run

program, string, system and is important in adopting a successful new system. This is the last

chance to detect and correct errors before the system is installed for user acceptance testing.

The software testing process commences once the program is created and the

documentation and related data structures are designed. Software testing is essential for

correcting errors. Otherwise the program or the project is not said to be complete. Software

testing is the critical element of software quality assurance and represents the ultimate the review
41
of specification design and coding. Testing is the process of executing the program with the

intent of finding the error. A good test case design is one that as a probability of finding an yet

undiscovered error. A successful test is one that uncovers an yet undiscovered error. Any

engineering product can be tested in one of the two ways:

7.5.1.1 WHITE BOX TESTING

This testing is also called as Glass box testing. In this testing, by knowing the

specific functions that a product has been design to perform test can be conducted that

demonstrate each function is fully operational at the same time searching for errors in each

function. It is a test case design method that uses the control structure of the procedural design to

derive test cases. Basis path testing is a white box testing.

Basis path testing:

 Flow graph notation

 Cyclometric complexity

 Deriving test cases

 Graph matrices Control

7.5.1.2 BLACK BOX TESTING

In this testing by knowing the internal operation of a product, test can be

conducted to ensure that “all gears mesh”, that is the internal operation performs according to

specification and all internal components have been adequately exercised. It fundamentally

focuses on the functional requirements of the software.

42
The steps involved in black box test case design are:

 Graph based testing methods

 Equivalence partitioning

 Boundary value analysis

 Comparison testing

7.5.2 SOFTWARE TESTING STRATEGIES:

A software testing strategy provides a road map for the software developer. Testing is a

set activity that can be planned in advance and conducted systematically. For this reason a

template for software testing a set of steps into which we can place specific test case design

methods should be strategy should have the following characteristics:

 Testing begins at the module level and works “outward” toward the integration of

the entire computer based system.

 Different testing techniques are appropriate at different points in time.

 The developer of the software and an independent test group conducts testing.

 Testing and Debugging are different activities but debugging must be

accommodated in any testing strategy.

7.5.2.1 INTEGRATION TESTING:

Integration testing is a systematic technique for constructing the program

structure while at the same time conducting tests to uncover errors associated with. Individual

modules, which are highly prone to interface errors, should not be assumed to work instantly

when we put them together. The problem of course, is “putting them together”- interfacing.

There may be the chances of data lost across on another’s sub functions, when combined may

43
not produce the desired major function; individually acceptable impression may be magnified to

unacceptable levels; global data structures can present problems.

7.5.2.2 PROGRAM TESTING:

The logical and syntax errors have been pointed out by program testing. A

syntax error is an error in a program statement that in violates one or more rules of the language

in which it is written. An improperly defined field dimension or omitted keywords are common

syntax error. These errors are shown through error messages generated by the computer. A logic

error on the other hand deals with the incorrect data fields, out-off-range items and invalid

combinations. Since the compiler s will not deduct logical error, the programmer must examine

the output. Condition testing exercises the logical conditions contained in a module. The possible

types of elements in a condition include a Boolean operator, Boolean variable, a pair of Boolean

parentheses A relational operator or on arithmetic expression. Condition testing method focuses

on testing each condition in the program the purpose of condition test is to deduct not only

errors in the condition of a program but also other a errors in the program.

7.5.2.3 SECURITY TESTING:

Security testing attempts to verify the protection mechanisms built in to a system well, in

fact, protect it from improper penetration. The system security must be tested for invulnerability

from frontal attack must also be tested for invulnerability from rear attack. During security, the

tester places the role of individual who desires to penetrate system.

44
7.5.2.4 VALIDATION TESTING

At the culmination of integration testing, software is completely assembled as a

package. Interfacing errors have been uncovered and corrected and a final series of software test-

validation testing begins. Validation testing can be defined in many ways, but a simple definition

is that validation succeeds when the software functions in manner that is reasonably expected by

the customer. Software validation is achieved through a series of black box tests that

demonstrate conformity with requirement. After validation test has been conducted, one of two

conditions exists.

* The function or performance characteristics confirm to specifications and are accepted.

* A validation from specification is uncovered and a deficiency created.

Deviation or errors discovered at this step in this project is corrected prior to completion

of the project with the help of the user by negotiating to establish a method for resolving

deficiencies. Thus the proposed system under consideration has been tested by using validation

testing and found to be working satisfactorily. Though there were deficiencies in the system they

were not catastrophic

7.5.2.5 USER ACCEPTANCE TESTING

User acceptance of the system is key factor for the success of any system. The system

under consideration is tested for user acceptance by constantly keeping in touch with prospective

system and user at the time of developing and making changes whenever required. This is done

in regarding to the following points.

 Input screen design.

45
 Output screen design.
Source Code

#!/usr/bin/env python

# coding: utf-8

# In[60]:

import numpy as np

# In[61]:

import pandas as pd

# In[62]:

46
import matplotlib.pyplot as plt

# In[63]:

import seaborn as sns

# In[64]:

import warnings

warnings.filterwarnings('ignore')

# In[65]:

47
data = pd.read_csv(r'data/labeled_data.csv')

# In[66]:

data.head()

# In[67]:

data.shape

48
# In[68]:

data.info()

# In[69]:

data.describe().T

# In[70]:

data.isnull().sum()
49
# In[71]:

data.filter(items=['class', 'tweet'])

# In[72]:

sns.set(rc={'figure.figsize':(10,6)})

sns.countplot(x = 'class',data = data)

# In[75]:

50
print(data)

# # Data Engineering

# 1 ) Case conversion

# 2 ) Removing special characters

# 3 ) Removing shorthands

# 4 ) Removing stopwords

# 5 ) Removing links

# 6 ) Removing accents

# 7 ) Normalize spaces
51
# In[76]:

import re

import unidecode

from autocorrect import Speller

from wordcloud import WordCloud, STOPWORDS

from textblob import TextBlob

from nltk.corpus import stopwords

# In[77]:

def case_convert():

data.tweet = [i.lower() for i in data.tweet.values]


52
def remove_specials():

data.tweet = [re.sub(r"[^a-zA-Z]"," ",text) for text in data.tweet.values]

def remove_shorthands():

CONTRACTION_MAP = {

"ain't": "is not",

"aren't": "are not",

"can't": "cannot",

"can't've": "cannot have",

"'cause": "because",

"could've": "could have",

"couldn't": "could not",

"couldn't've": "could not have",

"didn't": "did not",

"doesn't": "does not",

"don't": "do not",

"hadn't": "had not",

"hadn't've": "had not have",


53
"hasn't": "has not",

"haven't": "have not",

"he'd": "he would",

"he'd've": "he would have",

"he'll": "he will",

"he'll've": "he he will have",

"he's": "he is",

"how'd": "how did",

"how'd'y": "how do you",

"how'll": "how will",

"how's": "how is",

"i'd": "i would",

"i'd've": "i would have",

"i'll": "i will",

"i'll've": "i will have",

"i'm": "i am",

"i've": "i have",

"isn't": "is not",

"it'd": "it would",


54
"it'd've": "it would have",

"it'll": "it will",

"it'll've": "it will have",

"it's": "it is",

"let's": "let us",

"ma'am": "madam",

"mayn't": "may not",

"might've": "might have",

"mightn't": "might not",

"mightn't've": "might not have",

"must've": "must have",

"mustn't": "must not",

"mustn't've": "must not have",

"needn't": "need not",

"needn't've": "need not have",

"o'clock": "of the clock",

"oughtn't": "ought not",

"oughtn't've": "ought not have",

"shan't": "shall not",


55
"sha'n't": "shall not",

"shan't've": "shall not have",

"she'd": "she would",

"she'd've": "she would have",

"she'll": "she will",

"she'll've": "she will have",

"she's": "she is",

"should've": "should have",

"shouldn't": "should not",

"shouldn't've": "should not have",

"so've": "so have",

"so's": "so as",

"that'd": "that would",

"that'd've": "that would have",

"that's": "that is",

"there'd": "there would",

"there'd've": "there would have",

"there's": "there is",

"they'd": "they would",


56
"they'd've": "they would have",

"they'll": "they will",

"they'll've": "they will have",

"they're": "they are",

"they've": "they have",

"to've": "to have",

"wasn't": "was not",

"we'd": "we would",

"we'd've": "we would have",

"we'll": "we will",

"we'll've": "we will have",

"we're": "we are",

"we've": "we have",

"weren't": "were not",

"what'll": "what will",

"what'll've": "what will have",

"what're": "what are",

"what's": "what is",

"what've": "what have",


57
"when's": "when is",

"when've": "when have",

"where'd": "where did",

"where's": "where is",

"where've": "where have",

"who'll": "who will",

"who'll've": "who will have",

"who's": "who is",

"who've": "who have",

"why's": "why is",

"why've": "why have",

"will've": "will have",

"won't": "will not",

"won't've": "will not have",

"would've": "would have",

"wouldn't": "would not",

"wouldn't've": "would not have",

"y'all": "you all",

"y'all'd": "you all would",


58
"y'all'd've": "you all would have",

"y'all're": "you all are",

"y'all've": "you all have",

"you'd": "you would",

"you'd've": "you would have",

"you'll": "you will",

"you'll've": "you will have",

"you're": "you are",

"you've": "you have"

texts = []

for text in data.tweet.values:

string = ""

for word in text.split(" "):

if word.strip() in list(CONTRACTION_MAP.keys()):

string = string + " " + CONTRACTION_MAP[word]

else:

string = string + " " + word

texts.append(string.strip())
59
data.tweet = texts

def remove_stopwords():

texts = []

stopwords_list = stopwords.words('english')

for item in data.tweet.values:

string = ""

for word in item.split(" "):

if word.strip() in stopwords_list:

continue

else:

string = string + " " + word

texts.append(string)

data.tweet = texts

def remove_links():

texts = []

for text in data.tweet.values:

remove_https = re.sub(r'http\S+', '', text)


60
remove_com = re.sub(r"\ [A-Za-z]*\.com", " ", remove_https)

texts.append(remove_com)

data.tweet = texts

def remove_accents():

data.tweet = [unidecode.unidecode(text) for text in data.tweet.values]

def normalize_spaces():

data.tweet = [re.sub(r"\s+"," ",text) for text in data.tweet.values]

case_convert()

remove_links()

# remove_shorthands()

remove_accents()

remove_specials()

remove_stopwords()

normalize_spaces()

print(data)
61
# # Word Cloud Visualization

# In[82]:

data_hate = data[data['class']==0]

data_offen = data[data['class']==1]

data_neither = data[data['class']==2]

# # Abusive words

# In[83]:

text = " ".join(i for i in data_hate.tweet)

stopwords = set(STOPWORDS)
62
wordcloud = WordCloud(stopwords=stopwords,
background_color="white").generate(text)

plt.figure( figsize=(15,10))

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis("off")

plt.show()

# # offensive language

# In[84]:

text = " ".join(i for i in data_offen.tweet)

stopwords = set(STOPWORDS)

wordcloud = WordCloud(stopwords=stopwords,
background_color="white").generate(text)

plt.figure( figsize=(15,10))

plt.imshow(wordcloud, interpolation='bilinear')

63
plt.axis("off")

plt.show()

# # normal

# In[85]:

text = " ".join(i for i in data_neither.tweet)

stopwords = set(STOPWORDS)

wordcloud = WordCloud(stopwords=stopwords,
background_color="white").generate(text)

plt.figure( figsize=(15,10))

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis("off")

plt.show()

64
# In[86]:

X=data["tweet"]

Y=data["class"]

# In[87]:

import pickle

from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer()

X = cv.fit_transform(X)

65
# In[88]:

pickle.dump(cv, open("models/count_vectorizer.pickle", "wb"))

# In[89]:

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20,


random_state = 42, stratify = data['class'] )

# In[90]:

66
print("X_train shape:", X_train.shape)

print("X_test shape:", X_test.shape)

print("Y_train shape:", Y_train.shape)

print("Y_test shape:", Y_test.shape)

# In[91]:

# Logistic Regression Algorithm

from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression(random_state = 42)

logreg.fit(X_train, Y_train)

# # Support Vector Classifier Algorithm

# In[93]:

67
# Support Vector Classifier Algorithm

from sklearn.svm import SVC

svc = SVC(kernel = 'linear', random_state = 42)

svc.fit(X_train, Y_train)

# In[97]:

Y_pred_logreg = logreg.predict(X_test)

Y_pred_svc = svc.predict(X_test)

# In[98]:

68
from sklearn.metrics import accuracy_score

accuracy_logreg = accuracy_score(Y_test, Y_pred_logreg)

accuracy_svc = accuracy_score(Y_test, Y_pred_svc)

# In[99]:

print("Logistic Regression: " + str(accuracy_logreg * 100))

print("Support Vector Classifier: " + str(accuracy_svc * 100))

# In[100]:

a=accuracy_logreg * 100

c=accuracy_svc * 100

69
# In[113]:

# Classification report

from sklearn.metrics import classification_report

print(classification_report(Y_test, Y_pred_svc))

# # Save the model as a pickle

# In[117]:

import joblib

# Save the model as a pickle in a file

joblib.dump(svc, 'models/svm_model.pkl')

70
# Load the model from the file

svm_from_joblib = joblib.load('models/svm_model.pkl')

# Use the loaded model to make predictions

svm_from_joblib.predict(X_test)

# # Abusive contents

# In[118]:

message=["Jackies a retard #blondeproblems At least I can make a grilled


cheese!"]

vect = cv.transform(message).toarray()

my_prediction = logreg.predict(vect)

print(my_prediction)

71
# In[119]:

message=["cant you see these hoes wont change"]

vect = cv.transform(message).toarray()

my_prediction = logreg.predict(vect)

print(my_prediction)

# # Normal

# In[120]:

message=["Peel up peel up bring it back up rewind back where I'm from they
move Shaq from the line"]

72
vect = cv.transform(message).toarray()

my_prediction = logreg.predict(vect)

print(my_prediction)

# # # Social_media - whatsapp chat

# # Sender

# In[ ]:

from datetime import datetime

import pywhatkit as pwt

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import smtplib
73
import time

import joblib

import json

import warnings

warnings.filterwarnings('ignore')

import pandas as pd

import pickle

from twilio.rest import Client

account_sid = 'AC935e8128c0f849fe32f9fea8267a58d4'

auth_token = '4e326d3a6983bc0f22cc3ec26f0eb57e'

client = Client(account_sid, auth_token)

now = datetime.now()

dt_string = now.strftime("%d/%m/%Y %H:%M:%S")

date=now.strftime("%d/%m/%Y")

time_now=now.strftime("%H:%M:%S")

hour= now.strftime("%H")
74
minute = now.strftime("%M")

second = now.strftime("%S")

hour=round(int(hour))

minute=round(int(minute))

social_minute=int(minute)+1

message = ""

body =[message]

to_contact = "+91"

pwt.sendwhatmsg(to_contact,message,int(hour),int(social_minute))

cvf=pickle.load(open("models/count_vectorizer.pickle", 'rb'))

vect = cvf.transform(body).toarray()

svm_from_joblib = joblib.load('models/svm_model.pkl')

my_prediction = svm_from_joblib.predict(vect)

if my_prediction == [0]:
75
mess = "abusive"

list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
message),"Prediction":mess}

result=json.dumps(list_of_history)

message = client.messages.create(

from_='+19402673979',

body = result,

to = "")

elif my_prediction == [1]:

mess = "abusive"

list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
message),"Prediction":mess}
76
result=json.dumps(list_of_history)

message = client.messages.create(

from_='+19402673979',

body = result,

to = "")

elif my_prediction == [2]:

mess = "normal"

# # Receiver

# In[133]:

from selenium import webdriver

77
webdriver =
webdriver.Chrome("/home/hwuser/Downloads/chromedriver/chromedriver")

#webdriver.get("https://web.whatsapp.com")

#!pip install webdriver-manager

from selenium import webdriver

from webdriver_manager.chrome import ChromeDriverManager

import time

import datetime as dt

import json

import os

import requests

import shutil

import pickle

from PIL import Image

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support import expected_conditions as EC


78
from selenium.webdriver.common.keys import Keys

from selenium.webdriver.support.ui import WebDriverWait

from selenium.common.exceptions import NoSuchElementException,


ElementNotVisibleException

from selenium.webdriver.common.alert import Alert

from selenium.webdriver.common.action_chains import ActionChains

from selenium.common.exceptions import TimeoutException

from selenium.webdriver.chrome.options import Options

from urllib.parse import urlencode

from bs4 import BeautifulSoup

class WhatsAppElements:

search = (By.CSS_SELECTOR, "#side > div.uwk68 > div > div >
div._16C8p > div > div._13NKt.copyable-text.selectable-text")

class WhatsApp:

browser = None

timeout = 10

def __init__(self, wait, screenshot=None, session=None):

79
self.browser = webdriver.Chrome(ChromeDriverManager().install())

self.browser.get("https://web.whatsapp.com/")

WebDriverWait(self.browser,wait).until(

EC.presence_of_element_located(WhatsAppElements.search))

def goto_main(self):

try:

self.browser.refresh()

Alert(self.browser).accept()

except Exception as e:

print(e)

WebDriverWait(self.browser,
self.timeout).until(EC.presence_of_element_located(

WhatsAppElements.search))

def unread_usernames(self, scrolls=100):

self.goto_main()

initial = 10

usernames = []

80
for i in range(0, scrolls):

self.browser.execute_script("document.getElementById('pane-
side').scrollTop={}".format(initial))

soup = BeautifulSoup(self.browser.page_source, "html.parser")

for i in soup.find_all("div", class_="_3Bc7H _20c87"):

if i.find("div", class_="_3OvU8"):

username = i.find("div", class_="zoWT4").text

usernames.append(username)

initial -= 10

usernames = list(set(usernames))

return usernames

def get_last_message_for(self, name):

messages = list()

search = self.browser.find_element(*WhatsAppElements.search)

81
search.send_keys(name+Keys.ENTER)

time.sleep(3)

soup = BeautifulSoup(self.browser.page_source, "html.parser")

for i in soup.find_all("div", class_="message-in"):

message = i.find("span", class_="selectable-text")

if message:

message2 = message.find("span")

if message2:

messages.append(message2.text)

messages = list(filter(None, messages))

return messages

# In[134]:

whatsapp = WhatsApp(100, session="mysession")

user_names = whatsapp.unread_usernames(scrolls=100)
82
print(user_names)

# In[135]:

for name in user_names:

messages = whatsapp.get_last_message_for(name)

messgaes_len = len(messages)

#latest_msg = messages[messgaes_len-1]

print(messages)

# In[ ]:

from datetime import datetime

import pywhatkit as pwt

import pandas as pd
83
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import smtplib

import time

import joblib

import json

import warnings

warnings.filterwarnings('ignore')

import pandas as pd

import pickle

from twilio.rest import Client

now = datetime.now()

dt_string = now.strftime("%d/%m/%Y %H:%M:%S")

account_sid = 'AC935e8128c0f849fe32f9fea8267a58d4'

auth_token = '4e326d3a6983bc0f22cc3ec26f0eb57e'

client = Client(account_sid, auth_token)

84
date=now.strftime("%d/%m/%Y")

time_now=now.strftime("%H:%M:%S")

hour= now.strftime("%H")

minute = now.strftime("%M")

second = now.strftime("%S")

cvf=pickle.load(open("models/count_vectorizer.pickle", 'rb'))

vect = cvf.transform([latest_msg]).toarray()

svm_from_joblib = joblib.load('models/svm_model.pkl')

my_prediction = svm_from_joblib.predict(vect)

to_contact = "+91"

if my_prediction == [0]:

mess = "abusive"

85
list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
latest_msg),"Prediction":mess}

result=json.dumps(list_of_history)

message = client.messages.create(

from_='+19402673979',

body = result,

to = "")

elif my_prediction == [1]:

mess = "abusive"

list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
latest_msg),"Prediction":mess}

result=json.dumps(list_of_history)

86
message = client.messages.create(

from_='+19402673979',

body = result,

to = "")

elif my_prediction == [2]:

mess = "normal"

list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
latest_msg),"Prediction":mess}

result=json.dumps(list_of_history)

message = client.messages.create(

from_='+19402673979',

body = result,

to = "")

87
# In[2]:

get_ipython().system('pip install twilio')

88
Screenshots:

Conclusion:
This paper developed an efficient tweet classification model to enhance the
effectiveness of topic models for the detection of cyber-bullying events. DEA-RNN was
developed by combining both the DEA optimization and the Elman type RNN for efficient
89
parameter tuning. DEA-RNN had achieved optimal results compared to the other existing
methods in all the scenarios with various metrics such as accuracy, recall, F-measure, precision,
and specificity.

Future Enchanments:
The current study was limited only to the Twitter dataset exclusively; other Social Media
Platforms (SMP) such as Instagram , Flickr, YouTube, Facebook, etc., should be investigated in
order to detect the trend of cyberbullying. Then, the possibility of utilizing multiple source data
for cyber-bullying detection will be investigated in the future. Furthermore, we performed the
analysis only on the content of tweets; we could not perform the analysis in relation to the users’
behavior. This will be in future works.

REFERENCES

[1] F. Mishna, M. Khoury-Kassabri, T. Gadalla, and J. Daciuk, ``Risk factors for


involvement in cyber bullying: Victims, bullies and bully_victims,'' Children
Youth Services Rev., vol. 34, no. 1, pp. 63_70, Jan. 2012, doi:
10.1016/j.childyouth.2011.08.032.
90
[2] K. Miller, ``Cyberbullying and its consequences: How cyberbullying is
contorting the minds of victims and bullies alike, and the law's limitedavailable
redress,'' Southern California Interdiscipl. Law J., vol. 26, no. 2, p. 379, 2016.

[3] A. M. Vivolo-Kantor, B. N. Martell, K. M. Holland, and R. Westby, ``A


systematic review and content analysis of bullying and cyber-bullying
measurement strategies,'' Aggression Violent Behav., vol. 19, no. 4, pp. 423_434,
Jul. 2014, doi: 10.1016/j.avb.2014.06.008.

[4] H. Sampasa-Kanyinga, P. Roumeliotis, and H. Xu, ``Associations between


cyberbullying and school bullying victimization and suicidal ideation, plans and
attempts among Canadian schoolchildren,'' PLoS ONE, vol. 9, no. 7, Jul. 2014,
Art. no. e102145, doi: 10.1371/journal.pone.0102145.

[5] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. de Jong, ``Improving


cyberbullying detection with user context,'' in Proc. Eur. Conf. Inf. Retr., in
Lecture Notes in Computer Science: Including Subseries Lecture Notes in
Arti_cial Intelligence and Lecture Notes in Bioinformatics, vol. 7814, 2013, pp.
693_696.

[6] A. S. Srinath, H. Johnson, G. G. Dagher, and M. Long, ``BullyNet: Unmasking


cyberbullies on social networks,'' IEEE Trans. Computat. Social Syst., vol. 8, no. 2,
pp. 332_344, Apr. 2021, doi: 10.1109/TCSS.2021.3049232.

91
[7] A. Agarwal, A. S. Chivukula, M. H. Bhuyan, T. Jan, B. Narayan, and M.
Prasad, ``Identi_cation and classi_cation of cyberbullying posts: A recurrent neural
network approach using under-sampling and class weighting,'' in Neural
Information Processing (Communications in Computer and Information Science),
vol. 1333, H. Yang, K. Pasupa, A. C.-S. Leung, J. T. Kwok, J. H. Chan, and I.
King, Eds. Cham, Switzerland: Springer, 2020, pp. 113_120.

[8] Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski, ``Machine


learning and feature engineering-based study into sarcasmand irony classi_cation
with application to cyberbullying detection,'' Inf. Process. Manage., vol. 58, no. 4,
Jul. 2021, Art. no. 102600, doi: 10.1016/j.ipm.2021.102600.

[9] N. Yuvaraj, K. Srihari, G. Dhiman, K. Somasundaram, A. Sharma, S.


Rajeskannan, M. Soni, G. S. Gaba, M. A. AlZain, and M. Masud, ``Nature-
inspired-based approach for automated cyberbullying classi_cation on multimedia
social networking,'' Math. Problems Eng., vol. 2021, pp. 1_12, Feb. 2021, doi:
10.1155/2021/6644652.

[10] B. A. Talpur and D. O'Sullivan, ``Multi-class imbalance in text classi cation:


A feature engineering approach to detect cyberbullying in Twitter,'' Informatics,
vol. 7, no. 4, p. 52, Nov. 2020, doi: 10.3390/informatics7040052.

[11] A. Muneer and S. M. Fati, ``A comparative analysis of machine learning


techniques for cyberbullying detection on Twitter,'' Futur. Internet, vol. 12, no. 11,
pp. 1_21, 2020, doi: 10.3390/_12110187.

92
[12] R. R. Dalvi, S. B. Chavan, and A. Halbe, ``Detecting a Twitter cyberbullying
using machine learning,'' Ann. Romanian Soc. Cell Biol., vol. 25, no. 4, pp.
16307_16315, 2021.

[13] R. Zhao, A. Zhou, and K. Mao, ``Automatic detection of cyberbullying on


social networks based on bullying features,'' in Proc. 17th Int. Conf. Dis- trib.
Comput. Netw., Jan. 2016, pp. 1_6, doi: 10.1145/2833312.2849567.

[14] L. Cheng, J. Li, Y. N. Silva, D. L. Hall, and H. Liu, ``XBully: Cyberbullying


detection within a multi-modal context,'' in Proc. 12th ACM Int. Conf. Web Search
Data Mining, Jan. 2019, pp. 339_347, doi: 10.1145/3289600.3291037.

[15] K. Reynolds, A. Kontostathis, and L. Edwards, ``Using machine learning


to detect cyberbullying,'' in Proc. 10th Int. Conf. Mach. Learn. Appl. Workshops
(ICMLA), vol. 2, Dec. 2011, pp. 241_244, doi: 10.1109/ICMLA.2011.152.

93

You might also like