DEA-RNN A Hybrid Deep Learning Approach
DEA-RNN A Hybrid Deep Learning Approach
A PROJECT REPORT
of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
MAY 2023
1
BONAFIDE CERTIFICATE
2
TABLE OF CONTENTS
LIST OF FIGURES Vi
1 INTRODUCTION
2 SYSTEM ANALYSIS
3 REQUIREMENTS SPECIFICATION
3.1 Introduction
3.3.1 Python
3
4.5 Design and Implementation Constraints
5 SYSTEM DESIGN
6.1 Modules
7.1 Coding
SOURCE CODE
SNAP SHOTS
REFERENCES
LIST OF FIGURES
4
5.3 Activity Diagram
LIST OF ABBREVATIONS
5
CNN Convolutional Neural Network
IP Internet Protocol
6
ABSTRACT
7
CHAPTER 1
INTRODUCTION
Aim:
Synopsis:
The proposed DEA-RNN model combines Elman typeRecurrent Neural Networks (RNN)
with an optimized Dolphin Echolocation Algorithm (DEA) for fine-tuningthe Elman RNN's
parameters and reducing training time. We evaluated DEA-RNN thoroughlyutilizing a dataset of
10000 tweets and compared its performance to those of state-of-the-art algorithmssuch as Bi-
directional long short term memory (Bi-LSTM), RNN, SVM, Multinomial Naive Bayes
(MNB),Random Forests (RF). The experimental results show that DEA-RNN was found to be
superior in allthe scenarios. It outperformed the considered existing approaches in detecting CB
on Twitter platform.DEA-RNN was more efficient in scenario, where it has achieved an average
of 90.45% accuracy, 89.52%precision, 88.98% recall, 89.25% F1-score, and 90.94% specificity.
.
8
CHAPTER 2
SYSTEM ANALYSIS
2.2PROPOSED SYSTEM
9
In this model, we propose a hybrid deep learning-basedapproach, called DEA-
RNN, which automatically detectsbullying from tweets. The DEA-RNN approach
combinesRecurrent Neural Networks (RNN) with animproved Dolphin Echolocation Algorithm
(DEA) for fine-tuningthe Elman RNN's parameters. DEA-RNN can handlethe dynamic nature of
short texts and can cope with thetopic models for the effective extraction of trending
topics.DEA-RNN outperformed the considered existing approachesin detecting cyberbullying on
the Twitter platform in all scenariosand with various evaluation metrics.
Advantage:
Propose DEA-RNN by combining the Elman type RNNand the improved DEA for optimal
classification oftweets;A new Twitter dataset is collected based on cyberbullyingkeywords for
evaluating the performance ofDEA-RNN and the existing methods; andThe efficiency of DEA-
RNN in recognizing and classifyingcyberbullying tweets is assessed using Twitterdatasets. The
thorough experimental results revealthat DEA-RNN outperforms other competing modelsin
terms of recall, precision, accuracy, F1 score, andspecificity
CHAPTER 3
REQUIREMENT SPECIFICATIONS
3.1 INTRODUCTION
Social media networks such as Facebook, Twitter, Flickr, and Instagram have
become the preferred online platforms for interaction and socialization among
people of all ages. While these platforms enable people to communicate and
interact in previously unthinkableways, they have also led to malevolent activities
such as cyber-bullying. Cyberbullying is a type of psychological abuse with a
signi_cant impact on society. Cyber-bullying events have been increasing mostly
among young people spending most of their time navigating between different
10
social media platforms. Particularly, social media networks such as Twitter and
Facebook are prone to CB because of their popularity and the anonymity that the
Internet provides to abusers. In India, for example, 14 percent of all harassment
occurs on Facebook and Twitter, with 37 percent of these incidents involving
youngsters [1]. Moreover, cyberbullying might lead to serious mental issues and
adverse mental health effects. Most suicides are due to the anxiety, depression,
stress, and social and emotional dif_culties from cyber-bullying events [2]_[4].
This motivates the need for an approach to identify cyberbullying in social media
messages (e.g., posts, tweets, and comments). In this article, we mainly focus on
the problem of cyberbullying detection on the Twitter platform. As cyberbullying
is becoming a prevalent problem in Twitter, the detection of cyberbullying events
from tweets and provisioning preventive measures are the primary tasks in battling
cyberbullying threats [5]. Therefore, there is a greater need to increase the research
on social networks-based CB in order to get greater insights and aid in the
development of effective tools and approaches to effectively combat cyberbullying
problem [6]. Manually monitoring and controlling cyberbullying on Twitter
platform is virtually impossible [7]. Furthermore, mining social media messages
for cyberbullying detection is quite dif_cult. For example, Twitter messages are
often brief, full of slang, and may include emojis, and gifs, which makes it
impossible to deduce individuals' intentions and meanings purely from social
media messages. Moreover, bullying can be dif_cult to detect if the bully uses
strategies like sarcasm or passive-aggressiveness to conceal it.
11
3.2.1 HARDWARE REQUIREMENTS
Machine Learning
3.3.1 Python
initially designed by Guido van Rossum in 1991 and developed by Python Software Foundation.
It was mainly developed for emphasis on code readability, and its syntax allows programmers to
Python is a programming language that lets you work quickly and integrate systems more
efficiently.
It is used for:
12
web development (server-side),
software development,
mathematics,
System scripting.
Why Python?
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-orientated way or a functional way.
Good to know
The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.
Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases until
December 2008. At that time, the development team made the decision to release version
3.0, which contained a few relatively small but significant changes that were not
backward compatible with the 2.x versions. Python 2 and 3 are very similar, and some
13
features of Python 3 have been backported to Python 2. But in general, they remain not
quite compatible.
Both Python 2 and 3 have continued to be maintained and developed, with periodic
release updates for both. As of this writing, the most recent versions available are 2.7.15
and 3.6.5. However, an official End Of Life date of January 1, 2020 has been established
for Python 2, after which time it will no longer be maintained.
Python is still maintained by a core development team at the Institute, and Guido is still
in charge, having been given the title of BDFL (Benevolent Dictator For Life) by the
Python community. The name Python, by the way, derives not from the snake, but from
the British comedy troupe Monty Python’s Flying Circus, of which Guido was, and
presumably still is, a fan. It is common to find references to Monty Python sketches and
movies scattered throughout the Python documentation.
It is possible to write Python in an Integrated Development Environment, such as
Thonny, Pycharm, Netbeans or Eclipse which are particularly useful when managing
larger collections of Python files.
Python was designed to for readability, and has some similarities to the English language
with influence from mathematics.
Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets for
this purpose.
Python is Interpreted
Many languages are compiled, meaning the source code you create needs to be translated
into machine code, the language of your computer’s processor, before it can be run.
Programs written in an interpreted language are passed straight to an interpreter that runs
them directly.
14
This makes for a quicker development cycle because you just type in your code and run
it, without the intermediate compilation step.
One potential downside to interpreted languages is execution speed. Programs that are
compiled into the native language of the computer processor tend to run more quickly
than interpreted programs. For some applications that are particularly computationally
intensive, like graphics processing or intense number crunching, this can be limiting.
In practice, however, for most programs, the difference in execution speed is measured in
milliseconds, or seconds at most, and not appreciably noticeable to a human user. The
expediency of coding in an interpreted language is typically worth it for most
applications.
For all its syntactical simplicity, Python supports most constructs that would be expected
in a very high-level language, including complex dynamic data types, structured and
functional programming, and object-oriented programming.
Additionally, a very extensive library of classes and functions is available that provides
capability well beyond what is built into the language, such as database manipulation or
GUI programming.
Python accomplishes what many programming languages don’t: the language itself is
simply designed, but it is very versatile in terms of what you can accomplish with it.
Machine learning
Introduction:
15
model based on sample data, known as "training data", in order to make
predictions or decisions without being explicitly programmed to perform the task.
Machine learning algorithms are used in a wide variety of applications, such
as email filtering and computer vision, where it is difficult or infeasible to develop
a conventional algorithm for effectively performing the task.
Machine learning tasks are classified into several broad categories. In supervised
learning, the algorithm builds a mathematical model from a set of data that
contains both the inputs and the desired outputs. For example, if the task were
determining whether an image contained a certain object, the training data for a
supervised learning algorithm would include images with and without that object
(the input), and each image would have a label (the output) designating whether it
contained the object. In special cases, the input may be only partially available, or
restricted to special feedback. Semi algorithms develop mathematical models from
incomplete training data, where a portion of the sample input doesn't have labels.
16
would be an incoming email, and the output would be the name of the folder in
which to file the email. For an algorithm that identifies spam emails, the output
would be the prediction of either "spam" or "not spam", represented by
the Boolean values true and false. Regression algorithms are named for their
continuous outputs, meaning they may have any value within a range. Examples of
a continuous value are the temperature, length, or price of an object.
Active learning algorithms access the desired outputs (training labels) for a limited
set of inputs based on a budget and optimize the choice of inputs for which it will
acquire training labels. When used interactively, these can be presented to a human
user for labeling. Reinforcement learning algorithms are given feedback in the
form of positive or negative reinforcement in a dynamic environment and are used
in autonomous vehicles or in learning to play a game against a human opponent.
Other specialized algorithms in machine learning include topic modeling, where
the computer program is given a set of natural language documents and finds other
documents that cover similar topics. Machine learning algorithms can be used to
find the unobservable probability density function in density
estimation problems. Meta learning algorithms learn their own inductive bias based
on previous experience. In developmental robotics, robot learning algorithms
generate their own sequences of learning experiences, also known as a curriculum,
to cumulatively acquire new skills through self-guided exploration and social
17
interaction with humans. These robots use guidance mechanisms such as active
learning, maturation, motor synergies, and imitation.
The types of machine learning algorithms differ in their approach, the type of data
they input and output, and the type of task or problem that they are intended to
solve.
Supervised learning:
Supervised learning algorithms build a mathematical model of a set of data that
contains both the inputs and the desired outputs. The data is known as training
data, and consists of a set of training examples. Each training example has one or
more inputs and the desired output, also known as a supervisory signal. In the
mathematical model, each training example is represented by an array or vector,
sometimes called a feature vector, and the training data is represented by a matrix.
Through iterative optimization of an objective function, supervised learning
algorithms learn a function that can be used to predict the output associated with
new inputs. An optimal function will allow the algorithm to correctly determine the
output for inputs that were not a part of the training data. An algorithm that
improves the accuracy of its outputs or predictions over time is said to have
learned to perform that task.
18
examples using a similarity function that measures how similar or related two
objects are. It has applications in ranking, recommendation systems, visual identity
tracking, face verification, and speaker verification.
Unsupervised learning:
Unsupervised learning algorithms take a set of data that contains only inputs, and
find structure in the data, like grouping or clustering of data points. The
algorithms, therefore, learn from test data that has not been labeled, classified or
categorized. Instead of responding to feedback, unsupervised learning algorithms
identify commonalities in the data and react based on the presence or absence of
such commonalities in each new piece of data. A central application of
unsupervised learning is in the field of density estimation in statistics, though
unsupervised learning encompasses other domains involving summarizing and
explaining data features.
19
same cluster, and separation, the difference between clusters. Other methods are
based on estimated density and graph connectivity.
Semi-supervised learning:
20
CHAPTER 4
21
4.6 Other Nonfunctional Requirements
The application at this side controls and communicates with the following three main general
components.
embedded browser in charge of the navigation and accessing to the web service;
Server Tier: The server side contains the main parts of the functionality of the proposed
architecture. The components at this tier are the following.
1. The software may be safety-critical. If so, there are issues associated with its integrity level
2. The software may not be safety-critical although it forms part of a safety-critical system. For
3. If a system must be of a high integrity level and if the software is shown to be of that
integrity level, then the hardware must be at least of the same integrity level.
4. There is little point in producing 'perfect' code in some language if hardware and system
5. If a computer system is to run software of a high integrity level then that system should not at
22
7. Otherwise, the highest level of integrity required must be applied to all systems in the same
environment.
CHAPTER 5
23
Data Cleaning & Preprocessing
Data Collection
Algorithms Implementation
Model Creation
Evaluation Performance
perperfor
Prediction
24
Fig: 5.1
A Sequence diagram is a kind of interaction diagram that shows how processes operate
with one another and in what order. It is a construct of Message Sequence diagrams are
sometimes called event diagrams, event sceneries and timing diagram.
25
USER WHATSAPP SUPERVISOR
Creating
Testing data
Apply
ML Techniques
validate
Cyberbullying Predict value
Detection
26
5.3 Use Case Diagram:
Use case: A use case describes a sequence of actions that provided something of measurable
value to an actor and is drawn as a horizontal ellipse.
Actor: An actor is a person, organization or external system that plays a role in one or more
interaction with the system.
27
28
GET DATASET
EXTRACTED DATASET
VALIDATION
CYBERBULLING
DETECTION
29
5.4 Activity Diagram:
30
DATASET
DATA EXTRACTION
MACHINCE LEARNING
MODEL CREATION
USER WHATSAPP
RESULT
31
5.5 Collaboration Diagram:
UML Collaboration Diagrams illustrate the relationship and interaction between
software objects. They require use cases, system operation contracts and domain model to
already exist. The collaboration diagram illustrates messages being sent between classes and
objects.
32
GET DATASET
EXTRACTED DATA
GET DATASET
Extracted dataset
collected from user
GET DATASET
TRAINING DATASET MACHINCE LEARNING
33
CHAPTER 6
6.1 MODULES
Dataset Collection
Data Cleaning & Preprocessing
Algorithm Implementation
Prediction
Data Engineering:
The data cleansing and pre-processing phase contain three sub-phases. This process is
performed on the raw tweet dataset to form the finalized data as described in the previous
dataset. In the first sub-phase, noise removal such as URL removal, hashtag/mentions removal,
punctuation/ symbol removal, and emoticon transformation processes are performed. In the
second sub-phase, Out of Vocabulary Cleansing such as spell checking, acronym expansion,
slang modification, elongated (repeated Characters removal) are performed. In the finalsub-
phase, tweet transformations such as lower-case conversion, stemming,word segmentation
(tokenization), and stop word filtering are conducted. These subphases are performed to enhance
the tweets and improve feature extraction and classification accuracy.
Algorithm:
Support Vector Machine (SVM) is a supervised machine learning algorithm which can be
used for either classification or regression challenges. However, it is mostly used in classification
34
problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is
number of features you have) with the value of each feature being the value of a particular
coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two
classes very well .
Prediction:
CHAPTER 7
CODING AND TESTING
7.1 CODING
35
Once the design aspect of the system is finalizes the system enters into the coding and
testing phase. The coding phase brings the actual system into action by converting the design of
the system into the code in a given programming language. Therefore, a good coding style has to
be taken whenever changes are required it easily screwed into the system.
Coding standards are guidelines to programming that focuses on the physical structure and
appearance of the program. They make the code easier to read, understand and maintain. This
phase of the system actually implements the blueprint developed during the design phase. The
coding specification should be in such a way that any programmer must be able to understand the
code and can bring about changes whenever felt necessary. Some of the standard needed to
Naming conventions
Value conventions
Naming conventions of classes, data member, member functions, procedures etc., should be
self-descriptive. One should even get the meaning and scope of the variable by its name. The
36
conventions are adopted for easy understanding of the intended message by the user. So it is
Class names
Class names are problem domain equivalence and begin with capital letter and have
mixed cases.
Member function and data member name begins with a lowercase letter with each
subsequent letters of the new words in uppercase and the rest of letters in lowercase.
Value conventions ensure values for variable at any point of time. This involves the
following:
37
Script writing is an art in which indentation is utmost important. Conditional and looping
statements are to be properly aligned to facilitate easy understanding. Comments are included to
minimize the number of surprises that could occur when going through the code.
When something has to be prompted to the user, he must be able to understand it properly.
To achieve this, a specific format has been adopted in displaying messages to the user. They are
as follows:
SYSTEM TESTING
Testing is performed to identify errors. It is used for quality assurance. Testing is
an integral part of the entire development and maintenance process. The goal of the testing
during phase is to verify that the specification has been accurately and completely incorporated
into the design, as well as to ensure the correctness of the design itself. For example the design
must not have any logic faults in the design is detected before coding commences, otherwise the
cost of fixing the faults will be considerably higher as reflected. Detection of design faults can be
Testing is one of the important steps in the software development phase. Testing checks for
the errors, as a whole of the project testing involves the following test cases:
Static analysis is used to investigate the structural properties of the Source code.
38
Dynamic testing is used to investigate the behavior of the source code by executing the
component of the software. Unit testing focuses on the smallest unit of the software design (i.e.),
the module. The white-box testing techniques were heavily employed for unit testing.
Functional test cases involved exercising the code with nominal input values for
which the expected results are known, as well as boundary values and special values, such as
Performance Test
Stress Test
Structure Test
It determines the amount of execution time spent in various parts of the unit, program
throughput, and response time and device utilization by the program unit.
learned about the strength and limitations of a program by examining the manner in which a
Structure Tests are concerned with exercising the internal logic of a program and
traversing particular execution paths. The way in which White-Box test strategy was employed
to ensure that the test cases could Guarantee that all independent paths within a module have
Execute all loops at their boundaries and within their operational bounds.
Handling end of file condition, I/O errors, buffer problems and textual errors in
output information
while at the same time conducting tests to uncover errors associated with interfacing. i.e.,
integration testing is the complete testing of the set of modules which makes up the product. The
objective is to take untested modules and build a program structure tester should identify critical
modules. Critical modules should be tested as early as possible. One approach is to wait until all
the units have passed testing, and then combine them and then tested. This approach is evolved
from unstructured testing of small programs. Another strategy is to construct the product in
40
increments of tested units. A small set of modules are integrated together and tested, to which
another module is added and tested in combination. And so on. The advantages of this approach
The major error that was faced during the project is linking error. When all the
modules are combined the link is not set properly with all support files. Then we checked out for
interconnection and the links. Errors are localized to the new module and its
intercommunications. The product development can be staged, and modules integrated in as they
complete unit testing. Testing is completed when the last module is integrated and tested.
7.5.1 TESTING
Testing is a process of executing a program with the intent of finding an error. A good
test case is one that has a high probability of finding an as-yet –undiscovered error. A successful
test is one that uncovers an as-yet- undiscovered error. System testing is the stage of
implementation, which is aimed at ensuring that the system works accurately and efficiently as
expected before live operation commences. It verifies that the whole set of programs hang
together. System testing requires a test consists of several key activities and steps for run
program, string, system and is important in adopting a successful new system. This is the last
chance to detect and correct errors before the system is installed for user acceptance testing.
The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the review
41
of specification design and coding. Testing is the process of executing the program with the
intent of finding the error. A good test case design is one that as a probability of finding an yet
undiscovered error. A successful test is one that uncovers an yet undiscovered error. Any
This testing is also called as Glass box testing. In this testing, by knowing the
specific functions that a product has been design to perform test can be conducted that
demonstrate each function is fully operational at the same time searching for errors in each
function. It is a test case design method that uses the control structure of the procedural design to
Cyclometric complexity
conducted to ensure that “all gears mesh”, that is the internal operation performs according to
specification and all internal components have been adequately exercised. It fundamentally
42
The steps involved in black box test case design are:
Equivalence partitioning
Comparison testing
A software testing strategy provides a road map for the software developer. Testing is a
set activity that can be planned in advance and conducted systematically. For this reason a
template for software testing a set of steps into which we can place specific test case design
Testing begins at the module level and works “outward” toward the integration of
The developer of the software and an independent test group conducts testing.
structure while at the same time conducting tests to uncover errors associated with. Individual
modules, which are highly prone to interface errors, should not be assumed to work instantly
when we put them together. The problem of course, is “putting them together”- interfacing.
There may be the chances of data lost across on another’s sub functions, when combined may
43
not produce the desired major function; individually acceptable impression may be magnified to
The logical and syntax errors have been pointed out by program testing. A
syntax error is an error in a program statement that in violates one or more rules of the language
in which it is written. An improperly defined field dimension or omitted keywords are common
syntax error. These errors are shown through error messages generated by the computer. A logic
error on the other hand deals with the incorrect data fields, out-off-range items and invalid
combinations. Since the compiler s will not deduct logical error, the programmer must examine
the output. Condition testing exercises the logical conditions contained in a module. The possible
types of elements in a condition include a Boolean operator, Boolean variable, a pair of Boolean
on testing each condition in the program the purpose of condition test is to deduct not only
errors in the condition of a program but also other a errors in the program.
Security testing attempts to verify the protection mechanisms built in to a system well, in
fact, protect it from improper penetration. The system security must be tested for invulnerability
from frontal attack must also be tested for invulnerability from rear attack. During security, the
44
7.5.2.4 VALIDATION TESTING
package. Interfacing errors have been uncovered and corrected and a final series of software test-
validation testing begins. Validation testing can be defined in many ways, but a simple definition
is that validation succeeds when the software functions in manner that is reasonably expected by
the customer. Software validation is achieved through a series of black box tests that
demonstrate conformity with requirement. After validation test has been conducted, one of two
conditions exists.
Deviation or errors discovered at this step in this project is corrected prior to completion
of the project with the help of the user by negotiating to establish a method for resolving
deficiencies. Thus the proposed system under consideration has been tested by using validation
testing and found to be working satisfactorily. Though there were deficiencies in the system they
User acceptance of the system is key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with prospective
system and user at the time of developing and making changes whenever required. This is done
45
Output screen design.
Source Code
#!/usr/bin/env python
# coding: utf-8
# In[60]:
import numpy as np
# In[61]:
import pandas as pd
# In[62]:
46
import matplotlib.pyplot as plt
# In[63]:
# In[64]:
import warnings
warnings.filterwarnings('ignore')
# In[65]:
47
data = pd.read_csv(r'data/labeled_data.csv')
# In[66]:
data.head()
# In[67]:
data.shape
48
# In[68]:
data.info()
# In[69]:
data.describe().T
# In[70]:
data.isnull().sum()
49
# In[71]:
data.filter(items=['class', 'tweet'])
# In[72]:
sns.set(rc={'figure.figsize':(10,6)})
# In[75]:
50
print(data)
# # Data Engineering
# 1 ) Case conversion
# 3 ) Removing shorthands
# 4 ) Removing stopwords
# 5 ) Removing links
# 6 ) Removing accents
# 7 ) Normalize spaces
51
# In[76]:
import re
import unidecode
# In[77]:
def case_convert():
def remove_shorthands():
CONTRACTION_MAP = {
"can't": "cannot",
"'cause": "because",
"ma'am": "madam",
texts = []
string = ""
if word.strip() in list(CONTRACTION_MAP.keys()):
else:
texts.append(string.strip())
59
data.tweet = texts
def remove_stopwords():
texts = []
stopwords_list = stopwords.words('english')
string = ""
if word.strip() in stopwords_list:
continue
else:
texts.append(string)
data.tweet = texts
def remove_links():
texts = []
texts.append(remove_com)
data.tweet = texts
def remove_accents():
def normalize_spaces():
case_convert()
remove_links()
# remove_shorthands()
remove_accents()
remove_specials()
remove_stopwords()
normalize_spaces()
print(data)
61
# # Word Cloud Visualization
# In[82]:
data_hate = data[data['class']==0]
data_offen = data[data['class']==1]
data_neither = data[data['class']==2]
# # Abusive words
# In[83]:
stopwords = set(STOPWORDS)
62
wordcloud = WordCloud(stopwords=stopwords,
background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
# # offensive language
# In[84]:
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords,
background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
63
plt.axis("off")
plt.show()
# # normal
# In[85]:
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords,
background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
64
# In[86]:
X=data["tweet"]
Y=data["class"]
# In[87]:
import pickle
cv = CountVectorizer()
X = cv.fit_transform(X)
65
# In[88]:
# In[89]:
# In[90]:
66
print("X_train shape:", X_train.shape)
# In[91]:
logreg.fit(X_train, Y_train)
# In[93]:
67
# Support Vector Classifier Algorithm
svc.fit(X_train, Y_train)
# In[97]:
Y_pred_logreg = logreg.predict(X_test)
Y_pred_svc = svc.predict(X_test)
# In[98]:
68
from sklearn.metrics import accuracy_score
# In[99]:
# In[100]:
a=accuracy_logreg * 100
c=accuracy_svc * 100
69
# In[113]:
# Classification report
print(classification_report(Y_test, Y_pred_svc))
# In[117]:
import joblib
joblib.dump(svc, 'models/svm_model.pkl')
70
# Load the model from the file
svm_from_joblib = joblib.load('models/svm_model.pkl')
svm_from_joblib.predict(X_test)
# # Abusive contents
# In[118]:
vect = cv.transform(message).toarray()
my_prediction = logreg.predict(vect)
print(my_prediction)
71
# In[119]:
vect = cv.transform(message).toarray()
my_prediction = logreg.predict(vect)
print(my_prediction)
# # Normal
# In[120]:
message=["Peel up peel up bring it back up rewind back where I'm from they
move Shaq from the line"]
72
vect = cv.transform(message).toarray()
my_prediction = logreg.predict(vect)
print(my_prediction)
# # Sender
# In[ ]:
import pandas as pd
import numpy as np
import smtplib
73
import time
import joblib
import json
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import pickle
account_sid = 'AC935e8128c0f849fe32f9fea8267a58d4'
auth_token = '4e326d3a6983bc0f22cc3ec26f0eb57e'
now = datetime.now()
date=now.strftime("%d/%m/%Y")
time_now=now.strftime("%H:%M:%S")
hour= now.strftime("%H")
74
minute = now.strftime("%M")
second = now.strftime("%S")
hour=round(int(hour))
minute=round(int(minute))
social_minute=int(minute)+1
message = ""
body =[message]
to_contact = "+91"
pwt.sendwhatmsg(to_contact,message,int(hour),int(social_minute))
cvf=pickle.load(open("models/count_vectorizer.pickle", 'rb'))
vect = cvf.transform(body).toarray()
svm_from_joblib = joblib.load('models/svm_model.pkl')
my_prediction = svm_from_joblib.predict(vect)
if my_prediction == [0]:
75
mess = "abusive"
list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
message),"Prediction":mess}
result=json.dumps(list_of_history)
message = client.messages.create(
from_='+19402673979',
body = result,
to = "")
mess = "abusive"
list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
message),"Prediction":mess}
76
result=json.dumps(list_of_history)
message = client.messages.create(
from_='+19402673979',
body = result,
to = "")
mess = "normal"
# # Receiver
# In[133]:
77
webdriver =
webdriver.Chrome("/home/hwuser/Downloads/chromedriver/chromedriver")
#webdriver.get("https://web.whatsapp.com")
import time
import datetime as dt
import json
import os
import requests
import shutil
import pickle
class WhatsAppElements:
search = (By.CSS_SELECTOR, "#side > div.uwk68 > div > div >
div._16C8p > div > div._13NKt.copyable-text.selectable-text")
class WhatsApp:
browser = None
timeout = 10
79
self.browser = webdriver.Chrome(ChromeDriverManager().install())
self.browser.get("https://web.whatsapp.com/")
WebDriverWait(self.browser,wait).until(
EC.presence_of_element_located(WhatsAppElements.search))
def goto_main(self):
try:
self.browser.refresh()
Alert(self.browser).accept()
except Exception as e:
print(e)
WebDriverWait(self.browser,
self.timeout).until(EC.presence_of_element_located(
WhatsAppElements.search))
self.goto_main()
initial = 10
usernames = []
80
for i in range(0, scrolls):
self.browser.execute_script("document.getElementById('pane-
side').scrollTop={}".format(initial))
if i.find("div", class_="_3OvU8"):
usernames.append(username)
initial -= 10
usernames = list(set(usernames))
return usernames
messages = list()
search = self.browser.find_element(*WhatsAppElements.search)
81
search.send_keys(name+Keys.ENTER)
time.sleep(3)
if message:
message2 = message.find("span")
if message2:
messages.append(message2.text)
return messages
# In[134]:
user_names = whatsapp.unread_usernames(scrolls=100)
82
print(user_names)
# In[135]:
messages = whatsapp.get_last_message_for(name)
messgaes_len = len(messages)
#latest_msg = messages[messgaes_len-1]
print(messages)
# In[ ]:
import pandas as pd
83
import numpy as np
import smtplib
import time
import joblib
import json
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import pickle
now = datetime.now()
account_sid = 'AC935e8128c0f849fe32f9fea8267a58d4'
auth_token = '4e326d3a6983bc0f22cc3ec26f0eb57e'
84
date=now.strftime("%d/%m/%Y")
time_now=now.strftime("%H:%M:%S")
hour= now.strftime("%H")
minute = now.strftime("%M")
second = now.strftime("%S")
cvf=pickle.load(open("models/count_vectorizer.pickle", 'rb'))
vect = cvf.transform([latest_msg]).toarray()
svm_from_joblib = joblib.load('models/svm_model.pkl')
my_prediction = svm_from_joblib.predict(vect)
to_contact = "+91"
if my_prediction == [0]:
mess = "abusive"
85
list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
latest_msg),"Prediction":mess}
result=json.dumps(list_of_history)
message = client.messages.create(
from_='+19402673979',
body = result,
to = "")
mess = "abusive"
list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
latest_msg),"Prediction":mess}
result=json.dumps(list_of_history)
86
message = client.messages.create(
from_='+19402673979',
body = result,
to = "")
mess = "normal"
list_of_history=
{"Date":str(date),"Time":str(time),"Contact":str(to_contact),"Message":str(
latest_msg),"Prediction":mess}
result=json.dumps(list_of_history)
message = client.messages.create(
from_='+19402673979',
body = result,
to = "")
87
# In[2]:
88
Screenshots:
Conclusion:
This paper developed an efficient tweet classification model to enhance the
effectiveness of topic models for the detection of cyber-bullying events. DEA-RNN was
developed by combining both the DEA optimization and the Elman type RNN for efficient
89
parameter tuning. DEA-RNN had achieved optimal results compared to the other existing
methods in all the scenarios with various metrics such as accuracy, recall, F-measure, precision,
and specificity.
Future Enchanments:
The current study was limited only to the Twitter dataset exclusively; other Social Media
Platforms (SMP) such as Instagram , Flickr, YouTube, Facebook, etc., should be investigated in
order to detect the trend of cyberbullying. Then, the possibility of utilizing multiple source data
for cyber-bullying detection will be investigated in the future. Furthermore, we performed the
analysis only on the content of tweets; we could not perform the analysis in relation to the users’
behavior. This will be in future works.
REFERENCES
91
[7] A. Agarwal, A. S. Chivukula, M. H. Bhuyan, T. Jan, B. Narayan, and M.
Prasad, ``Identi_cation and classi_cation of cyberbullying posts: A recurrent neural
network approach using under-sampling and class weighting,'' in Neural
Information Processing (Communications in Computer and Information Science),
vol. 1333, H. Yang, K. Pasupa, A. C.-S. Leung, J. T. Kwok, J. H. Chan, and I.
King, Eds. Cham, Switzerland: Springer, 2020, pp. 113_120.
92
[12] R. R. Dalvi, S. B. Chavan, and A. Halbe, ``Detecting a Twitter cyberbullying
using machine learning,'' Ann. Romanian Soc. Cell Biol., vol. 25, no. 4, pp.
16307_16315, 2021.
93