CRIME TYPE AND OCCURRENCE PREDICTION USING
MACHINE LEARNING
ABSTRACT
In this era of recent times, crime has become an evident way of making people and society under
trouble. An increasing crime factor leads to an imbalance in the constituency of a country. In order to
analyze and have a response ahead this type of criminal activities, it is necessary to understand the crime
patterns. This study imposes one such crime pattern analysis by using crime data obtained from Kaggle
open source which in turn used for the prediction of most recently occurring crimes. The major aspect of
this project is to estimate which type of crime contributes the most along with time period and location
where it has happened. Some machine learning algorithms such as Naïve Bayes is implied in this work
in order to classify among various crime patterns and the accuracy achieved was comparatively high
when compared to pre composed works. The rise in urbanization and digitalization has led to an increase
in crime rates and the complexity of crime patterns. Traditional methods of crime analysis and
prevention are becoming less effective in handling this growing challenge. This study explores the
application of machine learning techniques to predict crime types and their occurrences. By leveraging
historical crime data, socio-economic factors, and spatial-temporal information, we develop predictive
models that can identify potential crime hotspots and the likelihood of different crime types occurring in
specific areas. The research aims to enhance law enforcement agencies' ability to allocate resources
efficiently, prevent crimes proactively, and improve overall public safety. Our findings demonstrate that
machine learning models, particularly ensemble methods and neural networks, provide significant
improvements in prediction accuracy compared to traditional statistical approaches.
INTRODUCTION
Crime has become a major thread imposed which is considered to grow relatively high in
intensity. An action stated is said to be a crime, when it violates the rule, against the government laws
and it is highly offensive. The crime pattern analysis requires a study in the different aspects of
criminology and also in indicating patterns. The Government has to spend a lot of time and work to
imply technology to govern some of these criminal activities. Hence, use of machine learning
techniques and its records is required to predict the crime type and patterns. It imposes the uses of
existing crime data and predicts the crime type and its occurrence bases on the location and time.
Researchers undergone many studies that helps in analyzing the crime patterns along with their
relations in a specific location. Some of the hotspots analyzed has become easier way of classifying
the crime patterns. This leads to assist the officials to resolve them faster. This approach uses a dataset
obtained from Kaggle open source based on various factors along with the time and space where it
occurs over a certain period of time. We implied a classification algorithm that helps in locating the
type of crime and hotspots of the criminal actions that takes place on the certain time and day. In this
proposed one to impose a machine learning algorithms to find the matching criminal patterns along
with the assist of its category with the given temporal and spatial data. Crime remains a pervasive
issue in societies worldwide, posing a significant threat to public safety and well-being. As urban
populations continue to grow, so does the complexity of crime patterns, making traditional crime
analysis methods less effective. Law enforcement agencies face the daunting task of predicting and
preventing criminal activities in an ever-evolving landscape.
In recent years, the advent of machine learning has provided new avenues for addressing complex
problems across various domains, including crime prediction. Machine learning algorithms can
process vast amounts of data, identify patterns, and make predictions with a level of accuracy
unattainable by conventional methods. This study aims to harness the power of machine learning to
predict crime types and their occurrences, thereby aiding law enforcement agencies in crime
prevention and resource allocation. By analyzing historical crime data along with socio-economic and
spatial-temporal factors, we can develop models that forecast crime trends and identify potential
hotspots. These predictions enable proactive measures, such as increased patrolling in high-risk areas
and timely interventions, ultimately leading to a reduction in crime rates and an increase in
community safety.
The primary objectives of this research are to:
1
Evaluate the effectiveness of various machine learning algorithms in predicting crime types
and occurrences.
Identify key features that significantly influence crime patterns.
Develop a predictive framework that can be implemented by law enforcement agencies for
real-time crime forecasting.
2
SYSTEM ANALYSIS
EXISTING SYSTEM
In pre-work, the dataset obtained from the open source are first pre-processed to remove the
duplicated values and features. Decision tree has been used in the factor of finding crime patterns and
also extracting the features from large amount of data is inclusive. It provides a primary structure for
further classification process. The classified crime patterns are feature extracted using Deep Neural
network. Based on the prediction, the performance is calculated for both trained and test values. The
crime prediction helps in forecasting the future happening of any type of criminal activities and help
the officials to resolve them at the earliest.
DISADVANTAGES:
1. The pre-existing works account for low accuracy since the classifier uses a categorical value which
produces a biased outcome for the nominal attributes with greater value.
2. The classification techniques does not suit for regions with inappropriate data and real valued
attributes.
3. The value of the classifier must be tuned and hence there is a need of assigning an optimal value.
PROPOSED SYSTEM
The data obtained is first pre-processed using machine learning technique filter and wrapper
in order to remove irrelevant and repeated data values. It also reduces the dimensionality thus
the data has been cleaned. The data is then further undergoing a splitting process. It is
classified into test and trained data set. The model is trained by dataset both training and
testing. It is then followed by mapping. The crime type, year, month, time, date, place are
mapped to an integer for ensuring classification easier.
The independent effect between the attributes is analyzed initially by using Naïve Bayes.
Bernoulli Naïve Bayes is used for classifying the independent features extracted. The crime
features are labelled that allows to analyze the occurrence of crime at a particular time and
location. Finally, the crime which occur the most along with spatial and temporal information
is gained. The performance of the prediction model is found out by calculating accuracy rate.
The language used in designing the prediction model is python and run on the Colab – an
online compiler for data analysis and machine learning models.
3
ADVANTAGES
1. The proposed algorithm is well suited for the crime pattern detection since most of the featured
attributes depends on the time and location.
2. It also overcomes the problem of analyzing independent effect of the attributes.
3. The initialization of optimal value is not required since it accounts for real valued, nominal
value and also concern the region with insufficient information.
4. The accuracy has been relatively high when compared to other machine learning prediction
model.
4
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS
➢ Processor - Pentium –IV
➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA
4.2.2 SOFTWARE REQUIREMENTS
Coding Language : Python.
Operating system : Windows 7 Ultimate.
Front-End : Python.
Back-End : Django-ORM
Designing : Html, CSS, JavaScript.
Data Base : MySQL (WAMP Server).
5
SYSTEM ARCHITECTURE
Service Provider
Login,
Login,
Train and Test Data Sets,
Accepting all user Information
Admin
View user data details View Trained and Tested Accuracy in Bar
Chart,
Authorize View Trained and Tested Accuracy
the Admin Results,
Process all View Predicted Crime Type Details,
user queries
Find Crime Type Ratio on Data Sets,
Store and retrievals Download Trained Data Sets,
Registering View Crime Type Ratio Results,
the User
WEB View All Remote Users
Database
Remote User
Tweet Server
REGISTER AND LOGIN,
Tweet Server
POST CRIME DATA SETS,
Tweet Server
PREDICT CRIME TYPE,
VIEW YOUR PROFILE
MODULES:
Remote User
System Provider