[go: up one dir, main page]

0% found this document useful (0 votes)
2 views11 pages

Pratical Work

The document outlines an approach to applying numerical linear algebra concepts to detect plagiarism in computer science through an algorithm using cosine similarity. It details the steps of identifying a practical problem, understanding relevant concepts, implementing an algorithm, and analyzing simulation results. The algorithm preprocesses documents, computes a similarity matrix, and flags potentially plagiarized documents based on a defined threshold.

Uploaded by

ekpehope19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views11 pages

Pratical Work

The document outlines an approach to applying numerical linear algebra concepts to detect plagiarism in computer science through an algorithm using cosine similarity. It details the steps of identifying a practical problem, understanding relevant concepts, implementing an algorithm, and analyzing simulation results. The algorithm preprocesses documents, computes a similarity matrix, and flags potentially plagiarized documents based on a defined threshold.

Uploaded by

ekpehope19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

GROUP 16

NUMERICAL LINEAR ALGEBRA


QUESTION
Apply the concepts of numerical linear algebra to solve a practical problem in computer science. Implement with an algorithm
and analyses the simulation results.
APPROACH STYLE
01 IDENTIFICATION OF PRATICAL PROBLEM IN COMPUTER SCI.
First, we need to find a practical problem where numerical linear algebra can
be applied. This could be anything from image processing, machine learning
models, data compression, or solving systems of equations that arise in
various computer simulations.

02 UNDERSTAND NUMERICAL LINEAR ALGEBRA CONCEPTS:


Numerical linear algebra involves the study of how matrix operations can be
used to create efficient and accurate computer algorithms for questions in
continuous mathematics1. It includes understanding vectors, matrices,
matrix operations, eigenvalues, and eigenvectors2.

03 IMPLEMENT AN ALGORITHM:

Once we had our problem and understand the necessary linear algebra
concepts, we implemented an algorithm. This involve writing a program that
uses SVD

04 SIMULATE AND ANALYZE RESULTS:


After implementing the algorithm, we run simulations to test its effectiveness.
We would use a dataset of document, run our feature extraction and
classification, and then analyze the results to see how well our algorithm
performed.
GLOBAL PLAGIARISM SURVEY

United States: China


Data_01 : 36% Data_01 : 70%
Data_02 : 7% Data_02 : 0%

Colombia Australia
Data_01 : 36% Data_01 : 15%
Data_02 : 0% Data_02: 0%

United States: Colombia: China: Australia:


A survey conducted in the A survey in Colombia found In China, a study conducted at a Research in Australia
United States revealed that 36% that 36% of students admitted leading university revealed that indicates that
of undergraduates admit to 70% of students admitted to approximately 15% of
to plagiarizing, highlighting a cheating in exams or assignments.
paraphrasing/copying a few students have
sentences without citation, while
significant issue with academic This high prevalence of academic purchased
7% admit to submitting work integrity in the country's dishonesty has raised concerns
assignments from
done by someone else. source educational institutions about the integrity of education in
online sources
the country. source 1.0
1.1 IDENTIFICATION OF PRATICAL PROBLEM IN
COMPUTER SCIENCE
You see, as computer scientists and tech enthusiasts, we're always working on
exciting projects, creating innovative solutions, and sharing our ideas with the
world. But there's a problem that's been popping up more and more often, and it's
something we need to address: plagiarism.
Plagiarism is like a sneaky ghost that haunts the world of computer science. It's
when someone takes the hard work, ideas, or creations of others and tries to pass
them off as their own. And unfortunately, it's becoming a bit of a problem in our
community.

Now, you might be wondering why this is such a big deal. Well, let me tell you. In
computer science, our ideas and creations are like building blocks. Each new discovery,
innovation, or program builds upon what came before it. But when someone plagiarizes,
they're not only being dishonest, but they're also hindering progress and undermining
the hard work of others.
So, as members of the computer science society, it's up to us to recognize this problem
and take action to prevent it. We need to promote integrity, honesty, and originality in
everything we do. And that's why today, we're going to dive deeper into the issue of
plagiarism in computer science, explore its consequences, and discuss how we can work
together to combat it.

Are you ready to tackle this challenge with me? Let's get started! 💻🔍
2.0 UNDERSTAND NUMERICAL LINEAR ALGEBRA
CONCEPTS:
Preprocessing:
•Tokenize the text documents into words or phrases.
•Convert the documents into numerical representations, such as TF-
IDF vectors or word embeddings.

Constructing Similarity Matrix:


Use cosine similarity to compute the similarity between each pair of
documents. Cosine similarity measures the cosine of the angle
between two vectors and is commonly used in text similarity tasks.

Identifying Copied Work:


•Analyze the similarity matrix to identify pairs of documents with
high similarity scores. This indicates potential instances of copied
work.
•Define a threshold above which documents are considered
plagiarized. Documents with similarity scores above this threshold
are flagged as potentially plagiarized.

Limitations:

•Numerical methods alone may not capture synonyms or


paraphrased content.
•Setting the threshold is subjective and depends on the desired level
of strictness.
3.0 ALGORITHM:
PLAGIARISM DETECTION USING COSINE SIMILARITY

Input: Output Steps: Simulation

First Step 1.PREPROCESS THE DOCUMENTS: 3.COSINE SIMILARITY 5.EXAMPLE USAGE:


- List of documents - Convert documents - Calculate cosine - Provide a list of
(texts) into TF-IDF vectors similarity score example documents.
- Similarity threshold using between document i - Call the
TfidfVectorizer. and document j. detect_plagiarism
(default 0.8)
- Compute TF-IDF function with the list
Content Here matrix representing 2. DETECT PLAGIARISM: 4.SIMILARITY SCORE of documents.
Get a modern the documents. - Compute cosine - If the similarity score is
PowerPoint similarity matrix greater than the
between all pairs of threshold:
Presentation that is
documents using - Print "Documents i and j
beautifully designed cosine_similarity are potentially
function. plagiarized with a
- For each pair of similarity score of score".
documents (i, j)
where i < j:
Preprocessing Documents:
from sklearn.feature_extraction.text import •The preprocess_documents function takes a list of documents as input.
TfidfVectorizer •It initializes a TfidfVectorizer object to convert the documents into TF-IDF vectors.
from sklearn.metrics.pairwise import cosine_similarity •The fit_transform method of the vectorizer computes the TF-IDF vectors for the
Content Content Content Content
given documents and returns a matrix representation.
def preprocess_documents(documents):
vectorized = TfidfVectorizer() Detecting Plagiarism:
return vectorized.fit_transform(documents) •The detect_plagiarism function takes the preprocessed TF-IDF vectors of
documents as input.
def detect_plagiarism(documents, threshold=0.8): •It computes the cosine similarity matrix between all pairs of documents using the
tiff_matrix = preprocess_documents(documents) cosine_similarity function from sklearn.metrics.pairwise.
similarity_matrix = cosine_similarity(tiff_matrix) •The similarity score between each pair of documents is compared against a
n = len(documents) threshold (default value is 0.8).
for i in range(n): •If the similarity score between a pair of documents is greater than the threshold, it
for j in range(i+1, n): indicates potential plagiarism.
similarity_score = similarity_matrix[i][j] •The function then prints out the indices of the potentially plagiarized documents
if similarity_score > threshold: along with their similarity scores.
print(f"Documents {i+1} and {j+1} are potentially
plagiarized with a similarity score of
Example Usage:
{similarity_score:.2f}")
•An example list of documents is provided.
# Example usage: •The detect_plagiarism function is called with this list of documents.
documents = [
"This is the first document.",
"This document is the second document.",
"And this is the third one.",
"Is this the first document?"
]

detect_plagiarism(documents)

3.1
4.0 SIMULATE AND ANALYZE RESULTS:

IF THE SIMILARITY SCORE IS GREATER


TAKE FIRST INPUT THAN THE THRESHOLD

COMPUTE TF-IDF MATRIX


REPRESENTING THE RUN CHECK WITH ALL
DOCUMENTS. THE INPUTS TAKEN

DETECT
PLAGIARISM

- Compute cosine similarity matrix between all pairs of documents


TAKE THE NEXT using cosine_similarity function. PRINTS POTENTIALLY PLAGIARIZED
INPUTS(CAN BE MOR THAN ONE) - For each pair of documents (i, j) where i < j: DOCUMENTS AND THEIR SIMILARITY
- Calculate cosine similarity score between document i and document j. SCORE
4.1 SIMULATE AND ANALYZE RESULTS:

INPUTS
"This is the first document."
"This document is the second document."
"And this is the third one."
"Is this the first document?".
PREPROCESSING DOCUMENTS:
• The preprocess_documents function takes a list of documents as input.
• It initializes a TfidfVectorizer object to convert the documents into TF-
IDF vectors.
• The fit_transform method of the vectorizer computes the TF-IDF vectors
for the given documents and returns a matrix representation.
THRESHOLDING AND ANALYSIS:
•Set a threshold for cosine similarity (e.g., 0.8). Documents with a
similarity above the threshold are flagged for further inspection.
•If the similarity score between a pair of documents is greater than
the threshold, it indicates potential plagiarism.
RESULTS
• The function then prints out the indices of the potentially
plagiarized documents along with their similarity scores.
• Remember, this is just an initial detection system. Further
human review is crucial to confirm plagiarism.
OUTPUTS
Documents 1 and 4 are potentially plagiarized with
a similarity score of 1.00
GROUP 16
Thank You
END OF PRESENTATION

You might also like