Pratical Work
Pratical Work
03 IMPLEMENT AN ALGORITHM:
Once we had our problem and understand the necessary linear algebra
concepts, we implemented an algorithm. This involve writing a program that
uses SVD
Colombia Australia
Data_01 : 36% Data_01 : 15%
Data_02 : 0% Data_02: 0%
Now, you might be wondering why this is such a big deal. Well, let me tell you. In
computer science, our ideas and creations are like building blocks. Each new discovery,
innovation, or program builds upon what came before it. But when someone plagiarizes,
they're not only being dishonest, but they're also hindering progress and undermining
the hard work of others.
So, as members of the computer science society, it's up to us to recognize this problem
and take action to prevent it. We need to promote integrity, honesty, and originality in
everything we do. And that's why today, we're going to dive deeper into the issue of
plagiarism in computer science, explore its consequences, and discuss how we can work
together to combat it.
Are you ready to tackle this challenge with me? Let's get started! 💻🔍
2.0 UNDERSTAND NUMERICAL LINEAR ALGEBRA
CONCEPTS:
Preprocessing:
•Tokenize the text documents into words or phrases.
•Convert the documents into numerical representations, such as TF-
IDF vectors or word embeddings.
Limitations:
detect_plagiarism(documents)
3.1
4.0 SIMULATE AND ANALYZE RESULTS:
DETECT
PLAGIARISM
INPUTS
"This is the first document."
"This document is the second document."
"And this is the third one."
"Is this the first document?".
PREPROCESSING DOCUMENTS:
• The preprocess_documents function takes a list of documents as input.
• It initializes a TfidfVectorizer object to convert the documents into TF-
IDF vectors.
• The fit_transform method of the vectorizer computes the TF-IDF vectors
for the given documents and returns a matrix representation.
THRESHOLDING AND ANALYSIS:
•Set a threshold for cosine similarity (e.g., 0.8). Documents with a
similarity above the threshold are flagged for further inspection.
•If the similarity score between a pair of documents is greater than
the threshold, it indicates potential plagiarism.
RESULTS
• The function then prints out the indices of the potentially
plagiarized documents along with their similarity scores.
• Remember, this is just an initial detection system. Further
human review is crucial to confirm plagiarism.
OUTPUTS
Documents 1 and 4 are potentially plagiarized with
a similarity score of 1.00
GROUP 16
Thank You
END OF PRESENTATION