You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project offers advanced techniques in text preprocessing, word embeddings, and text classification. Explore methods like Word2Vec and GloVe, and master Multinomial Naive Bayes for accurate predictions. Dive into the world of text clustering and conquer challenges like unbalanced data.
Classified human and machine generated text using 1) a single score threshold classifier and 2) a neural network classifier approach, based on perplexities and probability scores generated from n-grams. Best results are 77% for the single score classifier and 80% for the ANN classifier.
FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data. Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.
We created a topic modeling pipeline to evaluate different topic modeling algorithms, including their performance on short and long text, preprocessed and not preprocessed datasets, and with different embedding models. Finally, we summarized the results and suggested how to choose algorithms based on the task.
Project work as part of the E0-334 Deep Learning for Natural Language Processing course at IISc, Bengaluru. We had proposed a graph-based model for text classification.
This repository contains code for our project work as part of the E0-334 Deep Learning for Natural Language Processing course at IISc, Bengaluru. We had proposed a graph-based model for text classification.
Assignment 2 – Dimensionality reduction and text classification: converted news text into a machine readable representation, reduced the dimensions of the text representation and trained classifiers to decide which of 20 news groups a sample belongs to.
Naive Bayes classifier and boolean retrieval done on the 20Newsgroups dataset that has been written from scratch. Extremely lightweight and produces decent results. Also currently working on classification using word embeddings.