Specialization : Natural Language Processing, Prompt Engineering, Natural Language Understanding, Human-Centered Artificial Intelligence Design, Ethics in NLP
Preferred Programming Languages : Python 3, Java, C
Domains of Active Exploration : Computational Linguistics, Educational Technology, Computational Social Science
Some key projects from the past include :
Topic Classification for Garage Reviews - An ML Task [^1] A project developed for automatic tagging of customer reviews using unsupervised topic modelling (LDA, GSDMM, CTM) and supervised multi-class, multi-label classification (Naive Bayes, Logistic Regression, SVM).
Spotify Similarity-Based Recommender System [^2] Project for machine learning and data mining course exploring the Spotify Dataset, comparing clustering methods, and determining the best similarity measure to be used for recommendation of new songs.
Grouping and recommending literature based on their content [^3] An NLP project utilising LDA to group books into themes based on their topics and to recommend similar books to the user. Web scraping from Wikipedia is used to get the plot of the book provided in test data.
RESEARCH : Efficient Autocomplete Algorithms [^4] An exploratory NLP project looking at different solutions for autocomplete, including FastText embeddings and non-linear data structures, to increase the use of sustainable and energy-efficient machine learning in code without compromising on result quality.
RESEARCH: Automatic Evaluation of Short-Answer Responses using Clustering and Summarization [^5]Snippets of Jupyter notebooks exploring clustering and summarization methods on examinations from students in introductory programmatic reasoning classes.
RESEARCH: Psycholinguistics Experiment as Part of LLM Idiom Interpretation Study [^6]Complete project repository containing experiment setup, data, and analysis scripts for psycholinguistic study on unfamiliar idiom interpretation in human participants.
Pipeline : A preprocessing pipeline for general-purpose tokenization, stopword removal, POS-tagging, and n-gram generation [^6] A useful piece of code with multiple functions that can be conveiently accessed using a pipeline method. Work in Progress to convert it into an encapsulated version of itself.