[go: up one dir, main page]

0% found this document useful (0 votes)
6 views2 pages

MST Mini Project Statements

The document outlines ten machine learning project statements for various companies, including tasks such as predicting house prices, classifying student performance, segmenting customers, and predicting protein solubility. Each project requires data pre-processing, model development, and performance evaluation, with specific datasets and objectives provided. Additionally, guidelines for dataset sourcing, feature selection, and evaluation parameters are included, along with a rubric for assessment.

Uploaded by

J C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

MST Mini Project Statements

The document outlines ten machine learning project statements for various companies, including tasks such as predicting house prices, classifying student performance, segmenting customers, and predicting protein solubility. Each project requires data pre-processing, model development, and performance evaluation, with specific datasets and objectives provided. Additionally, guidelines for dataset sourcing, feature selection, and evaluation parameters are included, along with a rubric for assessment.

Uploaded by

J C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

MST Project Statements (UCS321)

1. Develop a machine learning-based regression model for Larsen & Toubro Realty to predict house
prices using features such as location, size, number of rooms, property age, and amenities. The
workflow must include data pre-processing, exploratory data analysis, model building, performance
evaluation, and optimization for improved accuracy. The final model should provide accurate price
predictions and highlight the most influential factors affecting property value, aiding better decision-
making for buyers, sellers, and investors.

2. Pearson VUE, a global leader in computer-based testing, seeks to classify students into categories
such as High Performer, Average Performer, and Needs Improvement. Using a dataset containing
students Mid-Semester Test (MST) scores, Quiz results, Attendance records, and Assignment
performance, you are required to develop a Python-based machine learning classification model. The
project should include:

 Pre-processing the dataset (handling missing values, normalization)


 Training and evaluating suitable classification algorithms
 Model performance

3. Big Bazaar aims to segment its customers into distinct groups to optimize promotional strategies
and improve sales. Using customer purchase history (annual income, spending score, and visit
frequency etc.), develop a Python-based clustering model to identify customer segments. The project
should include:Data cleaning and pre-processing, Performance parameters etc.

4. Pfizer Inc. is focusing on enhancing the success rate of therapeutic protein development. One key
challenge in biotechnology is predicting whether a protein will be soluble or insoluble during
expression in E. coli. Using a dataset containing amino acid composition, molecular weight,
isoelectric point (pI), hydrophobicity index, and other physicochemical properties, develop
a supervised machine learning classification model to categorize proteins as Soluble or Insoluble.
5. HDFC Bank Ltd. aims to improve its loan processing efficiency by predicting whether a loan
application should be approved or rejected based on applicant details. Using a dataset containing
information such as applicant income, loan amount, credit history, employment status, property area,
and other financial indicators, develop a supervised machine learning classification model to
categorize applications as Approved or Rejected.

6. General Electric (GE) Power Systems aims to improve the operational reliability of its electric
motors by predicting winding temperatures under diverse working conditions. Using a dataset
containing parameters such as ambient temperature, motor speed, load torque, supply voltage, and
current, students will develop a supervised machine learning regression model to estimate motor
temperature.

7. Siemens Energy is exploring efficient material selection for manufacturing components in turbines,
engines, and heavy machinery. Given a dataset containing mechanical properties such as tensile
strength, yield strength, hardness, density, thermal conductivity, and elasticity, students will
apply unsupervised learning techniques (e.g., K-Means clustering) to group materials with similar
characteristics.The aim is to help engineers quickly identify suitable materials for specific applications
based on property clusters, reducing selection time and improving performance. The project should
include:

 Data cleaning and handling of missing values


 Visualizing clusters using PCA or t-SNE
8. BASF SE, a global leader in chemical manufacturing, is seeking to enhance environmental
monitoring capabilities by predicting the Air Quality Index (AQI) in industrial and urban areas. Using
historical air quality datasets containing parameters such as PM2.5, PM10, NO₂, SO₂, CO, O₃,
temperature, and humidity, students will develop a supervised regression model to predict future AQI
values.The goal is to assist environmental engineers and regulatory bodies in taking proactive
measures to control emissions and safeguard public health.

9. Customer churn is a critical challenge in the telecom industry, where customers discontinue their
services and move to competitors. Reducing churn can significantly improve revenue and customer
loyalty. Build a classification-based machine learning model to predict whether a customer is likely to
churn based on their demographic details, usage patterns, billing information, and service feedback.

10. Analyze energy consumption data from Siemens Energy using PCA for dimensionality reduction
and K-Means clustering to segment consumers into distinct usage patterns. The goal is to identify key
factors affecting consumption and propose targeted optimization strategies.

Instructions:

 Dataset can be self-generated or obtained from standard platforms (e.g., Kaggle, GitHub).
 Number of features can be selected as per project scope and relevance.
 Performance evaluation parameters can be chosen freely.
 Flow diagram must be given with Pre-processing and visualization steps clearly documented.

A. Rubrics (40 marks) (Group of 4-5 students)

 Problem Understanding & Objective Clarity – 5 marks


 Data Collection & Pre-processing – 10 marks
 Model Development & Implementation – 12 marks
 Performance Evaluation and Interpretation of Results – 8 marks
 Innovation / Creativity in Approach – 5 marks

B. Presentation/viva (20 Marks)

A+B = 60 Marks

You might also like