[go: up one dir, main page]

0% found this document useful (0 votes)
3 views14 pages

Project Problem Statement - AI ML

The document outlines guidelines for completing projects using YOLOv5 for object detection in self-driving cars, ChatGPT for building AI assistants, and EchoLang for multilingual voice interaction. Each project includes a detailed description, learning outcomes, key features, and step-by-step instructions for implementation. All project submissions must be uploaded as ZIP files.

Uploaded by

nishidhawandre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views14 pages

Project Problem Statement - AI ML

The document outlines guidelines for completing projects using YOLOv5 for object detection in self-driving cars, ChatGPT for building AI assistants, and EchoLang for multilingual voice interaction. Each project includes a detailed description, learning outcomes, key features, and step-by-step instructions for implementation. All project submissions must be uploaded as ZIP files.

Uploaded by

nishidhawandre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Guidelines

●​ You are required to complete at least one project. However, you are welcome to submit more than one if you wish.
●​ All project submissions must be uploaded as a ZIP file.

YOLOv5-Powered Self-Driving Cars: Enhancing Perception with Deep Learning


Project Description

Project Overview

In this hands-on project, you will build a smart object detection system using YOLOv5, one of the most advanced real-time
object detection frameworks, to power the perception module of self-driving cars. The system will be capable of detecting
pedestrians, vehicles, road signs, and obstacles in both images and live camera feeds—ensuring smarter, safer navigation
even on edge devices like Jetson Nano or Raspberry Pi.

What You’ll Learn

●​ Fundamentals of object detection with YOLOv5


●​ Running pre-trained YOLOv5 models on static images and real-time webcam/video feeds
●​ Annotating data and training YOLOv5 on a custom dataset to detect user-defined objects

Key Features

●​ Real-time object detection using YOLOv5s/m/l/x


●​ Support for custom class training (e.g., construction cones, traffic lights)
●​ Integration with OpenCV for live camera streaming

Project Outcomes
By the end of this project, you will be able to:

●​ Use YOLOv5 for object detection on both general and custom classes
●​ Train their own models to detect unique objects relevant to self-driving applications

What is YOLO?

YOLO (You Only Look Once) is a family of real-time object detection models known for their speed and accuracy. Unlike
traditional methods that process images in multiple stages, YOLO analyzes the entire image in a single pass, making it ideal
for applications like autonomous driving, robotics, and surveillance.

YOLO divides the image into a grid, with each cell predicting bounding boxes, confidence scores, and class probabilities.
This end-to-end approach enables fast and precise detection of objects like cars, people, or traffic signs.

Since its debut, YOLO has evolved from YOLOv1 to YOLOv11, with each version improving performance, robustness,
and ease of use. Notably, YOLOv5, developed by Ultralytics, became popular for its PyTorch implementation and
simplicity.

In short, YOLO revolutionized computer vision by enabling machines to understand and locate objects in real time.

YOLOv5 Model Variants


YOLOv5, developed by Ultralytics, is one of the most popular YOLO versions, thanks to its ease of use and PyTorch
implementation. It comes in four model sizes, each offering a trade-off between speed, accuracy, and resource usage:
Model Speed Accuracy Use Case
Examples
YOLOv5s Small Fastest Lower Real-time on edge
devices like
Raspberry Pi or
mobile
YOLOv5m Balanced Good Drones, basic
Medium surveillance,
general-purpose
tasks
YOLOv5l Large Slower Higher Industrial
inspection,
higher-accuracy
surveillance
YOLOv5x Slowest Best Research,
X-Large cloud-based
systems,
high-accuracy
requirements
Choose the variant based on your device capabilities and accuracy vs. speed needs.

Problem Statement: YOLOv5 Real-Time Object Detection

Real-time object detection is essential for time-critical applications such as autonomous driving, surveillance, and robotics,
where rapid and accurate object recognition is vital for effective decision-making.
This project aims to address the challenge of achieving efficient and accurate real-time object detection on
resource-constrained systems by leveraging the YOLOv5s model.
The goal is to demonstrate how YOLOv5s can deliver reliable detection performance while maintaining high speed and low
computational cost, making it suitable for mobile and embedded platforms.
Steps Involved in: YOLOv5 Real-Time Object Detection

●​ Installing YOLOv5
o​ Prerequisites:
▪​ Install Python (>= 3.7) and pip.
▪​ Ensure git is available.​
o​ git clone https://github.com/ultralytics/yolov5.git
o​ cd yolov5
o​ pip install -r requirements.txt

●​ Running a Test Detection on a Static Image.


o​ Use a sample image to verify that YOLOv5 is working:
▪​ python3 detect.py --source data/images/bus.jpg --weights yolov5s.pt --conf 0.25
o​ What this does:
▪​ Loads a test image.
▪​ Uses the yolov5s model.
▪​ Applies object detection with a 25% confidence threshold.
▪​ Saves annotated output in runs/detect/exp/.

●​ Real-Time Object Detection Using Webcam


o​ Detect objects from a live webcam feed:
▪​ python3 detect.py --source 0 --weights yolov5s.pt --conf 0.25
o​ Notes:
▪​ --source 0: Use default webcam.
▪​ Change source to 1, 2, etc., for other connected webcams.
▪​ Results display live and are saved automatically.

●​ Running Detection on a Real-World Video


o​ Select a dynamic street scene (e.g., from Pexels.com).
o​ Replace the source with your video file path:
▪​ python3 detect.py --source path/to/video.mp4 --weights yolov5s.pt --conf 0.25
o​ This step simulates perception in a self-driving environment.

●​ Submission Requirements
○​ Has to be submitted in a Zip file.

Problem Statement: YOLOv5 Custom Object Detection

In many real-world applications, standard object detection models fall short when it comes to identifying domain-specific or
uncommon objects not present in general-purpose datasets.
This project aims to address the challenge of detecting custom objects by training the YOLOv5s model on a user-defined
dataset.
You will explore the complete pipeline of preparing a custom dataset, training YOLOv5s, and deploying the model to
accurately detect specialized objects.

Steps Involved: Training YOLOv5 for Custom Object Detection

●​ Understand the Need for Custom Training


o​ Pre-trained YOLO models (e.g., on COCO) don’t detect custom objects like autorickshaws, helmets, or rare
plant species.
o​ Custom training allows you to detect unique objects not available in the default model's label set.
●​ Familiarize Yourself with YOLOv5 Directory Structure
o​ detect.py​ For running inference on images/videos
o​ train.py​ For training on custom datasets
o​ val.py​For validating model performance
o​ models/​ YOLO model architecture definitions
o​ data/​ Dataset config files (YAML) and sample data
o​ runs/​ Output folder for saved results, weights, and logs
●​ Prepare Your Custom Dataset
o​ Collect Images
▪​ Gather at least 1500+ images per class for good performance.
o​ Label the Data
▪​ Use tools like LabelImg, or Roboflow.
▪​ Format:
o​ <class_id> <x_center> <y_center> <width> <height>
o​ (all values normalized to the image dimensions)

o​ Organize the Directory Structure


▪​ /custom_dataset/
▪​ /images/
o​ /train/
o​ /val/
▪​ /labels/
o​ /train/
o​ /val/

●​ Create a Dataset Config File (YAML)


o​ Example: custom.yaml
train: /path/to/custom_dataset/images/train
val: /path/to/custom_dataset/images/val
nc: 3
names: ['car', 'bike', 'autorickshaw']

▪​ nc: Number of classes


▪​ names: Class names as a list

●​ Train the Model


o​ Run the training script:
▪​ python3 train.py --img 640 --batch 16 --epochs 50 --data custom.yaml --weights yolov5s.pt --name
my_custom_model
o​ Explanation:
▪​ --img: Input image resolution (e.g., 640)
▪​ --batch: Batch size (tune based on GPU memory)
▪​ --epochs: Number of training iterations
▪​ --data: Path to the YAML file
▪​ --weights: Pre-trained YOLOv5 weights to fine-tune (s/m/l/x)
▪​ --name: Name of the training run
o​ If you're on a low-resource machine, use Google Colab to leverage free GPU compute.

●​ Validate Model Performance


o​ After training, evaluate using:
▪​ python3 val.py --data custom.yaml --weights runs/train/my_custom_model/weights/best.pt
▪​ This provides precision, recall, mAP, and other metrics on your validation set.

●​ Run Detection with Custom Trained Weights


o​ Use the custom model to detect objects:
o​ python3 detect.py --weights runs/train/my_custom_model/weights/best.pt --source path/to/test/images
o​ You can use image folders, video files, or webcam streams as the --source.

●​ Demo: Autorickshaw Detection Example


o​ Created a dataset with labeled autorickshaws.
o​ Modified custom.yaml with class name: autorickshaw.
o​ Trained using train.py and validated results.
o​ Ran detect.py on sample traffic images to confirm detections.​

●​ Submission Requirements
o​ Has to be submitted in a Zip file.
Talk to Machines: Build AI Assistants with ChatGPT

Project Description

Project Overview
The rise of powerful language models like ChatGPT has unlocked new possibilities for building intelligent, conversational
applications. These systems are transforming the way humans interact with machines—enabling fluid, context-aware
communication in natural language.
In this hands-on project, learners will build a real-time conversational AI interface using Python and the ChatGPT API. The
focus is on integrating OpenAI’s language model into a custom software solution, enabling intelligent dialogues through
prompt engineering and response handling.
What You’ll Learn
●​ How large language models like ChatGPT generate natural language responses
●​ Programmatic integration of the ChatGPT API using Python
Key Features
●​ Real-time chatbot interaction using ChatGPT
●​ OpenAI API setup and authentication
Project Outcomes
By the end of this project, learners will be able to:
●​ Explain how ChatGPT interprets and generates context-aware responses
●​ Set up and securely authenticate with the OpenAI API
●​ Build a functional, intelligent chatbot interface in Python
●​ Apply conversational AI to use cases like customer support, virtual assistants, and educational bots
Problem Statement: Build AI Assistants with ChatGPT
As conversational AI becomes increasingly integral to customer support, education, and productivity tools, there is a
growing need for developers to understand how to integrate advanced language models like ChatGPT into real-world
applications. However, building such systems requires knowledge of API communication, context handling, and prompt
engineering—skills that are often lacking in traditional programming curricula.
This project addresses the challenge of creating an intelligent, real-time chatbot by guiding learners through the process of
connecting with the ChatGPT API, and constructing structured prompts — all within a Python-based implementation.

Steps Involved: Build AI Assistants with ChatGPT


●​ Import Required Library
o​ import requests
o​ Use requests to communicate with the OpenAI API — your bridge to the model.
●​ Setup Your API Key
o​ api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
o​ The API key is your secure access token. Keep it secret!
●​ Configure Endpoint and Headers
o​ url = "https://api.openai.com/v1/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
o​ The headers tell the server you're sending JSON and include your access credentials.
●​ Create the Data Payload
o​ data = {
"model": "gpt-3.5-turbo-instruct",
"prompt": "Who invented ChatGPT?",
"temperature": 0,
"max_tokens": 100
}
o​ You specify the model, the question, how creative the response should be, and how long it can be.
●​ Send the POST Request
o​ response = requests.post(url, headers=headers, json=data)
o​ The request is sent — now wait for a response.
●​ Process the Response
o​ if response.status_code == 200:
print("Response:", response.json()["choices"][0]["text"].strip())
else:
print("Error:", response.status_code, response.text)
o​ If successful, you see the model’s answer. If not, you get a helpful error.
●​ Add Continuous Input Loop (Interactive Bot)
o​ while True:
user_input = input("You: ")
if user_input.lower() in ['exit', 'quit']:
break
data["prompt"] = user_input
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
print("Bot:", response.json()["choices"][0]["text"].strip())
else:
print("Error:", response.status_code, response.text)
●​ Submission Requirements
○​ Has to be submitted in a Zip file.

EchoLang — Turning Text into Speech Across Languages

Project Description

Project Overview
Multilingual voice interaction is becoming increasingly essential for applications in accessibility, virtual assistants,
education, and global communication. In this project, EchoLang, learners will build a real-time multilingual voice system
using Python. The system will take English text as input, translate it into a target language (e.g., Hindi, French, or German)
using Hugging Face's Transformers library, and then generate natural-sounding speech output using gTTS (Google
Text-to-Speech).
This project highlights the power of chaining AI models to solve complex tasks—combining machine translation with
neural text-to-speech for seamless, cross-language audio generation. Whether you're building tools for visually impaired
users or smart assistants, this project offers a hands-on introduction to creating intelligent, speech-driven systems.
What You’ll Learn
●​ How to use Hugging Face Transformers for real-time language translation.
●​ How to convert text to audio using gTTS, a lightweight cloud-based TTS library.
●​ How to chain multiple AI models to create a functional pipeline.
●​ How to build multilingual applications for accessibility and voice-based interfaces.
Key Features
●​ Language Translation: Translate English input into multiple languages like Hindi, French, and German.
●​ Text-to-Speech Conversion: Generate lifelike speech from translated text using Google's neural TTS engine.
●​ Real-Time Integration: Create a fully integrated system that processes and speaks out translated text instantly.
●​ AI Model Chaining: Demonstrate how to chain NLP and speech models into a seamless application pipeline.
●​ Simple & Lightweight: No need for GPU—runs smoothly on basic hardware using cloud APIs and pre-trained
models.
Project Outcomes
By the end of this project, learners will:
●​ Build a real-time multilingual speech generator.
●​ Understand the fundamentals of machine translation using transformer models.
●​ Integrate gTTS into applications to produce human-like audio from text.
●​ Gain experience chaining AI models for practical tasks.
●​ Be prepared to develop intelligent voice-based interfaces for assistive tech, chatbots, or global applications.

Problem Statement: EchoLang — Turning Text into Speech Across Languages


Design a Python-based tool that takes English text input, translates it into a selected language (Hindi, Spanish, French, or
German), and then speaks the translated text using text-to-speech. The goal is to help users understand English text in their
own language both visually and audibly.
Steps Involved: EchoLang — Turning Text into Speech Across Languages
●​ Install Required Libraries
o​ Before you start, you need to install the following libraries:
o​ gTTS: Google Text-to-Speech library for converting text to audio.
o​ transformers, torch, sentencepiece: Used for text translation via Hugging Face.
o​ mpg123: A lightweight utility for playing MP3 files from the command line (Linux).
o​ pip install gTTS transformers torch sentencepiece
o​ sudo apt install mpg123
●​ Import Required Libraries
o​ from transformers import pipeline
from gtts import gTTS
import os
●​ Define Supported Languages
o​ supported_langs = {
"hi": "Hindi",
"es": "Spanish",
"fr": "French",
"de": "German"
}
●​ Show Available Language Options
o​ for code, name in supported_langs.items():
​ ​ print(f"{code} → {name}")

●​ Prompt the user to select one of the supported target language codes.
o​ translation_lang_code = input("Enter target language code (hi/es/fr/de): ").strip().lower()

●​ Load the Translation Pipeline


o​ model_name = f"Helsinki-NLP/opus-mt-en-{translation_lang_code}"
translator = pipeline(f"translation_en_to_{translation_lang_code}", model=model_name)
●​ Translate Input Text
o​ result = translator(text_en)
translated_text = result[0]['translation_text']
●​ Convert Translated Text to Speech
o​ tts = gTTS(text=translated_text, lang=translation_lang_code)
tts.save("spoken.mp3")
●​ Play the Audio
o​ os.system("mpg123 spoken.mp3")
●​ Submission Requirements
o​ Has to be submitted in a Zip file.

You might also like