GitHub

🚀 An AI Powered Vision and Navigation Assistance System

A modular, real time deep learning system that transitions from basic scene understanding to depth aware intelligent navigation, integrating object detection, depth estimation, vision language models, and audio visual feedback to assist in perceiving, interpreting, and interacting with the environment.

📄 1. Abstract / Invention Overview

This invention presents a multi phase, AI driven system designed for comprehensive visual environment understanding and intelligent navigation assistance. It synergistically combines real time object detection, scene summarization, depth estimation, and navigation planning to deliver both descriptive and actionable feedback.

Key Capabilities: Detect and describe what's seen in an image or video Answer questions about a visual scene Estimate spatial layout from 2D input Plan a navigable path and guide users via instructions

🎯 2. Problem Statement & Background

Existing visual assistive tools offer limited, disconnected features. Most fail to: Integrate spatial understanding with scene interpretation Offer navigation guidance using real time environmental data

❗This project solves that by: Combining visual recognition + spatial modeling Supporting both passive understanding (captions, Q&A) and active assistance (pathfinding, guidance) Building a modular and extensible system for various applications

🔧 3. System Overview

The system is built in two evolutionary phases, each enhancing capabilities:

✅ Phase 1(Earlier One) – Foundational Visual Intelligence

✅ Phase 2(This one) – Advanced Spatial Awareness & Navigation

📂 4. Codebase Breakdown

. ├── smolvlm_module.py Scene/video captioning, navigation via SmolVLM ├── yolo_module.py YOLOv8 object detection with bbox + confidence ├── midas_module.py Monocular depth estimation (MiDaS) ├── qna_module.py Context aware QnA with Sentence Transformers

`

🔍 yolo_module.py Uses ultralytics/yolov8l.pt Outputs object label, bounding box, and confidence python detector = YOLODetector() detections = detector.detect("img.jpg") `

🧠 smolvlm_module.py

Describes images/videos using SmolVLM2 Plans natural language navigation using labeled detections and positions

python describe_image("img.jpg") describe_video("video.mp4") plan_navigation(detections, target_index=0)

🌊 midas_module.py

Estimates depth using MiDaS DPT_Large Returns numeric depth map and grayscale visualization

python depth_model = MiDaSDepth() depth_map, depth_img = depth_model.estimate_depth(pil_image)

❓ qna_module.py

Uses SentenceTransformer (MiniLM) for contextual QnA Accepts any scene description and allows relevant Q&A

python qna = QnASystem() qna.update_context("A man is standing near a car...") qna.answer_question("What is the man doing?")

🛠️ 5. Technologies & Methodologies

💡 6. Potential Applications

🧑‍🦯 Assistive navigation for visually impaired individuals 🚁 Autonomous drones with spatial awareness 🧠 Vision language research in robotics 🧭 Smart surveillance with scene understanding 🎓 Educational tools using QnA on visual data 📱 Augmented reality systems for enhanced perception

🏁 Key Achievements

✔️ Full stack AI integration: detection + VLMs + navigation ✔️ Real time depth aware navigation with pathfinding ✔️ GUI based interface for interaction and control ✔️ Modular, expandable architecture for future upgrades

📥 Installation

bash pip install torch torchvision torchaudio pip install ultralytics pip install transformers pip install sentence transformers pip install opencv python pip install decord pip install pyttsx3

demo.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
MiDaS		MiDaS
modules		modules
README.md		README.md
car_pic.jpg		car_pic.jpg
cpvideo.mp4		cpvideo.mp4
lab_pic.jpg		lab_pic.jpg
main.ipynb		main.ipynb
main.py		main.py
main_running.ipynb		main_running.ipynb
path_grid _testing.ipynb		path_grid _testing.ipynb
path_planning_done.ipynb		path_planning_done.ipynb
requirements.txt		requirements.txt
temp_capture.jpg		temp_capture.jpg
yolov8l.pt		yolov8l.pt
yolov8m.pt		yolov8m.pt
yolov8n.pt		yolov8n.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pratap424/ai_assist_smolvlm

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages