Seeing with Sound : Object detection and localization by YOLOv8 and audio feedback for blind individuals
Object detection is a challenging Computer Vision (CV) application, particularly for assisting blind individuals. With the rapid advancement of Deep Learning (DL), algorithms like Convolutional Neural Network (CNN) have significantly improved video analysis and image understanding for this purpose. Blind individuals face substantial challenges when navigating indoor or outdoor environments, underscoring the pressing need for assistive technologies.
In this project, a system has been developed to address this need, integrating the You Only Look Once (YOLO) object detection algorithm with audio guidance to aid blind users. The solution utilizes YOLOv8’s State-Of-The-Art (SOTA) deep convolutional neural network architecture to detect objects in the user’s environment, providing spatial information and counting processes through audio feedback. The system, equipped with the Google Text-To-Speech (gTTS) engine, converts spatial information into verbal instructions, acting as a sort of virtual assistant. This context-aware feedback, available in multiple languages, has been optimized for webcam, images, and videos. The system has shown promising results, enhancing the autonomy and quality of life for blind users, a significant step towards addressing the challenges they face in daily environments.