Exploring The Potential of YouTube Transcription
Dr. Dharampal Singh1, Mr. Radha Krishna Jana2, Md Azal Ashraf3, Tushar Yadav4, Manas Malik5,
1
Head of Department, Department of CSE, JIS University, Kolkata, India
2Assistant Professor, Department of CSE, JIS University, Kolkata, India
3B.Tech Student, Department of CSE, JIS University, Kolkata, India
4B.Tech Student, Department of CSE, JIS University, Kolkata, India
5B.Tech Student, Department of CSE, JIS University, Kolkata, India
Abstract
In today's digital landscape, video content, especially on platforms like YouTube, has become an integral part of our lives.
However, ensuring accessibility and comprehension for all viewers remains a challenge. Our YouTube Transcript Web App
offers a versatile solution to this problem. This web-based tool harnesses cutting-edge transcription technology to
generate precise text transcripts for YouTube videos. By providing synchronised transcripts alongside the videos, our web
app aims to Individuals with hearing impairments can now access video content through readable text, promoting
inclusivity. Facilitate Searchability: The app empowers users to search for specific keywords or phrases within video
transcripts, saving valuable time when seeking information. Improve Language Understanding: Non-native speakers and
language learners benefit from the written reference for spoken content, aiding language comprehension. Enhance
Content Engagement: Researchers, educators, and content creators find value in the transcripts for curating educational
materials, content analysis, and creating supplementary resources. With an emphasis on user-friendliness, our YouTube
Transcript Web App is accessible to a wide audience. Whether you're a healthcare professional seeking medical insights,
a student enriching your educational experience, or anyone looking to bridge the gap between video content and text-
based accessibility, our web app strives to democratise online information access, making digital content more
inclusive and understandable for everyone.
Keyword: YouTube, Text Summarization, AssemblyAI, Streamlit, Pytube
1.Introduction
The number of Indian YouTube users in 2023 was approximately 467 million and has been rapidly increasing every year.
The average watch time of an Indian accessing YouTube videos on mobile is over 48 hrs a month according to The Google
Study. It has become difficult to spend time watching such videos which may have a longer duration than expected and
sometimes our efforts may become futile if we couldn’t find relevant information out of it which we are in search of. It is
frustrating and time-consuming to search for videos that contain the information we are looking for. Python has various
packages which are very helpful to work on this. The YouTube content has been made easier through the API in the
Python library. By taking advantage of the Python library, we can directly access the Transcript of the YouTube video and
read out it to search for the information we are looking for. This project's purpose is to use the Assembly AI API and
Streamlit for the transcription of the youtube video, thereby providing a meaningful summary of the YouTube video and
reducing their waste of time. YouTube includes short films, music videos, feature films, documentaries, audio recordings,
corporate-sponsored movie trailers, live streams, vlogs, and many other contents from popular YouTubers. Our main
concern is to transcript the youtube video and save it into a file for later use. These files can be used for understanding
the content in a much faster way than wasting time watching those contents. It can be used for analysing and can be
used in making notes, research purposes, and many more.
2. Literature Review
It has become challenging to commit time to watching such movies, which could go on longer than we anticipate, and
our efforts might be in vain if we are unable to learn anything useful from them. It is frustrating and time consuming to
search for videos that contain the information we require. For example, there are many videos available online in which
the speaker speaks for an extended period of time on a specific topic, but it is difficult to find the actual content the
speaker wishes to convey to the audience unless we watch the entire video. 8 Many-a-times the video content is
irrelevant due to which it may be a wastage of time for the viewers.
“Review of automatic text summarization techniques & methods” is developed by Adhika Pramita, Supriadi Rustad, Abdul
Shukur, Affandy. It was published in 2020. Text summary and systematic review techniques have been employed. The
limitation of this model is that the Fuzzy based approach is weak in semantic problems. The approaches used in extractive
industries need to close many gaps.
In 2021, “Natural Language Processing (NLP) based Text Summarization - A Survey” was published by Ishitva Awasthi,
Kuntal Gupta, Prajbot Singh Bhojal, Anand, Piyush Kumar. The techniques used are Extractive and abstract methods for
summarising texts. The advantages are: Based on linguistic and statistical characteristics, the implications of sentences
are calculated. The disadvantage is each type of summarization technique is useful in different situations. One cannot
say which technique is more promising [1].
“Review of automatic text summarization techniques & methods” is developed by AdhikaPramita, SupriadiRustad, Abdul
Shukur, Affandy. It was published in 2020. They have used Text summarization, Systematic review techniques. The
drawback is that the Fuzzy based approach is weak in semantic problems. Many loopholes have to be improved in
extractive methods [2].
“Study on Abstractive Text Summarization Techniques” is developed by Parth Rajesh Dedhia, Hardik Pradeep, Meghana
Naik. It was published in 2020. They have used Seq2Seq, Encoder-Decoder, and Pointer Mechanism. The drawback is that
the current model does not work if multiple documents are passed to the model [3].
“Abstractive Summarization of video sequences” is developed by AniqaDilawari, Muhammad Usman Ghani Khan. They
have used multi-line video description, RCNN deep neural network model. The drawback is It focuses only on the
conciseness of the summary. Memory efficiency and time constraints are not under consideration [4].
3. Experimental Set Up and Methodology
The programming language chosen to make the YouTube Transcript Web App is Python. Python offers a large variety of
libraries making the work of the developer easier. There are three types of transcripts in a YouTube video - Manually
Generated Transcript, Automatically Generated Transcript, and Video that contains no Transcript.
Firstly, we use the Pytube library to import the youtube URL and make it play in the web app. Through this process, we
only extract the audio of that video and download it. Secondly, the web app will upload the audio path in the Assembly
AI using the API key. The web app uploads the full path of the audio file using the API key and request address. From there
it is loaded on the AssemblyAI server, and it reads the file. Thirdly, the web app instructs to transcript the youtube video
with the help of the API key, JSON audio file, and API address. The output of the AssemblyAI API will be saved in the
backend of the web app. Fourthly, we will retrieve the transcript of the youtube video with the help of the transcript id
which is stored in the backend of the web app. The output will be saved in two formats, one with the plain transcript text
in the file having extension .txt and the second one with the caption transcript text having timestamps in the .str file
extension. Lastly, the user can use any file format as per its need. The .str file can be used by the content creator, or
linguist to study human language behaviour. This file can be used for machine learning models to improve Natural
Language Processing (NLP). The .txt file can be used to simply read out the content of youtube by the user. It can also be
used for Natural Language Processing (NLP).
3.1. Libraries and Technology:
3.1.1 ASSEMBLYAI: AssemblyAI provides AI models to transcribe and analyse audio and speech data through the
scalable web API. The other various features of AssemblyAI are that the models are customizable and enable
content moderation, sentiment analysis, PII redaction, key phrase identification, and speaker diarization. The
model that we used while building the Youtube Transcript Web App is known as the LeMUR. The LeMUR
model's new framework applies powerful Large Language Models (LLMs) with the help of our web app to
quickly process audio files for transcription tasks like summarization.
Fig-1
3.1.2. PYTUBE: Pytube is a lightweight, Pythonic, dependency-free, library for downloading YouTube Videos. This library
has features such as Command-line Interfaced Included, the Ability to Capture Thumbnail URLs and Caption Track
Support. The pytube library in Python is used to encounter a situation where the script of a youtube video is to be
downloaded. The app uses this feature of the pytube library. pytube also makes pipelining easy, allowing you to specify
callback functions for different download events, such as on progress or completion.
Fig-2
3.1.3. STREAMLIT: Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom
web apps for machine learning and data science. Streamlit library package including the ability to display and style data,
draw charts and maps, add interactive widgets, customise app layouts, cache computation, and define themes. Creating
a web app with Streamlit’s core, let the developer use a feature like app layout and interactive widget, and define themes.
These features make the work of the developer easier during managing and UI/UX formation.
Fig-3
3.1.4. Experimental Pipeline:
In this section of the report, we will discuss and see the algorithm and flowchart for the Youtube Transcript Web App.
Fig-4
The working of the Youtube Transcript Web App is simply shown above where the user will paste the Youtube
video URL of their choice on the given space of the Web App. Then the Web App will separate the audio file
from the youtube video and generate a request with the help of the API key and API address to the ASSEMBLYAI
website server. The ASSEMBLYAI model will transcript the given youtube URL and generate the unique id with
it. Then send the whole file back to the Web App. The Web App will retrieve that transcript text file with the
help of that id and show the output on the Web App screen. The user can also download the zip file for further
usage.
Fig-5
The flowchart of the Youtube Transcript Web App is shown above describing the algorithm in a much more
detailed way. It shows how the data flow through the various function of the Web App. It also describes the
functioning of each section of the Youtube Transcript Web App.
The user will visit the Youtube Transcript Web App and paste the URL. The Web app will download the youtube
app at the backend and separate the audio file from it. Then with the help of the API key and API Address, the
file location of the audio file will be uploaded to the servers of the ASSEMBLYAI website. The LeMUR machine
learning model will convert the speech to the text format of the audio file. This information along with the
unique id will be sent back to the Web App. With the help of the unique id, the Web App will retrieve the text
from the .str file and show it to the output screen. It also saves the text file in a .txt file for the user to download
it easily and use it in their work.
4.Results and Outcome
The following is the whole working of the Youtube Transcript Web App along with the processed screenshot. The
YouTube Transcript Web App looks like this upon visiting it. Here on the left side of the Web App, the youtube
URL space is given to paste the link and the GO button is given below. Upon clicking the button, the process
starts to happen. The user will paste the youtube link on the left side and press the Go button to start the process
of transcription of the youtube video. After pasting the youtube link and clicking on the GO button on the left
side of the Youtube Transcript Web App. The process to convert the youtube video into a transcript text started
to happen as shown by the screenshot. The process includes these steps – (a) Waiting for the URL to process,
(b) Sending the request to Assembly AI, (c) Transcription is in process, (d) Transcription is completed.
Fig-6
Once the Web App retrieves the transcript text of the given youtube video link. It is shown in the output screen
and the Download Zip button is given below it to download the fle. The user can directly read the transcript
of the youtube video from the Web App or download the zip fle to save the fle on the local computer for
further use in the future.
5.Discussion
YouTube transcripts, created using automatic speech recognition (ASR), hold the power to reshape healthcare. They
boost accessibility, empower patients, assist research, enhance education, and overcome language barriers. Yet,
challenges related to privacy and misinformation must be carefully managed. The adoption of YouTube transcripts
promises a revolution in healthcare information availability and distribution.
Related Work Technology Used Model Architecture
Youtube Transcript Python, Flask, Visual Studio, This project will provide us the
Summarizer using Flask Ffmpeg chance to put cutting-edge NLP
techniques for Abstractive and
Extractive text summarization into
practise while also implementing an
intriguing notion that is ideal for
intermediates, as well as a reviving
side endeavor for experts.
YouTube Transcript chrome Python, Flask, Pipeline, This system will be used in different
extension JSON, .NET Framework, NLP, required ways of a user, as not only
API, Tensorflow youtube video summarization but
also videos from websites, video
conferences from different region
with diverse language-based
summarization to understand the
content on their own language.
YouTube Transcript Web app Python, Assembly AI, This system will be used in web,
StreamLit, PyTube user can paste the link of YouTube
video in the application and the
system will generate the transcript
in the web app only. User can
download the transcript in zip
format as per their convenience.
6.Conclusion
This project helps the users a lot by saving their valuable time and resources. This helps us to get the gist of the video
without watching the whole video. It also helps the user to identify unusual and unhealthy content so that it may not
disturb their viewing experience. This project also ensures a great user interface experience in finding out the transcript
text of the youtube video from the Youtube Transcript Web App used. This helps in getting the text file without the use of
any untrusted third-party applications. A user interface is implemented to enable users to interact and display the
summarized text, but some missing links must be addressed. The utilization of a Python programming language and API
allows the Web App to get the transcripts or subtitles for a given YouTube video in the file format.
7.References
1. Vybhavi, A.N.S.S., Saroja, L.V., Duvvuru, J. and Bayana, J., 2022, March. Video transcript summarizer.
In 2022 International mobile and embedded technology conference (MECON) (pp. 461-465). IEEE
2. Kumari, P. Vijaya, et al. "Youtube Transcript Summarizer Using Flask and Nlp." Journal of Positive
School Psychology 6.8 (2022): 1204-1209.
3. Kumari, P. V., Keshava, M. C., Narendra, C., Akanksha, P., & Sravani, K. (2022). Youtube Transcript
Summarizer Using Flask and Nlp. Journal of Positive School Psychology, 6(8), 1204-1209.
4. Kumari, P. Vijaya, M. Chenna Keshava, C. Narendra, P. Akanksha, and K. Sravani. "Youtube Transcript
Summarizer Using Flask and Nlp." Journal of Positive School Psychology 6, no. 8 (2022): 1204-1209.
5. Kumari, P.V., Keshava, M.C., Narendra, C., Akanksha, P. and Sravani, K., 2022. Youtube Transcript
Summarizer Using Flask and Nlp. Journal of Positive School Psychology, 6(8), pp.1204-1209.
6. Kumari PV, Keshava MC, Narendra C, Akanksha P, Sravani K. Youtube Transcript Summarizer Using
Flask and Nlp. Journal of Positive School Psychology. 2022 Jul 31;6(8):1204-9.
7. Bhandare, Ms KM, et al. "YOUTUBE TRANSCRIPT SUMMARIZER."
8. Bhandare, M. K., Chigare, A. A., Patil, U. U., & Sangle, S. B. YOUTUBE TRANSCRIPT SUMMARIZER.
9. Bhandare, Ms KM, Aishwarya A. Chigare, Utkarsha U. Patil, and Shweta B. Sangle. "YOUTUBE
TRANSCRIPT SUMMARIZER."
10. Bhandare, M.K., Chigare, A.A., Patil, U.U. and Sangle, S.B., YOUTUBE TRANSCRIPT SUMMARIZER.
11. Bhandare MK, Chigare AA, Patil UU, Sangle SB. YOUTUBE TRANSCRIPT SUMMARIZER.
12. Biswas, Sourav, and Atul Kumar Patel. "YouTube Transcript Summarizer to Summarize the content of
YouTube." (2022).
13. Biswas, S., & Patel, A. K. (2022). YouTube Transcript Summarizer to Summarize the content of
YouTube.
14. Biswas, Sourav, and Atul Kumar Patel. "YouTube Transcript Summarizer to Summarize the content of
YouTube." (2022).
15. Biswas, S. and Patel, A.K., 2022. YouTube Transcript Summarizer To Summarize the content of
YouTube.
16. Biswas S, Patel AK. YouTube Transcript Summarizer To Summarize the content of YouTube.
17. Sheth, Hemil, et al. "YouTube Video Transcript Summarizer using NLP." Grenze International Journal
of Engineering & Technology (GIJET) 9.2 (2023).
18. Sheth, H., Vishwakarma, K., Shaikh, S., Sonawane, P., & Hirlekar, V. (2023). YouTube Video Transcript
Summarizer using NLP. Grenze International Journal of Engineering & Technology (GIJET), 9(2).
19. Sheth, Hemil, Kalash Vishwakarma, Sobiya Shaikh, Prapti Sonawane, and Vaishali Hirlekar. "YouTube
Video Transcript Summarizer using NLP." Grenze International Journal of Engineering & Technology
(GIJET) 9, no. 2 (2023).
20. Sheth, H., Vishwakarma, K., Shaikh, S., Sonawane, P. and Hirlekar, V., 2023. YouTube Video Transcript
Summarizer using NLP. Grenze International Journal of Engineering & Technology (GIJET), 9(2).
AUTHORS PROFILE
Author-1 earned his B.tech , M.tech , Ph.d from university of Kalyani and is currently
working as HOD in Department of CSE in JIS UNIVERSITY Since 2022
AUTHOR- 2 earned his B.tech , M.tech from Jadhavpur University and is currently
working as Assistant Professor in Department of CSE in JIS UNIVERSITY.
Author -3 – Currently Pursuing B-tech from JIS UNIVERSITY Will graduated in
2024
Author –4 - Currently Pursuing B-tech from JIS UNIVERSITY Will graduated in 2024
Author – 5 - Currently Pursuing B-tech from JIS UNIVERSITY Will graduated in
2024