[go: up one dir, main page]

0% found this document useful (0 votes)
70 views6 pages

1 - 5. YouTube Transcript Synthesis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views6 pages

1 - 5. YouTube Transcript Synthesis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)

2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N) | 979-8-3503-3086-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAC3N60023.2023.10541713

YOUTUBE TRANSCRIPT SYNTHESIS


Ankur Kumar Priya Yadav N. Partheeban
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
Galgotias University Galgotias University Galgotias University
Greater Noida, India Greater Noida, India Greater Noida, India
ankur009@gmail.com Priyayadav4650@gmail.com n.partheeban@galgotiasunive
rsity.edu.in

of Google Chrome can directly access a summary of the


Abstract YouTube video transcription involves converting current YouTube video. The summary is generated with the
the spoken words videotape into written text for a variety of
reasons, including providing captions for viewers with hearing hugging face transformer programme, which generates
impairments, enhancing SEO by providing more textual summaries from text effectively. The YouTube API and
content, and increasing the video's accessibility to a greater Python libraries facilitate access to video content and
audience. Commonly used for transcription is automated associated metadata, such as transcripts.
speech recognition software, but due to its tendency to produce
inaccurate results, many content creators and businesses prefer By utilizing text summarization techniques such as Hugging
human transcription. The finished transcription can be used to Face transformer, we can generate and display concise
create captions or subtitles that improve content accessibility
and comprehension, or it can be repurposed for other summaries of video content. This can be especially helpful
platforms, such as blog posts and social media updates. In for viewers who lack the opportunity to watch the entire
conclusion, YouTube video transcription is a valuable video or who have trouble understanding the spoken
instrument for content creators and businesses seeking to content.
improve the accessibility, reach, and engagement of their video
content through the provision of accurate transcriptions and Using a transformer utility such as T5 to summarize
captions.
Keyword API, SEO, YouTube Video Transcription,
YouTube video transcripts can produce a useful and
pertinent summary of the video's content. As YouTube
contextual text
contains a wide variety of videos, such as short films, music
videos, documentaries, and vlogs, manual descriptions and
thumbnails may not always provide enough information for
I. Introduction potential viewers.

Due to advancements in real-time video technology and Utilizing automated summarization techniques can save
inexpensive storage media, digital video has become an consumers time and provide a concise overview of the
integral part of education, entertainment, and commerce. content on YouTube, where more than one billion hours of
Consequently, there is a great need for systems that can video are viewed daily. T5 is a pre-trained encoder-decoder
organize and search video data based on its content. These model for unsupervised and supervised tasks that can be
systems should not only have search capabilities, but also fine-tuned for summarization.
generate concise and user-friendly data representations that
allow users to efficiently navigate the entire database or The purpose of this study is to summarize the original text
search results. These representations provide users with by applying automated summarization to YouTube video
quick insights into the scrutinized video's content while transcriptions using the TF-IDF method to extract keywords
maintaining the underlying message. and generate a concise text summary. Although automated
summaries may not be as coherent or intelligent as those
Designing effective representations for video browsing created by humans, readers are still able to comprehend the
presents distinctive algorithmic and technical challenges. most important information presented. The structure of the
Video is a sequential, information-dense medium that paper includes a section on techniques overview, followed
incorporates audio and motion, conveying the long-term by the research methodology and experimental results. The
logical relationships between shots and sequences. Video study concludes with a summary of the most important
data management is inherently more complex than image aspects.
database management. For example, images can be
represented as thumbnails, allowing users to rapidly evaluate II. Related Works
their relevance. ROUGE is a set of measures commonly used to evaluate the
quality of machine-generated summaries by comparing them
However, this is a time-consuming operation for video to summaries written by humans, according to [1]. The
sequences that contain over 100,000 frames per hour and are measures are based on the recall of overlapping units, such as
composed of numerous shots. Additionally, audio and
n-grams, word sequences, and word pairings, between the
dialogues, which frequently impart a significant portion of
the information, must be included in the representation, such generated summary and the reference summaries.
as a video of a person speaking. As dialogues are our primary In [2], a technique for automatically summarizing Arabic texts
focus, we developed the YouTube Video Transcript based on Rhetorical Structure Theory (RST) is described.
Summarizer system. Using a tree-like structure, the method identifies the rhetorical
relationships between various sections of the text. Based on
By utilizing Chrome extensions, the user interface can be the type of rhetorical relations present, the system then selects
made more functional. By adding a "summarize" icon, users
sentences for the final summary.
ISBN: 978-X-XXXX-XXXX-X/23/$31.00 ©2023 IEEE 1

Authorized licensed use limited to: Zhejiang University. Downloaded on January 19,2025 at 19:38:11 UTC from IEEE Xplore. Restrictions apply.
2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)
The article [4] introduces Kaldi, an open-source speech users may find it difficult and time-consuming to read
recognition research toolkit. Kaldi is constructed with finite- extensive reviews.
state transducers and the OpenFst library, and it includes
In [12], researchers developed an automatic text and video
comprehensive documentation and routines for constructing summarization system using natural language processing
complete recognition systems. The C++-written core library of techniques. The system utilized the term frequency-inverse
Kaldi supports arbitrary phonetic-context sizes and acoustic document frequency (TF-IDF) technique to extract significant
modeling with subspace Gaussian mixture models (SGMM) or keywords from the text and condense lengthy videos into a
standard Gaussian mixture models, as well as linear and affine few lines of text. Students and researchers who lack the time
transformations. to extricate valuable information from lengthy videos will
According to [5], this paper concentrates on Arabic benefit from the proposed system.
Documents Clustering, which is crucial for traditional The paper [13] proposed a Persian text summarizer system
Information Retrieval (IR) systems due to the growing number that employs a combination of graph-based and TF-IDF
of online Arabic documents. The endeavor entails clustering methods to evaluate sentences following word stemming. SA-
comparable documents using various similarity/distance GA-based sentence selection is used to generate a summary,
metrics. However, document length and disturbance can affect and SA-GA is a composite algorithm that incorporates the
the efficacy of clustering. Genetic Algorithm and Simulated Annealing.
Automatic text summarization is introduced in [6] as a method In [14], the development of an image captioning model to aid
for reducing the volume of text documents while preserving the blind and visually impaired in outdoor navigation is
the most important information. After stemming the words, the discussed. The CNN and the attention layer serve as encoders,
authors propose a Persian text summarizer system that while the LSTM serves as a decoder in this model. The
employs a combination of graph-based and TF-IDF methods encoder uses ResNet101 and ResNet152 to derive image
to assign sentence weights. features. The attention layer uses the Bahdanau attention
The purpose of [7] is to compare the quality of summaries mechanism.
generated by various automatic text summarization methods During the past decade, automated captioning systems have
and those produced by humans. Two series of experiments gained widespread use in technology, per [15]. Up until now,
were conducted: one with extractive summaries generated the focus of these Services has been on the technical aspects,
automatically using Fuzzy and Vector techniques, and the such as assisting students with special needs and teaching
other with summaries produced manually by English students of a second language. Its use for research has only
instructors. According to Ajmal and Haroon [8], the increased been the subject of a few limited studies: audio file
usability of documents has necessitated extensive research in transcriptions.
the field of automated text summarization. A summary is a III. Proposed System
condensed version of one or more texts that includes only the
most essential information from the original text(s) and is The objective of this project was to create a system that could
typically no longer than half the length of the original text(s), obtain transcripts/subtitles for a given YouTube video ID
if not substantially shorter. The primary purpose of a summary using a Python API, perform text summarization using
is to convey concisely the central ideas of a text. Hugging Face transformers, build a Flask backend REST API
According to [3], there has been a recent explosion of text data to expose the summarization service to the client, and create a
from numerous sources. This book contains invaluable Chrome extension that would use the backend API to display
knowledge and information that must be skillfully extracted summarized text to the user.
in order to be useful. This review explains the primary
methods for autonomous text summarization. Examining the
various summarizing procedures and discussing their
advantages and disadvantages, we analyze the various
approaches.
The article [10] discusses the use of automated captioning
services for research purposes, specifically audio
transcription. It provides a proof-of-concept analysis by
contrasting three instances of automated transcription with
manual transcription techniques. This article provides a
literature review of automated captioning and voice
recognition transcription tools. The authors describe the
processes and tools utilized for producing automated captions
and transcripts. Using software that checks for originality, the
percentage of similarity between the automated and manual
transcripts is determined.
In [11] Due to the increased use of smartphones and the FIG 1. Architectural Design of System
internet by individuals of all ages, online purchasing has
increased steadily. However, it can be difficult to determine According to figure 1. To achieve this, the user would first
which products are authentic and which to choose from the open a YouTube video and click the "summarize" button on
plethora of identically priced options. Users rely on the Chrome extension, which would create an HTTP request
evaluations to make well-informed choices. However, some
ISBN: 978-X-XXXX-XXXX-X/23/$31.00 ©2023 IEEE 2

Authorized licensed use limited to: Zhejiang University. Downloaded on January 19,2025 at 19:38:11 UTC from IEEE Xplore. Restrictions apply.
2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)
There are two primary methods for summarizing a text:
to the backend. The request would include the YouTube video extractive and abstractive.
ID taken from the URL. The response would be a transcript of
the video in JSON format. After obtaining the transcripts in In extractive summarization, the model extracts the most
text format, the system would perform transcript significant sentences or phrases from the source text and
summarization using Hugging Face transformers. Finally, the outputs them as the summary. This method entails selecting
summarized transcript would be displayed on the extension for and rearranging existing sentences from the text without
the user to read. revising or paraphrasing the content. Using techniques such as
clustering, graph-based methods, or machine learning
Overall, the project was successful in creating a functional algorithms, one can perform extractive summarization.
system that could obtain YouTube transcripts and perform
summarization, with the final result displayed to the user The advantage of extractive summarization is that it preserves
through the Chrome extension. the original verbiage and structure of the text, which can be
useful in fields such as legal and technical writing where
A. Back End precise terminology is essential. However, this can
occasionally result in disjointed or difficult-to-read
Flask is a prominent Python web framework that enables summaries.
developers to build web applications and APIs. RESTful APIs
are an API type that adheres to a set of design principles and D. API Rest Point
constraints that facilitate client-server communication.
To define the API route, we use the Flask framework to
Follow these steps to construct a Flask RESTful API with generate a route with a URI and the GET HTTP method. Using
dependencies such as youtube_transcript_api and query parameters, we derive the YouTube video ID from the
transformers: URL, and then we generate the transcript by invoking the
transcript generation function.
Create a new Python virtual environment using venv or http://[hostname]/api/summarize?youtube_url=#{url}. The
virtualenv to isolate this project's dependencies. transcript is then passed to the transcript summarizer function
to produce a summary. Then, we return the abridged transcript
The source command will activate the virtual environment. with an OK HTTP status code and manage any applicable
HTTP exceptions. This endpoint can be accessed by sending a
Using the pip package manager, install Flask and all required GET request with the YouTube video URL as the query
dependencies. parameter to the API. The API will then return a condensed
version of the video's transcript.
Create a new file with the name app.py and import the required
modules and packages, including Flask and any dependencies. E. Chrome Extension

Define your API endpoints using the @app.route() and Chrome extensions are software programmes that can modify
associated functions. and improve the browsing experience by introducing new
functionality or altering existing behavior. They are created
Using flask run, the Flask application is executed. using web technologies such as HTML, CSS, and JavaScript
and are encoded into a file that can be installed on the Chrome
Once your Flask RESTful API is operational, you can use it to web browser. You can create a Chrome extension by following
manage client requests and responses. For instance, the these steps:
youtube_transcript_api could be used to retrieve YouTube
video transcripts, while the transformers library could be used a. Create a new directory for your extension and include
to perform natural language processing duties on the text. the required files, including an HTML file for the user
interface, a JavaScript file for any functionality, and a CSS
B. Attain Transcript file for formatting.
b. Create a manifest.json file describing the extension's
Several Python APIs are available for retrieving transcripts functionality. This file contains the extension's name,
and translations for YouTube videos. You mentioned the version, and icons, as well as the necessary permissions for
youtube_transcript_api library in your initial query as an the extension to function.
example of such an API.
c. In Chrome, navigate to chrome://extensions and toggle
Create a function in your app.py file that accepts a YouTube the switch in the upper right corner to enable developer mode.
video ID as an input to use the youtube_transcript_api library
in a Flask application. This function can then use the d. Click the "Load unpacked" icon and navigate to the
youtube_transcript_api library to retrieve the video's transcript directory containing the files for your extension.
and extract the transcript text from the response.
e. Your extension is now installed in Chrome and is
C. Perform Text Summarizer accessible through the Chrome toolbar and context menus.
ISBN: 978-X-XXXX-XXXX-X/23/$31.00 ©2023 IEEE 3

Authorized licensed use limited to: Zhejiang University. Downloaded on January 19,2025 at 19:38:11 UTC from IEEE Xplore. Restrictions apply.
2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)
Using the YouTube API, you can retrieve the video's transcript
Every time you modify your extension's files, you must and metadata.
refresh the extension in Chrome by selecting "reload" on the
chrome://extensions page. Python is a popular programming language for text
summarization, and it can be used to construct customized
F. User Interface/ Extension Popup summarization techniques. Machine-Learning Algorithms.

We include the popup.css file for styling and the popup.js file V. Methodology
for user interaction in the popup.html file. We add a button Several components of the transcript summarization procedure
element with the name "Summarise" that generates a click utilize the Spacy library.
event when it is clicked, which is detected by an event Spacy is intended for natural language comprehension and
observer. Additionally, we add a div element where the data extraction. It is capable of separating text into words and
summarized text will be displayed when received from the punctuation, assigning word roots, serialization, and text
REST API call on the backend. In the popup.css file, we classification. Any text can be converted into a Doc object and
provide HTML elements with appropriate CSS formatting to its properties can be inferred using Spacy.
enhance the user experience.
Sentence Tokenization: This portion of the method utilizes
G. Display Summarized Text the Punkt Sentence Tokenizer from the NLTK tokenize
module to determine the start and conclusion of sentences
To enable the Chrome extension to communicate with the in a given text. The method also describes the various
backend server, some lacking connections must be added. In forms of available word tokenization, such as white space
this phase, the code in popup.js, contentScript.js, and the tokenization, dictionary-based tokenization, rule-based
manifest file will be modified. First, in popup.js, we'll attach tokenization, regular expression tokenization, Penn
an event listener to the Summarise button and use Treebank tokenization, Spacy tokenization, Moses
chrome.runtime.sendMessage to send an action message to tokenization, and subword tokenization.
contentScript.js. We will also add an event listener to monitor
for the message results from contentScript.js and use Word Tokenization: This portion of the method entails
JavaScript to programmatically display the summary in the div dividing a string sequence containing words, phrases, and
element. In contentScript.js, we will add an event listener that symbols into numerous tokens. WordTokenize() is a
will monitor for the message generator and extract the current wrapper function that executes tokenize() on an instance of
tab's URL. Then, we'll send a GET HTTP request using the the Treebank class in the Word Tokenizer Table. The
XMLHttpRequestWeb API to the backend to receive the process of separating a large text sample into individual
response containing the summarized text. words is known as word tokenization.

Summarization:This portion of the method calculates the


III. Future Scope frequency of each word in the text data and stores the
Individuals with hearing impairments and pupils would benefit results in a dictionary alongside the text data.
the most from the transcript summarization of YouTube
videos. It is difficult for hearing-impaired individuals to Text Tokenization: In this step, the text data is tokenized
comprehend videos without transcripts or subtitles. It would into sentences and words to facilitate further analysis.
be useful for them if summaries were generated even for
videos for which transcripts are not readily available. Students Sentence Selection: This portion of the method involves
would be better able to select lecture/tutorial videos based on selecting, based on frequency, the sentences to be
their preferences if YouTube video transcripts were included in the final summary. The sentences containing
summarized. Additionally, the concept of Transcript the most frequent words are prioritized and included in the
Summarisation can be applied to other streaming services. final summary.

IV. Tools and technology Grammar and Spelling Check: In this step, the text is
It is possible to summarize YouTube video transcripts using a checked for grammar and spelling using Language Tool,
variety of techniques and technologies. Here are some an open-source programme that can be used as
instances: OpenOffice's spell checker. The programme is
Libraries for Natural Language Processing (NLP): NLP accessible via a command-line interface (CLI) or a
libraries such as NLTK, spaCy, and TextBlob may be utilized Python code fragment.
to derive relevant information from the transcript. VI. Conclusion
The purpose of this paper was to investigate the viability
ASR (automatic speech recognition) software Using ASR of using readily accessible web-based tools for automated
technologies such as Google Cloud Speech-to-Text and captioning to generate transcripts of audio and video
Amazon Transcribe, the video's audio can be converted to text. recordings. Based on our proof-of-concept, we've
Text summarization tools: To summarize the transcript, one determined that this is indeed possible and produces a
can use utilities such as Gensim, Sumy, and PyTeaser. satisfactory first transcript draft. Even with conservative
ISBN: 978-X-XXXX-XXXX-X/23/$31.00 ©2023 IEEE 4

Authorized licensed use limited to: Zhejiang University. Downloaded on January 19,2025 at 19:38:11 UTC from IEEE Xplore. Restrictions apply.
2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)
estimates, we can save a significant amount of time by
obtaining two-thirds of the transcript without editing in just
a few minutes of uploading, a few hours of waiting (which
can be used for other duties), and a minute of downloading.
For high-quality audio in optimal conditions, such as one-
on-one interviews, the auto-captioning accuracy can
exceed 90%. It is essential to note, however, that even with
such high rates of accuracy, we are not suggesting that
auto-captioning eliminates the need for manual
transcription; rather, it can facilitate the transcription
process.

VII.Result and Discussion

The programme is written in Python and its graphical


interface is provided by the Tkinter module. It uses the
NLTK for text processing, as well as Math and PyPDF2.
The programme prompts the user to choose a YouTube
video to summarize, retrieves the transcription, and saves
the file as a PDF. The preprocessing phase involves
tokenization, removal of stop words, and stemming. The
algorithm then employs a statistical approach to frequency-
inverse document frequency to characterize the document's
characteristics and selects key words and phrases based on
these characteristics. The overview is displayed in the
interface segment. Using Rouge 2.0, which employs a
synonym dictionary to capture semantic overlap, the
efficacy of the programme is evaluated. Using the
CNN/dailymail dataset, the results of the programme are
compared to other traditional methods and baselines, and it
is determined that the programme substantially
outperforms them with a higher score.

REFERENCES

[1] Lin, C.Y., 2004, July. Rouge: A package for automatic


ISBN: 978-X-XXXX-XXXX-X/23/$31.00 ©2023 IEEE 5

Authorized licensed use limited to: Zhejiang University. Downloaded on January 19,2025 at 19:38:11 UTC from IEEE Xplore. Restrictions apply.
2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)
evaluation of summaries. In Text summarization INDIA 2019 (pp. 535-547). Singapore: Springer
branches out (pp. 74-81). Singapore.
[14] Rahimi, S.R., Mozhdehi, A.T. and Abdolahi, M., 2017,
[2] Maâloul, M.H., Keskes, I., Belguith, L.H. and Blache, December. An overview on extractive text
P., 2010. Automatic Summarization of Arabic Texts summarization. In 2017 IEEE 4th international
conference on knowledge-based engineering and
based on RST Technique. In ICEIS (2) (pp. 434-437). innovation (KBEI) (pp. 0054-0062). IEEE.
[3] Povey, D., Ghoshal, A., Boulianne, G., Burget, L., [15] Andhale, N. and Bewoor, L.A., 2016, August. An
overview of text summarization techniques. In 2016
Glembek, O., Goel, N., Hannemann, M., Motlicek, P., international conference on computing communication
Qian, Y., Schwarz, P. and Silovsky, J., 2011. The Kaldi control and automation (ICCUBEA) (pp. 1-7). IEEE.
speech recognition toolkit. In IEEE 2011 workshop on
automatic speech recognition and understanding (No.
CONF). IEEE Signal Processing Society.
[4] Zhang, J.J. and Fung, P., 2012. Active learning with
semi-automatic annotation for extractive speech
summarization. ACM Transactions on Speech and
Language Processing (TSLP), 8(4), pp.1-25.

[5] Froud, H., Lachkar, A. and Ouatik, S.A., 2013. Arabic


text summarization based on latent semantic analysis
to enhance Arabic documents clustering. arXiv
preprint arXiv:1302.1612.

[6] Mahdipour, E. and Bagheri, M., 2014. Automatic


Persian text summarizer using simulated annealing and
genetic algorithm. International Journal of Intelligent
Information Systems, Special Issue: Research and
Practices in Information Systems and Technologies in
Developing Countries, 3(6-1), pp.84-90.

[7] Kiyoumarsi, F., 2015. Evaluation of automatic text


summarization based on human summaries. Procedia-
Social and Behavioral Sciences, 192, pp.83-91.

[8] Ajmal, E.B. and Haroon, R.P., 2016. Maximal


marginal relevance based malayalam text
summarization with successive thresholds.
International Journal on Cybernetics & Informatics,
5(2), pp.349-56.

[9] Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S.,


Trippe, E.D., Gutierrez, J.B. and Kochut, K., 2017.
Text summarization techniques: a brief survey. arXiv
preprint arXiv:1707.02268.

[10] Bokhove, C. and Downey, C., 2018. Automated


transcripts as a first step to
transcription of audio-recorded data. Methodological
innovations, 11(2), p.2059799118790743.
[11] Albeer, R.A., Al-Shahad, H.F., Aleqabie, H.J. and Al-
shakarchy, N.D., 2022. Automatic summarization of
YouTube video transcription text using term frequency-
inverse document frequency. Indonesian Journal of
Electrical Engineering and Computer Science, 26(3),
pp.1512-1519.
[12] Boorugu, R. and Ramesh, G., 2020, July. A survey on
NLP based text summarization for summarizing product
reviews. In 2020 Second International Conference on
Inventive Research in Computing Applications
(ICIRCA) (pp. 352-356). IEEE.
[13] Prudhvi, K., Bharath Chowdary, A., Subba Rami Reddy,
P. and Lakshmi Prasanna, P., 2020. Text summarization
using natural language processing. In Intelligent System
Design: Proceedings of Intelligent System Design:
ISBN: 978-X-XXXX-XXXX-X/23/$31.00 ©2023 IEEE 6

Authorized licensed use limited to: Zhejiang University. Downloaded on January 19,2025 at 19:38:11 UTC from IEEE Xplore. Restrictions apply.

You might also like