ITBA20100024A1 - "SYSTEM FOR MONITORING, RESEARCH, REVIEW, INTEGRATION AND AUDIO / VIDEO / TEXT EDITING BASED ON TRANSLATION OF SPEAKING CONTENT IN MULTIMEDIA SEQUENCES BY SPEECH RECOGNITION AND SPEECH TO TEXT SYSTEMS". - Google Patents
"SYSTEM FOR MONITORING, RESEARCH, REVIEW, INTEGRATION AND AUDIO / VIDEO / TEXT EDITING BASED ON TRANSLATION OF SPEAKING CONTENT IN MULTIMEDIA SEQUENCES BY SPEECH RECOGNITION AND SPEECH TO TEXT SYSTEMS". Download PDFInfo
- Publication number
- ITBA20100024A1 ITBA20100024A1 IT000024A ITBA20100024A ITBA20100024A1 IT BA20100024 A1 ITBA20100024 A1 IT BA20100024A1 IT 000024 A IT000024 A IT 000024A IT BA20100024 A ITBA20100024 A IT BA20100024A IT BA20100024 A1 ITBA20100024 A1 IT BA20100024A1
- Authority
- IT
- Italy
- Prior art keywords
- fact
- speech
- video
- text
- audio
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/41—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Machine Translation (AREA)
Description
Descrizione del Brevetto per Invenzione Industriale dal titolo: Description of the Patent for Industrial Invention entitled:
“Sistema per il monitoraggio, la ricerca, la rassegna, l’integrazione e l’editing audio/video/testo basato sulla trascrizione del parlato contenuto in sequenze multimediali mediante sistemi di speech recognition e di indicizzazione speech to text†⠀ œSystem for monitoring, research, review, integration and audio / video / text editing based on the transcription of speech contained in multimedia sequences using speech recognition and speech to text indexing systemsâ €
La presente invenzione si inquadra nel settore dei sistemi di ricerca, attuati mediante mezzi informatici e, più in particolare, nei metodi che consentono di monitorare, ricercare, esaminare il contenuto di sequenze multimediali. The present invention is part of the research systems sector, implemented by computer means and, more particularly, in the methods which allow to monitor, search, examine the content of multimedia sequences.
L’invenzione concerne un insieme strutturato di tecnologie avanzate quali la trascrizione automatica del parlato, le metodologie di indicizzazione di testi, i motori di ricerche oltre a tecnologie e servizi web per l’accesso e la gestione della ricerca di informazioni all’interno di contenuti audio-video provenienti da diverse fonti quali TV, radio, web, archivi digitali o digitalizzati, volto alla ricercabilità e integrabilità dei contenuti multimediali con il resto dei contenuti su web. The invention concerns a structured set of advanced technologies such as automatic speech transcription, text indexing methodologies, search engines as well as web technologies and services for accessing and managing the search for information at internal audio-video content from various sources such as TV, radio, web, digital or digitized archives, aimed at the searchability and integration of multimedia content with the rest of the content on the web.
La principale tecnica nota oggi utilizzata per la ricerca su grandi quantità di contenuti multimediali, si basa sulla categorizzazione di detti contenuti sulla base di commenti associati ai file. L’associazione può avvenire sia mediante l’impiego di personale qualificato che provvede alla categorizzazione, sia tramite commenti inseriti direttamente da chi usufruisce, quale fornitore di contributi, del sistema come avviene, ad esempio, con YouTube®. The main known technique used today for research on large quantities of multimedia contents is based on the categorization of said contents on the basis of comments associated with the files. The association can take place either through the use of qualified personnel who provide for the categorization, or through comments entered directly by those who use the system, as a contributor of contributions, as is the case, for example, with YouTube®.
Il carico di lavoro che ne deriva risulterà , quindi, proporzionale alla quantità dei video passati in rassegna. The resulting workload will therefore be proportional to the quantity of videos reviewed.
Inoltre, l’attuale tecnica, fornisce un risultato di ricerca in dipendenza della qualità e l’accuratezza descrittiva dei contenuti inseriti per categorizzare i file. Scopo della presente invenzione à ̈ quello di fornire un sistema decisamente efficace di ricerca all’interno di documenti audio-video ed in grado di archiviare grandi moli di registrazioni mantenendo un accurato indice dei contenuti. Furthermore, the current technique provides a search result depending on the quality and descriptive accuracy of the contents inserted to categorize the files. The purpose of the present invention is to provide a decidedly effective system for searching within audio-video documents and capable of archiving large amounts of recordings while maintaining an accurate index of contents.
Un ulteriore scopo, à ̈ quello di fornire un sistema idoneo a generare indici strutturati in modo da poter reperire velocemente e puntualmente il documento più attinente alla tematica scelta dal richiedente. A further purpose is to provide a suitable system for generating structured indexes in order to be able to quickly and punctually find the document most pertinent to the topic chosen by the applicant.
La presente invenzione, per i suddetti scopi, prevede l’integrazione sia di un riconoscitore automatico della voce (ASR) basato su motore SILVCSR ( Speaker Independent Larg-Vocabulary Continuous Speeck Recognition) sia di un indicizzatore e motore di ricerca di ultima generazione, permettendo al sistema di ricercare direttamente sul “parlato†contenuto nei documenti. The present invention, for the aforementioned purposes, provides for the integration of both an automatic voice recognition (ASR) based on SILVCSR (Speaker Independent Larg-Vocabulary Continuous Speeck Recognition) engine and a latest generation indexer and search engine, allowing the system to search directly on the â € œspokenâ € contained in the documents.
Questi ed ulteriori vantaggi sono raggiunti dal sistema per il monitoraggio, la ricerca, la rassegna, l’integrazione e l’editing audio/video/testo basato sulla trascrizione del parlato contenuto in sequenze multimediali mediante sistemi di speech recognition e di indicizzazione speech to text, di cui alla presente invenzione, descritta con l’aiuto della tavola di disegno allegata che illustra la seguente figura: These and further advantages are achieved by the system for monitoring, research, review, integration and audio / video / text editing based on the transcription of speech contained in multimedia sequences using speech recognition and speech indexing systems. to text, referred to in the present invention, described with the help of the attached drawing table which illustrates the following figure:
fig. 1 uno schema a blocchi del sistema. fig. 1 a block diagram of the system.
Come schematizzato in figura 1, il sistema à ̈ strutturato in modo da ricevere in ingresso i documenti multimediali 1, produrre una trascrizione del parlato 2 dei contenuti audio-video – contenente i riferimenti all’esatto millisecondo nel quale ogni singola parola viene pronunciata – e sottoporre il tutto al processo di indicizzazione 4,5 e 6. Sarà poi compito dell’interfaccia web 7 quello di fornire all’utente finale le modalità per una consultazione semplice ed immediata, oltre che un insieme di risorse aggiuntive. As shown in figure 1, the system is structured in such a way as to receive multimedia documents 1 as input, to produce a transcription of speech 2 of the audio-video content - containing references to the exact millisecond in which each single word is pronounced â € “and submit everything to the indexing process 4,5 and 6. It will then be the task of the web interface 7 to provide the end user with the modalities for a simple and immediate consultation, as well as a set of resources additional.
E’ previsto, inoltre, un sistema 8 e 9 di feedback automatico, controllato dall’utente, che permette una messa a punto dinamica e continua del servizio offerto e della struttura stessa del sistema. Furthermore, an automatic feedback system 8 and 9 is provided, controlled by the user, which allows a dynamic and continuous fine-tuning of the service offered and of the structure of the system itself.
I blocchi 3a e 3b evidenziano la necessità di immagazzinare i dati, di varia provenienza e codifica, in un formato uniforme come richiesto per l’efficienza del sistema e di estrarre il fermo immagine dell’eventuale file video per riferimenti puntuali a frasi significative e/o a segmenti rilevanti del file audio originale. Blocks 3a and 3b highlight the need to store data, from various sources and encodings, in a uniform format as required for the efficiency of the system and to extract the still image of any video file for precise references to meaningful sentences and / or relevant segments of the original audio file.
Il blocco 4 evidenzia, invece, la possibilità dell’intervento di un operatore umano per la correzione di errori di trascrizione tramite un apposito strumento software e linee guida definite in base a criteri di efficienza e continuamente aggiornate sulla base dei feedback. Block 4, on the other hand, highlights the possibility of the intervention of a human operator for the correction of transcription errors using a special software tool and guidelines defined on the basis of efficiency criteria and continuously updated on the basis of feedback.
Così come rivendicato, il sistema oggetto di invenzione presenta le seguenti caratteristiche di base: As claimed, the system object of the invention has the following basic characteristics:
- presa in carico 2 del materiale, da parte del sistema, e suo smistamento al sistema di trascrizione automatica ed, eventualmente e successivamente, ad un sistema di revisione che si avvale di personale specializzato 4; - taking charge 2 of the material, by the system, and sorting it to the automatic transcription system and, possibly and subsequently, to a revision system that makes use of specialized personnel 4;
- produzione ed immagazzinamento dei file multimediali ricodificati 3a, ovvero di thumbnail 3b, nel solo caso di video, estratti sulla base della suddivisione in frasi/segmenti ottenuti dalla trascrizione automatica e dall’eventuale revisione 4; - production and storage of the recoded multimedia files 3a, or thumbnails 3b, only in the case of videos, extracted on the basis of the subdivision into sentences / segments obtained from the automatic transcription and possible revision 4;
- elaborazione ed indicizzazione del testo per il motore di ricerca e sua disponibilità per la consultazione 6; - processing and indexing of the text for the search engine and its availability for consultation 6;
- un’interfaccia web si occuperà , infine, delle modalità con cui i dati sono resi fruibili da parte dell’utente finale e dei servizi aggiuntivi 7; - a web interface will take care, finally, of the modalities with which the data are made usable by the end user and of the additional services 7;
Il servizio di correzione e trascrizione a mezzo operatore tramite un software specializzato per la correzione, che integra e completa le trascrizioni automatiche grezze, concorre a produrre trascrizioni perfette, ovvero a verificare la corretta trascrizione di nomi propri di persona, luogo e organizzazione, e a gestire eventuali errori di pubblicazione 7. The correction and transcription service by operator through a specialized software for correction, which integrates and completes the raw automatic transcriptions, helps to produce perfect transcriptions, or to verify the correct transcription of proper names of person, place and organization, and to manage any publication errors 7.
Un esempio di applicazione consiste nel monitoraggio specializzato di contenuti audio video televisivi, radiofonici e web. An example of an application is specialized monitoring of television, radio and web audio video content.
Si tratta, dunque, di un’applicazione precisa e potente, capace di generare virtuosi risparmi di tempo e di risorse, in particolare laddove la mole dei contenuti audio video da trattare e vasta ed articolata. It is therefore a precise and powerful application, capable of generating virtuous savings of time and resources, especially where the amount of audio video content to be treated is vast and articulated.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ITBA2010A000024A IT1400352B1 (en) | 2010-06-03 | 2010-06-03 | "SYSTEM FOR MONITORING, RESEARCH, REVIEW, INTEGRATION AND AUDIO / VIDEO / TEXT EDITING BASED ON TRANSLATION OF SPEAKING CONTENT IN MULTIMEDIA SEQUENCES BY SPEECH RECOGNITION AND SPEECH TO TEXT SYSTEMS". |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ITBA2010A000024A IT1400352B1 (en) | 2010-06-03 | 2010-06-03 | "SYSTEM FOR MONITORING, RESEARCH, REVIEW, INTEGRATION AND AUDIO / VIDEO / TEXT EDITING BASED ON TRANSLATION OF SPEAKING CONTENT IN MULTIMEDIA SEQUENCES BY SPEECH RECOGNITION AND SPEECH TO TEXT SYSTEMS". |
Publications (2)
Publication Number | Publication Date |
---|---|
ITBA20100024A1 true ITBA20100024A1 (en) | 2011-12-04 |
IT1400352B1 IT1400352B1 (en) | 2013-05-31 |
Family
ID=43242889
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
ITBA2010A000024A IT1400352B1 (en) | 2010-06-03 | 2010-06-03 | "SYSTEM FOR MONITORING, RESEARCH, REVIEW, INTEGRATION AND AUDIO / VIDEO / TEXT EDITING BASED ON TRANSLATION OF SPEAKING CONTENT IN MULTIMEDIA SEQUENCES BY SPEECH RECOGNITION AND SPEECH TO TEXT SYSTEMS". |
Country Status (1)
Country | Link |
---|---|
IT (1) | IT1400352B1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133763A1 (en) * | 2006-11-30 | 2008-06-05 | Bryan Clark | Method and system for mastering music played among a plurality of users |
GB2451938A (en) * | 2007-08-07 | 2009-02-18 | Aurix Ltd | Methods and apparatus for searching of spoken audio data |
-
2010
- 2010-06-03 IT ITBA2010A000024A patent/IT1400352B1/en active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133763A1 (en) * | 2006-11-30 | 2008-06-05 | Bryan Clark | Method and system for mastering music played among a plurality of users |
GB2451938A (en) * | 2007-08-07 | 2009-02-18 | Aurix Ltd | Methods and apparatus for searching of spoken audio data |
Non-Patent Citations (2)
Title |
---|
ANONYMUS: "Hebrew Speech Recognition", IBM RESEARCH, 27 November 2004 (2004-11-27), pages 1 - 2, XP002617155, Retrieved from the Internet <URL:http://web.archive.org/web/20041127192908/http://www.research.ibm.com/haifa/projects/multimedia/audio_video/speech.html> [retrieved on 20110118] * |
BASU S ET AL: "Audio-visual large vocabulary continuous speech recognition in the broadcast domain", MULTIMEDIA SIGNAL PROCESSING, 1999 IEEE 3RD WORKSHOP ON COPENHAGEN, DENMARK 13-15 SEPT. 1999, PISCATAWAY, NJ, USA,IEEE, US, 13 September 1999 (1999-09-13), pages 475 - 481, XP010351739, ISBN: 978-0-7803-5610-8 * |
Also Published As
Publication number | Publication date |
---|---|
IT1400352B1 (en) | 2013-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7536713B1 (en) | Knowledge broadcasting and classification system | |
US7206303B2 (en) | Time ordered indexing of an information stream | |
US7996431B2 (en) | Systems, methods and computer program products for generating metadata and visualizing media content | |
Suárez-Figueroa et al. | The landscape of multimedia ontologies in the last decade | |
Kurz et al. | Semantic enhancement for media asset management systems: Integrating the Red Bull Content Pool in the Web of Data | |
Hoyt et al. | PodcastRE Analytics: Using RSS to Study the Cultures and Norms of Podcasting. | |
Barwick et al. | Cybraries in paradise: new technologies and ethnographic repositories | |
US11437038B2 (en) | Recognition and restructuring of previously presented materials | |
CN119226575A (en) | Multimodal data search method, device, equipment, storage medium and product based on generative AI big model | |
Shein | From accession to access: A born-digital materials case study | |
Chortaras et al. | WITH: human-computer collaboration for data annotation and enrichment | |
ITBA20100024A1 (en) | "SYSTEM FOR MONITORING, RESEARCH, REVIEW, INTEGRATION AND AUDIO / VIDEO / TEXT EDITING BASED ON TRANSLATION OF SPEAKING CONTENT IN MULTIMEDIA SEQUENCES BY SPEECH RECOGNITION AND SPEECH TO TEXT SYSTEMS". | |
Paneva-Marinova et al. | Digital library for Bulgarian traditional culture and folklore | |
Lichtenstein et al. | TIB's Portal for Audiovisual Media: Combining Manual and Automatic Indexing | |
Grobe et al. | Long-term reusability of biodiversity and collection data using a national federated data infrastructure | |
Liu et al. | A study of entity search in semantic search workshop | |
Biffard et al. | Adding value to big acoustic data from ocean observatories: Metadata, online processing, and a computing sandbox | |
CN102541889A (en) | Method for non-structured media data storage mode | |
Vu et al. | A Content and Knowledge Management System Supporting Emotion Detection from Speech | |
Celma et al. | Zempod: A semantic web approach to podcasting | |
Bürger et al. | Interlinking multimedia-Principles and requirements | |
Chen et al. | Meet2Mitigate: An LLM-powered framework for real-time issue identification and mitigation from construction meeting discourse | |
Oliveira et al. | Social Media Aware Virtual Editions for the Book of Disquiet | |
Herrera et al. | A Corpus of Stories by Members of Civil Society in Rennes, Bretagne, France | |
Dimou et al. | Ilastic: Linked data generation workflow and user interface for iminds scholarly data |