ITBA20100024A1

ITBA20100024A1 - "SYSTEM FOR MONITORING, RESEARCH, REVIEW, INTEGRATION AND AUDIO / VIDEO / TEXT EDITING BASED ON TRANSLATION OF SPEAKING CONTENT IN MULTIMEDIA SEQUENCES BY SPEECH RECOGNITION AND SPEECH TO TEXT SYSTEMS".

Info

Publication number: ITBA20100024A1
Application number: IT000024A
Authority: IT
Inventors: Enrico Giannotti
Original assignee: Cedat 85 S R L
Priority date: 2010-06-03
Filing date: 2010-06-03
Publication date: 2011-12-04
Also published as: IT1400352B1

Description

Descrizione del Brevetto per Invenzione Industriale dal titolo: Description of the Patent for Industrial Invention entitled:

â€œSistema per il monitoraggio, la ricerca, la rassegna, lâ€™integrazione e lâ€™editing audio/video/testo basato sulla trascrizione del parlato contenuto in sequenze multimediali mediante sistemi di speech recognition e di indicizzazione speech to textâ€ â € œSystem for monitoring, research, review, integration and audio / video / text editing based on the transcription of speech contained in multimedia sequences using speech recognition and speech to text indexing systemsâ €

La presente invenzione si inquadra nel settore dei sistemi di ricerca, attuati mediante mezzi informatici e, piÃ¹ in particolare, nei metodi che consentono di monitorare, ricercare, esaminare il contenuto di sequenze multimediali. The present invention is part of the research systems sector, implemented by computer means and, more particularly, in the methods which allow to monitor, search, examine the content of multimedia sequences.

Lâ€™invenzione concerne un insieme strutturato di tecnologie avanzate quali la trascrizione automatica del parlato, le metodologie di indicizzazione di testi, i motori di ricerche oltre a tecnologie e servizi web per lâ€™accesso e la gestione della ricerca di informazioni allâ€™interno di contenuti audio-video provenienti da diverse fonti quali TV, radio, web, archivi digitali o digitalizzati, volto alla ricercabilitÃ e integrabilitÃ dei contenuti multimediali con il resto dei contenuti su web. The invention concerns a structured set of advanced technologies such as automatic speech transcription, text indexing methodologies, search engines as well as web technologies and services for accessing and managing the search for information at internal audio-video content from various sources such as TV, radio, web, digital or digitized archives, aimed at the searchability and integration of multimedia content with the rest of the content on the web.

La principale tecnica nota oggi utilizzata per la ricerca su grandi quantitÃ di contenuti multimediali, si basa sulla categorizzazione di detti contenuti sulla base di commenti associati ai file. Lâ€™associazione puÃ² avvenire sia mediante lâ€™impiego di personale qualificato che provvede alla categorizzazione, sia tramite commenti inseriti direttamente da chi usufruisce, quale fornitore di contributi, del sistema come avviene, ad esempio, con YouTube®. The main known technique used today for research on large quantities of multimedia contents is based on the categorization of said contents on the basis of comments associated with the files. The association can take place either through the use of qualified personnel who provide for the categorization, or through comments entered directly by those who use the system, as a contributor of contributions, as is the case, for example, with YouTube®.

Il carico di lavoro che ne deriva risulterÃ , quindi, proporzionale alla quantitÃ dei video passati in rassegna. The resulting workload will therefore be proportional to the quantity of videos reviewed.

Inoltre, lâ€™attuale tecnica, fornisce un risultato di ricerca in dipendenza della qualitÃ e lâ€™accuratezza descrittiva dei contenuti inseriti per categorizzare i file. Scopo della presente invenzione Ã ̈ quello di fornire un sistema decisamente efficace di ricerca allâ€™interno di documenti audio-video ed in grado di archiviare grandi moli di registrazioni mantenendo un accurato indice dei contenuti. Furthermore, the current technique provides a search result depending on the quality and descriptive accuracy of the contents inserted to categorize the files. The purpose of the present invention is to provide a decidedly effective system for searching within audio-video documents and capable of archiving large amounts of recordings while maintaining an accurate index of contents.

Un ulteriore scopo, Ã ̈ quello di fornire un sistema idoneo a generare indici strutturati in modo da poter reperire velocemente e puntualmente il documento piÃ¹ attinente alla tematica scelta dal richiedente. A further purpose is to provide a suitable system for generating structured indexes in order to be able to quickly and punctually find the document most pertinent to the topic chosen by the applicant.

La presente invenzione, per i suddetti scopi, prevede lâ€™integrazione sia di un riconoscitore automatico della voce (ASR) basato su motore SILVCSR ( Speaker Independent Larg-Vocabulary Continuous Speeck Recognition) sia di un indicizzatore e motore di ricerca di ultima generazione, permettendo al sistema di ricercare direttamente sul â€œparlatoâ€ contenuto nei documenti. The present invention, for the aforementioned purposes, provides for the integration of both an automatic voice recognition (ASR) based on SILVCSR (Speaker Independent Larg-Vocabulary Continuous Speeck Recognition) engine and a latest generation indexer and search engine, allowing the system to search directly on the â € œspokenâ € contained in the documents.

Questi ed ulteriori vantaggi sono raggiunti dal sistema per il monitoraggio, la ricerca, la rassegna, lâ€™integrazione e lâ€™editing audio/video/testo basato sulla trascrizione del parlato contenuto in sequenze multimediali mediante sistemi di speech recognition e di indicizzazione speech to text, di cui alla presente invenzione, descritta con lâ€™aiuto della tavola di disegno allegata che illustra la seguente figura: These and further advantages are achieved by the system for monitoring, research, review, integration and audio / video / text editing based on the transcription of speech contained in multimedia sequences using speech recognition and speech indexing systems. to text, referred to in the present invention, described with the help of the attached drawing table which illustrates the following figure:

fig. 1 uno schema a blocchi del sistema. fig. 1 a block diagram of the system.

Come schematizzato in figura 1, il sistema Ã ̈ strutturato in modo da ricevere in ingresso i documenti multimediali 1, produrre una trascrizione del parlato 2 dei contenuti audio-video â€“ contenente i riferimenti allâ€™esatto millisecondo nel quale ogni singola parola viene pronunciata â€“ e sottoporre il tutto al processo di indicizzazione 4,5 e 6. SarÃ poi compito dellâ€™interfaccia web 7 quello di fornire allâ€™utente finale le modalitÃ per una consultazione semplice ed immediata, oltre che un insieme di risorse aggiuntive. As shown in figure 1, the system is structured in such a way as to receive multimedia documents 1 as input, to produce a transcription of speech 2 of the audio-video content - containing references to the exact millisecond in which each single word is pronounced â € “and submit everything to the indexing process 4,5 and 6. It will then be the task of the web interface 7 to provide the end user with the modalities for a simple and immediate consultation, as well as a set of resources additional.

Eâ€™ previsto, inoltre, un sistema 8 e 9 di feedback automatico, controllato dallâ€™utente, che permette una messa a punto dinamica e continua del servizio offerto e della struttura stessa del sistema. Furthermore, an automatic feedback system 8 and 9 is provided, controlled by the user, which allows a dynamic and continuous fine-tuning of the service offered and of the structure of the system itself.

I blocchi 3a e 3b evidenziano la necessitÃ di immagazzinare i dati, di varia provenienza e codifica, in un formato uniforme come richiesto per lâ€™efficienza del sistema e di estrarre il fermo immagine dellâ€™eventuale file video per riferimenti puntuali a frasi significative e/o a segmenti rilevanti del file audio originale. Blocks 3a and 3b highlight the need to store data, from various sources and encodings, in a uniform format as required for the efficiency of the system and to extract the still image of any video file for precise references to meaningful sentences and / or relevant segments of the original audio file.

Il blocco 4 evidenzia, invece, la possibilitÃ dellâ€™intervento di un operatore umano per la correzione di errori di trascrizione tramite un apposito strumento software e linee guida definite in base a criteri di efficienza e continuamente aggiornate sulla base dei feedback. Block 4, on the other hand, highlights the possibility of the intervention of a human operator for the correction of transcription errors using a special software tool and guidelines defined on the basis of efficiency criteria and continuously updated on the basis of feedback.

CosÃ¬ come rivendicato, il sistema oggetto di invenzione presenta le seguenti caratteristiche di base: As claimed, the system object of the invention has the following basic characteristics:

- presa in carico 2 del materiale, da parte del sistema, e suo smistamento al sistema di trascrizione automatica ed, eventualmente e successivamente, ad un sistema di revisione che si avvale di personale specializzato 4; - taking charge 2 of the material, by the system, and sorting it to the automatic transcription system and, possibly and subsequently, to a revision system that makes use of specialized personnel 4;

- produzione ed immagazzinamento dei file multimediali ricodificati 3a, ovvero di thumbnail 3b, nel solo caso di video, estratti sulla base della suddivisione in frasi/segmenti ottenuti dalla trascrizione automatica e dallâ€™eventuale revisione 4; - production and storage of the recoded multimedia files 3a, or thumbnails 3b, only in the case of videos, extracted on the basis of the subdivision into sentences / segments obtained from the automatic transcription and possible revision 4;

- elaborazione ed indicizzazione del testo per il motore di ricerca e sua disponibilitÃ per la consultazione 6; - processing and indexing of the text for the search engine and its availability for consultation 6;

- unâ€™interfaccia web si occuperÃ , infine, delle modalitÃ con cui i dati sono resi fruibili da parte dellâ€™utente finale e dei servizi aggiuntivi 7; - a web interface will take care, finally, of the modalities with which the data are made usable by the end user and of the additional services 7;

Il servizio di correzione e trascrizione a mezzo operatore tramite un software specializzato per la correzione, che integra e completa le trascrizioni automatiche grezze, concorre a produrre trascrizioni perfette, ovvero a verificare la corretta trascrizione di nomi propri di persona, luogo e organizzazione, e a gestire eventuali errori di pubblicazione 7. The correction and transcription service by operator through a specialized software for correction, which integrates and completes the raw automatic transcriptions, helps to produce perfect transcriptions, or to verify the correct transcription of proper names of person, place and organization, and to manage any publication errors 7.

Un esempio di applicazione consiste nel monitoraggio specializzato di contenuti audio video televisivi, radiofonici e web. An example of an application is specialized monitoring of television, radio and web audio video content.

Si tratta, dunque, di unâ€™applicazione precisa e potente, capace di generare virtuosi risparmi di tempo e di risorse, in particolare laddove la mole dei contenuti audio video da trattare e vasta ed articolata. It is therefore a precise and powerful application, capable of generating virtuous savings of time and resources, especially where the amount of audio video content to be treated is vast and articulated.

Claims

CLAIMS 1. â € œSystem for monitoring, research, review, integration and audio / video / text editing based on the transcription of speech contained in multimedia sequences using speech recognition and speech to text indexing systemsâ € characterized by the integration of both an automatic voice recognition (ASR) engine based on SILVCSR (Speaker Independent Large-Vocabulary Continuous Speech Recognition) engine and a latest generation indexer and search engine, allowing the system to search directly on the â € Spoken contained in documents.

2. System for the acquisition of audio-video material from different sources, characterized by a sequence of instructions stored in an optical magnetic disk-type storage medium consisting of: - taking charge 2 of the material, by the system, and its sorting to the automatic transcription system and, eventually and subsequently, to a revision system that makes use of specialized personnel 4. - production and storage of the recoded multimedia files 3a, or thumbnails 3b, only in the case of videos, extracted on the basis of the subdivision into sentences / segments obtained from the automatic transcription and possible revision 4. - processing and indexing of the text for the search engine and its availability for consultation 6; - a web interface will take care, finally, both of the modalities with which the data are made usable by the end user and of the additional services 7.

3. System referred to in claims 1 and 2 characterized by the fact that whenever a keyword or a topic of interest is dealt with by a television or radio broadcaster, an alert is sent automatically and promptly via e-mail or sms to the computer or telephone of the service user.

4. System referred to in the preceding claims, characterized by the fact that the video clip containing the part of the film, news report, interview or dossier transmitted by the monitored television, radio or web broadcasters, corresponding to the keyword or topic of interest indicated Ã ̈ made immediately accessible to the user of the service.

5. System as per the preceding claims characterized by the fact that the video clip is published on a web interface thus allowing its download.

6. System referred to in the preceding claims characterized by the fact that the cutout is supplied complete with its integral transcription in an interactive and printable version, point & click on the textual track with immediate redirection to the corresponding point.

7. System referred to in the preceding claims characterized by the fact that a search in the archive is allowed, assisted by the system itself, and carried out by means of a latest generation search engine, the latter being able to be carried out according to traditional canons, or by crossing information such as date, issuer, topic.

8. System referred to in the previous claims characterized by the fact that each extract extracted is classified on the basis of predefined categories that can be configured according to criteria established by the user of the service and, by the fact that the system itself, through the search engine and working on indexed texts, will increase the category index thus generating an automatic classification of the examined resources. Moreover, the system, thanks to the ability of the search engine to structure originally unstructured content, is able to use filters created ad hoc according to the customer's needs, effectively restricting the search field on the available audio / video contents.

9. System referred to in the preceding claims characterized by the fact that statistical data are generated on the basis of the occurrences of words and contents, the data thus structured can be consulted in the form most suitable for the research to be conducted.