US20110112832A1 - Auto-transcription by cross-referencing synchronized media resources - Google Patents
Auto-transcription by cross-referencing synchronized media resources Download PDFInfo
- Publication number
- US20110112832A1 US20110112832A1 US12/894,557 US89455710A US2011112832A1 US 20110112832 A1 US20110112832 A1 US 20110112832A1 US 89455710 A US89455710 A US 89455710A US 2011112832 A1 US2011112832 A1 US 2011112832A1
- Authority
- US
- United States
- Prior art keywords
- media
- media resource
- resource
- data
- implemented method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/36—Monitoring, i.e. supervising the progress of recording or reproducing
Definitions
- the disclosure generally relates to the field of processing media archive resources, and more specifically, to auto-transcription enhancements based on cross-referencing of synchronized media resources.
- the production of audio and video has resulted in many different formats and standards in which to store and/or transmit the audio and video media.
- the media industry has further developed to encompass other unique types of media production such as teleconferencing, web conferencing, video conferencing, podcasts, other proprietary forms of innovative collaborative conferencing, various forms of collaborative learning systems, and the like. When recorded, for later playback or for archival purposes, all of these forms of media are digitized and archived on some form of storage medium.
- An operation performed on media resources is speech recognition for processing audio information to generate information that can be represented in textual format.
- speech recognition engines transcribe spoken utterances from audio files or audio streams into readable text.
- OCR optical character recognition
- Optical character recognition can fail if the image being processed is blurry and one character can be mistaken for multiple characters, for example, character ‘1’ being mistaken for number ‘1.’
- image recognition Another operation that is performed with images/videos is image recognition. For example, image recognition based on videos can be performed to identify information available in the video.
- FIG. 1 is an embodiment of the components of the system illustrating their interactions.
- FIG. 2 is a block diagram that illustrates one embodiment of a universal media aggregator system and related programming modules and services including the universal media convertor and the universal media format.
- FIG. 3 is a diagram that illustrates one embodiment of modules within the universal media convertor and the interactions between these modules.
- FIG. 4 is a flowchart illustrating an embodiment of operations of a media archive extractor for a universal media convertor module.
- FIG. 5 is a flowchart illustrating an embodiment for processing of the universal media convertor interpreter.
- FIG. 6 is a flowchart illustrating an embodiment of operations of a universal media convertor module interpreter and associated process interactions.
- FIG. 7 is a flowchart illustrating one embodiment for creation of universal media format and related data storage format.
- FIG. 8 is a diagram that illustrates one embodiment for access of universal media format resources via an application programming interface (API).
- API application programming interface
- FIG. 9 illustrates one embodiment of components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller).
- a disclosed system and framework processes the contents of media archives. Detailed descriptions are provided for the new and inventive ways to detect and make use of the media archive contents in addition to the new and useful ways in which the representation of the media resources is constructed and presented for ease of programmatic interfacing.
- the processing of media archive resources is unified. For example, a system accepts input from multiple media archive sources and then detects and interprets the contents from the media archive. The resulting processed media resources are represented in a flexible and extensible data format. The process of detecting and interpreting the media resources preserves all of the synchronous aspects as captured in the original media archive. All of the processed media resources that have been converted to the new flexible and extensible data format are then aggregated into an encompassing unifying system which provides for ease of access to the converted media resources via common application programming interfaces.
- the systems and methods disclosed are best understood by referring to the drawings and flowcharts that are included in the figures to accompany the following textual description.
- a media resource in a given format is converted to another format, for example, an audio resource is converted to text format via transcription.
- the conversion may be performed via manual transcription or auto-transcription. If the input provided in a certain format does not adhere to strict rules or format of specification, the conversion may not succeed in all cases. For example, transcription of an audio input may not succeed due to the accent of the speaker or if the speaker talks fast in certain portions of the input. Typically several portions of the audio may be transcribed successfully but some words or phrases (also referred to as terms) may fail.
- the automatic transcription system may provide a confidence score such that a low confidence in the transcribed result indicates potential failure in transcribing.
- the output term may be determined to be meaningless, for example, if it does not appear in a dictionary used to validate the words.
- Other media resources that are synchronized with the media resource are used for determining the terms that are incorrectly transcribed (or converted). For example, if a POWERPOINT presentation is synchronized with the audio file, the portions of the presentation are analyzed to identify the correct transcription. More specifically, a portion of the slide presentation that is synchronized with the portion of the audio file being transcribed is analyzed. For example, a particular slide that was being described by a speaker in the audio resource is analyzed to determine possible terms corresponding to the terms in the audio input.
- the inaccurate transcription of the term performed by the auto-transcription process can be used to aid the search of the correct term.
- the resemblance between the incorrect transcription of the term and the potential candidates can be analyzed, based on distance between the terms (measured by number of characters that mismatch between the terms) or based on phonetic similarities between the terms.
- Matching between terms can be performed by determining a match score and comparing the match score of different terms compared with an error term obtained from the auto-transcription output. The ability to reduce the search of the terms to a small portion of other synchronized media resources increases the chances of identifying the term accurately.
- Identifying terms can be particularly difficult if the terms refer to proper names since the proper names may not be available in a dictionary.
- other synchronized media resources are likely to contain the particular name.
- a speaker may refer to a particular software release by a code name.
- the code name may be based on an obscure term or a proper name not found in typical dictionaries.
- the term may be based on an acronym that does not accurately correspond to the way the term is pronounced.
- the term “SCSI” is pronounced as “scuzzy,” which may be difficult for an auto-transcription system to transcribe. This helps narrowing down the correct term faster and with greater accuracy.
- OCR optical character recognition
- Optical character recognition can be performed on an image associated with a media resource, for example, an image belonging to a video associated with the media archive.
- Certain terms obtained from OCR may be incorrect. For example, a word “flower” may be identified as “flower.” Some of these errors are difficult to fix if they refer to proper names that can only be recognized based on the context. These errors can be detected by comparing the terms generated by OCR with a dictionary of terms. The incorrect terms can be corrected by identifying corresponding terms in another media resource that is synchronized with the media resource being converted via OCR. A subset of the other media resource that is synchronized with the portion of the media resource being converted is analyzed to narrow down the search of the terms.
- Another example of conversion is object recognition in image or facial recognition.
- information associated with various participants in a media resource is analyzed to determine a face visible in an image.
- the candidates for objects in the synchronized portion of other media resources are analyzed to determine the correct object being identified.
- other identified objects are analyzed to eliminate objects that have been recognized as being different from the object being currently identified.
- Potential candidate objects are compared with characteristics of the portion of image being identified. For example, even though the information available in the image may not be sufficient for identifying the object accurately, the information may be sufficient to eliminate potential candidates thereby reducing the search space.
- Matching of the features of the objects and the image is performed to determine best match from the candidate objects that were not eliminated. If the best match is within a threshold comparison distance with the image features, the match is identified as the object.
- a confidence score is provided along with the object to indicate a measure of accuracy with which the object is recognized.
- an audio or text (e.g., text obtained by transcribing the audio or a script of a chat session) is analyzed to determine the person in the image. For example, if an image of a person is present in the video, there is a reasonable likelihood that the person is mentioned in the audio associated with that particular portion of the image or the person is one of the participants associated with the media resources, for example, the person can be a contributor in the media resource or the person can be a topic of discussion.
- Embodiments of a disclosed system detect errors in a media resource by comparing information found in the media resource with expected information. For example, a slide number for each slide may be expected to increment by one for each slide. If two consecutive slides are found in which the slide numbers have a gap, slides may be missing between the consecutive slides. Similarly, a title may be expected at the top of each slide. If a slide it found with a missing title, the slide is considered a candidate for correction.
- Information available in the media archive is used to determine corrections to errors found in a media resource.
- Information available in media resources other than the media resource in which error occurs can be used to correct the error. Correlating portions of the various media resources allows focusing on specific portions of other media resources for identifying corrections to errors. For example, a slide presentation on web may be correlated with a transcript of an online session. Portions of the transcript are correlated with individual slides of the presentation. To identify errors that occur in a particular slide, a portion of the transcript correlated with the particular slide is analyzed to identify corrections of the error.
- a system and framework for processing media archive resources comprises the following components: the universal media converter (UMC), the universal media format (UMF), and the universal media aggregator (UMA).
- UMC universal media converter
- UMF universal media format
- UMA universal media aggregator
- the UMC accepts media archives as input, detects and interprets the contents of the archive, and processes the contents of the media archive.
- the UMC performs a “self healing” function when anomalies in the original media archive are detected.
- the UMC also produces a representation of the media archive resources in the UMF. The synchronization between all media archive resources is preserved and/or enhanced during the UMC processing and subsequent creation of the UMF.
- a media archive can comprise a collection of digitized components from a recorded media event that includes media/multimedia presentation, teleconference, recorded video conference, recorded video via camcorders (e.g., FLIP video), podcast, other forms of broadcast media (e.g., television (TV), closed circuit television (CCTV)), etc.
- the collection of digitized components may be in compressed or uncompressed form and may be in a standardized format or a proprietary format.
- the media archive may be digitally stored in a single file, or may be represented in a data stream transmitted over a network, or the individual components may be loosely coupled (e.g., interact with each other over network) and reside in different storage locations and then aggregated into a single cohesive media archive at processing time.
- Examples of digitized components may be a combination of any or all, but not limited to audio, video, MICROSOFT POWERPOINT presentation, screen sharing, chat window sharing, webcam, question and answer (Q&A) window, textual transcript, transcripted words with timings for the transcripted spoken words, background music (e.g., alternate audio tracks), etc.
- the media archive resources are digitized components that are part of the media archive.
- the media resources may be associated with data captured via a webcam, audio recording devices, video recording devices, etc. Examples of media resources include, screen sharing resources, POWERPOINT slides, transcripts, questions and answers, user notes, list of attendees, etc.
- the media resource may reside in a media file, a data stream or other means of storing/archiving computer information or data.
- the information stored may be associated with one or more events, for example, conferences, meetings, conversations, screen sharing sessions, online chat sessions, interactions on online forums and the like.
- the UMF is a representation of content from a media archive as well as any extended or related resources.
- the format is flexible as new resources can be added to the contents of the UMF and existing resources can be modified.
- the UMF is extendable and supports proprietary extensions. The UMF facilitates the ease of access to the components from a media archive.
- the UMA is the encompassing system that supports and controls processing of the requests for media archive extractions, media archive conversions, UMF generation, playback of recorded conferences/presentations/meetings, and the like.
- the UMF beneficially interfaces with a proliferation of proprietary media archive formats and serves as a simplified integrating layer to the complexities of the various media archive formats.
- the UMC beneficially determines media archive formats, autonomously corrects errors in media archive resources, and controls the scheduling of the processing steps required to extract resources from the media archive and synchronously represent these interrelated resources when creating the UMF.
- the UMA provides a common means of interacting with a selected UMF. Enhanced synchronized search capability and synchronous playback of all detected and newly added media resources is possible since UMC preserves and/or enhances the synchronization of the media resources.
- FIG. 1 it illustrates the interactions of the components of the system used to process media archives.
- the systems components are the universal media converter (‘UMC’), the universal media format (‘UMF’), and the universal media aggregator (‘UMA’) introduced previously.
- UMC universal media converter
- UMF universal media format
- UMA universal media aggregator
- the UMC accepts (or receives) input from different media sources 101 - 104 .
- the depicted, and other new and emerging, types of media sources are possible because of a convergence of available technologies, such as voice, audio, video, data, and various other forms of internet related collaborative technologies email, chat, and the like.
- telepresence media sources 101 include CISCO TELEPRESENCE, HP HALO, and telepresence product offerings from TANDBERG.
- CISCO TELEPRESENCE CLIENT PREFERRED
- HP HALO CLIENT PREFERRED
- telepresence product offerings from TANDBERG.
- the UMC 105 is adaptable to support new forms of other media sources 104 that are available in the industry or can emerge in the future.
- the UMC 105 detects and interprets the contents of the various media sources 101 , 102 , 103 , and 104 .
- the resulting output from the UMC 105 interrogation, detection, and interpretation of the media sources 101 , 102 , 103 , and 104 is a unifying media resource, namely the UMF 106 .
- the UMF 106 is a representation of the contents from a media source 101 - 104 and is also both flexible and extensible.
- the UMF is flexible in that selected contents from the original media source may be included or excluded in the resulting UMF 106 and selected content from the original media resource may be transformed to a different compatible format in the UMF.
- the UMF 106 is extensible in that additional content may be added to the original UMF and company proprietary extensions may be added in this manner.
- the flexibility of the UMF 106 permits the storing of other forms of data in addition to media resource related content.
- the functions of both the UMC 105 and the UMF 106 are encapsulated in the unifying system and framework UMA 107 .
- the UMA 107 architecture supports processing requests for UMC 105 media archive extractions, media archive conversions, UMF 106 generation, playback of UMF 106 recorded conferences/presentations/meetings, and so on.
- the UMA 107 provides other related services and functions to support the processing and playback of media archives. Examples of UMA 107 services range from search related services to reporting services, as well as other services required in software architected solutions such as the UMA 107 that are known to people skilled in the art. Additional details of the UMC 105 , the UMF 106 , and the UMA 107 follow in the following further detailed descriptions.
- each of the components UMC 105 , UMF 106 , and UMA 107 can run as a separate programming module on a separate distributed computing device.
- the separate computing device can interact with each other using computer networks.
- the components UMC 105 , UMF 106 , and UMA 107 can be executed as separate processes executed on one or more computing devices.
- the various components UMC 105 , UMF 106 , and UMA 107 can be executed using a cloud computing environment.
- one or more components UMC 105 , UMF 106 , and UMA 107 can be executed on a cloud computer whereas the remaining components are executed on a local computing device.
- the components UMC 105 , UMF 106 , and UMA 107 can be invoked via the internet as a web service in a service oriented architecture (SOA) or as a software as a service (SaaS) model.
- SOA service oriented architecture
- SaaS software as a service
- FIG. 2 depicts the modules that when combined together, form the unifying system and framework to process media archives, namely the UMA 107 . It is understood that other components may be present in the configuration. Note that the UMC 105 is depicted as residing in the UMA 107 services framework as UMC extraction/conversion services 218 .
- the UMF 106 is depicted in the UMA 107 services framework as UMF universal media format 219 .
- the portal presentation services 201 of the UMA 107 services framework contains all of the software and related methods and services to playback a recorded media archive, as shown in the media archive playback viewer 202 .
- the media archive playback viewer 202 supports both the playback of UMF 106 , 219 as well as the playback of other recorded media formats.
- the UMA 107 also comprises a middle tier server side 203 software services.
- the viewer API 204 provides the presentation services 201 access to server side services 203 .
- Viewer components 205 are used in the rendering of graphical user interfaces used by the software in the presentation services layer 201 .
- Servlets 206 and related session management services 207 are also utilized by the presentation layer 201 .
- the UMA framework 107 also provides access to external users via a web services 212 interface.
- a list of exemplary, but not totally inclusive, web services are depicted in the diagram as portal data access 208 , blogs, comments, and question and answer (Q&A) 209 , image manipulation 210 , and custom presentation services, e.g., MICROSOFT POWERPOINT (PPT) services 211 .
- the UMA 107 contains a messaging services 213 layer that provides the infrastructure for inter-process communications and event notification messaging.
- Transcription services 214 provides the processing and services to provide the “written” transcripts for all of the spoken words that occur during a recorded presentation, conference, or collaborative meeting, etc.
- Production service 215 contains all of the software and methods to “produce” all aspects of a video presentation and/or video conference.
- Speech services 217 is the software and methods used to detect speech, speech patterns, speech characteristics, etc. that occur during a video conference, web conference, or collaborative meeting, etc.
- the UMC extraction/conversion service 218 , UMF universal media format 219 , and the UMF content API 220 will be each subsequently covered in separate detail.
- the diagram in FIG. 3 illustrates one embodiment of a processing flow of the major software components of the UMC media extractor/converter 218 .
- the processing starts with media archive input 300 .
- media sources 101 , 102 , 103 , and 104 for representative examples of media archive input.
- the media archive input can be in the form of web input, desktop application input, file input, data stream input, and similar well known input forms.
- the request bus 302 is a queue that contains the media archive input requests.
- the job listener 304 monitors items on the request bus 302 queue and pulls items from the queue that are jobs to process media archives and passes these jobs to the delegator 306 .
- the delegator 306 is configured to determine if the input media archive is of a known data type 308 and then to delegate accordingly to either the data inquisitor 310 for unknown resource types or to optimized extractor and dependency builder 312 for known resource types.
- Data correction module 314 provides fix-ups for detected deficiencies in presentation slides and provides for synchronous corrections to prevent jitter during the playback phase of a recorded presentation. Further details on the UMC extractor 312 and the associated automated data correction 314 are provided in FIG. 4 and accompanying description.
- the function of the data inquisitor 310 is to interrogate the contents of the media archive and determine if the UMC 105 is configured to support the types of media resources that are contained in the media archive 300 . If the data inquisitor 310 detects supported media resources (e.g., moving picture expert group's (MPEG) MPEG-1 Audio Layer 3 (MP3), MPEG-4 (MP4), WINDOWS media video (WMV), audio video interleave (AVI), etc.) then corresponding objects to handle the extraction are created for use by the UMC extractor 312 , updates the delegator 306 with information for the known media resource type, and then passes the request 300 to the UMC extractor 312 for processing. Errors are logged and the processing of the request is terminated when the data inquisitor determines that the UMC extractor 312 is unable to process the contents of the media archive 300 .
- supported media resources e.g., moving picture expert group's (MPEG) MPEG-1 Audio Layer 3 (MP3)
- the UMC extractor 312 is configured as described herein.
- the UMC extractor 312 creates an index for each of the media resources that are contained in the media archive.
- the index contains information identifying the type of resource and the start and end locations of the given resource within the contents of the media archive 300 .
- the UMC extractor 312 uses a new and inventive process to determine content in the media archive that is in an unknown (proprietary) format and to created identifiable media resources from this data for subsequent storage in the UMF 106 , 324 .
- This secondary processing phase of the UMC extractor 312 utilizes the data correction module 314 to “fix up” the data that is discovered in the supplemental extra data in a media archive.
- the interpreter 318 is configured to process decisions based on the information that is supplied from the data inquisitor 310 and the extractor 312 .
- the interpreter 318 monitors the queue of the data object bus 316 and pulls objects of the queue if it is determined that the interpreter can process the data in the object via one of associated media resource data processors 322 .
- the interpreter 318 is configured to perform complex tasks that include scheduling, processing, and performing corrections of media resources, which are further described in FIGS.
- the interpreter 318 determines that all of the tasks to process the media archive request are completed, the interpreter 318 then queues the resulting output to the processor object bus 320 .
- the data collator and UMF creation process 324 monitors the queue of the process object bus 320 and retrieves the completed media archive requests.
- the completed data associated with the media archive request is then transformed into the UMF 106 via the methods contained in the data collator and UMF creation process 324 .
- the UMF 106 Once the UMF 106 is created then these resources are made available to the portal assets/resources 328 for use in applications running in the presentation services layer 201 .
- the created UMF 106 can also be transformed to other archive formats 330 or other requested file formats 332 via the UMF translator 326 . Further details on the UMF 106 creation and the UMF 106 storage description are described with respect to FIG. 7 and the accompanying text. Further details on accessing the contents of the UMF 106 are contained in FIG. 8 with accompanying textual description.
- Processing steps 400 through 406 illustrate how the UMC extractor 312 extracts well known media resource types from the media archive.
- Examples of well known media resources include, but are not limited to, the following formats WAV, AV1, MP3, MP4, MPEG, PPT, etc.
- the extractor performs operations on the media archive raw data 400 .
- the process of extracting the well known media resources from the media archive 400 start with reading a data block from the media archive 400 , and if not at the end of input data 404 , then proceeds to the interrogation step 405 to determine if a well known media resource type is found in the data.
- the data block is examined by comparing the patterns in the data block to typical patterns in various formats of media resources.
- the comparison can be accomplished via any well known means (e.g., performing boolean and operation, boolean or operation, or boolean exclusive or operation on data) or via other well known tools such as regular expression comparison tools (for example, REGEX).
- an index is built 406 for the found media resource comprising a data structure that identifies the position of the media resource within the media archive 400 .
- the index for each detected media resource includes both the starting and ending location of the media resource within the media archive 400 .
- This process of reading data blocks 402 and interrogation of the data block via an introspection process continues until all of the data in the media archive 400 is sequentially processed and the end of the archive check 404 evaluates to true. If the end of the archive data is reached, the supplemental, non-standard, data is extracted 408 from the media archive 400 .
- the process of finding meaningful media resource data from the supplemental data in a media archive requires multiple passes through the data, where each pass through the data adds new information based on the information collected in previous passes. It is not required to pass through the entire contents of the media archive 400 in steps 410 through step 414 . Searching through the supplemental data is optimized since all of the index locations in the media archive 400 identified in step 406 can be bypassed.
- the UMC optimized extractor code 312 searches for repeating patterns in the data.
- An example of a repeating pattern identified in the first pass is a pattern like x02F0 that would be repeating throughout this supplemental data block.
- the types of patterns searched in the first pass include clear text ASCII keywords that repeat throughout the supplemental data block, e.g., “Page”, “Slide”, “Event”, etc.
- a sequential scan of all of the supplemental data is performed in the first pass to identify patterns.
- a matrix is developed, where the matrix includes: the pattern identifier, number of occurrences, and the locations (e.g., binary offset from beginning of file) of the occurrences in the supplemental data.
- the second pass 412 through the supplemental data is searching for “regular” incrementing or decrementing patterns within a close proximity to the repeating patterns that were detected in the first pass 410 through the supplemental data.
- the identification of regular patterns is performed similar to the sequential scan mechanism described above for identifying patterns.
- the close proximity is a configurable parameter, e.g., 128 bytes, but in practice this value could be larger or smaller.
- regular occurring patterns are associated with “human driven” events, e.g., they may correspond to MICROSOFT POWERPOINT slide numbers or the progression of slide flips that occurred within the recorded conference, meeting, or presentation.
- a video associated with a presentation can be analyzed to identify a slide flip by identifying a change in a portion of the image associated with the slide projection.
- the slide flip identified in the video is a human driven event that can be utilized as a regular pattern.
- an audio or a transcription of the audio associated with a presentation can be analyzed for regular events for example, the speaker speaking “next slide please.”
- the third pass 414 through the supplemental media archive data now searches for “non-regular” incrementing patterns in the locations in close proximity to the previously detected data patterns.
- Non-regular patterns do not occur on set intervals (the interval may be time code based, or based on the amount of data stored between pattern occurrences).
- the time code is a timing value that is detected in the media archive in close proximity to the media resource, and the associated detected regular pattern (if found).
- the time code timing value has the characteristic of always incrementing in larger increments (e.g., in milliseconds), for example, greater than one or two, and may be bounded by the total length of time of the detected audio file from the archive.
- Set interval indicates the human events type of interval, e.g., a slide flip or a slide number where the detected intervals are small integers.
- the integer values for these “set intervals” may be stored in, and detected in, a byte, integer, long, etc.
- the non-regular patterns that are identified are the patterns that have the property of proceeding in an incrementing manner. The recognition of non-regular patterns is performed using steps similar to those described for repeating or regular patterns with the exception of searching for the properties associated with the non-regular data patterns.
- Regular incrementing patterns are values that fall within a range, e.g., where the range is the number of pages in a POWERPOINT presentation. Further sanity checks on the possible meaning for a detected regular pattern can be applied, e.g., there are not likely to be thousands of slides within a one hour POWERPOINT presentation.
- the non-regular numbers may be time values in milliseconds granularity and therefore very large numbers outside of the range of the number of slides from the example POWERPOINT presentation. Therefore, using these types of inferences, human generated events (e.g., POWERPOINT slide flips) are distinguished from computer generated events such as recording time stamps in milliseconds.
- non-regular numbers may appear to be random in nature, but will notably be progressing ever larger.
- This type of detected non-regular, repeating, seemingly random, continually incrementing, patterns are determined to be timing values that are associated with the “human driven” events that were detected in the contextual pattern matching that occurred in the second pass 412 .
- These detected timing values from the third pass 414 can be further validated to ensure that the timing occurs within the time range of the audio track from the media resource that was detected in process step 406 .
- each pass has a smaller amount of data to analyze compared to previous passes.
- a media archive may contain a media resource representing the audio and video from an on-line (internet/intranet based or other form of public/private network based) meeting.
- the meeting has a number of attendees/participants/panelists/subject matter experts and that during the course of the meeting that one presenter may “pass the ‘mic’ (e.g., a virtual microphone)” between the speakers.
- the number of speakers for the virtual meeting have been identified and/or detected in the processing of the media archive.
- the first pass identifies the “pass the ‘mic’” type of event
- the second pass detects the integral speaker id's (e.g., the number within the range of speakers)
- the third pass detects the timings for the speaker change event transitions.
- a specific pass may identify information in one media resource of the media archive but not in another media resource of the media archive.
- file F 1 may represent a POWERPOINT presentation with “SLIDE 1 ,” “SLIDE 2 ,” annotations.
- File F 2 is an audio file that does not have the “SLIDE 1 ,” “SLIDE 2 ,” annotations but has timing information based on the third pass.
- the information in the two files can be correlated via the timing information detected in the third pass.
- the timings associated with the file F 1 are correlated with the file F 2 by advancing to the appropriate time interval of the file F 2 (the audio file).
- a matrix structure is built to hold the detected information from each pass. If a particular pass did not return any information, the matrix does not store any values for the corresponding positions. For example, if the second pass is performed and does not detect any regular pattern, the matrix structure does not store any values in the appropriate positions.
- the third pass may detect information and store the corresponding information but the second pass may not detect any information.
- the matrix may contain information detected in the first pass and the second pass or information detected in the first pass and the third pass, or information detected in all three passes.
- the first pass is always performed but the second and third pass may be optional.
- the second pass may be considered optional if the associated data is absent.
- the third pass may be relevant depending on the type of associated media resource/event that is detected/discovered in the first pass.
- the timing values associated with the media files may be estimated by using interpolation. For example, if the timing information available in the media resource is too far apart and the regular incrementing pattern occurs relatively more frequently, the timing information corresponding to the regular incrementing patterns can be estimated (by interpolation) and added to the metadata. In this situation the non-regular incrementing patterns may be detected anywhere in the file and estimated for the locations corresponding to the regular incrementing patterns.
- the result is that a new media resource with associated events and timings has been discovered in the supplemental data within a media archive.
- the new resource types are identified and made known to the system and are used for future identification 405 of data.
- These newly discovered media resources are able to be synchronously connected to all of the other media resources (audio, video, transcripts, etc.) in the media archive 400 .
- the newly discovered media is a POWERPOINT presentation
- all of the slide flips in the exact order in which they were presented, can now be synchronized with all of the other media resources (audio, video, transcripts) during the playback of the media event (conference, presentation, meeting).
- the auto transcription process keeps the timings for each detected spoken utterance (word) and there is a correlation between the timing of the spoken word (that was auto transcribed) and the associated timing of the spoken word within the audio file.
- word the timing of the spoken word
- the timings are at a fragment granularity (e.g., a sentence) instead of for each spoken word.
- the synchronization information is stored in the UMF.
- this synchronization information is stored in the UMF in the events 726 block, e.g., the case where the event is a synchronization timing event. Details of how the data is represented in the UMF are provided below.
- the information from the three passes is stored in the UMF, and correlations are performed on the fly with respect to the requested search criteria.
- the following examples illustrate cross referencing of media files based on synchronization information for the media resources.
- an audio file is monitored, and speech recognition used, to determine when the speaker says something like, “Now viewing the following screen” indicating that the speaker is now transitioning to a screen sharing view.
- speech recognition used, to determine when the speaker says something like, “Now viewing the following screen” indicating that the speaker is now transitioning to a screen sharing view.
- a “smoothing factor” is applied to slowly/gradually transition the view from the speaker to the view of the computer display screen. Therefore, the interaction between the audio and video resources.
- the benefit here is that special effects can be introduced that were not originally intended in the original media resources (e.g., fade in/out) in order to improve the play back qualities of an originally recorded media presentation.
- the UMC identifies errors in media resources for correction and determines potential corrections for the errors.
- Typical media archives have expected pieces of information in specific media resources.
- a slide presentation can be expected to have a slide title in every slide.
- the expected information in a media resource may correspond to specific patterns that are expected.
- the slides numbers are expected to increment by one in consecutive sides.
- the expected information may be inferred based on the context. For example, is the slides are numbered in a presentation, a slide number can be used to infer the number written on the next slide.
- the terms used in the media resource may be expected to be found in a dictionary of terms.
- a transcript states the speaker as mentioning “Next slide please,” the corresponding synchronized portion of a slide presentation is expected to have a change to the next slide.
- An error in a media resource comprises deviations in information from expected information in the media resource.
- the terms of a media resource may be expected to occur in a dictionary and a typo resulting in a term that is not found in the dictionary is a deviation from the expected information.
- each slide may be expected to have a title.
- a slide with a missing title contains a deviation from the expected information.
- An error can be localized to a specific portion of a media resource, for example, the error may occur in a particular slide of a presentation.
- Synchronization of the various media resources from a media archive allows media resources to be correlated. Corrections to the errors found in a media resource are determined by analyzing other media resources of the archive. Portions of other media resources of the media archive that are correlated with the specific portion containing the error are identified and analyzed. For example, a slide presentation on the web may be correlated with a session transcript. Errors found in a particular slide are corrected by analyzing portions of the transcript corresponding to the particular slide with errors. For example, a name of a company that is misspelled in a slide is corrected based on the spelling of the name in the transcript.
- expected information in media resources corresponds to patterns of events that correspond to correct synchronized flow of events. Deviations from these patterns representing correct flow of event are identified as errors in media resources. Portions of the media resources are corrected by editing the media resources. Portions of other synchronized media resources in the media archive are correspondingly edited to keep the media archive consistent.
- Another example illustrates how errors in multiple media resources of a media archive can be corrected based on information indicating an error inferred from one of the media resources.
- the audio resource for a speaker makes reference, at a specific point in time during a presentation, to a product code named “Potomac” which is supposed to represent a 4G mobile broadband router.
- Potomac product code
- the speaker transcript is synchronously corrected and the following other media resources of the media archive are also synchronously corrected to make them all consistent: screen sharing resource, power point slide including speaker notes, user notes/comments, and optional Q&A
- the errors are statically detected from a stored archive.
- the media resources from the stored archive are read and errors detected as well as corrections to the errors identified.
- the corrections to the errors may be automatically applied to the media resource and the corrected media resource stored in the media archive or as a corrected copy of the media archive.
- corrections are dynamically applied to the synchronized media resources during playback. As the playback of the various portions of the synchronized media resources is performed, the various portions are analyzed to detect errors and further analyzed to determine corrections to the errors which can be applied. As a result, the user views the corrected versions of the media resources.
- the error correction process is applied during a live presentation.
- a subject matter expert can monitor the audio of a live presentation.
- the subject matter expert is acting in the role similar to that of a proof reader. Any changes/corrections made by the subject matter expert (see example on correcting product code names in [0060] above) are propagated to the other media resources in real time: text transcription, power point slide (including speaker notes), user notes/comments, etc.
- the self healing process 416 can analyze the timings between the slide flip events and the associated slide numbers. Then to “self correct” possible jitter in the playback of the presentation, certain events may be filtered out of the media resource. For example, if the presenter mistakenly skipped ahead two or more slides, then the self correcting code can detect these multiple advances in slide flips followed immediately, almost instantaneously, to return to a previous slide number. In this case, the self correcting code can remove, i.e. filter out, the mistaken advances in the progression of the slide presentation and thereby reduce jitter by proceeding only to the intended slide.
- deviations from an expected pattern are detected to identify errors that need correction.
- Rules may be associated with the potential deviations to distinguish errors from normal behavior. For example, if the time interval during which a presenter advances by skipping slides and returns to the previous slide is greater than a threshold value, the behavior may be considered a part of the presentation. On the other hand, if the time difference is below the threshold value, the behavior is considered an error.
- self healing 416 on the newly detected events can occur as well.
- the self correcting code 416 can generate a table of contents by examine the contents of the POWERPOINT slides and using the slide title as the entry in the table of contents.
- further self correcting in 416 occurs for the case where a slide is missing a title.
- the self correcting code examines the slide for other text near the top of the slide and/or looks for other text on the slide that is in bold, underlined, or a larger point size font, and uses that text for the table of contents entry. It should be clear that variations of this self correcting methodology can be applied to any data source and can be used to derive topics from other digitized media sources.
- the usefulness and value of a presentation can be extended by adding User Notes/Comments synchronously as a media resource to the media archive, after the occurrence of the actual live presentation.
- These comments/corrections made by subject matter technical experts, are made “after the fact” (i.e. after the original recording of the presentation) and this supplemental material is synchronously inserted into the original media archive.
- These user comments may include links to new white papers or technical reports and thus enhance the value and usefulness of the original presentation. Accordingly, synchronously added new material can either correct or embellish the content of the original recorded presentation.
- Examples of self correction applied to various data sources include, but are not limited to, video/image optical character recognition (OCR), video/image context recognition, audio phrase detection by context, and via transcript analysis. Following examples illustrate self correction applied to various data sources.
- OCR video/image optical character recognition
- video/image context recognition video/image context recognition
- audio phrase detection by context audio phrase detection by context
- transcript analysis transcript analysis
- Image OCR for identifying text: in this example assume that there is an unknown (or misspelled) word in a transcript.
- the synchronization capability of all media resources allows analysis of the corresponding video image for the same point in time.
- the image analysis is used for inferring various kinds of information from the images, for example, text recognized using OCR techniques.
- the image analysis may indicate that there is recognizable text within the image, e.g. the word “Altus” is identified via OCR.
- Use of OCR in images allows material to be used for error correction that is outside the media file containing the error.
- image context recognition in this example assume that there is an unknown (or misspelled) word in a transcript. Because of the synchronization capability of all media resources, the corresponding video image for the same point in time is analyzed. Then the image is analyzed and a “toaster” is identified. A decision is made to determine if the unidentified (or misspelled) word in the transcript is the word “toaster”. Accordingly, object recognition techniques can be used to provide information for correction of errors. The object recognition may be applied to media information available in files distinct from the file containing erroneous information being corrected.
- Transcript Analysis to correct missing slide titles and table of contents entries In this example assume that the textual transcript is correct, but some of the MICROSOFT POWERPOINT slide titles and corresponding table of contents entries are missing entries. In this case, the contents of the textual transcripts, for the same time codes as the missing slide titles, can be analyzed and a title and corresponding table of contents entry may be formulated from the synchronous contents of the textual transcript.
- Another use case assists transcription services.
- the transcriptionist is playing back audio at varying speeds in order to create the textual transcription.
- the synchronized PPT can be displayed in a separate window during the transcription process. If the transcriptionist cannot determine the word from the audio to transcribe to text, the synchronized PPT will highlight the display likely possibilities and then the transcriptionist can drag and drop the desired word from one synchronized media resource (i.e. the PPT) to another (i.e. the textual transcription). The selected word is then also added to the speech recognition dictionaries for future transcription accuracy. Possible to also correct other files with the similar problem.
- An automatic transcription mechanism can use the synchronized information while deciphering audio information to assist with recognition of words that are difficult to recognize without additional information.
- Some problem areas solved via auto-correction self healing are also examples of the cross referencing of different synchronized media resources to auto-correct problems (or to assist in the correction of problems).
- Some problem areas solved via auto-correction self healing jitter correction (re: slide flips and other synchronous transitions between media resources), programmatic special effects by using the synchronized time codes and detected audio to improve viewing quality during play back (e.g. fade in/out, etc.)
- Media archives contain other types of supplemental information including, but not limited to, online chat session archives, attendees email addresses, screen sharing sessions, etc. and the newly disclosed processing techniques can be used to detect, synchronize, and extract these other types of media resources contained within a media archive and represent them in the UMF 106 .
- topic generation or categorization techniques based on text analysis of the content of the slide can be used to generate a topic.
- the topic can be generated based on information available in alternate media synchronized with the presentation. For example, a transcription of the audio data associated with the presentation can be analyzed to find the text representing the information described by the speaker during the same time period that the slide was shown.
- the information in the audio data can be used for self healing of the slide, for example, the title of the slide can be generated from the audio data. Accordingly, information from one media resource can be used for self healing of information in another synchronized media resource.
- the examples presented for self-healing also illustrate cross-referencing between media files of different format in a media file archive to auto-correct problems (or to assist in the correction of problems). The following examples, further illustrate cross referencing of media files and self correction.
- An audio file is monitored, and speech recognition used, to determine when the speaker says something like, “Now viewing the following screen” indicating that the speaker is now transitioning to a screen sharing view.
- a “smoothing factor” is applied to slowly/gradually transition the view from the speaker to the view of the computer display screen. Therefore, the interaction between the audio and video resources.
- the benefit here is that special effects can be introduced that were not originally intended in the original media resources (e.g. fade in/out) in order to improve the play back qualities of an originally recorded media presentation.
- Example 2 The audio is monitored for utterances like, “Now going to slide 5 ”, indicating that the speaker is going to flip to the slide number that he verbally stated.
- the slide number from the synchronized slide (e.g., MICROSOFT POWERPOINT) resource is compared with the audio phrase that was captured in the synchronized audio resource and if there was a mistake detected, then the transcript can be corrected to reflect what actually transpired during the recording of the presentation.
- the transcript might have notation like the following: “Now moving to slide four [note: auto-corrected to four] . . . ”
- a job listener 500 receives a request to process a media archive or other assets from a configured queue. This request is passed to a scheduler component 502 which retrieves the rulesets for processing the requested job from a data store (configuration file or database) 508 .
- a job state tracker object 506 which will contain the state of the individual tasks which are required to complete processing of the job, and checks the system resource monitor daemon 504 for the available system resources (amount of free memory, central processing unit (CPU) utilization, and number of free CPU cores (a CPU core is a dedicated one-way processor within an n-way processor/processor group if on an SMP (Symmetric Multi-Processor) machine), as well as general input/output (I/O) state) to determine the number of parallel threads that the system is capable of running
- the scheduler 502 builds a hierarchy of tasks in the form of task request objects 520 based on their interdependencies (e.g., the output of one task may be the input of another), estimated runtime, and amount of resources consumed, etc.
- the scheduler 502 notes the number of and type of tasks required to process the request in the state tracker object 506 , sorting by order of processing and grouping by individual thread, and assigning each task a unique ID for tracking. Once this registration is complete, the scheduler begins passing the individual task requests into the processor object bus 514 via a message passing interface known as the process requestor 512 .
- the process requestor 512 is responsible for passing individual task requests into the bus, in the form of task request objects 520 , and listening for responses from completed tasks.
- the individual task requests 520 take the form of object messages, each extending a parent task request object class, overriding data sections as needed so that each is identifiable as a specific object type, as well as having an embedded ID 522 as assigned by the scheduler 502 earlier.
- Each individual processor/data manipulator 516 listens to the bus, inspecting objects to see if they can manipulate the object, and pulling objects off the bus if they can. Each processor/data manipulator 516 runs in its own thread. When a processor/data manipulator 516 completes processing or encounters an error, it stores the created data (or pointers to the data) 526 , as well as any notes 528 or errors 530 encountered in the task request object, updates the status 526 , and returns the task request object back to the bus 514 where it is retrieved by the process requestor 512 and returned to the scheduler 502 for inspection.
- the scheduler 502 checks the error handling ruleset 510 in order to determine the next course of action, e.g., whether to spawn the next task, or stop processing.
- the task request 520 is stored in the state tracker object 506 , and, if needed, additional processing is performed.
- the additional processing and error handling is described in detail in FIG. 6 . This process of job request and response retrieval is looped until all specified tasks in the state tracker 506 are completed, or a critical error is reached.
- a request to finalize the job is passed via the bus 514 in the form of the job state tracker object 506 .
- the UMF collator 518 listens for state tracker objects 506 , retrieves them, and converts the object to a UMF object (See FIG. 7 , 750 ). After the UMF object creation has been performed, a serialization request 532 is sent to the UMF serializer 534 , whose operations are detailed in FIG. 7 .
- FIG. 6 it illustrates one embodiment for operations of the UMC interpreter 318 and its associated process interactions with other components of the UMC.
- a job request arrives 500 with a specified input and output and the interpreter 318 ( FIG. 3 ) retrieves 600 the ruleset 508 based on the specified input and output for the job (e.g., the input for a requested job could be a web conference archive, and the output could be a video, with a separate audio file of the conference).
- the scheduler 502 determines 602 the ideal number of threads for the job based on the amount of available system resources as reported by the system resource monitor 504 .
- the scheduler 502 builds 604 a schedule of specified tasks for distribution amongst the number of threads determined earlier.
- the scheduler 502 begins assigning 606 tasks by means of the process requestor 512 , and then awaits 608 return messages from the individual tasks.
- the processors that are assigned these tasks are modules or programs which are required to convert or build the assets needed to complete a job request (e.g., a WAV to MP3 converter or a PPT slide to image converter).
- the scheduler 502 performs a check 612 of the return object 520 for any errors. If no errors are found 613 b , the task is added to the completed list 628 contained in the job state tracker 506 . Then the scheduler checks for the next scheduled task 630 . If a task is found pending 631 b , it is assigned 632 , and the scheduler 502 waits for the next completion message 610 . If no tasks are found to be remaining 631 a the scheduler 502 assembles 634 the data from the state tracking object 506 into a UMF data object 750 and sends a serialization request 532 to the UMF serializer 534 (detailed operations for UMF serialization are in FIG.
- the scheduler checks 614 the severity of the error, and determines how to handle it based on the error handling ruleset 510 . If the error is found 615 a to be critical, e.g., processing cannot continue due to file corruption or lack of appropriate processor, the progress is logged 616 , and the processing of the job halts. A message may be sent to a monitoring service or human supervisor in the event of a critical failure.
- the notes are logged 626 to the state tracking object 506 , and the task is added 628 to the completed list. These notes may be useful during later quality assurance phases to alert a user that special attention may be needed when reviewing the final produced assets, or they may be useful during later analysis of the process to alleviate bottlenecks in performance of the system. If the error is found 615 b to be a warning, e.g., the task could not be completed as requested, but an alternative may exist, or the desired output may not have been perfectly generated, the error is logged 618 to the state tracking object 506 and the scheduler 502 checks 620 for an alternative processor.
- the scheduler 502 logs progress, and stops processing 616 , in the same way as described for a critical error 615 a . If an alternative processor does exist 621 b , then the selected alternative is logged 622 and a new task is generated and assigned to the alternative processor 624 . The scheduler 502 then returns to waiting for the next job completion method 608 .
- the process of selecting the best processor based on what is encountered during processing of individual tasks is an efficient way of executing self-healing techniques, in which the interpreter 318 ensures that the best processor for the job is selected, and in the event that the best processor cannot be used, the next best is tried, and so on, thereby guaranteeing that the best possible output is generated, as well as the completion of the job request in all but the worst case scenarios.
- An example of this would be the case where a particular slide from a PPT that was being processed contained corrupted data in it, and this data prevented the reading or writing of a PPT containing the corrupted slide, the corrupted slide would be isolated and the data corruption removed.
- FIG. 7 describes in detail both the creation of a UMF 106 and also the structure of the UMF data format.
- a serialization request arrives 514 in the form of a UMF data object 750 .
- the UMF serializer ( FIG. 5 , 534 ) first builds an index of the object 750 , noting size of each object contained in the request, and the position (order and address) that each object will take in the created file or data stream. The recommended order of the data blocks is illustrated (left to right) in the data block 752 .
- the UMF serializer 534 ( FIG. 5 ) then opens a file stream and writes a file header 714 to the beginning of the stream 702 .
- the header's 714 contents begin with the bytes (in hex) 0x554d467665722 e (“UMFver.”), and are followed by the UMF version number (e.g., “1.0.0.1”).
- UMF version number e.g., “1.0.0.1”.
- This versioning allows the UMF to be intrinsically extensible so that different versions of the file format may at future times specify different requirements. If the container used for storing the UMF does not support binary data representations, and instead uses a wrapper around plain-text, the corresponding ASCII or UTF-8 values would be used instead.
- index 716 is to be written 704
- the index created in 700 is written to the stream.
- an index greatly speeds retrieval from a large UMF
- the presence of an index requires its maintenance for it to be useful, and if a particular UMF will be undergoing lots of future manipulations and edits it is possible to store the UMF without this index to save write and modification time overhead.
- an index block can be added at a later point when the need for an index is identified.
- Requisite null values are then serialized to the checksums block 718 as a placeholder for the checksums that will be computed later in the process 712 .
- Several types of data integrity checks may be alternatively used for checksum 718 and the size of the placeholder is dependent upon the chosen data integrity algorithm.
- the serializer 534 After reserving the checksum block 718 , the serializer 534 ( FIG. 5 ) generates and writes a unique identifier (ID or id) 707 to the unique ID block 720 .
- ID unique identifier
- the unique ID may be registered in a data store of some format through well understood processes of cataloging unique ids, and the process may check with this data store after generation to ensure the uniqueness of the id.
- each index point stored in the index previously generated 700 is serialized in the order specified by the index.
- Each of the individual components ( 720 - 734 ) of the data block 752 is written to the file if found, with their type 762 and name 764 serving at the beginning of the relevant block. This serves as the individual identifier for the media resource that is stored in the UMF.
- This is followed by the security level of the segment 766 . This security level is a standard integer from 1-10, the specified level indicating that access should only be allowed by a client possessing an equal or lower security level. If there are any other arbitrary access limitations 768 , these are serialized next. These access limitations 768 may be, but are not limited to: a specific geographic locale, or a specified department/group.
- pointers or URLs to any modules that are required for access or manipulation of the data block are written 770 .
- the specified modules 770 would be useful if a custom encryption algorithm were used to store the data segment 772 , or in the event that the data 772 could not be accessed without a specific form of digital rights management (DRM).
- DRM digital rights management
- the data itself or a pointer to the data is written 772 .
- Concerning the data segments 760 the following may be noted: 1) there is no arbitrary restriction on the number of instances of each data type stored in the UMF, or the number of data blocks, e.g., there could be more than one audio block 728 , or multiple sets of resources 732 .
- the raw data need not be contained in the file itself 772 ; a pointer to another UMF in the format of “umf://uniqueid/segmentname[@location]” (@location is optional as a system could possess a preconfigured store for UMFs) or a file or network path (in URL format) would be sufficient. This allows, amongst other possibilities, for potential external storage of UMF data, or creation of a UMF where direct access to the data is not possible.
- a non-inclusive list of potential data to be stored in the UMF is as follows: (1) Any metadata concerning the media file 722 , e.g., the author 722 a , any attendees 722 b that were present if the media file originated from a meeting or conference, the date and time of the original recording 722 c , the location 722 d of the recording (geographic and/or virtual), and any other misc. data 722 e .
- Any metadata extracted from the original job request and processing e.g., the original job requestor 724 a , the date and time of the request 724 b , and errors encountered while processing 724 c , and notes made by the processor 724 d , the amount of time that processing the request took 724 e , any external billing ID(s) 724 e , and any other miscellaneous data 724 g .
- Any events 726 associated with the data contained in the UMF e.g., a change in presenter if a presentation was being recorded, a change in context of the data, e.g., video and audio combined to audio only at a certain time code, or something as simple as a change in page if a portable document format (e.g., ADOBE PDF) or MICROSOFT POWERPOINT was being presented.
- any segments of audio 728 processed as a part of the job e.g., WAV data, MP3, etc., 5) any video segments 730 .
- Any additional resources 732 related to the data e.g., POWERPOINT files, WORD documents, images, etc.
- any code modules 734 related to or required for access or manipulation of UMF could be, but are not limited to, DRM plugins, usage trackers, security information or decryption packages if the data contained in the UMF were encrypted, executable players for various platforms, dynamic code generators for various platforms related to playback or access of the data, etc.
- DRM plugins DRM plugins
- usage trackers security information or decryption packages if the data contained in the UMF were encrypted
- executable players for various platforms include executable players for various platforms
- dynamic code generators for various platforms related to playback or access of the data, etc.
- the ability to include code in the UMF format allows a recipient to process a media archive without having to download additional logic. Custom designed media archives and new media sources can be easily developed and incorporated in UMF and provided to consumer systems for being processed.
- FIG. 8 provides the detailed flow of accessing the UMF 106 , 219 via an API 220 that is provided via the UMA framework 107 .
- a request 800 is made via the UMF content API 220 to retrieve data that is contained in a UMF archive.
- the API checks the requested UMF to ensure it contains the requested data 802 .
- This request must contain the UMF unique ID 720 and the requested data type 762 , and may contain the name of the requested data type 764 .
- the request should also contain some form of authentication data in any of the variety of well understood authentication mechanisms (lightweight directory access protocol (LDAP), username/password pair, etc.) so as to identify the system or user initiating the request.
- LDAP lightweight directory access protocol
- username/password pair etc.
- the system checks for any access restrictions on the requested data segment 806 . These are stored in the relevant data block segment 766 , 768 , 770 . If the requestor does not possess the required credentials to meet the specified restrictions 808 b an error message is returned 812 stating that the request has been denied due to lack of sufficient privileges.
- the API checks a data conversion ruleset 816 , to determine if the requested data can be generated 810 and returned to the client.
- the data conversion ruleset 816 comprises mappings of dependencies for the creation of a given data type (e.g., a WAV or other audio file is required to generate an mp3), as well as suggested alternatives if a conversion cannot be performed.
- the ruleset 816 also provides the location (if web service) of executable code (for example, binary code) or name of the module to be used for the conversion. If the ruleset 816 does not define a means for generation of the requested data type 818 a , an error is returned 820 that states that the requested data could not be generated. If possible, a suggested alternative data type is returned that is either present in the system or can be generated as specified in the data conversion ruleset 816 .
- the API checks the restrictions of the parent data type that will be used to generate the data 822 (thus the generated data inherits the permissions/restrictions of the data it is being generated from). If the data is being generated from two or more data types (e.g., generating a video from a set of POWERPOINT slides and an audio file) the more restrictive restrictions are used. If the requestor does not meet the restrictions 823 b an error message is returned 830 that states that the request has been denied due to insufficient privileges. If the requestor meets the access restrictions 823 a , the API requests from the conversion utility specified in the data conversion ruleset 816 that the conversion be performed. Then the API checks the results of the conversion 825 .
- the restrictions of the parent data type that will be used to generate the data 822 (thus the generated data inherits the permissions/restrictions of the data it is being generated from). If the data is being generated from two or more data types (e.g., generating a video from a set of POWERPOINT slides and an audio
- methods and systems disclosed provide a significant improvement to the ways in which media archive files are processed, namely the UMC 105 .
- the methods preserve the synchronous attributes from the original media resources and provide a flexible and extensible storage mechanism, namely the UMF 106 .
- the UMF 106 both represents media resources and also provides the capability to store executable code that can perform additional processing on the same UMF 106 media resources. All new additions and/or modifications to the UMF 106 are handled in a synchronous context in relation to all of the other media resources.
- the unifying system and framework, namely the UMA 107 provides a comprehensive set of services and functions to perform processing operations on media archives, produce media resources, and to provide the services to playback the contents of media archives in a synchronized manner.
- the text in the scrolling transcript window is synchronized with the audio and video of the presenter, as well as synchronized with the POWERPOINT slides, chat window, screen sharing window, or any other displayable media resource.
- the UMA 107 provides other useful services that take advantage of the synchronous processing of the UMC 105 .
- the UMA search services 217 that allow a “synchronous search” down to the spoken word or phrase in a presentation. This is noted as a “synchronous search” because when the search criteria are found, all of the other corresponding media resources are synchronized together in the playback view of the presentation.
- the resulting search for a word or a phrase spoken in the presentation then presents a view to the user with the text in the scrolling transcript (corresponding to the search criteria), which is instantaneously synchronized with the audio and video of the presenter, as well as with the POWERPOINT slides, chat window, or any other displayable resource.
- This use case relates to auto transcription and related technologies. Although the automatic production of transcripts is possible via numerous commercially available speech recognition systems, the transcriptions produced may not be accurate.
- the transcription service 214 of the UMA 107 is used to improve the accuracy of the automatically generated transcripts. This case provides the real time detection of errors in the auto transcription process and a synchronous correction of the errors via correction event notifications transmitted in the UMA 107 via the UMA messaging services 213 .
- the automated speech recognition (ASR) technologies are used to accept simultaneous input from the presenters, and after analysis of the auto generated text, and then interface with a new transcription module in the UMA 107 via transmitting event messages when new technical jargon, acronyms, or other technical slang is detected.
- the transcription analysis module makes an attempt to correct the transcript based on real time input and other external sources.
- the external sources can interface with other programming modules that are building a specialized knowledge base of the technical jargon and acronyms.
- Human assist is an external input to the new transcription module that can be configured to override the computer generated selection for the word and/or phrase suggested for use in the transcript.
- the human corrections to the transcription module can be made by text or voice (taking advantage of the speech recognition services 217 in the UMA 107 .
- Real time self learning is utilized by continuously updating the speech recognition calibration with new conditions.
- self learning and improved accuracy is achieved by continuously feeding audio tracks through the speech calibration engine and monitoring and correcting the results.
- data from an alternate synchronized media resource can be used for correction of errors.
- text from slides or notes associated with slides in a POWERPOINT presentation (or presentation in another format) can be used to correct transcription errors in the audio.
- the proximity of the text in the slides or notes associated with the slides as determined by synchronizing the various media resources narrows down the text to be searched for potential correction of errors and improves the accuracy of the error correction.
- Other examples described above also illustrate how cross referencing can be used for correction of errors in one media based on information available in another media.
- the UMA 107 can be utilized to determine the identity of a speaker.
- a combination of the available digitized media resources from the UMF 106 is used to develop a speaker's biometric profile. Once this biometric is established, it can be subsequently used to identify a speaker via digitized audio and video sources.
- the speech services 217 from the UMA 107 are used to detect phonetic characteristics and attributes from the digitized audio input. This is used to develop a speaker's detected phonetic pattern of speech.
- other techniques of speaker recognition can be used to recognize attributes associated with speaker, for example, face recognition using images found in media files etc.
- the video images available via the UMF 106 are utilized to capture the visual characteristics and attributes.
- the combination of the speaker's detected phonetic pattern of speech and the captured visual characteristics and attributes are used to form a new biometric.
- This new biometric can be subsequently used to assist with the auto transcription process as custom dictionaries and corrective dictionaries can be applied and/or prioritized based on the speaker's detected phonetics derived from the biometric.
- various attributes collected from media archives can be associated with the speaker for recognizing the speaker. Synchronizing media file resources allows association of attributes found from various source with the speaker. For example, topics collected from slides in the presentation can be used to identify special topics of interest for the speaker.
- a speaker may be associated with a set of audience based on information identifying audience, for example, email addresses of participants in an online presentation.
- Historical data can be collected for the speaker to identify typical topics that the speaker is known to present.
- An error in speaker recognition through conventional means can be corrected if it is found that there is significant mismatch of other parameters. For example, a speaker historically associated with computer science related topics is unlikely to present very different topics, for example, medical sciences.
- the additional attributes associated with the speaker can be used for speaker recognition by assisting conventional biometric techniques or else to raise a warning if a large disparity is detected between a speaker recognized by conventional means and the characteristics associated with the speaker via cross referencing of media files.
- the mechanism to use biometric recognition to assist with auto transcription can also be applied to any media asset that contains audio, for example a video file that includes audio.
- the UMA 107 alerts other interested users when it is detected that a speaker of interest is giving a presentation and provides notifications when other users are viewing a recorded presentation for the speaker of interest.
- Speaker identification may be assisted by metadata, e.g. speaker identifier (id) (may be only shown on a conference as “speaker number one” and metadata coupled with a biometric id of the speaker may be used to clarify who it actually is speaking for “speaker number one.”
- Many web conference sessions include a chat session for use by the meeting attendees for an instantaneous questions and answers service. Entries in the chat session are identified by the user's email address. Both the contents of the chat session and the emails for the conference attendees are extracted from the media archive via the UMC media extractor 312 and stored in the UMF 106 . Then users can search the UMF 106 for all comments and questions in the chat window by a user's email address.
- email notification can be sent to all attendees of the meeting when the final production and processing of the media resources is complete and available for playback.
- the email notification contains the link to the presentation(s) that are available.
- notification emails are sent with information (and links) about other available presentations matching the client's interests (as specified in keywords or by related technologies). Again, this is utilizing the emails of the attendees that are stored in the UMF 106 .
- the notification email may contain text similar to the following: “ . . . you might also be interested in these presentations . . . ” This use case is based on the email information that is contained in the media archive file for the meeting/conference and tracking interests (by keywords) for each client.
- viewed presentations are tracked by the members of the meetings. Then something like the following could also be included in the notification email: “You might also be interested in viewing these presentations that were viewed by other members of the meeting . . . ”
- Another embodiment allows sending of reminder notification emails to people attending a meeting. Use the emails that are contained in the UMF 106 for a session/meeting and then use this information to track if the meeting attendees have downloaded/viewed the presentation. If some of the attendees have not viewed the presentation within a specified period of time (e.g. one to two weeks), then send a reminder email notification to the attendees of the meeting notifying them that the additional materials for the meeting are now available for viewing (and provide the link to the presentation).
- This sort of “track back” function can be useful if certain individuals must view the presentation for certification purposes, or for other corporate requirements.
- a UMA 107 service may send a notification email to all attendees that supplemental information has been added to the contents of the original meeting and with the encouragement that they may wish to view/playback the new combined contents of the session or just to view/playback the sections that have been augmented.
- the UMF 106 can contain information other than pure media resource content.
- the UMF can be used to store the usage count for the number of times that the presentation has been viewed.
- a UMA 107 service can email the account administrator when it is detected that the views/playbacks for a given presentation are approaching the maximum number of views. Then means can be provided for the customer to make payments, if so desired, to increase the number of allowable views for the presentation.
- An embodiment provides interactive interface to manually correct problems that were detected and reported during the processing of a media archive using UMF 106 .
- the UMF 106 can include executable code. It was also noted in the description for this disclosure that there is an error reporting mechanism and that errors are categorized by severity levels. This use case builds up an executable script based on the types of errors that were detected in the processing of a media archive and stores this real time generated script in the UMF 107 . The generated script provides a user interface with a list of the detected errors and a suggested/recommended action to take.
- the script contains the code to automatically perform a task at the control of the user. For example, consider the case of a PPT slide that contains a series of URL's. And further consider that the URL has been incorrectly entered onto the contents of the PPT slide. Also consider that this error is detected in the processing of the media archive.
- the automated script is generated to characterize the problem and to provide code to guide the user in the steps to resolve the problem.
- the script from the UMF 106 may present a user interface allowing the user to take action to correct the error.
- an error can be reported “Error: A problem was detected resolving a URL” and options presented to the user to take action “Select: fix problem or ignore problem.” If the user chooses to ignore the problem, the error is cleared from the UMF 106 and no further action is required for this error. If the user selects to fix the problem, another set of prompts is displayed to the user, e.g.: “Select button to Validate URL.” The user is informed of the results of the action. If the URL worked then no further action is necessary, any updates are stored in the UMF 106 and the error cleared. If the URL failed, the user is allowed to make corrections and re-validate. Other examples provide scripts for errors related to other types of problems, e.g. rules set violations, other problems with PPT slides, etc.
- FIG. 9 is a block diagram illustrating components of an example machine configured to read instructions from a machine-readable medium and execute them through one or more processors (or one or more controllers).
- the machine illustrated in FIG. 9 can be used to execute one or more components UMC 105 , UMF 106 , and UMA 107 .
- FIG. 9 shows a diagrammatic representation of a machine in the example form of a computer system 900 within which instructions 924 (e.g., software) cause the machine to perform any one or more of the methodologies discussed herein when those instructions are executed.
- instructions 924 e.g., software
- the machine operates as a standalone device or may be connected (e.g., networked) to other machines.
- the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
- the processes described herein may be embodied as functional instructions, e.g., 924 , that are stored in a storage unit 916 within a machine-readable storage medium 922 and/or a main memory 904 . Further, these instructions are executable by the processor 902 .
- the functional elements described with FIGS. 1 and 2 also may be embodied as instructions that are stored in the storage unit 916 and/or the main memory 904 . Moreover, when these instructions are executed by the processor 902 , they cause the processor to perform operations in the particular manner in which the functionality is configured by the instructions.
- the machine may be a server computer, a client computer, a personal computer (PC), a desktop computer, a laptop computer, a notebook, a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, an IPAD, IPHONE, a customized (application specific) embedded mobile computing devices, a web appliance, a network router, switch or bridge, server blade, or may reside in an specialized pluggable electronic card that is capable of insertion into the chassis of a computing device, or any machine capable of executing instructions 124 (sequential or otherwise) that specify actions to be taken by that machine.
- the machine may be integrated with other commercially available (or special purpose) Audio/Video playback devices, or integrated with other commercially available (or special purpose) Networking and/or Storage and/or Network Attached Storage and/or Media Processing equipment (e.g., CISCO MXE, etc.), or integrated as a set of Object Oriented (or procedural) statically or dynamically linked programming libraries that interface with other software applications.
- other commercially available (or special purpose) Audio/Video playback devices or integrated with other commercially available (or special purpose) Networking and/or Storage and/or Network Attached Storage and/or Media Processing equipment (e.g., CISCO MXE, etc.), or integrated as a set of Object Oriented (or procedural) statically or dynamically linked programming libraries that interface with other software applications.
- CISCO MXE Network Attached Storage and/or Media Processing equipment
- machine shall also be taken to include any collection of machines that individually or jointly execute instructions 924 to perform any one or more of the methodologies discussed herein.
- the example computer system 900 includes a processor 902 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 904 , and a static memory 906 , which are configured to communicate with each other via a bus 908 .
- the computer system 900 may further include graphics display unit 910 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)).
- graphics display unit 910 e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)
- the computer system 900 may also include alphanumeric input device 912 (e.g., a keyboard), a cursor control device 914 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 916 , a signal generation device 918 (e.g., a speaker), and a network interface device 820 , which also are configured to communicate via the bus 908 .
- alphanumeric input device 912 e.g., a keyboard
- a cursor control device 914 e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument
- storage unit 916 e.g., a disk drive, or other pointing instrument
- a signal generation device 918 e.g., a speaker
- network interface device 820 which also are configured to communicate via the bus 908 .
- the storage unit 916 includes a machine-readable medium 922 on which is stored instructions 924 (e.g., software) embodying any one or more of the methodologies or functions described herein.
- the instructions 924 (e.g., software) may also reside, completely or at least partially, within the main memory 904 or within the processor 902 (e.g., within a processor's cache memory) during execution thereof by the computer system 900 , the main memory 904 and the processor 902 also constituting machine-readable media.
- the instructions 924 (e.g., software) may be transmitted or received over a network 926 via the network interface device 920 .
- machine-readable medium 922 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 924 ).
- the term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 924 ) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein.
- the term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
- Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules.
- a hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner.
- one or more computer systems e.g., a standalone, client or server computer system
- one or more hardware modules of a computer system e.g., a processor or a group of processors
- software e.g., an application or application portion
- a hardware module may be implemented mechanically or electronically.
- a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations.
- a hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- hardware module should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
- “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
- a resource e.g., a collection of information
- processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions.
- the modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
- the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
- the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
- SaaS software as a service
- the performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines.
- the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
- any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
- the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- Coupled and “connected” along with their derivatives. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
- the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion.
- a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
- “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Landscapes
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A media archive comprising a plurality of media resources associated with events that occurred during a time interval are processed to synchronize the media resources. Sequences of patterns are identified in each media resource of the media archive. Elements of the sequences associated with different media resources are correlated such that a set of correlated elements is associated with the same event that occurred in the given time interval. The synchronization information of the processed media resources is represented in a flexible and extensible data format. The synchronization information is used for correction of errors occurring in the media resources of a media archive, for enhancing processes identifying information in media resources, for example by transcription of audio resources or by optical character recognition of images.
Description
- This application claims the benefit of U.S. Provisional Application No. 61/259,029, filed Nov. 6, 2009, and U.S. Provisional Application No. 61/264,595, filed Nov. 25, 2009 each of which is incorporated by reference in its entirety.
- 1. Field of Art
- The disclosure generally relates to the field of processing media archive resources, and more specifically, to auto-transcription enhancements based on cross-referencing of synchronized media resources.
- 2. Description of the Field of Art
- The production of audio and video has resulted in many different formats and standards in which to store and/or transmit the audio and video media. The media industry has further developed to encompass other unique types of media production such as teleconferencing, web conferencing, video conferencing, podcasts, other proprietary forms of innovative collaborative conferencing, various forms of collaborative learning systems, and the like. When recorded, for later playback or for archival purposes, all of these forms of media are digitized and archived on some form of storage medium.
- Several types of operations can be performed on media resources that process the information available in the media resource. An operation performed on media resources is speech recognition for processing audio information to generate information that can be represented in textual format. For example, speech recognition engines transcribe spoken utterances from audio files or audio streams into readable text.
- Another operation performed on media resources is optical character recognition (OCR) that processes images or video files/resources to generate textual information. Optical character recognition can fail if the image being processed is blurry and one character can be mistaken for multiple characters, for example, character ‘1’ being mistaken for number ‘1.’ Another operation that is performed with images/videos is image recognition. For example, image recognition based on videos can be performed to identify information available in the video.
- Often information present in an audio resource may be difficult to transcribe. This can happen due to noise present in the audio or due to bad quality of audio signal. As a result, either incorrect information may be transcribed or the speech recognition engine may fail to generate output. Therefore manual intervention may be required for improving accuracy of auto-transcription. Involving manual intervention increases the cost of the auto transcription process. Besides, if the noise level is high, manual intervention also may not help improve the accuracy of generated information. Similarly, image or video signal may be low quality or corrupted, resulting in incorrect recognition of optical characters or objects or faces present in the image/video.
- The disclosed embodiments have other advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.
-
FIG. 1 is an embodiment of the components of the system illustrating their interactions. -
FIG. 2 is a block diagram that illustrates one embodiment of a universal media aggregator system and related programming modules and services including the universal media convertor and the universal media format. -
FIG. 3 is a diagram that illustrates one embodiment of modules within the universal media convertor and the interactions between these modules. -
FIG. 4 is a flowchart illustrating an embodiment of operations of a media archive extractor for a universal media convertor module. -
FIG. 5 is a flowchart illustrating an embodiment for processing of the universal media convertor interpreter. -
FIG. 6 is a flowchart illustrating an embodiment of operations of a universal media convertor module interpreter and associated process interactions. -
FIG. 7 is a flowchart illustrating one embodiment for creation of universal media format and related data storage format. -
FIG. 8 is a diagram that illustrates one embodiment for access of universal media format resources via an application programming interface (API). -
FIG. 9 illustrates one embodiment of components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller). - The Figures (FIGS.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
- A disclosed system and framework processes the contents of media archives. Detailed descriptions are provided for the new and inventive ways to detect and make use of the media archive contents in addition to the new and useful ways in which the representation of the media resources is constructed and presented for ease of programmatic interfacing. The processing of media archive resources is unified. For example, a system accepts input from multiple media archive sources and then detects and interprets the contents from the media archive. The resulting processed media resources are represented in a flexible and extensible data format. The process of detecting and interpreting the media resources preserves all of the synchronous aspects as captured in the original media archive. All of the processed media resources that have been converted to the new flexible and extensible data format are then aggregated into an encompassing unifying system which provides for ease of access to the converted media resources via common application programming interfaces. The systems and methods disclosed are best understood by referring to the drawings and flowcharts that are included in the figures to accompany the following textual description.
- Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
- In some embodiments, a media resource in a given format is converted to another format, for example, an audio resource is converted to text format via transcription. The conversion may be performed via manual transcription or auto-transcription. If the input provided in a certain format does not adhere to strict rules or format of specification, the conversion may not succeed in all cases. For example, transcription of an audio input may not succeed due to the accent of the speaker or if the speaker talks fast in certain portions of the input. Typically several portions of the audio may be transcribed successfully but some words or phrases (also referred to as terms) may fail. The automatic transcription system may provide a confidence score such that a low confidence in the transcribed result indicates potential failure in transcribing. Alternatively, the output term may be determined to be meaningless, for example, if it does not appear in a dictionary used to validate the words.
- Other media resources that are synchronized with the media resource are used for determining the terms that are incorrectly transcribed (or converted). For example, if a POWERPOINT presentation is synchronized with the audio file, the portions of the presentation are analyzed to identify the correct transcription. More specifically, a portion of the slide presentation that is synchronized with the portion of the audio file being transcribed is analyzed. For example, a particular slide that was being described by a speaker in the audio resource is analyzed to determine possible terms corresponding to the terms in the audio input.
- In an embodiment, the inaccurate transcription of the term performed by the auto-transcription process can be used to aid the search of the correct term. For example, the resemblance between the incorrect transcription of the term and the potential candidates can be analyzed, based on distance between the terms (measured by number of characters that mismatch between the terms) or based on phonetic similarities between the terms. Matching between terms can be performed by determining a match score and comparing the match score of different terms compared with an error term obtained from the auto-transcription output. The ability to reduce the search of the terms to a small portion of other synchronized media resources increases the chances of identifying the term accurately. For example, if the term was searched through the entire space of possible correct terms, there may be several potential candidate terms, whereas if the search space was restricted to a small portion of other media resources the number of potential candidate terms is smaller. Identifying terms can be particularly difficult if the terms refer to proper names since the proper names may not be available in a dictionary. However, other synchronized media resources are likely to contain the particular name. For example, a speaker may refer to a particular software release by a code name. The code name may be based on an obscure term or a proper name not found in typical dictionaries. In some cases, the term may be based on an acronym that does not accurately correspond to the way the term is pronounced. For example, the term “SCSI” is pronounced as “scuzzy,” which may be difficult for an auto-transcription system to transcribe. This helps narrowing down the correct term faster and with greater accuracy.
- Other examples of conversion from one format to another include optical character recognition (OCR). Optical character recognition can be performed on an image associated with a media resource, for example, an image belonging to a video associated with the media archive. Certain terms obtained from OCR may be incorrect. For example, a word “flower” may be identified as “flower.” Some of these errors are difficult to fix if they refer to proper names that can only be recognized based on the context. These errors can be detected by comparing the terms generated by OCR with a dictionary of terms. The incorrect terms can be corrected by identifying corresponding terms in another media resource that is synchronized with the media resource being converted via OCR. A subset of the other media resource that is synchronized with the portion of the media resource being converted is analyzed to narrow down the search of the terms.
- Another example of conversion is object recognition in image or facial recognition. In case of facial recognition, information associated with various participants in a media resource is analyzed to determine a face visible in an image. The candidates for objects in the synchronized portion of other media resources are analyzed to determine the correct object being identified. For example, other identified objects are analyzed to eliminate objects that have been recognized as being different from the object being currently identified. Potential candidate objects are compared with characteristics of the portion of image being identified. For example, even though the information available in the image may not be sufficient for identifying the object accurately, the information may be sufficient to eliminate potential candidates thereby reducing the search space. Matching of the features of the objects and the image is performed to determine best match from the candidate objects that were not eliminated. If the best match is within a threshold comparison distance with the image features, the match is identified as the object. In an embodiment, a confidence score is provided along with the object to indicate a measure of accuracy with which the object is recognized.
- In an embodiment, an audio or text (e.g., text obtained by transcribing the audio or a script of a chat session) is analyzed to determine the person in the image. For example, if an image of a person is present in the video, there is a reasonable likelihood that the person is mentioned in the audio associated with that particular portion of the image or the person is one of the participants associated with the media resources, for example, the person can be a contributor in the media resource or the person can be a topic of discussion.
- Embodiments of a disclosed system (and method) detect errors in a media resource by comparing information found in the media resource with expected information. For example, a slide number for each slide may be expected to increment by one for each slide. If two consecutive slides are found in which the slide numbers have a gap, slides may be missing between the consecutive slides. Similarly, a title may be expected at the top of each slide. If a slide it found with a missing title, the slide is considered a candidate for correction.
- Information available in the media archive is used to determine corrections to errors found in a media resource. Information available in media resources other than the media resource in which error occurs can be used to correct the error. Correlating portions of the various media resources allows focusing on specific portions of other media resources for identifying corrections to errors. For example, a slide presentation on web may be correlated with a transcript of an online session. Portions of the transcript are correlated with individual slides of the presentation. To identify errors that occur in a particular slide, a portion of the transcript correlated with the particular slide is analyzed to identify corrections of the error.
- In one embodiment, a system and framework for processing media archive resources comprises the following components: the universal media converter (UMC), the universal media format (UMF), and the universal media aggregator (UMA).
- The UMC accepts media archives as input, detects and interprets the contents of the archive, and processes the contents of the media archive. The UMC performs a “self healing” function when anomalies in the original media archive are detected. The UMC also produces a representation of the media archive resources in the UMF. The synchronization between all media archive resources is preserved and/or enhanced during the UMC processing and subsequent creation of the UMF.
- A media archive can comprise a collection of digitized components from a recorded media event that includes media/multimedia presentation, teleconference, recorded video conference, recorded video via camcorders (e.g., FLIP video), podcast, other forms of broadcast media (e.g., television (TV), closed circuit television (CCTV)), etc. The collection of digitized components may be in compressed or uncompressed form and may be in a standardized format or a proprietary format. The media archive may be digitally stored in a single file, or may be represented in a data stream transmitted over a network, or the individual components may be loosely coupled (e.g., interact with each other over network) and reside in different storage locations and then aggregated into a single cohesive media archive at processing time. Examples of digitized components may be a combination of any or all, but not limited to audio, video, MICROSOFT POWERPOINT presentation, screen sharing, chat window sharing, webcam, question and answer (Q&A) window, textual transcript, transcripted words with timings for the transcripted spoken words, background music (e.g., alternate audio tracks), etc. The media archive resources (or media resources) are digitized components that are part of the media archive. The media resources may be associated with data captured via a webcam, audio recording devices, video recording devices, etc. Examples of media resources include, screen sharing resources, POWERPOINT slides, transcripts, questions and answers, user notes, list of attendees, etc. The media resource may reside in a media file, a data stream or other means of storing/archiving computer information or data. The information stored may be associated with one or more events, for example, conferences, meetings, conversations, screen sharing sessions, online chat sessions, interactions on online forums and the like.
- The UMF is a representation of content from a media archive as well as any extended or related resources. The format is flexible as new resources can be added to the contents of the UMF and existing resources can be modified. The UMF is extendable and supports proprietary extensions. The UMF facilitates the ease of access to the components from a media archive.
- The UMA is the encompassing system that supports and controls processing of the requests for media archive extractions, media archive conversions, UMF generation, playback of recorded conferences/presentations/meetings, and the like.
- The UMF beneficially interfaces with a proliferation of proprietary media archive formats and serves as a simplified integrating layer to the complexities of the various media archive formats. The UMC beneficially determines media archive formats, autonomously corrects errors in media archive resources, and controls the scheduling of the processing steps required to extract resources from the media archive and synchronously represent these interrelated resources when creating the UMF. Once the UMF data and structures are created, they are aggregated in the UMA. The UMA provides a common means of interacting with a selected UMF. Enhanced synchronized search capability and synchronous playback of all detected and newly added media resources is possible since UMC preserves and/or enhances the synchronization of the media resources. The function of each of these above mentioned system components is described in further detail in the following sections.
- Turning now to FIG. (
FIG. 1 , it illustrates the interactions of the components of the system used to process media archives. In particularly, in one embodiment the systems components are the universal media converter (‘UMC’), the universal media format (‘UMF’), and the universal media aggregator (‘UMA’) introduced previously. - As shown in
FIG. 1 , the UMC accepts (or receives) input from different media sources 101-104. The depicted, and other new and emerging, types of media sources are possible because of a convergence of available technologies, such as voice, audio, video, data, and various other forms of internet related collaborative technologies email, chat, and the like. - Representative examples of the
telepresence media sources 101 include CISCO TELEPRESENCE, HP HALO, and telepresence product offerings from TANDBERG. Likewise, there are other well known industry solutions forvideo conferencing 102 andweb conferencing 103. TheUMC 105 is adaptable to support new forms ofother media sources 104 that are available in the industry or can emerge in the future. - The
UMC 105 detects and interprets the contents of thevarious media sources UMC 105 interrogation, detection, and interpretation of themedia sources UMF 106. - The
UMF 106 is a representation of the contents from a media source 101-104 and is also both flexible and extensible. The UMF is flexible in that selected contents from the original media source may be included or excluded in the resultingUMF 106 and selected content from the original media resource may be transformed to a different compatible format in the UMF. TheUMF 106 is extensible in that additional content may be added to the original UMF and company proprietary extensions may be added in this manner. The flexibility of theUMF 106 permits the storing of other forms of data in addition to media resource related content. - The functions of both the
UMC 105 and theUMF 106 are encapsulated in the unifying system andframework UMA 107. TheUMA 107 architecture supports processing requests forUMC 105 media archive extractions, media archive conversions,UMF 106 generation, playback ofUMF 106 recorded conferences/presentations/meetings, and so on. TheUMA 107 provides other related services and functions to support the processing and playback of media archives. Examples ofUMA 107 services range from search related services to reporting services, as well as other services required in software architected solutions such as theUMA 107 that are known to people skilled in the art. Additional details of theUMC 105, theUMF 106, and theUMA 107 follow in the following further detailed descriptions. - In one embodiment, each of the
components UMC 105,UMF 106, andUMA 107 can run as a separate programming module on a separate distributed computing device. In an embodiment the separate computing device can interact with each other using computer networks. Alternatively thecomponents UMC 105,UMF 106, andUMA 107 can be executed as separate processes executed on one or more computing devices. In another embodiment, thevarious components UMC 105,UMF 106, andUMA 107 can be executed using a cloud computing environment. In one embodiment, one ormore components UMC 105,UMF 106, andUMA 107 can be executed on a cloud computer whereas the remaining components are executed on a local computing device. In one embodiment, thecomponents UMC 105,UMF 106, andUMA 107 can be invoked via the internet as a web service in a service oriented architecture (SOA) or as a software as a service (SaaS) model. -
FIG. 2 depicts the modules that when combined together, form the unifying system and framework to process media archives, namely theUMA 107. It is understood that other components may be present in the configuration. Note that theUMC 105 is depicted as residing in theUMA 107 services framework as UMC extraction/conversion services 218. - Note also that the
UMF 106 is depicted in theUMA 107 services framework as UMFuniversal media format 219. Theportal presentation services 201 of theUMA 107 services framework contains all of the software and related methods and services to playback a recorded media archive, as shown in the mediaarchive playback viewer 202. The mediaarchive playback viewer 202 supports both the playback ofUMF UMA 107 also comprises a middletier server side 203 software services. Theviewer API 204 provides thepresentation services 201 access to server side services 203.Viewer components 205 are used in the rendering of graphical user interfaces used by the software in thepresentation services layer 201.Servlets 206 and relatedsession management services 207 are also utilized by thepresentation layer 201. - The
UMA framework 107 also provides access to external users via aweb services 212 interface. A list of exemplary, but not totally inclusive, web services are depicted in the diagram asportal data access 208, blogs, comments, and question and answer (Q&A) 209,image manipulation 210, and custom presentation services, e.g., MICROSOFT POWERPOINT (PPT) services 211. TheUMA 107 contains amessaging services 213 layer that provides the infrastructure for inter-process communications and event notification messaging.Transcription services 214 provides the processing and services to provide the “written” transcripts for all of the spoken words that occur during a recorded presentation, conference, or collaborative meeting, etc. thus enablingsearch services 216 to provide the extremely unique capability to search down to the very utterance of a spoken word and/or phrase.Production service 215 contains all of the software and methods to “produce” all aspects of a video presentation and/or video conference.Speech services 217 is the software and methods used to detect speech, speech patterns, speech characteristics, etc. that occur during a video conference, web conference, or collaborative meeting, etc. The UMC extraction/conversion service 218, UMFuniversal media format 219, and theUMF content API 220 will be each subsequently covered in separate detail. - The diagram in
FIG. 3 illustrates one embodiment of a processing flow of the major software components of the UMC media extractor/converter 218. The processing starts withmedia archive input 300. Refer toFIG. 1 media sources job listener 304 monitors items on the request bus 302 queue and pulls items from the queue that are jobs to process media archives and passes these jobs to thedelegator 306. Thedelegator 306 is configured to determine if the input media archive is of a knowndata type 308 and then to delegate accordingly to either thedata inquisitor 310 for unknown resource types or to optimized extractor anddependency builder 312 for known resource types.Data correction module 314 provides fix-ups for detected deficiencies in presentation slides and provides for synchronous corrections to prevent jitter during the playback phase of a recorded presentation. Further details on theUMC extractor 312 and the associatedautomated data correction 314 are provided inFIG. 4 and accompanying description. - The function of the
data inquisitor 310 is to interrogate the contents of the media archive and determine if theUMC 105 is configured to support the types of media resources that are contained in themedia archive 300. If thedata inquisitor 310 detects supported media resources (e.g., moving picture expert group's (MPEG) MPEG-1 Audio Layer 3 (MP3), MPEG-4 (MP4), WINDOWS media video (WMV), audio video interleave (AVI), etc.) then corresponding objects to handle the extraction are created for use by theUMC extractor 312, updates thedelegator 306 with information for the known media resource type, and then passes therequest 300 to theUMC extractor 312 for processing. Errors are logged and the processing of the request is terminated when the data inquisitor determines that theUMC extractor 312 is unable to process the contents of themedia archive 300. - The
UMC extractor 312 is configured as described herein. TheUMC extractor 312 creates an index for each of the media resources that are contained in the media archive. The index contains information identifying the type of resource and the start and end locations of the given resource within the contents of themedia archive 300. TheUMC extractor 312 uses a new and inventive process to determine content in the media archive that is in an unknown (proprietary) format and to created identifiable media resources from this data for subsequent storage in theUMF UMC extractor 312 utilizes thedata correction module 314 to “fix up” the data that is discovered in the supplemental extra data in a media archive. - Additional details for the supplemental media archive data are documented in
FIG. 4 and the associated textual functional description. A dependency list is built for the discovered media resources. This ordered list of media resources and the index information for all extracted media resources are then passed to the data object bus 316 via a queuing mechanism. Theinterpreter 318 is configured to process decisions based on the information that is supplied from thedata inquisitor 310 and theextractor 312. Theinterpreter 318 monitors the queue of the data object bus 316 and pulls objects of the queue if it is determined that the interpreter can process the data in the object via one of associated mediaresource data processors 322. Theinterpreter 318 is configured to perform complex tasks that include scheduling, processing, and performing corrections of media resources, which are further described inFIGS. 5 and 6 and their associated descriptions. Once theinterpreter 318 determines that all of the tasks to process the media archive request are completed, theinterpreter 318 then queues the resulting output to the processor object bus 320. The data collator andUMF creation process 324 monitors the queue of the process object bus 320 and retrieves the completed media archive requests. - The completed data associated with the media archive request is then transformed into the
UMF 106 via the methods contained in the data collator andUMF creation process 324. Once theUMF 106 is created then these resources are made available to the portal assets/resources 328 for use in applications running in thepresentation services layer 201. The createdUMF 106 can also be transformed to other archive formats 330 or other requestedfile formats 332 via theUMF translator 326. Further details on theUMF 106 creation and theUMF 106 storage description are described with respect toFIG. 7 and the accompanying text. Further details on accessing the contents of theUMF 106 are contained inFIG. 8 with accompanying textual description. - Turning now to
FIG. 4 , it describes one embodiment of processing steps that occur in the optimized extractor anddependency builder 312. Processing steps 400 through 406 illustrate how theUMC extractor 312 extracts well known media resource types from the media archive. Examples of well known media resources include, but are not limited to, the following formats WAV, AV1, MP3, MP4, MPEG, PPT, etc. The extractor performs operations on the media archiveraw data 400. The process of extracting the well known media resources from themedia archive 400 start with reading a data block from themedia archive 400, and if not at the end ofinput data 404, then proceeds to theinterrogation step 405 to determine if a well known media resource type is found in the data. The data block is examined by comparing the patterns in the data block to typical patterns in various formats of media resources. The comparison can be accomplished via any well known means (e.g., performing boolean and operation, boolean or operation, or boolean exclusive or operation on data) or via other well known tools such as regular expression comparison tools (for example, REGEX). If the examination of the data block 405 determines that the format of the data in the data block 405 is of a known media resource type, then an index is built 406 for the found media resource comprising a data structure that identifies the position of the media resource within themedia archive 400. The index for each detected media resource includes both the starting and ending location of the media resource within themedia archive 400. - This process of reading
data blocks 402 and interrogation of the data block via an introspection process continues until all of the data in themedia archive 400 is sequentially processed and the end of thearchive check 404 evaluates to true. If the end of the archive data is reached, the supplemental, non-standard, data is extracted 408 from themedia archive 400. The process of finding meaningful media resource data from the supplemental data in a media archive requires multiple passes through the data, where each pass through the data adds new information based on the information collected in previous passes. It is not required to pass through the entire contents of themedia archive 400 insteps 410 throughstep 414. Searching through the supplemental data is optimized since all of the index locations in themedia archive 400 identified instep 406 can be bypassed. - In the
first pass 410 through the supplemental data, the UMC optimizedextractor code 312 searches for repeating patterns in the data. An example of a repeating pattern identified in the first pass, is a pattern like x02F0 that would be repeating throughout this supplemental data block. In another embodiment, the types of patterns searched in the first pass include clear text ASCII keywords that repeat throughout the supplemental data block, e.g., “Page”, “Slide”, “Event”, etc. In an embodiment, a sequential scan of all of the supplemental data is performed in the first pass to identify patterns. From the results of this pattern identification, a matrix is developed, where the matrix includes: the pattern identifier, number of occurrences, and the locations (e.g., binary offset from beginning of file) of the occurrences in the supplemental data. Once these initial patterns are detected, then thesecond pass 412 through the supplemental data is searching for “regular” incrementing or decrementing patterns within a close proximity to the repeating patterns that were detected in thefirst pass 410 through the supplemental data. The identification of regular patterns is performed similar to the sequential scan mechanism described above for identifying patterns. In an embodiment, the close proximity is a configurable parameter, e.g., 128 bytes, but in practice this value could be larger or smaller. - These regular occurring patterns are associated with “human driven” events, e.g., they may correspond to MICROSOFT POWERPOINT slide numbers or the progression of slide flips that occurred within the recorded conference, meeting, or presentation. In an embodiment, a video associated with a presentation can be analyzed to identify a slide flip by identifying a change in a portion of the image associated with the slide projection. The slide flip identified in the video is a human driven event that can be utilized as a regular pattern. In another embodiment, an audio or a transcription of the audio associated with a presentation can be analyzed for regular events for example, the speaker speaking “next slide please.”
- The
third pass 414 through the supplemental media archive data now searches for “non-regular” incrementing patterns in the locations in close proximity to the previously detected data patterns. Non-regular patterns do not occur on set intervals (the interval may be time code based, or based on the amount of data stored between pattern occurrences). The time code is a timing value that is detected in the media archive in close proximity to the media resource, and the associated detected regular pattern (if found). The time code timing value, has the characteristic of always incrementing in larger increments (e.g., in milliseconds), for example, greater than one or two, and may be bounded by the total length of time of the detected audio file from the archive. Set interval indicates the human events type of interval, e.g., a slide flip or a slide number where the detected intervals are small integers. The integer values for these “set intervals” may be stored in, and detected in, a byte, integer, long, etc. The non-regular patterns that are identified are the patterns that have the property of proceeding in an incrementing manner. The recognition of non-regular patterns is performed using steps similar to those described for repeating or regular patterns with the exception of searching for the properties associated with the non-regular data patterns. - Regular incrementing patterns are values that fall within a range, e.g., where the range is the number of pages in a POWERPOINT presentation. Further sanity checks on the possible meaning for a detected regular pattern can be applied, e.g., there are not likely to be thousands of slides within a one hour POWERPOINT presentation. The non-regular numbers may be time values in milliseconds granularity and therefore very large numbers outside of the range of the number of slides from the example POWERPOINT presentation. Therefore, using these types of inferences, human generated events (e.g., POWERPOINT slide flips) are distinguished from computer generated events such as recording time stamps in milliseconds. These non-regular numbers may appear to be random in nature, but will notably be progressing ever larger. This type of detected non-regular, repeating, seemingly random, continually incrementing, patterns are determined to be timing values that are associated with the “human driven” events that were detected in the contextual pattern matching that occurred in the
second pass 412. These detected timing values from thethird pass 414 can be further validated to ensure that the timing occurs within the time range of the audio track from the media resource that was detected inprocess step 406. In the overall flow of processing described above, each pass has a smaller amount of data to analyze compared to previous passes. - In another embodiment, a media archive may contain a media resource representing the audio and video from an on-line (internet/intranet based or other form of public/private network based) meeting. Consider further, that the meeting has a number of attendees/participants/panelists/subject matter experts and that during the course of the meeting that one presenter may “pass the ‘mic’ (e.g., a virtual microphone)” between the speakers. Also, consider that the number of speakers for the virtual meeting have been identified and/or detected in the processing of the media archive. In this case, the first pass identifies the “pass the ‘mic’” type of event, the second pass detects the integral speaker id's (e.g., the number within the range of speakers), and the third pass detects the timings for the speaker change event transitions.
- In some embodiments, a specific pass may identify information in one media resource of the media archive but not in another media resource of the media archive. For example, assume there are two files F1 and F2 in the media archive. It is possible that the first and second passes detects information in file F1 but only the third pass detects information in file F2. For example, file F1 may represent a POWERPOINT presentation with “
SLIDE 1,” “SLIDE 2,” annotations. File F2 is an audio file that does not have the “SLIDE 1,” “SLIDE 2,” annotations but has timing information based on the third pass. In this case, the information in the two files can be correlated via the timing information detected in the third pass. During playback, the timings associated with the file F1 are correlated with the file F2 by advancing to the appropriate time interval of the file F2 (the audio file). - In one embodiment, a matrix structure is built to hold the detected information from each pass. If a particular pass did not return any information, the matrix does not store any values for the corresponding positions. For example, if the second pass is performed and does not detect any regular pattern, the matrix structure does not store any values in the appropriate positions. In some embodiments, the third pass may detect information and store the corresponding information but the second pass may not detect any information. Typically, the matrix may contain information detected in the first pass and the second pass or information detected in the first pass and the third pass, or information detected in all three passes. In one embodiment, the first pass is always performed but the second and third pass may be optional. For example, the second pass may be considered optional if the associated data is absent. The third pass may be relevant depending on the type of associated media resource/event that is detected/discovered in the first pass.
- In some embodiments, the timing values associated with the media files may be estimated by using interpolation. For example, if the timing information available in the media resource is too far apart and the regular incrementing pattern occurs relatively more frequently, the timing information corresponding to the regular incrementing patterns can be estimated (by interpolation) and added to the metadata. In this situation the non-regular incrementing patterns may be detected anywhere in the file and estimated for the locations corresponding to the regular incrementing patterns.
- Once the requisite passes have completed, the result is that a new media resource with associated events and timings has been discovered in the supplemental data within a media archive. The new resource types are identified and made known to the system and are used for
future identification 405 of data. These newly discovered media resources are able to be synchronously connected to all of the other media resources (audio, video, transcripts, etc.) in themedia archive 400. For example, if the newly discovered media is a POWERPOINT presentation, then all of the slide flips, in the exact order in which they were presented, can now be synchronized with all of the other media resources (audio, video, transcripts) during the playback of the media event (conference, presentation, meeting). - In cases where an audio file is converted to textual representation via auto-transcription, the auto transcription process keeps the timings for each detected spoken utterance (word) and there is a correlation between the timing of the spoken word (that was auto transcribed) and the associated timing of the spoken word within the audio file. The similar is true for manually generated transcripts except that the timings are at a fragment granularity (e.g., a sentence) instead of for each spoken word.
- The synchronization information is stored in the UMF. In an embodiment, this synchronization information is stored in the UMF in the
events 726 block, e.g., the case where the event is a synchronization timing event. Details of how the data is represented in the UMF are provided below. In one embodiment, the information from the three passes is stored in the UMF, and correlations are performed on the fly with respect to the requested search criteria. - The following examples, illustrate cross referencing of media files based on synchronization information for the media resources. In one embodiment, an audio file is monitored, and speech recognition used, to determine when the speaker says something like, “Now viewing the following screen” indicating that the speaker is now transitioning to a screen sharing view. When the audio detects this type of pending transition, then a “smoothing factor” is applied to slowly/gradually transition the view from the speaker to the view of the computer display screen. Therefore, the interaction between the audio and video resources. The benefit here is that special effects can be introduced that were not originally intended in the original media resources (e.g., fade in/out) in order to improve the play back qualities of an originally recorded media presentation.
- The UMC identifies errors in media resources for correction and determines potential corrections for the errors. Typical media archives have expected pieces of information in specific media resources. For example, a slide presentation can be expected to have a slide title in every slide. The expected information in a media resource may correspond to specific patterns that are expected. For example, the slides numbers are expected to increment by one in consecutive sides. The expected information may be inferred based on the context. For example, is the slides are numbered in a presentation, a slide number can be used to infer the number written on the next slide. As another example of expected information, the terms used in the media resource may be expected to be found in a dictionary of terms. Similarly, if a transcript states the speaker as mentioning “Next slide please,” the corresponding synchronized portion of a slide presentation is expected to have a change to the next slide.
- An error in a media resource comprises deviations in information from expected information in the media resource. The terms of a media resource may be expected to occur in a dictionary and a typo resulting in a term that is not found in the dictionary is a deviation from the expected information. Similarly, in a slide presentation, each slide may be expected to have a title. A slide with a missing title contains a deviation from the expected information. An error can be localized to a specific portion of a media resource, for example, the error may occur in a particular slide of a presentation.
- Synchronization of the various media resources from a media archive allows media resources to be correlated. Corrections to the errors found in a media resource are determined by analyzing other media resources of the archive. Portions of other media resources of the media archive that are correlated with the specific portion containing the error are identified and analyzed. For example, a slide presentation on the web may be correlated with a session transcript. Errors found in a particular slide are corrected by analyzing portions of the transcript corresponding to the particular slide with errors. For example, a name of a company that is misspelled in a slide is corrected based on the spelling of the name in the transcript.
- In some embodiments, expected information in media resources corresponds to patterns of events that correspond to correct synchronized flow of events. Deviations from these patterns representing correct flow of event are identified as errors in media resources. Portions of the media resources are corrected by editing the media resources. Portions of other synchronized media resources in the media archive are correspondingly edited to keep the media archive consistent.
- As an example, if a speaker states the he is moving to slide number 10, however the PowerPoint slide at this synchronized point in time is actually slide number 9. If the speaker continues speaking with slide 9 being shown to the audience, the speaker's statement that he is moving to slide 10 is inferred to be an error. In this case the textual transcript for the speaker is corrected to state the actual slide number that is in view at the synchronized point in time.
- Another example illustrates how errors in multiple media resources of a media archive can be corrected based on information indicating an error inferred from one of the media resources. In this example, the audio resource for a speaker makes reference, at a specific point in time during a presentation, to a product code named “Potomac” which is supposed to represent a 4G mobile broadband router. After review, it is discovered that the speaker made an error and was actually making a reference to the code name for the 3G mobile broadband router and the correct code name for the 4G router was “Chesapeake”. In this case, the speaker transcript is synchronously corrected and the following other media resources of the media archive are also synchronously corrected to make them all consistent: screen sharing resource, power point slide including speaker notes, user notes/comments, and optional Q&A
- In one embodiment, the errors are statically detected from a stored archive. The media resources from the stored archive are read and errors detected as well as corrections to the errors identified. The corrections to the errors may be automatically applied to the media resource and the corrected media resource stored in the media archive or as a corrected copy of the media archive. In another embodiment, corrections are dynamically applied to the synchronized media resources during playback. As the playback of the various portions of the synchronized media resources is performed, the various portions are analyzed to detect errors and further analyzed to determine corrections to the errors which can be applied. As a result, the user views the corrected versions of the media resources.
- In an embodiment, the error correction process is applied during a live presentation. For example, in real time, a subject matter expert can monitor the audio of a live presentation. In this case the subject matter expert is acting in the role similar to that of a proof reader. Any changes/corrections made by the subject matter expert (see example on correcting product code names in [0060] above) are propagated to the other media resources in real time: text transcription, power point slide (including speaker notes), user notes/comments, etc.
- Further examples illustrating the self healing possibilities of the discovered data are presented. Consider that the event data and timings are associated with a POWERPOINT presentation. The
self healing process 416 can analyze the timings between the slide flip events and the associated slide numbers. Then to “self correct” possible jitter in the playback of the presentation, certain events may be filtered out of the media resource. For example, if the presenter mistakenly skipped ahead two or more slides, then the self correcting code can detect these multiple advances in slide flips followed immediately, almost instantaneously, to return to a previous slide number. In this case, the self correcting code can remove, i.e. filter out, the mistaken advances in the progression of the slide presentation and thereby reduce jitter by proceeding only to the intended slide. In general, deviations from an expected pattern are detected to identify errors that need correction. Rules may be associated with the potential deviations to distinguish errors from normal behavior. For example, if the time interval during which a presenter advances by skipping slides and returns to the previous slide is greater than a threshold value, the behavior may be considered a part of the presentation. On the other hand, if the time difference is below the threshold value, the behavior is considered an error. - Other types of
self healing 416 on the newly detected events can occur as well. Consider the same case of the POWERPOINT presentation and assume that numerous slides in the presentation are missing a table of contents. In this case theself correcting code 416 can generate a table of contents by examine the contents of the POWERPOINT slides and using the slide title as the entry in the table of contents. In this case further self correcting in 416 occurs for the case where a slide is missing a title. In this case the self correcting code examines the slide for other text near the top of the slide and/or looks for other text on the slide that is in bold, underlined, or a larger point size font, and uses that text for the table of contents entry. It should be clear that variations of this self correcting methodology can be applied to any data source and can be used to derive topics from other digitized media sources. - In an embodiment the usefulness and value of a presentation can be extended by adding User Notes/Comments synchronously as a media resource to the media archive, after the occurrence of the actual live presentation. These comments/corrections, made by subject matter technical experts, are made “after the fact” (i.e. after the original recording of the presentation) and this supplemental material is synchronously inserted into the original media archive. These user comments may include links to new white papers or technical reports and thus enhance the value and usefulness of the original presentation. Accordingly, synchronously added new material can either correct or embellish the content of the original recorded presentation.
- Examples of self correction applied to various data sources include, but are not limited to, video/image optical character recognition (OCR), video/image context recognition, audio phrase detection by context, and via transcript analysis. Following examples illustrate self correction applied to various data sources.
- Using Image OCR for identifying text: in this example assume that there is an unknown (or misspelled) word in a transcript. The synchronization capability of all media resources allows analysis of the corresponding video image for the same point in time. The image analysis is used for inferring various kinds of information from the images, for example, text recognized using OCR techniques. For example, the image analysis may indicate that there is recognizable text within the image, e.g. the word “Altus” is identified via OCR. A decision made to determine if the unidentified (or misspelled) word in the transcript is the word “Altus”. Use of OCR in images allows material to be used for error correction that is outside the media file containing the error.
- Using image context recognition: in this example assume that there is an unknown (or misspelled) word in a transcript. Because of the synchronization capability of all media resources, the corresponding video image for the same point in time is analyzed. Then the image is analyzed and a “toaster” is identified. A decision is made to determine if the unidentified (or misspelled) word in the transcript is the word “toaster”. Accordingly, object recognition techniques can be used to provide information for correction of errors. The object recognition may be applied to media information available in files distinct from the file containing erroneous information being corrected.
- Using image analysis to correct problem with bad audio: In this example assume that there is a section of the audio is bad (i.e. insufficient recording quality) and it was impossible to produce a textual transcript based on the playback of the audio. Because of the synchronous properties of all media resources, it is conceivable that the video, for the same time codes as the missing transcript, can be analyzed and then “lip reading” of the speaker in the video is analyzed to produce the words for the missing transcripts. Lip reading of a speaker in a video can be performed using software or manually.
- Transcript Analysis to correct missing slide titles and table of contents entries: In this example assume that the textual transcript is correct, but some of the MICROSOFT POWERPOINT slide titles and corresponding table of contents entries are missing entries. In this case, the contents of the textual transcripts, for the same time codes as the missing slide titles, can be analyzed and a title and corresponding table of contents entry may be formulated from the synchronous contents of the textual transcript.
- Another use case assists transcription services. Typically the transcriptionist is playing back audio at varying speeds in order to create the textual transcription. With the synchronized media resources, the synchronized PPT can be displayed in a separate window during the transcription process. If the transcriptionist cannot determine the word from the audio to transcribe to text, the synchronized PPT will highlight the display likely possibilities and then the transcriptionist can drag and drop the desired word from one synchronized media resource (i.e. the PPT) to another (i.e. the textual transcription). The selected word is then also added to the speech recognition dictionaries for future transcription accuracy. Possible to also correct other files with the similar problem. An automatic transcription mechanism can use the synchronized information while deciphering audio information to assist with recognition of words that are difficult to recognize without additional information.
- The above examples are also examples of the cross referencing of different synchronized media resources to auto-correct problems (or to assist in the correction of problems). Some problem areas solved via auto-correction self healing: jitter correction (re: slide flips and other synchronous transitions between media resources), programmatic special effects by using the synchronized time codes and detected audio to improve viewing quality during play back (e.g. fade in/out, etc.) Media archives contain other types of supplemental information including, but not limited to, online chat session archives, attendees email addresses, screen sharing sessions, etc. and the newly disclosed processing techniques can be used to detect, synchronize, and extract these other types of media resources contained within a media archive and represent them in the
UMF 106. - In some embodiments, topic generation or categorization techniques based on text analysis of the content of the slide can be used to generate a topic. In alternative embodiment, the topic can be generated based on information available in alternate media synchronized with the presentation. For example, a transcription of the audio data associated with the presentation can be analyzed to find the text representing the information described by the speaker during the same time period that the slide was shown. The information in the audio data can be used for self healing of the slide, for example, the title of the slide can be generated from the audio data. Accordingly, information from one media resource can be used for self healing of information in another synchronized media resource. The examples presented for self-healing also illustrate cross-referencing between media files of different format in a media file archive to auto-correct problems (or to assist in the correction of problems). The following examples, further illustrate cross referencing of media files and self correction.
- An audio file is monitored, and speech recognition used, to determine when the speaker says something like, “Now viewing the following screen” indicating that the speaker is now transitioning to a screen sharing view. When the audio detects this type of pending transition, then a “smoothing factor” is applied to slowly/gradually transition the view from the speaker to the view of the computer display screen. Therefore, the interaction between the audio and video resources. The benefit here is that special effects can be introduced that were not originally intended in the original media resources (e.g. fade in/out) in order to improve the play back qualities of an originally recorded media presentation. Example 2: The audio is monitored for utterances like, “Now going to slide 5”, indicating that the speaker is going to flip to the slide number that he verbally stated. But consider that the speaker actually made a mistake and that he really intended to advance to slide number 4 (even though he said 5). In this case, the original transcript would also be incorrect since the transcript is a textual representation of the spoken words. In this example, the slide number from the synchronized slide (e.g., MICROSOFT POWERPOINT) resource is compared with the audio phrase that was captured in the synchronized audio resource and if there was a mistake detected, then the transcript can be corrected to reflect what actually transpired during the recording of the presentation. E.g. the transcript might have notation like the following: “Now moving to slide four [note: auto-corrected to four] . . . ” In this example the cross referencing of synchronized resources of audio, slides (e.g., MICROSOFT POWERPOINT), and transcripts to solve the problem.
- Referring next to
FIG. 5 , it provides functional description of one embodiment of theUMC interpreter 318. Ajob listener 500 receives a request to process a media archive or other assets from a configured queue. This request is passed to ascheduler component 502 which retrieves the rulesets for processing the requested job from a data store (configuration file or database) 508. It initializes a jobstate tracker object 506 which will contain the state of the individual tasks which are required to complete processing of the job, and checks the systemresource monitor daemon 504 for the available system resources (amount of free memory, central processing unit (CPU) utilization, and number of free CPU cores (a CPU core is a dedicated one-way processor within an n-way processor/processor group if on an SMP (Symmetric Multi-Processor) machine), as well as general input/output (I/O) state) to determine the number of parallel threads that the system is capable of running Once the number of parallel threads has been determined, thescheduler 502 builds a hierarchy of tasks in the form of task request objects 520 based on their interdependencies (e.g., the output of one task may be the input of another), estimated runtime, and amount of resources consumed, etc. - The
scheduler 502 notes the number of and type of tasks required to process the request in thestate tracker object 506, sorting by order of processing and grouping by individual thread, and assigning each task a unique ID for tracking. Once this registration is complete, the scheduler begins passing the individual task requests into theprocessor object bus 514 via a message passing interface known as theprocess requestor 512. The process requestor 512 is responsible for passing individual task requests into the bus, in the form of task request objects 520, and listening for responses from completed tasks. The individual task requests 520 take the form of object messages, each extending a parent task request object class, overriding data sections as needed so that each is identifiable as a specific object type, as well as having an embeddedID 522 as assigned by thescheduler 502 earlier. - Each individual processor/
data manipulator 516 listens to the bus, inspecting objects to see if they can manipulate the object, and pulling objects off the bus if they can. Each processor/data manipulator 516 runs in its own thread. When a processor/data manipulator 516 completes processing or encounters an error, it stores the created data (or pointers to the data) 526, as well as anynotes 528 orerrors 530 encountered in the task request object, updates thestatus 526, and returns the task request object back to thebus 514 where it is retrieved by theprocess requestor 512 and returned to thescheduler 502 for inspection. - If any errors are reported, the
scheduler 502 checks theerror handling ruleset 510 in order to determine the next course of action, e.g., whether to spawn the next task, or stop processing. Thetask request 520 is stored in thestate tracker object 506, and, if needed, additional processing is performed. The additional processing and error handling is described in detail inFIG. 6 . This process of job request and response retrieval is looped until all specified tasks in thestate tracker 506 are completed, or a critical error is reached. Once processing for the job has completed, a request to finalize the job is passed via thebus 514 in the form of the jobstate tracker object 506. A special listener, theUMF collator 518 listens for state tracker objects 506, retrieves them, and converts the object to a UMF object (SeeFIG. 7 , 750). After the UMF object creation has been performed, aserialization request 532 is sent to theUMF serializer 534, whose operations are detailed inFIG. 7 . - Turning now to
FIG. 6 , it illustrates one embodiment for operations of theUMC interpreter 318 and its associated process interactions with other components of the UMC. A job request arrives 500 with a specified input and output and the interpreter 318 (FIG. 3 ) retrieves 600 theruleset 508 based on the specified input and output for the job (e.g., the input for a requested job could be a web conference archive, and the output could be a video, with a separate audio file of the conference). Next thescheduler 502 determines 602 the ideal number of threads for the job based on the amount of available system resources as reported by thesystem resource monitor 504. Then thescheduler 502 builds 604 a schedule of specified tasks for distribution amongst the number of threads determined earlier. Taken into account are the interdependencies of the tasks (e.g., is the output of one task the input for another), the length of time estimated for each task, and other relevant data (e.g., will one task consume more memory than another leading to a slowdown in the process, or a potential out of memory condition). Once this schedule has been built, thescheduler 502 begins assigning 606 tasks by means of the process requestor 512, and then awaits 608 return messages from the individual tasks. The processors that are assigned these tasks are modules or programs which are required to convert or build the assets needed to complete a job request (e.g., a WAV to MP3 converter or a PPT slide to image converter). - Once a task completion message has been received 610, the
scheduler 502 performs acheck 612 of thereturn object 520 for any errors. If no errors are found 613 b, the task is added to the completedlist 628 contained in thejob state tracker 506. Then the scheduler checks for the next scheduledtask 630. If a task is found pending 631 b, it is assigned 632, and thescheduler 502 waits for thenext completion message 610. If no tasks are found to be remaining 631 a thescheduler 502 assembles 634 the data from thestate tracking object 506 into aUMF data object 750 and sends aserialization request 532 to the UMF serializer 534 (detailed operations for UMF serialization are inFIG. 7 ). If an error is encountered 613 a during theerror check 612, the scheduler checks 614 the severity of the error, and determines how to handle it based on theerror handling ruleset 510. If the error is found 615 a to be critical, e.g., processing cannot continue due to file corruption or lack of appropriate processor, the progress is logged 616, and the processing of the job halts. A message may be sent to a monitoring service or human supervisor in the event of a critical failure. - If the error is simply a set of information or
messages concerning processing 615 c, but the processor produced the desired output, the notes are logged 626 to thestate tracking object 506, and the task is added 628 to the completed list. These notes may be useful during later quality assurance phases to alert a user that special attention may be needed when reviewing the final produced assets, or they may be useful during later analysis of the process to alleviate bottlenecks in performance of the system. If the error is found 615 b to be a warning, e.g., the task could not be completed as requested, but an alternative may exist, or the desired output may not have been perfectly generated, the error is logged 618 to thestate tracking object 506 and thescheduler 502checks 620 for an alternative processor. If no alternative processor exists 621 a, thescheduler 502 logs progress, and stopsprocessing 616, in the same way as described for acritical error 615 a. If an alternative processor does exist 621 b, then the selected alternative is logged 622 and a new task is generated and assigned to thealternative processor 624. Thescheduler 502 then returns to waiting for the nextjob completion method 608. - The process of selecting the best processor based on what is encountered during processing of individual tasks is an efficient way of executing self-healing techniques, in which the
interpreter 318 ensures that the best processor for the job is selected, and in the event that the best processor cannot be used, the next best is tried, and so on, thereby guaranteeing that the best possible output is generated, as well as the completion of the job request in all but the worst case scenarios. An example of this would be the case where a particular slide from a PPT that was being processed contained corrupted data in it, and this data prevented the reading or writing of a PPT containing the corrupted slide, the corrupted slide would be isolated and the data corruption removed. The original state of the corrupted slide would be caught by the normal processor when it tried to read or write the file, and this would prevent it from continuing. When this error was returned, the alternate processor would be selected and would permit handling of the file. The reason for not selecting the alternate processor as the primary processor is often for performance reasons, as in most data cases additional error checking need not be performed on each segment of a file, and to do so would slow the processing time considerably. Selecting fallback processors on failure allows these edge cases to still be handled, while maintaining high performance throughout the majority of the process. -
FIG. 7 describes in detail both the creation of aUMF 106 and also the structure of the UMF data format. A serialization request arrives 514 in the form of aUMF data object 750. The UMF serializer (FIG. 5 , 534) first builds an index of theobject 750, noting size of each object contained in the request, and the position (order and address) that each object will take in the created file or data stream. The recommended order of the data blocks is illustrated (left to right) in the data block 752. The UMF serializer 534 (FIG. 5 ) then opens a file stream and writes afile header 714 to the beginning of thestream 702. The header's 714 contents begin with the bytes (in hex) 0x554d467665722e (“UMFver.”), and are followed by the UMF version number (e.g., “1.0.0.1”). This versioning allows the UMF to be intrinsically extensible so that different versions of the file format may at future times specify different requirements. If the container used for storing the UMF does not support binary data representations, and instead uses a wrapper around plain-text, the corresponding ASCII or UTF-8 values would be used instead. In XML, an example of the alternate storage of theheader 714 information would be <UMF version=“1.0.0.1”>, as the top level element for the XML instance document. If the configuration of the serializer (FIG. 5 , 534), or the request specifies that anindex 716 is to be written 704, the index created in 700 is written to the stream. Index points are identified as “::IndexPosition:DataType:ObjectName:Address” and are padded by “_” if needed. If serialized to xml, the element is defined as <index position=““dataType=””objectName=““address=””/>. These index addresses will be verified and, if necessary, corrected later 710. - While an index greatly speeds retrieval from a large UMF, the presence of an index requires its maintenance for it to be useful, and if a particular UMF will be undergoing lots of future manipulations and edits it is possible to store the UMF without this index to save write and modification time overhead. Further, due to the extensible nature of the UMF format an index block can be added at a later point when the need for an index is identified. Requisite null values are then serialized to the checksums block 718 as a placeholder for the checksums that will be computed later in the
process 712. Several types of data integrity checks may be alternatively used forchecksum 718 and the size of the placeholder is dependent upon the chosen data integrity algorithm. After reserving thechecksum block 718, the serializer 534 (FIG. 5 ) generates and writes a unique identifier (ID or id) 707 to theunique ID block 720. The unique ID may be registered in a data store of some format through well understood processes of cataloging unique ids, and the process may check with this data store after generation to ensure the uniqueness of the id. - After the
unique ID generation 707, the process continues by iterating over each index point stored in the index previously generated 700 and serializing eachsegment 700 in the order specified by the index. Each of the individual components (720-734) of the data block 752 is written to the file if found, with theirtype 762 andname 764 serving at the beginning of the relevant block. This serves as the individual identifier for the media resource that is stored in the UMF. This is followed by the security level of thesegment 766. This security level is a standard integer from 1-10, the specified level indicating that access should only be allowed by a client possessing an equal or lower security level. If there are any otherarbitrary access limitations 768, these are serialized next. Theseaccess limitations 768 may be, but are not limited to: a specific geographic locale, or a specified department/group. - After the
access limitations 768, pointers or URLs to any modules that are required for access or manipulation of the data block are written 770. The specifiedmodules 770 would be useful if a custom encryption algorithm were used to store thedata segment 772, or in the event that thedata 772 could not be accessed without a specific form of digital rights management (DRM). Finally the data itself or a pointer to the data is written 772. Concerning thedata segments 760 the following may be noted: 1) there is no arbitrary restriction on the number of instances of each data type stored in the UMF, or the number of data blocks, e.g., there could be more than oneaudio block 728, or multiple sets of resources 732.2) The raw data need not be contained in the file itself 772; a pointer to another UMF in the format of “umf://uniqueid/segmentname[@location]” (@location is optional as a system could possess a preconfigured store for UMFs) or a file or network path (in URL format) would be sufficient. This allows, amongst other possibilities, for potential external storage of UMF data, or creation of a UMF where direct access to the data is not possible. - A non-inclusive list of potential data to be stored in the UMF is as follows: (1) Any metadata concerning the
media file 722, e.g., theauthor 722 a, anyattendees 722 b that were present if the media file originated from a meeting or conference, the date and time of theoriginal recording 722 c, thelocation 722 d of the recording (geographic and/or virtual), and any other misc.data 722 e. (2) Any metadata extracted from the original job request and processing, e.g., the original job requestor 724 a, the date and time of therequest 724 b, and errors encountered while processing 724 c, and notes made by theprocessor 724 d, the amount of time that processing the request took 724 e, any external billing ID(s) 724 e, and any othermiscellaneous data 724 g. (3) Anyevents 726 associated with the data contained in the UMF, e.g., a change in presenter if a presentation was being recorded, a change in context of the data, e.g., video and audio combined to audio only at a certain time code, or something as simple as a change in page if a portable document format (e.g., ADOBE PDF) or MICROSOFT POWERPOINT was being presented. (4) any segments ofaudio 728 processed as a part of the job, e.g., WAV data, MP3, etc., 5) anyvideo segments 730. (5) Anyadditional resources 732 related to the data. e.g., POWERPOINT files, WORD documents, images, etc. (6) anycode modules 734 related to or required for access or manipulation of UMF. Examples of thesemodules 734 could be, but are not limited to, DRM plugins, usage trackers, security information or decryption packages if the data contained in the UMF were encrypted, executable players for various platforms, dynamic code generators for various platforms related to playback or access of the data, etc. The ability to include code in the UMF format allows a recipient to process a media archive without having to download additional logic. Custom designed media archives and new media sources can be easily developed and incorporated in UMF and provided to consumer systems for being processed. -
FIG. 8 provides the detailed flow of accessing theUMF API 220 that is provided via theUMA framework 107. A request 800 is made via theUMF content API 220 to retrieve data that is contained in a UMF archive. The API checks the requested UMF to ensure it contains the requesteddata 802. This request must contain the UMFunique ID 720 and the requesteddata type 762, and may contain the name of the requesteddata type 764. The request should also contain some form of authentication data in any of the variety of well understood authentication mechanisms (lightweight directory access protocol (LDAP), username/password pair, etc.) so as to identify the system or user initiating the request. If a pointer to the data exists 804 b, or the data itself exists 804 a, in thedata storage section 772 of the requested data block of the UMF, the system then checks for any access restrictions on the requesteddata segment 806. These are stored in the relevantdata block segment restrictions 808 b an error message is returned 812 stating that the request has been denied due to lack of sufficient privileges. - If the requestor does meet the specified
restrictions 808 a, then the data or a pointer to the data is returned 814. It is possible for the API to act as a proxy for data that is merely referenced, and if requested in a particular way (e.g., with a boolean argument specifying whether or not the API should act as a proxy), to return the data itself, even if the data is merely referenced by pointer in the UMF. If the requested data does not exist 804 c, the API checks adata conversion ruleset 816, to determine if the requested data can be generated 810 and returned to the client. Thedata conversion ruleset 816 comprises mappings of dependencies for the creation of a given data type (e.g., a WAV or other audio file is required to generate an mp3), as well as suggested alternatives if a conversion cannot be performed. Theruleset 816 also provides the location (if web service) of executable code (for example, binary code) or name of the module to be used for the conversion. If theruleset 816 does not define a means for generation of the requesteddata type 818 a, an error is returned 820 that states that the requested data could not be generated. If possible, a suggested alternative data type is returned that is either present in the system or can be generated as specified in thedata conversion ruleset 816. - If the data can be generated 818 b, the API checks the restrictions of the parent data type that will be used to generate the data 822 (thus the generated data inherits the permissions/restrictions of the data it is being generated from). If the data is being generated from two or more data types (e.g., generating a video from a set of POWERPOINT slides and an audio file) the more restrictive restrictions are used. If the requestor does not meet the restrictions 823 b an error message is returned 830 that states that the request has been denied due to insufficient privileges. If the requestor meets the
access restrictions 823 a, the API requests from the conversion utility specified in thedata conversion ruleset 816 that the conversion be performed. Then the API checks the results of theconversion 825. If the conversion failed (i.e. because of corrupted source data, or incompatible codecs) 825 b an error message is returned that specifies that an error was encountered while attempting to convert thedata 832. If the conversion was successful 825 a the converted data is added to theUMF 826 in the form of an additionaldata block segment 760 and the converted data is returned 828. - In summary, methods and systems disclosed provide a significant improvement to the ways in which media archive files are processed, namely the
UMC 105. The methods preserve the synchronous attributes from the original media resources and provide a flexible and extensible storage mechanism, namely theUMF 106. TheUMF 106 both represents media resources and also provides the capability to store executable code that can perform additional processing on thesame UMF 106 media resources. All new additions and/or modifications to theUMF 106 are handled in a synchronous context in relation to all of the other media resources. The unifying system and framework, namely theUMA 107, provides a comprehensive set of services and functions to perform processing operations on media archives, produce media resources, and to provide the services to playback the contents of media archives in a synchronized manner. - The following is an example description of the synchronous playback of a presentation: the text in the scrolling transcript window, is synchronized with the audio and video of the presenter, as well as synchronized with the POWERPOINT slides, chat window, screen sharing window, or any other displayable media resource. Likewise, the
UMA 107 provides other useful services that take advantage of the synchronous processing of theUMC 105. Perhaps the most notable feature is theUMA search services 217 that allow a “synchronous search” down to the spoken word or phrase in a presentation. This is noted as a “synchronous search” because when the search criteria are found, all of the other corresponding media resources are synchronized together in the playback view of the presentation. For example, the resulting search for a word or a phrase spoken in the presentation then presents a view to the user with the text in the scrolling transcript (corresponding to the search criteria), which is instantaneously synchronized with the audio and video of the presenter, as well as with the POWERPOINT slides, chat window, or any other displayable resource. - There are numerous uses cases that provide new and useful advantageous features and functions when coupled with the use of the systems and methods for the
UMC 105,UMF 106, and theUMA 107. Some example use cases are provided herein. - This use case relates to auto transcription and related technologies. Although the automatic production of transcripts is possible via numerous commercially available speech recognition systems, the transcriptions produced may not be accurate. The
transcription service 214 of theUMA 107 is used to improve the accuracy of the automatically generated transcripts. This case provides the real time detection of errors in the auto transcription process and a synchronous correction of the errors via correction event notifications transmitted in theUMA 107 via the UMA messaging services 213. - Consider the example use case of the ORACLE OPENWORLD conference and the large numbers of presenters. The automated speech recognition (ASR) technologies are used to accept simultaneous input from the presenters, and after analysis of the auto generated text, and then interface with a new transcription module in the
UMA 107 via transmitting event messages when new technical jargon, acronyms, or other technical slang is detected. The transcription analysis module makes an attempt to correct the transcript based on real time input and other external sources. The external sources can interface with other programming modules that are building a specialized knowledge base of the technical jargon and acronyms. Human assist is an external input to the new transcription module that can be configured to override the computer generated selection for the word and/or phrase suggested for use in the transcript. The human corrections to the transcription module can be made by text or voice (taking advantage of thespeech recognition services 217 in theUMA 107. Real time self learning is utilized by continuously updating the speech recognition calibration with new conditions. In some embodiments, self learning and improved accuracy is achieved by continuously feeding audio tracks through the speech calibration engine and monitoring and correcting the results. - In some embodiments, data from an alternate synchronized media resource can be used for correction of errors. For example, text from slides or notes associated with slides in a POWERPOINT presentation (or presentation in another format) can be used to correct transcription errors in the audio. The proximity of the text in the slides or notes associated with the slides as determined by synchronizing the various media resources narrows down the text to be searched for potential correction of errors and improves the accuracy of the error correction. Other examples described above also illustrate how cross referencing can be used for correction of errors in one media based on information available in another media.
- The
UMA 107 can be utilized to determine the identity of a speaker. A combination of the available digitized media resources from theUMF 106 is used to develop a speaker's biometric profile. Once this biometric is established, it can be subsequently used to identify a speaker via digitized audio and video sources. In one embodiment, thespeech services 217 from theUMA 107 are used to detect phonetic characteristics and attributes from the digitized audio input. This is used to develop a speaker's detected phonetic pattern of speech. In other embodiments, other techniques of speaker recognition can be used to recognize attributes associated with speaker, for example, face recognition using images found in media files etc. The video images available via theUMF 106 are utilized to capture the visual characteristics and attributes. The combination of the speaker's detected phonetic pattern of speech and the captured visual characteristics and attributes are used to form a new biometric. This new biometric can be subsequently used to assist with the auto transcription process as custom dictionaries and corrective dictionaries can be applied and/or prioritized based on the speaker's detected phonetics derived from the biometric. In an embodiment, various attributes collected from media archives can be associated with the speaker for recognizing the speaker. Synchronizing media file resources allows association of attributes found from various source with the speaker. For example, topics collected from slides in the presentation can be used to identify special topics of interest for the speaker. A speaker may be associated with a set of audience based on information identifying audience, for example, email addresses of participants in an online presentation. Historical data can be collected for the speaker to identify typical topics that the speaker is known to present. An error in speaker recognition through conventional means can be corrected if it is found that there is significant mismatch of other parameters. For example, a speaker historically associated with computer science related topics is unlikely to present very different topics, for example, medical sciences. The additional attributes associated with the speaker can be used for speaker recognition by assisting conventional biometric techniques or else to raise a warning if a large disparity is detected between a speaker recognized by conventional means and the characteristics associated with the speaker via cross referencing of media files. - In an embodiment, the mechanism to use biometric recognition to assist with auto transcription can also be applied to any media asset that contains audio, for example a video file that includes audio. In some embodiments, the
UMA 107 alerts other interested users when it is detected that a speaker of interest is giving a presentation and provides notifications when other users are viewing a recorded presentation for the speaker of interest. Speaker identification may be assisted by metadata, e.g. speaker identifier (id) (may be only shown on a conference as “speaker number one” and metadata coupled with a biometric id of the speaker may be used to clarify who it actually is speaking for “speaker number one.” - Use Cases of Email Addresses Extracted from a Media Archive
- Many web conference sessions include a chat session for use by the meeting attendees for an instantaneous questions and answers service. Entries in the chat session are identified by the user's email address. Both the contents of the chat session and the emails for the conference attendees are extracted from the media archive via the
UMC media extractor 312 and stored in theUMF 106. Then users can search theUMF 106 for all comments and questions in the chat window by a user's email address. - Typically there is some amount of time that is passed between a live conference session, or collaborative internet meeting, and the time that the media archive of the recorded session is made available to a wider audience via a portal. Since the emails of the attendees are stored in the
UMF 106, then email notification can be sent to all attendees of the meeting when the final production and processing of the media resources is complete and available for playback. The email notification contains the link to the presentation(s) that are available. - In an embodiment, notification emails are sent with information (and links) about other available presentations matching the client's interests (as specified in keywords or by related technologies). Again, this is utilizing the emails of the attendees that are stored in the
UMF 106. For example, the notification email may contain text similar to the following: “ . . . you might also be interested in these presentations . . . ” This use case is based on the email information that is contained in the media archive file for the meeting/conference and tracking interests (by keywords) for each client. In an embodiment, viewed presentations are tracked by the members of the meetings. Then something like the following could also be included in the notification email: “You might also be interested in viewing these presentations that were viewed by other members of the meeting . . . ” - Another embodiment allows sending of reminder notification emails to people attending a meeting. Use the emails that are contained in the
UMF 106 for a session/meeting and then use this information to track if the meeting attendees have downloaded/viewed the presentation. If some of the attendees have not viewed the presentation within a specified period of time (e.g. one to two weeks), then send a reminder email notification to the attendees of the meeting notifying them that the additional materials for the meeting are now available for viewing (and provide the link to the presentation). This sort of “track back” function can be useful if certain individuals must view the presentation for certification purposes, or for other corporate requirements. - Similar to the above, since the
UMF 106 is extensible and new content may be added to the original contents, then aUMA 107 service may send a notification email to all attendees that supplemental information has been added to the contents of the original meeting and with the encouragement that they may wish to view/playback the new combined contents of the session or just to view/playback the sections that have been augmented. - Using the emails residing in the
UMF 106 as a service assistant. Similar to number 5 above, if the user still has not viewed the presentation after the reminder (or set of reminders), then customer support can check with the customer to see if they require assistance or if they are having problems accessing the portal to view/playback the recorded conference/meeting session. - As disclosed herein, the
UMF 106 can contain information other than pure media resource content. For example, the UMF can be used to store the usage count for the number of times that the presentation has been viewed. AUMA 107 service can email the account administrator when it is detected that the views/playbacks for a given presentation are approaching the maximum number of views. Then means can be provided for the customer to make payments, if so desired, to increase the number of allowable views for the presentation. - An embodiment provides interactive interface to manually correct problems that were detected and reported during the processing of a media
archive using UMF 106. As disclosed herein, theUMF 106 can include executable code. It was also noted in the description for this disclosure that there is an error reporting mechanism and that errors are categorized by severity levels. This use case builds up an executable script based on the types of errors that were detected in the processing of a media archive and stores this real time generated script in theUMF 107. The generated script provides a user interface with a list of the detected errors and a suggested/recommended action to take. - Where possible, the script contains the code to automatically perform a task at the control of the user. For example, consider the case of a PPT slide that contains a series of URL's. And further consider that the URL has been incorrectly entered onto the contents of the PPT slide. Also consider that this error is detected in the processing of the media archive. In this case, the automated script is generated to characterize the problem and to provide code to guide the user in the steps to resolve the problem. In this case, the script from the
UMF 106 may present a user interface allowing the user to take action to correct the error. For example, an error can be reported “Error: A problem was detected resolving a URL” and options presented to the user to take action “Select: fix problem or ignore problem.” If the user chooses to ignore the problem, the error is cleared from theUMF 106 and no further action is required for this error. If the user selects to fix the problem, another set of prompts is displayed to the user, e.g.: “Select button to Validate URL.” The user is informed of the results of the action. If the URL worked then no further action is necessary, any updates are stored in theUMF 106 and the error cleared. If the URL failed, the user is allowed to make corrections and re-validate. Other examples provide scripts for errors related to other types of problems, e.g. rules set violations, other problems with PPT slides, etc. - One embodiment of the disclosed systems and processes described herein are structured to operate with machines to provide such machines with particular functionality as disclosed herein. FIG. (
FIG. 9 is a block diagram illustrating components of an example machine configured to read instructions from a machine-readable medium and execute them through one or more processors (or one or more controllers). For example, the machine illustrated inFIG. 9 can be used to execute one ormore components UMC 105,UMF 106, andUMA 107. Specifically,FIG. 9 shows a diagrammatic representation of a machine in the example form of acomputer system 900 within which instructions 924 (e.g., software) cause the machine to perform any one or more of the methodologies discussed herein when those instructions are executed. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. - It is noted that the processes described herein, for example, with respect to
FIGS. 3 , 4, 5, 6, 7 and 8, may be embodied as functional instructions, e.g., 924, that are stored in astorage unit 916 within a machine-readable storage medium 922 and/or amain memory 904. Further, these instructions are executable by theprocessor 902. In addition, the functional elements described withFIGS. 1 and 2 also may be embodied as instructions that are stored in thestorage unit 916 and/or themain memory 904. Moreover, when these instructions are executed by theprocessor 902, they cause the processor to perform operations in the particular manner in which the functionality is configured by the instructions. - The machine may be a server computer, a client computer, a personal computer (PC), a desktop computer, a laptop computer, a notebook, a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, an IPAD, IPHONE, a customized (application specific) embedded mobile computing devices, a web appliance, a network router, switch or bridge, server blade, or may reside in an specialized pluggable electronic card that is capable of insertion into the chassis of a computing device, or any machine capable of executing instructions 124 (sequential or otherwise) that specify actions to be taken by that machine. In an embodiment, the machine may be integrated with other commercially available (or special purpose) Audio/Video playback devices, or integrated with other commercially available (or special purpose) Networking and/or Storage and/or Network Attached Storage and/or Media Processing equipment (e.g., CISCO MXE, etc.), or integrated as a set of Object Oriented (or procedural) statically or dynamically linked programming libraries that interface with other software applications.
- Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute
instructions 924 to perform any one or more of the methodologies discussed herein. - The
example computer system 900 includes a processor 902 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), amain memory 904, and astatic memory 906, which are configured to communicate with each other via a bus 908. Thecomputer system 900 may further include graphics display unit 910 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)). Thecomputer system 900 may also include alphanumeric input device 912 (e.g., a keyboard), a cursor control device 914 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), astorage unit 916, a signal generation device 918 (e.g., a speaker), and a network interface device 820, which also are configured to communicate via the bus 908. - The
storage unit 916 includes a machine-readable medium 922 on which is stored instructions 924 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 924 (e.g., software) may also reside, completely or at least partially, within themain memory 904 or within the processor 902 (e.g., within a processor's cache memory) during execution thereof by thecomputer system 900, themain memory 904 and theprocessor 902 also constituting machine-readable media. The instructions 924 (e.g., software) may be transmitted or received over anetwork 926 via thenetwork interface device 920. - While machine-
readable medium 922 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 924). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 924) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media. - Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
- Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein, for example, the process illustrated and described with respect to
FIGS. 3 , 4, 6, and 8. - In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
- The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
- Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
- The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
- The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
- Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
- Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.
- As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
- As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
- In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
- Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a method for processing of media archive resources through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
Claims (20)
1. A computer implemented method of converting a media resource in a plurality of media resources to a text representation, the method comprising:
receiving an audio resource and a media resource associated with events that occurred during a time interval, wherein portions of the audio resource are correlated with portions of the media resource, the correlating comprising:
identifying a first sequence of pattern in the audio resource and a second sequence of pattern in the media resource; and
correlating elements of the first sequence with elements of the second sequence;
transcribing a first portion of the audio resource to a text representation;
identifying a second portion of the media resource correlated with the first portion of the first media resource; and
responsive to encountering an unidentifiable part of the first portion of the audio resource, determining a potential text representation for the unidentifiable part based on the second portion of the media resource.
2. The computer implemented method of claim 1 , further comprising:
automatically inserting the potential text representation into the text representation obtained by transcribing the first portion.
3. The computer implemented method of claim 1 , further comprising:
presenting the potential text representation for approval for incorporation into the text representation obtained by transcribing the first portion.
4. The computer implemented method of claim 1 , further comprising:
receiving information indicative of a failure to transcribe the unidentifiable part of the first portion of the audio.
5. The computer implemented method of claim 4 , wherein the information indicative of failure to transcribe comprises an error code encountered while transcribing the first portion.
6. The computer implemented method of claim 4 , wherein the information indicative of failure to transcribe comprises an erroneous text output obtained by transcribing the unidentifiable part of the first portion.
7. The computer implemented method of claim 6 , further comprising:
matching the erroneous text with text in the second portion of the media resource; and
determining a term with highest match score with the erroneous text as the potential text representation.
8. The computer implemented method of claim 1 , wherein the media resource comprises a script of a chat session.
9. The computer implemented method of claim 1 , wherein each pair of correlated elements is associated with an event that occurred in the time interval.
10. The computer implemented method of claim 1 , wherein the events that occurred during a time interval comprise at least one of a presentation, a screen sharing session, an online collaboration session, and a recorded media event.
11. The computer implemented method of claim 1 , wherein the events that occurred during a time interval comprise a slide presentation and each sequence of pattern corresponds to slide flips in the presentation.
12. The computer implemented method of claim 1 , wherein the first media resource and the second media resource are obtained from a media archive.
13. A computer implemented method of converting a media resource in a plurality of media resources to a text representation, the method comprising:
receiving a first media resource and a second media resource associated with events that occurred during a time interval, wherein portions of the first media resource are correlated with portions of the second media resource, the correlating comprising:
identifying a first sequence of pattern in the first media resource and a second sequence of pattern in the second media resource; and
correlating elements of the first sequence with elements of the second sequence;
converting a first portion of the first media resource represented in a media format to a representation in text format;
identifying a second portion of the second media resource correlated with the first portion of the first media resource; and
responsive to encountering an unidentifiable part of the first portion of the first media resource, determining a potential text representation for the unidentifiable part based on the second portion of the second media resource.
14. The computer implemented method of claim 13 , further comprising:
automatically inserting the potential text representation into the text representation obtained by transcribing the first portion.
15. The computer implemented method of claim 13 , wherein the media format of the first media resource comprises an image and the converting comprises optical character recognition.
16. The computer implemented method of claim 13 , wherein the media format of the first media resource comprises an image and the converting comprises object recognition.
17. The computer implemented method of claim 13 , wherein the media format of the first media resource comprises an image and the converting comprises facial recognition.
18. A computer program product having a computer-readable storage medium storing computer-executable code for converting a media resource in a plurality of media resources to a text representation, the code comprising:
a universal media convertor module configured to:
receive a first media resource and a second media resource associated with events that occurred during a time interval, wherein portions of the first media resource are correlated with portions of the second media resource, the correlating comprising:
identifying a first sequence of pattern in the first media resource and a second sequence of pattern in the second media resource; and
correlating elements of the first sequence with elements of the second sequence;
convert a first portion of the first media resource represented in a media format to a representation in text format;
identify a second portion of the second media resource correlated with the first portion of the first media resource; and
responsive to encountering an unidentifiable part of the first portion of the first media resource, determine a potential text representation for the unidentifiable part based on the second portion of the second media resource.
19. The computer program product of claim 18 , wherein the universal media convertor module is further configured to:
automatically inserting the potential text representation into the text representation obtained by transcribing the first portion.
20. The computer implemented method of claim 18 , wherein the media format of the first media resource comprises an image and the converting comprises object recognition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/894,557 US20110112832A1 (en) | 2009-11-06 | 2010-09-30 | Auto-transcription by cross-referencing synchronized media resources |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25902909P | 2009-11-06 | 2009-11-06 | |
US26459509P | 2009-11-25 | 2009-11-25 | |
US12/894,557 US20110112832A1 (en) | 2009-11-06 | 2010-09-30 | Auto-transcription by cross-referencing synchronized media resources |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110112832A1 true US20110112832A1 (en) | 2011-05-12 |
Family
ID=43974240
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/775,064 Expired - Fee Related US8438131B2 (en) | 2009-11-06 | 2010-05-06 | Synchronization of media resources in a media archive |
US12/875,088 Abandoned US20110110647A1 (en) | 2009-11-06 | 2010-09-02 | Error correction for synchronized media resources |
US12/894,557 Abandoned US20110112832A1 (en) | 2009-11-06 | 2010-09-30 | Auto-transcription by cross-referencing synchronized media resources |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/775,064 Expired - Fee Related US8438131B2 (en) | 2009-11-06 | 2010-05-06 | Synchronization of media resources in a media archive |
US12/875,088 Abandoned US20110110647A1 (en) | 2009-11-06 | 2010-09-02 | Error correction for synchronized media resources |
Country Status (1)
Country | Link |
---|---|
US (3) | US8438131B2 (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110276325A1 (en) * | 2010-05-05 | 2011-11-10 | Cisco Technology, Inc. | Training A Transcription System |
US20120101817A1 (en) * | 2010-10-20 | 2012-04-26 | At&T Intellectual Property I, L.P. | System and method for generating models for use in automatic speech recognition |
US20120151309A1 (en) * | 2010-12-14 | 2012-06-14 | International Business Machines Corporation | Template application error detection |
US20120151496A1 (en) * | 2010-12-14 | 2012-06-14 | Kabushiki Kaisha Square Enix (Also Trading As Square Enix Co., Ltd.) | Language for task-based parallel programing |
US20120245936A1 (en) * | 2011-03-25 | 2012-09-27 | Bryan Treglia | Device to Capture and Temporally Synchronize Aspects of a Conversation and Method and System Thereof |
US20140149599A1 (en) * | 2012-11-29 | 2014-05-29 | Ricoh Co., Ltd. | Unified Application Programming Interface for Communicating with Devices and Their Clouds |
US20140149771A1 (en) * | 2012-11-29 | 2014-05-29 | Ricoh Co., Ltd. | Smart Calendar for Scheduling and Controlling Collaboration Devices |
US20140149554A1 (en) * | 2012-11-29 | 2014-05-29 | Ricoh Co., Ltd. | Unified Server for Managing a Heterogeneous Mix of Devices |
US8972416B1 (en) * | 2012-11-29 | 2015-03-03 | Amazon Technologies, Inc. | Management of content items |
US20150286718A1 (en) * | 2014-04-04 | 2015-10-08 | Fujitsu Limited | Topic identification in lecture videos |
US20160150009A1 (en) * | 2014-11-25 | 2016-05-26 | Microsoft Technology Licensing, Llc | Actionable souvenir from real-time sharing |
US20160224554A1 (en) * | 2013-09-12 | 2016-08-04 | Beijing Zhigu Rui Tuo Tech Co., Ltd | Search methods, servers, and systems |
US20170061987A1 (en) * | 2015-08-28 | 2017-03-02 | Kabushiki Kaisha Toshiba | Electronic device and method |
US9886423B2 (en) | 2015-06-19 | 2018-02-06 | International Business Machines Corporation | Reconciliation of transcripts |
US10185711B1 (en) * | 2012-09-10 | 2019-01-22 | Google Llc | Speech recognition and summarization |
US20190129591A1 (en) * | 2017-10-26 | 2019-05-02 | International Business Machines Corporation | Dynamic system and method for content and topic based synchronization during presentations |
US10347300B2 (en) | 2017-03-01 | 2019-07-09 | International Business Machines Corporation | Correlation of recorded video presentations and associated slides |
US20190371298A1 (en) * | 2014-12-15 | 2019-12-05 | Baidu Usa Llc | Deep learning models for speech recognition |
US10580410B2 (en) * | 2018-04-27 | 2020-03-03 | Sorenson Ip Holdings, Llc | Transcription of communications |
CN111564157A (en) * | 2020-03-18 | 2020-08-21 | 浙江省北大信息技术高等研究院 | Conference record optimization method, device, equipment and storage medium |
US10754508B2 (en) | 2016-01-28 | 2020-08-25 | Microsoft Technology Licensing, Llc | Table of contents in a presentation program |
US10770077B2 (en) | 2015-09-14 | 2020-09-08 | Toshiba Client Solutions CO., LTD. | Electronic device and method |
WO2021051024A1 (en) * | 2019-09-11 | 2021-03-18 | Educational Vision Technologies, Inc. | Editable notetaking resource with optional overlay |
US20220148583A1 (en) * | 2020-11-12 | 2022-05-12 | International Business Machines Corporation | Intelligent media transcription |
US11367445B2 (en) * | 2020-02-05 | 2022-06-21 | Citrix Systems, Inc. | Virtualized speech in a distributed network environment |
WO2023154351A3 (en) * | 2022-02-08 | 2023-09-21 | My Job Matcher, Inc. D/B/A Job.Com | Apparatus and method for automated video record generation |
US11972759B2 (en) * | 2020-12-02 | 2024-04-30 | International Business Machines Corporation | Audio mistranscription mitigation |
US11997340B2 (en) | 2012-04-27 | 2024-05-28 | Comcast Cable Communications, Llc | Topical content searching |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9098507B2 (en) | 2009-12-03 | 2015-08-04 | At&T Intellectual Property I, L.P. | Dynamic content presentation |
US8640137B1 (en) * | 2010-08-30 | 2014-01-28 | Adobe Systems Incorporated | Methods and apparatus for resource management in cluster computing |
US8786667B2 (en) | 2011-04-26 | 2014-07-22 | Lifesize Communications, Inc. | Distributed recording of a videoconference in multiple formats |
US20120281970A1 (en) * | 2011-05-03 | 2012-11-08 | Garibaldi Jeffrey M | Medical video production and distribution system |
US20120324355A1 (en) * | 2011-06-20 | 2012-12-20 | Sumbola, Inc. | Synchronized reading in a web-based reading system |
JP5845764B2 (en) * | 2011-09-21 | 2016-01-20 | 富士ゼロックス株式会社 | Information processing apparatus and information processing program |
US9654821B2 (en) | 2011-12-30 | 2017-05-16 | Sonos, Inc. | Systems and methods for networked music playback |
US9674587B2 (en) | 2012-06-26 | 2017-06-06 | Sonos, Inc. | Systems and methods for networked music playback including remote add to queue |
GB2505072A (en) | 2012-07-06 | 2014-02-19 | Box Inc | Identifying users and collaborators as search results in a cloud-based system |
US10915492B2 (en) * | 2012-09-19 | 2021-02-09 | Box, Inc. | Cloud-based platform enabled with media content indexed for text-based searches and/or metadata extraction |
US9495364B2 (en) | 2012-10-04 | 2016-11-15 | Box, Inc. | Enhanced quick search features, low-barrier commenting/interactive features in a collaboration platform |
US9754000B2 (en) * | 2012-12-21 | 2017-09-05 | Sap Se | Integration scenario for master data with software-as-a-service system |
US9165009B1 (en) * | 2013-03-14 | 2015-10-20 | Emc Corporation | Lightweight appliance for content storage |
US9247363B2 (en) | 2013-04-16 | 2016-01-26 | Sonos, Inc. | Playback queue transfer in a media playback system |
US9501533B2 (en) | 2013-04-16 | 2016-11-22 | Sonos, Inc. | Private queue for a media playback system |
US9361371B2 (en) | 2013-04-16 | 2016-06-07 | Sonos, Inc. | Playlist update in a media playback system |
US9652460B1 (en) | 2013-05-10 | 2017-05-16 | FotoIN Mobile Corporation | Mobile media information capture and management methods and systems |
US10255650B2 (en) | 2013-05-24 | 2019-04-09 | Sony Interactive Entertainment Inc. | Graphics processing using dynamic resources |
US9722852B2 (en) * | 2013-05-24 | 2017-08-01 | Cisco Technology, Inc. | On-demand encapsulating of timed metadata in a network environment |
US9495722B2 (en) | 2013-05-24 | 2016-11-15 | Sony Interactive Entertainment Inc. | Developer controlled layout |
US9684484B2 (en) | 2013-05-29 | 2017-06-20 | Sonos, Inc. | Playback zone silent connect |
KR101775461B1 (en) * | 2013-08-20 | 2017-09-06 | 인텔 코포레이션 | Collaborative audio conversation attestation |
US9954909B2 (en) | 2013-08-27 | 2018-04-24 | Cisco Technology, Inc. | System and associated methodology for enhancing communication sessions between multiple users |
CN103646656B (en) * | 2013-11-29 | 2016-05-04 | 腾讯科技(成都)有限公司 | Sound effect treatment method, device, plugin manager and audio plug-in unit |
US9336228B2 (en) * | 2013-12-18 | 2016-05-10 | Verizon Patent And Licensing Inc. | Synchronization of program code between revision management applications utilizing different version-control architectures |
US10341397B2 (en) * | 2015-08-12 | 2019-07-02 | Fuji Xerox Co., Ltd. | Non-transitory computer readable medium, information processing apparatus, and information processing system for recording minutes information |
US10235209B2 (en) * | 2015-08-28 | 2019-03-19 | Vmware, Inc. | Hybrid task framework |
US20170099366A1 (en) * | 2015-10-01 | 2017-04-06 | Orion Labs | Intelligent routing between wearable group communication devices |
US10200387B2 (en) | 2015-11-30 | 2019-02-05 | International Business Machines Corporation | User state tracking and anomaly detection in software-as-a-service environments |
JP6642316B2 (en) * | 2016-07-15 | 2020-02-05 | コニカミノルタ株式会社 | Information processing system, electronic device, information processing device, information processing method, electronic device processing method, and program |
US10685169B2 (en) * | 2017-05-08 | 2020-06-16 | Zoho Corporation Private Limited | Messaging application with presentation window |
CN109949828B (en) * | 2017-12-20 | 2022-05-24 | 苏州君林智能科技有限公司 | Character checking method and device |
US10673913B2 (en) * | 2018-03-14 | 2020-06-02 | 8eo, Inc. | Content management across a multi-party conference system by parsing a first and second user engagement stream and transmitting the parsed first and second user engagement stream to a conference engine and a data engine from a first and second receiver |
US11321345B2 (en) * | 2018-03-30 | 2022-05-03 | Rightcrowd Software Pty Ltd. | Systems and methods of providing graphical relationships of disparate data object formats |
RU2703154C1 (en) * | 2018-12-06 | 2019-10-15 | Общество с ограниченной ответственностью "Ай Ти Ви групп" | System and method for synchronizing on time playback of data from different devices |
CN112581976B (en) * | 2019-09-29 | 2023-06-27 | 骅讯电子企业股份有限公司 | Singing scoring method and system based on streaming media |
CN111626049B (en) * | 2020-05-27 | 2022-12-16 | 深圳市雅阅科技有限公司 | Title correction method and device for multimedia information, electronic equipment and storage medium |
US11704151B2 (en) * | 2020-09-28 | 2023-07-18 | International Business Machines Corporation | Estimate and control execution time of a utility command |
CN112351303B (en) * | 2021-01-08 | 2021-03-26 | 全时云商务服务股份有限公司 | Video sharing method and system in network conference and readable storage medium |
JP7342918B2 (en) * | 2021-07-30 | 2023-09-12 | 株式会社リコー | Information processing device, text data editing method, communication system, program |
CN117632885B (en) * | 2024-01-25 | 2024-04-16 | 太平金融科技服务(上海)有限公司 | Resource synchronization method, device, equipment and medium in backtracking system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060190250A1 (en) * | 2001-04-26 | 2006-08-24 | Saindon Richard J | Systems and methods for automated audio transcription, translation, and transfer |
US7117231B2 (en) * | 2000-12-07 | 2006-10-03 | International Business Machines Corporation | Method and system for the automatic generation of multi-lingual synchronized sub-titles for audiovisual data |
US20070011012A1 (en) * | 2005-07-11 | 2007-01-11 | Steve Yurick | Method, system, and apparatus for facilitating captioning of multi-media content |
US7487086B2 (en) * | 2002-05-10 | 2009-02-03 | Nexidia Inc. | Transcript alignment |
US20100268534A1 (en) * | 2009-04-17 | 2010-10-21 | Microsoft Corporation | Transcription, archiving and threading of voice communications |
Family Cites Families (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5661665A (en) * | 1996-06-26 | 1997-08-26 | Microsoft Corporation | Multi-media synchronization |
US6094688A (en) * | 1997-01-08 | 2000-07-25 | Crossworlds Software, Inc. | Modular application collaboration including filtering at the source and proxy execution of compensating transactions to conserve server resources |
US7111009B1 (en) | 1997-03-14 | 2006-09-19 | Microsoft Corporation | Interactive playlist generation using annotations |
US6546405B2 (en) * | 1997-10-23 | 2003-04-08 | Microsoft Corporation | Annotating temporally-dimensioned multimedia content |
US6105055A (en) * | 1998-03-13 | 2000-08-15 | Siemens Corporate Research, Inc. | Method and apparatus for asynchronous multimedia collaboration |
US6321252B1 (en) * | 1998-07-17 | 2001-11-20 | International Business Machines Corporation | System and method for data streaming and synchronization in multimedia groupware applications |
AU5926099A (en) * | 1998-09-15 | 2000-04-03 | Microsoft Corporation | Annotation creation and notification via electronic mail |
US7035932B1 (en) * | 2000-10-27 | 2006-04-25 | Eric Morgan Dowling | Federated multiprotocol communication |
US7085842B2 (en) * | 2001-02-12 | 2006-08-01 | Open Text Corporation | Line navigation conferencing system |
US7617445B1 (en) * | 2001-03-16 | 2009-11-10 | Ftr Pty. Ltd. | Log note system for digitally recorded audio |
US7047296B1 (en) * | 2002-01-28 | 2006-05-16 | Witness Systems, Inc. | Method and system for selectively dedicating resources for recording data exchanged between entities attached to a network |
US7062712B2 (en) * | 2002-04-09 | 2006-06-13 | Fuji Xerox Co., Ltd. | Binding interactive multichannel digital document system |
AU2002309146A1 (en) * | 2002-06-14 | 2003-12-31 | Nokia Corporation | Enhanced error concealment for spatial audio |
US7613773B2 (en) * | 2002-12-31 | 2009-11-03 | Rensselaer Polytechnic Institute | Asynchronous network audio/visual collaboration system |
US7499046B1 (en) * | 2003-03-15 | 2009-03-03 | Oculus Info. Inc. | System and method for visualizing connected temporal and spatial information as an integrated visual representation on a user interface |
US20040260714A1 (en) * | 2003-06-20 | 2004-12-23 | Avijit Chatterjee | Universal annotation management system |
US20050125717A1 (en) * | 2003-10-29 | 2005-06-09 | Tsakhi Segal | System and method for off-line synchronized capturing and reviewing notes and presentations |
US20050165840A1 (en) * | 2004-01-28 | 2005-07-28 | Pratt Buell A. | Method and apparatus for improved access to a compacted motion picture asset archive |
US20070118794A1 (en) * | 2004-09-08 | 2007-05-24 | Josef Hollander | Shared annotation system and method |
US20060161621A1 (en) * | 2005-01-15 | 2006-07-20 | Outland Research, Llc | System, method and computer program product for collaboration and synchronization of media content on a plurality of media players |
US20080040151A1 (en) * | 2005-02-01 | 2008-02-14 | Moore James F | Uses of managed health care data |
US8856118B2 (en) * | 2005-10-31 | 2014-10-07 | Qwest Communications International Inc. | Creation and transmission of rich content media |
US20070160972A1 (en) * | 2006-01-11 | 2007-07-12 | Clark John J | System and methods for remote interactive sports instruction, analysis and collaboration |
WO2007141204A1 (en) * | 2006-06-02 | 2007-12-13 | Anoto Ab | System and method for recalling media |
FR2910211A1 (en) * | 2006-12-19 | 2008-06-20 | Canon Kk | METHODS AND DEVICES FOR RE-SYNCHRONIZING A DAMAGED VIDEO STREAM |
US20080193051A1 (en) * | 2007-02-12 | 2008-08-14 | Kabushiki Kaisha Toshiba | Image forming processing apparatus and method of processing image for the same |
US20080276159A1 (en) * | 2007-05-01 | 2008-11-06 | International Business Machines Corporation | Creating Annotated Recordings and Transcripts of Presentations Using a Mobile Device |
US20090132583A1 (en) * | 2007-11-16 | 2009-05-21 | Fuji Xerox Co., Ltd. | System and method for capturing, annotating, and linking media |
US20090138508A1 (en) * | 2007-11-28 | 2009-05-28 | Hebraic Heritage Christian School Of Theology, Inc | Network-based interactive media delivery system and methods |
US8340492B2 (en) * | 2007-12-17 | 2012-12-25 | General Instrument Corporation | Method and system for sharing annotations in a communication network |
US20090193345A1 (en) * | 2008-01-28 | 2009-07-30 | Apeer Inc. | Collaborative interface |
FR2931021A1 (en) * | 2008-05-09 | 2009-11-13 | Canon Kk | METHOD FOR SYNCHRONIZING A DATA STREAM TRANSMITTED ON A SYNCHRONOUS COMMUNICATION NETWORK, COMPUTER PROGRAM PRODUCT, STORAGE MEDIUM AND CORRESPONDING RECEIVER DEVICE. |
US8892553B2 (en) * | 2008-06-18 | 2014-11-18 | Microsoft Corporation | Auto-generation of events with annotation and indexing |
US8594290B2 (en) * | 2008-06-20 | 2013-11-26 | International Business Machines Corporation | Descriptive audio channel for use with multimedia conferencing |
US8655953B2 (en) * | 2008-07-18 | 2014-02-18 | Porto Technology, Llc | System and method for playback positioning of distributed media co-viewers |
US20100169906A1 (en) * | 2008-12-30 | 2010-07-01 | Microsoft Corporation | User-Annotated Video Markup |
US20100318520A1 (en) * | 2009-06-01 | 2010-12-16 | Telecordia Technologies, Inc. | System and method for processing commentary that is related to content |
US10423927B2 (en) * | 2009-08-07 | 2019-09-24 | Accenture Global Services Limited | Electronic process-enabled collaboration system |
US8707381B2 (en) * | 2009-09-22 | 2014-04-22 | Caption Colorado L.L.C. | Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs |
-
2010
- 2010-05-06 US US12/775,064 patent/US8438131B2/en not_active Expired - Fee Related
- 2010-09-02 US US12/875,088 patent/US20110110647A1/en not_active Abandoned
- 2010-09-30 US US12/894,557 patent/US20110112832A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7117231B2 (en) * | 2000-12-07 | 2006-10-03 | International Business Machines Corporation | Method and system for the automatic generation of multi-lingual synchronized sub-titles for audiovisual data |
US20060190250A1 (en) * | 2001-04-26 | 2006-08-24 | Saindon Richard J | Systems and methods for automated audio transcription, translation, and transfer |
US7487086B2 (en) * | 2002-05-10 | 2009-02-03 | Nexidia Inc. | Transcript alignment |
US20070011012A1 (en) * | 2005-07-11 | 2007-01-11 | Steve Yurick | Method, system, and apparatus for facilitating captioning of multi-media content |
US20100268534A1 (en) * | 2009-04-17 | 2010-10-21 | Microsoft Corporation | Transcription, archiving and threading of voice communications |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9009040B2 (en) * | 2010-05-05 | 2015-04-14 | Cisco Technology, Inc. | Training a transcription system |
US20110276325A1 (en) * | 2010-05-05 | 2011-11-10 | Cisco Technology, Inc. | Training A Transcription System |
US20120101817A1 (en) * | 2010-10-20 | 2012-04-26 | At&T Intellectual Property I, L.P. | System and method for generating models for use in automatic speech recognition |
US8571857B2 (en) * | 2010-10-20 | 2013-10-29 | At&T Intellectual Property I, L.P. | System and method for generating models for use in automatic speech recognition |
US20120151309A1 (en) * | 2010-12-14 | 2012-06-14 | International Business Machines Corporation | Template application error detection |
US20120151496A1 (en) * | 2010-12-14 | 2012-06-14 | Kabushiki Kaisha Square Enix (Also Trading As Square Enix Co., Ltd.) | Language for task-based parallel programing |
US9495348B2 (en) * | 2010-12-14 | 2016-11-15 | International Business Machines Corporation | Template application error detection |
US8793692B2 (en) * | 2010-12-14 | 2014-07-29 | Kabushiki Kaisha Square Enix | Language for task-based parallel programming |
US20120245936A1 (en) * | 2011-03-25 | 2012-09-27 | Bryan Treglia | Device to Capture and Temporally Synchronize Aspects of a Conversation and Method and System Thereof |
US11997340B2 (en) | 2012-04-27 | 2024-05-28 | Comcast Cable Communications, Llc | Topical content searching |
US10185711B1 (en) * | 2012-09-10 | 2019-01-22 | Google Llc | Speech recognition and summarization |
US10679005B2 (en) | 2012-09-10 | 2020-06-09 | Google Llc | Speech recognition and summarization |
US20140149599A1 (en) * | 2012-11-29 | 2014-05-29 | Ricoh Co., Ltd. | Unified Application Programming Interface for Communicating with Devices and Their Clouds |
US10348661B2 (en) * | 2012-11-29 | 2019-07-09 | Ricoh Company, Ltd. | Unified server for managing a heterogeneous mix of devices |
US20150172237A1 (en) * | 2012-11-29 | 2015-06-18 | Ricoh Co., Ltd. | Unified Application Programming Interface for Communicating with Devices and Their Clouds |
US8972416B1 (en) * | 2012-11-29 | 2015-03-03 | Amazon Technologies, Inc. | Management of content items |
US9954802B2 (en) * | 2012-11-29 | 2018-04-24 | Ricoh Company, Ltd. | Unified application programming interface for communicating with devices and their clouds |
US9363214B2 (en) * | 2012-11-29 | 2016-06-07 | Ricoh Company, Ltd. | Network appliance architecture for unified communication services |
EP2860944B1 (en) * | 2012-11-29 | 2018-08-29 | Ricoh Company, Ltd. | Network appliance architecture for unified communication services |
US9444774B2 (en) * | 2012-11-29 | 2016-09-13 | Ricoh Company, Ltd. | Smart calendar for scheduling and controlling collaboration devices |
US20140149554A1 (en) * | 2012-11-29 | 2014-05-29 | Ricoh Co., Ltd. | Unified Server for Managing a Heterogeneous Mix of Devices |
US20140149592A1 (en) * | 2012-11-29 | 2014-05-29 | Ricoh Co., Ltd. | Network Appliance Architecture for Unified Communication Services |
US20150149478A1 (en) * | 2012-11-29 | 2015-05-28 | Ricoh Co., Ltd. | Unified Server for Managing a Heterogeneous Mix of Devices |
US20140149771A1 (en) * | 2012-11-29 | 2014-05-29 | Ricoh Co., Ltd. | Smart Calendar for Scheduling and Controlling Collaboration Devices |
US20160224554A1 (en) * | 2013-09-12 | 2016-08-04 | Beijing Zhigu Rui Tuo Tech Co., Ltd | Search methods, servers, and systems |
US11080322B2 (en) * | 2013-09-12 | 2021-08-03 | Beijing Zhigu Rui Tuo Tech Co., Ltd | Search methods, servers, and systems |
US9892194B2 (en) * | 2014-04-04 | 2018-02-13 | Fujitsu Limited | Topic identification in lecture videos |
US20150286718A1 (en) * | 2014-04-04 | 2015-10-08 | Fujitsu Limited | Topic identification in lecture videos |
US20160150009A1 (en) * | 2014-11-25 | 2016-05-26 | Microsoft Technology Licensing, Llc | Actionable souvenir from real-time sharing |
US11562733B2 (en) * | 2014-12-15 | 2023-01-24 | Baidu Usa Llc | Deep learning models for speech recognition |
US20190371298A1 (en) * | 2014-12-15 | 2019-12-05 | Baidu Usa Llc | Deep learning models for speech recognition |
US9892095B2 (en) | 2015-06-19 | 2018-02-13 | International Business Machines Corporation | Reconciliation of transcripts |
US9886423B2 (en) | 2015-06-19 | 2018-02-06 | International Business Machines Corporation | Reconciliation of transcripts |
US10089061B2 (en) * | 2015-08-28 | 2018-10-02 | Kabushiki Kaisha Toshiba | Electronic device and method |
US20170061987A1 (en) * | 2015-08-28 | 2017-03-02 | Kabushiki Kaisha Toshiba | Electronic device and method |
US10770077B2 (en) | 2015-09-14 | 2020-09-08 | Toshiba Client Solutions CO., LTD. | Electronic device and method |
US10754508B2 (en) | 2016-01-28 | 2020-08-25 | Microsoft Technology Licensing, Llc | Table of contents in a presentation program |
US10665267B2 (en) | 2017-03-01 | 2020-05-26 | International Business Machines Corporation | Correlation of recorded video presentations and associated slides |
US10347300B2 (en) | 2017-03-01 | 2019-07-09 | International Business Machines Corporation | Correlation of recorded video presentations and associated slides |
US11132108B2 (en) * | 2017-10-26 | 2021-09-28 | International Business Machines Corporation | Dynamic system and method for content and topic based synchronization during presentations |
US20190129591A1 (en) * | 2017-10-26 | 2019-05-02 | International Business Machines Corporation | Dynamic system and method for content and topic based synchronization during presentations |
US10580410B2 (en) * | 2018-04-27 | 2020-03-03 | Sorenson Ip Holdings, Llc | Transcription of communications |
WO2021051024A1 (en) * | 2019-09-11 | 2021-03-18 | Educational Vision Technologies, Inc. | Editable notetaking resource with optional overlay |
US11367445B2 (en) * | 2020-02-05 | 2022-06-21 | Citrix Systems, Inc. | Virtualized speech in a distributed network environment |
CN111564157A (en) * | 2020-03-18 | 2020-08-21 | 浙江省北大信息技术高等研究院 | Conference record optimization method, device, equipment and storage medium |
US20220148583A1 (en) * | 2020-11-12 | 2022-05-12 | International Business Machines Corporation | Intelligent media transcription |
US12033619B2 (en) * | 2020-11-12 | 2024-07-09 | International Business Machines Corporation | Intelligent media transcription |
US11972759B2 (en) * | 2020-12-02 | 2024-04-30 | International Business Machines Corporation | Audio mistranscription mitigation |
WO2023154351A3 (en) * | 2022-02-08 | 2023-09-21 | My Job Matcher, Inc. D/B/A Job.Com | Apparatus and method for automated video record generation |
Also Published As
Publication number | Publication date |
---|---|
US20110110647A1 (en) | 2011-05-12 |
US20110113011A1 (en) | 2011-05-12 |
US8438131B2 (en) | 2013-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20110112832A1 (en) | Auto-transcription by cross-referencing synchronized media resources | |
US20240029025A1 (en) | Computer-based method and system of analyzing, editing and improving content | |
US12014737B2 (en) | Systems, methods, and computer-readable storage device for generating notes for a meeting based on participant actions and machine learning | |
US10629189B2 (en) | Automatic note taking within a virtual meeting | |
US9569428B2 (en) | Providing an electronic summary of source content | |
US10685668B2 (en) | System and method of improving communication in a speech communication system | |
US20150066935A1 (en) | Crowdsourcing and consolidating user notes taken in a virtual meeting | |
US10613825B2 (en) | Providing electronic text recommendations to a user based on what is discussed during a meeting | |
US20200403816A1 (en) | Utilizing volume-based speaker attribution to associate meeting attendees with digital meeting content | |
CN113574555A (en) | Intelligent summarization based on context analysis of auto-learning and user input | |
US11665010B2 (en) | Intelligent meeting recording using artificial intelligence algorithms | |
US10990828B2 (en) | Key frame extraction, recording, and navigation in collaborative video presentations | |
US11403596B2 (en) | Integrated framework for managing human interactions | |
US20240430117A1 (en) | Techniques for inferring context for an online meeting | |
CN113111658A (en) | Method, device, equipment and storage medium for checking information | |
US20160342639A1 (en) | Methods and systems for generating specialized indexes of recorded meetings | |
Aichroth et al. | Mico-media in context | |
US20240037316A1 (en) | Automatically summarizing event-related data using artificial intelligence techniques | |
CN113852835A (en) | Live audio processing method, device, electronic device and storage medium | |
US20250029612A1 (en) | Guiding transcript generation using detected section types as part of automatic speech recognition | |
US20240428011A1 (en) | System and method for natural language based command recognition | |
CN116052659A (en) | Information processing method and device in conference scene, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ALTUS LEARNING SYSTEMS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PROROCK, MICHAEL F.;PROROCK, THOMAS J.;SIGNING DATES FROM 20100927 TO 20100929;REEL/FRAME:025073/0130 |
|
AS | Assignment |
Owner name: ALTUS365, INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:ALTUS LEARNING SYSTEMS, INC.;REEL/FRAME:029380/0145 Effective date: 20110718 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |