CN116501919A - Prompting method, device, equipment and storage medium - Google Patents
Prompting method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116501919A CN116501919A CN202310552144.7A CN202310552144A CN116501919A CN 116501919 A CN116501919 A CN 116501919A CN 202310552144 A CN202310552144 A CN 202310552144A CN 116501919 A CN116501919 A CN 116501919A
- Authority
- CN
- China
- Prior art keywords
- text
- target object
- audio data
- content
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/74—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Human Computer Interaction (AREA)
- Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
Abstract
本申请公开了一种提示方法、装置、设备和存储介质。该方法包括:获取目标对象的历史音频数据;根据所述历史音频数据确定所述目标对象的语速文本;获取所述目标对象的实时音频数据;确定所述实时音频数据对应文本在所述目标对象的语速文本的位置信息;根据所述位置信息和所述目标对象的语速文本确定需要提示的内容,并通过提词显示屏显示所述需要提示的内容。通过本申请的技术方案,能够根据演讲者或演唱者正在演讲或演唱的内容在事先录制的音频中快速定位,进而确定需要提示的内容,提高提词设备的便捷性,准确的为演讲者或演唱者提供基于事先录制音频的任意内容的个性化提词信息和速度。
The application discloses a prompting method, device, equipment and storage medium. The method includes: acquiring historical audio data of a target object; determining the speech rate text of the target object according to the historical audio data; acquiring real-time audio data of the target object; determining that the text corresponding to the real-time audio data is in the target The location information of the speech rate text of the object; determine the content to be prompted according to the location information and the speech rate text of the target object, and display the content to be prompted through the prompt display screen. Through the technical solution of this application, it is possible to quickly locate in the pre-recorded audio according to the content of the speaker or singer's speech or singing, and then determine the content that needs to be prompted, improve the convenience of the teleprompter device, and accurately provide the speaker or singer Singers provide personalized teleprompter messages and tempo based on any content from pre-recorded audio.
Description
技术领域technical field
本申请实施例涉及音频处理技术领域,尤其涉及一种提示方法、装置、设备和存储介质。The embodiments of the present application relate to the technical field of audio processing, and in particular, to a prompt method, device, device, and storage medium.
背景技术Background technique
提词是生活中常见的一种技术。比如,我们唱卡拉OK就需提词。除了演唱,演讲也需要提词。通常,演讲用的提词器是一种显示屏,通过一个高亮度的显示器件显示文稿内容,并将内容反射到摄像机镜头前一块呈45度角的专用镀膜玻璃上,把台词反射出来,使得演讲者在看演讲词时,也能面对摄像机。它与演讲者、摄像机、三角架支撑在同一轴线上,从而产生演讲者始终面向观众的亲切感,提高了演讲质量。每个人其实有自己独特的说话语速,但是,无论是演讲还是演唱,目前的提词器都无法提供个性化的提词服务。Teleprompter is a common technology in life. For example, when we sing karaoke, we need prompts. In addition to singing, speeches also require prompts. Usually, the teleprompter used for speech is a kind of display screen, which displays the content of the document through a high-brightness display device, and reflects the content to a special coated glass at a 45-degree angle in front of the camera lens, reflecting the lines, making The speaker can also face the camera while reading the speech. It is supported on the same axis with the speaker, camera, and tripod, so that the intimacy that the speaker is always facing the audience improves the quality of the speech. Everyone actually has their own unique speaking speed. However, no matter whether it is speech or singing, the current teleprompter cannot provide personalized teleprompter service.
发明内容Contents of the invention
有鉴于此,本申请实施例提供了一种提示方法,包括:In view of this, the embodiment of this application provides a reminder method, including:
获取目标对象的历史音频数据;Obtain historical audio data of the target object;
根据所述历史音频数据确定所述目标对象的语速文本;determining the speech rate text of the target object according to the historical audio data;
获取所述目标对象的实时音频数据;Acquiring real-time audio data of the target object;
确定所述实时音频数据对应文本在所述目标对象的语速文本的位置信息;Determining the position information of the text corresponding to the real-time audio data in the speech rate text of the target object;
根据所述位置信息和所述目标对象的语速文本确定需要提示的内容,并通过提词显示屏显示所述需要提示的内容。The content to be prompted is determined according to the location information and the speech rate text of the target object, and the content to be prompted is displayed on a prompter display screen.
本申请实施例还提供了一种提示装置,该装置包括:The embodiment of the present application also provides a reminder device, which includes:
第一获取模块,用于获取目标对象的历史音频数据;The first obtaining module is used to obtain the historical audio data of the target object;
第一确定模块,用于根据所述历史音频数据确定所述目标对象的语速文本;A first determination module, configured to determine the speech rate text of the target object according to the historical audio data;
第二获取模块,用于获取所述目标对象的实时音频数据;The second obtaining module is used to obtain the real-time audio data of the target object;
第二确定模块,用于确定所述实时音频数据对应文本在所述目标对象的语速文本的位置信息;The second determination module is used to determine the position information of the text corresponding to the real-time audio data in the speech rate text of the target object;
获取与提示模块,用于根据所述位置信息和所述目标对象的语速文本确定需要提示的内容,并通过提词显示屏显示所述需要提示的内容。The obtaining and prompting module is configured to determine the content to be prompted according to the location information and the speech rate text of the target object, and display the content to be prompted through the prompt display screen.
本申请实施例还提供了一种电子设备,所述电子设备包括:The embodiment of the present application also provides an electronic device, and the electronic device includes:
至少一个处理器;以及at least one processor; and
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行本申请任一实施例所述的提示方法。The memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the method described in any embodiment of the present application. Prompt method.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现本申请任一实施例所述的提示方法。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable the processor to implement the prompting method described in any embodiment of the present application. .
应当理解,本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征,也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the application, nor is it intended to limit the scope of the application. Other features of the present application will be easily understood from the following description.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings that are required in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application, and thus It should be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings based on these drawings without creative work.
图1是本申请实施例中的一种提示方法的流程图;FIG. 1 is a flow chart of a prompting method in an embodiment of the present application;
图2是本申请实施例中的一种提示方法的示意图;Fig. 2 is a schematic diagram of a prompting method in the embodiment of the present application;
图3是本申请实施例中的一种提示装置的结构示意图;Fig. 3 is a schematic structural diagram of a prompting device in an embodiment of the present application;
图4是实现本申请实施例的提示方法的电子设备的结构示意图。Fig. 4 is a schematic structural diagram of an electronic device implementing a prompting method according to an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
实施例一Embodiment one
提词器是演讲者或演员在演讲或演唱时向演讲者或演员提示台词的专业装置,一般布置在演讲者或演员的面前,但提词内容需要人工手动输入,使用时也需要专人在幕后手动控制显示顺序。另外,提词器体积较为笨拙不容易部署,同时在一些重要场合如向上级汇报等场景不方便布置提词器。现有技术中,与提词类似的还有一种字幕技术,两者的区别在于字幕是以声音为原始输入,输出对应的文字,目的在于给观看者提供信息;而提词的目地在于帮助演讲者或演唱者掌握演讲或演唱的内容和节奏,侧重于通过文字渐隐和高亮位置来提示演讲者或演唱者的语速和语调,这项功能在演讲或演唱等场景至关重要。The teleprompter is a professional device for the speaker or actor to prompt the speaker or actor with lines during speech or singing. It is generally arranged in front of the speaker or actor, but the content of the teleprompter needs to be manually input, and a special person is required to be behind the scenes when using it Manually control the display order. In addition, the teleprompter is relatively bulky and difficult to deploy, and it is inconvenient to arrange the teleprompter in some important occasions such as reporting to the superior. In the existing technology, there is a subtitle technology similar to the teleprompter. The difference between the two is that the subtitle uses sound as the original input and outputs the corresponding text, with the purpose of providing information to the viewers; and the purpose of the teleprompter is to help the speech The speaker or singer masters the content and rhythm of the speech or singing, focusing on prompting the speaker or singer's speech speed and intonation through text fade-in and highlight positions. This function is very important in speech or singing scenes.
本申请实施例提供一种提示方法、装置、设备和存储介质,以实现根据演讲者或演唱者正在演讲或演唱的内容在事先录制的音频中快速定位,进而确定需要提示的内容,提高提词设备的便捷性,能够准确的为演讲者或演唱者提供任意内容的个性化提词信息和速度。The embodiment of the present application provides a prompting method, device, device and storage medium, so as to quickly locate in the pre-recorded audio according to the content of the speaker or singer's speech or singing, and then determine the content that needs to be prompted, and improve the prompting. The convenience of the equipment can accurately provide the speaker or singer with personalized teleprompter information and speed for any content.
图1是本申请实施例中的一种提示方法的流程图,本实施例可适用于提示的情况,该方法可以由本申请实施例中的提示装置来执行,该装置可采用软件和/或硬件的方式实现,如图1所示,该方法具体包括如下步骤:Fig. 1 is a flow chart of a reminder method in the embodiment of the present application. This embodiment is applicable to the situation of prompting. The method can be executed by the prompting device in the embodiment of the present application. The device can use software and/or hardware As shown in Figure 1, the method specifically includes the following steps:
S101、获取目标对象的历史音频数据。S101. Acquire historical audio data of a target object.
在本实施例中,目标对象可以是演讲者或演唱者等用户,具体的目标对象的个数可以是一个或多个,本实施例对此不进行限定。In this embodiment, the target object may be a user such as a speaker or a singer, and the number of specific target objects may be one or more, which is not limited in this embodiment.
其中,历史音频数据可以是事先录制的目标对象对应的音频文件数据。示例性的,目标对象的历史音频数据可以是演讲者进行诗词歌赋等朗诵演讲内容时的音频数据,或者可以是演唱者进行的歌曲戏曲等曲艺内容演唱时的音频数据。Wherein, the historical audio data may be pre-recorded audio file data corresponding to the target object. Exemplarily, the historical audio data of the target object may be the audio data when the speaker recites the speech content such as poems and poems, or may be the audio data when the singer performs the singing of folk art content such as songs and operas.
具体的,在一个实施例中,目标对象在进行演讲或演唱时可以事先录制历史音频数据,录制工具例如可以是任何录音设备(比如,录音棚的录音设备或智能手机等终端设备)。录音设备录制的目标对象在进行演讲或演唱时的历史音频数据也可能被存储在音频存储设备中。从录音设备或音频存储设备中可以获取事先录制的目标对象在进行演讲或演唱时的历史音频数据。Specifically, in one embodiment, the target object can record historical audio data in advance when giving a speech or singing, and the recording tool can be any recording device (for example, a recording device in a recording studio or a terminal device such as a smart phone). The historical audio data recorded by the recording device when the target object is speaking or singing may also be stored in the audio storage device. The pre-recorded historical audio data of the target object speaking or singing can be obtained from the recording device or the audio storage device.
S102、根据历史音频数据确定目标对象的语速文本。S102. Determine the speech rate text of the target object according to the historical audio data.
需要说明的是,语速文本可以是从音频数据中提取出来的,目标对象在进行演讲或演唱时说话语速对应的文本。It should be noted that the speech rate text may be extracted from the audio data, and is the text corresponding to the speech rate of the target object when speaking or singing.
从音频数据提取文本可以由任何语音技术获得。在一个实施例中,可以采用AI语音识别技术(如DNN-HMM深度神经网络技术)对历史音频数据进行语音识别,同时通过AI语法和语义识别转写出历史音频数据对应的语速文本,可以简化提词内容提取的难度,减少人工制作的时间。Extracting text from audio data can be obtained by any speech technology. In one embodiment, AI speech recognition technology (such as DNN-HMM deep neural network technology) can be used to perform speech recognition on historical audio data, and at the same time, the speech rate text corresponding to historical audio data can be transcribed through AI grammar and semantic recognition, which can Simplify the difficulty of extracting prompt content and reduce the time for manual production.
具体的,在一个实施例中,从录音设备或音频存储设备中获取到事先录制的历史音频数据之后,通过AI语法和语义识别转写出历史音频数据对应的语速文本,记录到lrc(歌词对应的英文单词lyric的缩写,被用做歌词文件的扩展名)文件中。Specifically, in one embodiment, after obtaining the pre-recorded historical audio data from the recording device or audio storage device, the speech rate text corresponding to the historical audio data is transcribed through AI grammar and semantic recognition, and recorded in the lrc (lyric The abbreviation of the corresponding English word lyric is used as the extension of the lyrics file) in the file.
示例性的,一段历史音频数据中包含两句诗“白日依山尽,黄河入海流”,对应的转换出来的lrc文件中语速文本的格式可以表示如下:Exemplarily, a piece of historical audio data contains two lines of poem "The sun is at the end of the mountain, and the Yellow River flows into the sea". The format of the corresponding speech rate text in the converted lrc file can be expressed as follows:
[00:00.000]<1050>白<2300>日<800>依<500>山<2800>尽,[00:00.000]<1050>white<2300>day<800>by<500>mountain<2800>,
[00:07.450]<1500>黄<1200>河<1800>入<750>海<530>流。[00:07.450]<1500>Yellow<1200>River<1800>Enter<750>Sea<530>Current.
其中,[00:00.000]可以是第一句“白日依山尽”的开始时间,[00:07.450]可以是第二句“黄河入海流”的开始时间,每个字前面的数字可以是每个字的持续时间。比如,<1050>是“白”字耗时1.050秒,<2300>是“日”字耗时2.300秒,等等。Among them, [00:00.000] can be the start time of the first sentence "Bai Sun Yi Shan Jin", [00:07.450] can be the start time of the second sentence "Yellow River flows into the sea", and the number in front of each word can be duration of each word. For example, <1050> takes 1.050 seconds for the word "白", <2300> takes 2.300 seconds for the word "日", and so on.
在实际操作过程中,上述lrc文件的格式还可以为通过区间划分的时间段表示。此外,上述lrc文件的内容还可以为歌曲、演讲稿或戏剧等。In an actual operation process, the format of the above lrc file may also be expressed as a time period divided by intervals. In addition, the content of the above-mentioned lrc file may also be a song, a speech, or a drama.
在生成lrc文件时不仅需要根据历史音频数据识别出所讲内容,同时还需要分隔出每个文字持续的时间,以便演讲时能变色或高亮提示文字,提醒用户演讲进度。When generating the lrc file, it is not only necessary to identify the content of the speech based on historical audio data, but also to separate the duration of each text, so that the color can be changed or the prompt text can be highlighted during the speech to remind the user of the progress of the speech.
S103、获取目标对象的实时音频数据。S103. Acquire real-time audio data of the target object.
在本申请实施例中,实时音频数据与历史音频数据相对,实时音频数据可以是实时录制的目标对象对应的音频文件数据。示例性的,目标对象的实时音频数据可以是演讲者进行诗词歌赋等朗诵演讲内容时的音频数据,或者可以是演唱者进行的歌曲戏曲等曲艺内容演唱时的音频数据。In this embodiment of the present application, the real-time audio data is opposite to the historical audio data, and the real-time audio data may be audio file data corresponding to a target object recorded in real time. Exemplarily, the real-time audio data of the target object may be the audio data when the speaker recites the speech content such as poems and songs, or may be the audio data when the singer performs the singing of folk art content such as songs and operas.
具体的,在一个实施例中,目标对象在进行演讲或演唱时可以采集实时音频数据,采集工具例如可以是目标对象佩戴的AR眼镜或其它带有麦克风的设备。目标对象在进行演讲或演唱时的实时音频数据可以从目标对象佩戴的AR眼镜端或其它带有麦克风的设备获取。Specifically, in one embodiment, the target object may collect real-time audio data when speaking or singing, and the collection tool may be, for example, AR glasses worn by the target object or other devices with microphones. The real-time audio data of the target object when speaking or singing can be obtained from the AR glasses worn by the target object or other devices with microphones.
S104、确定实时音频数据对应文本在目标对象的语速文本的位置信息。S104. Determine the position information of the text corresponding to the real-time audio data in the speech rate text of the target object.
在各个实施例中,目标对象的实时音频数据对应文本可以和目标对象的语速文本进行比对,得到实时音频数据对应文本在目标对象的语速文本中的位置信息。In various embodiments, the text corresponding to the real-time audio data of the target object may be compared with the speech rate text of the target object to obtain position information of the text corresponding to the real-time audio data in the speech rate text of the target object.
具体的,在一个实施例中,录音设备或音频存储设备将目标对象的语速文本同步到目标对象佩戴的AR眼镜端,AR眼镜将目标对象的实时音频数据对应文本和目标对象的语速文本进行比对,找出实时音频数据对应文本在目标对象的语速文本中的位置信息。Specifically, in one embodiment, the recording device or audio storage device synchronizes the speech rate text of the target object to the AR glasses end worn by the target object, and the AR glasses converts the real-time audio data of the target object to the corresponding text and the speech rate text of the target object Compare and find out the position information of the text corresponding to the real-time audio data in the speech rate text of the target object.
示例性的,目标对象的语速文本为“<1050>白<2300>日<800>依<500>山<2800>尽”,实时音频数据对应文本为“白日”,则实时音频数据对应文本在目标对象的语速文本中的位置信息可以是最开始的前两个字的位置。Exemplarily, the speech rate text of the target object is "<1050>Bai<2300>Day<800>Yi<500>Mountain<2800>End", and the corresponding text of the real-time audio data is "Day Day", then the real-time audio data corresponds to The position information of the text in the speech rate text of the target object may be the positions of the first two characters.
S105、根据位置信息和目标对象的语速文本确定需要提示的内容,并通过提词显示屏显示需要提示的内容。S105. Determine the content to be prompted according to the location information and the speech rate text of the target object, and display the content to be prompted through the prompt display screen.
需要解释的是,需要提示的内容可以是需要在提词显示屏上进行显示的内容。It should be explained that the content to be prompted may be the content to be displayed on the prompt display screen.
在本申请的一个实施例中,提词显示器可以是目标对象佩戴的可穿戴设备,例如可以是AR眼镜。In an embodiment of the present application, the teleprompter display may be a wearable device worn by the target object, such as AR glasses.
AR是一种实时地计算摄影机影像的位置及角度,并加上相应图像、视频以及三维模型的技术,通过摄像获取真实环境信息,然后将虚拟投射物体(如图像、场景或系统提示信息等)叠加到获取到的真实环境信息中,再展示给用户,从而实现用户在感官上能同时看到虚拟物体存在于现实环境的视觉效果,即实现对现实场景的“增强”。AR is a technology that calculates the position and angle of camera images in real time, and adds corresponding images, videos, and 3D models. It obtains real environment information through cameras, and then projects virtual objects (such as images, scenes, or system prompts, etc.) Superimposed on the obtained real environment information, and then displayed to the user, so that the user can simultaneously see the visual effect of the virtual object existing in the real environment, that is, to realize the "enhancement" of the real scene.
具体的,在一个实施例中,目标对象佩戴的可穿戴设备例如AR眼镜,根据实时音频数据对应文本在目标对象的语速文本的位置信息和目标对象的语速文本确定需要提示的内容。示例性的,目标对象的语速文本可以是含有时间(语速)信息的文本“白日依山尽,黄河入海流”,实时音频数据对应文本在目标对象的语速文本的位置信息可以是最开始的前两个字的位置,即“白日”,则确定的需要提示的内容可以是“白日”后面的内容,例如可以是“依”、“依山尽”或者“依山尽,黄河入海流”,具体的提示文字的个数可以由用户预先设置,本实施例对此不进行限定。确定需要提示的内容后,通过目标对象佩戴的AR眼镜的显示屏显示需要提示的内容,准确的为演讲者或演唱者提供任意内容的提词信息,提高了提词设备的便捷性,同时提词内容投射在AR眼镜内屏,外部无法查看,提高了使用的私密性,保护了演讲者的隐私。Specifically, in one embodiment, the wearable device worn by the target object, such as AR glasses, determines the content to be prompted according to the position information of the text corresponding to the real-time audio data in the target object's speech rate text and the target object's speech rate text. Exemplarily, the speech rate text of the target object may be a text containing time (speech rate) information "The sun is at the end of the mountain, and the Yellow River flows into the sea", and the position information of the text corresponding to the real-time audio data in the speech rate text of the target object may be The position of the first two characters, that is, "Bai Ri", then the determined content that needs to be prompted can be the content after "Bai Ri", for example, it can be "Yi", "Yi Shan Jin" or "Yi Shan Jin". , the Yellow River flows into the sea", the number of specific prompt texts can be preset by the user, which is not limited in this embodiment. After determining the content that needs to be prompted, the display screen of the AR glasses worn by the target object displays the content that needs to be prompted, and accurately provides the speaker or singer with any content prompt information, which improves the convenience of the teleprompter device, and at the same time provides The word content is projected on the inner screen of the AR glasses, and cannot be viewed from the outside, which improves the privacy of use and protects the privacy of the speaker.
在实际操作过程中,当目标对象有多个时,即同时有多个用户在进行演讲或者演唱时,在一个实施例中,可以将所有目标对象的声音录制成一个音频数据,在每个目标的提词显示器上进行显示,每个用户可以同时查看所有人的演讲或者演唱内容(每个用户对应的目标文本可以用不同的颜色或其他效果进行区分),当用户想只查看自己演讲或者演唱的内容时,可以只选择自己演讲或者演唱的部分进行显示。在另一个实施例中,每个目标对象自己的演讲或者演唱的内容可以制成一个单独的音频数据,然后通过各自的提词显示器进行显示。In the actual operation process, when there are multiple target objects, that is, when multiple users are speaking or singing at the same time, in one embodiment, the voices of all target objects can be recorded into one audio data, and each target object Displayed on the teleprompter display, each user can view everyone's speech or singing content at the same time (the target text corresponding to each user can be distinguished by different colors or other effects), when the user wants to only view his own speech or singing When viewing content, you can select only the part of your speech or singing to display. In another embodiment, each target object's own speech or singing content can be made into a separate audio data, and then displayed through the respective teleprompter display.
本申请实施例通过获取目标对象的历史音频数据,根据历史音频数据确定目标对象的语速文本,获取目标对象的实时音频数据,确定实时音频数据对应文本在目标对象的语速文本的位置信息,根据位置信息和目标对象的语速文本确定需要提示的内容,并通过提词显示屏显示需要提示的内容。通过本申请的技术方案,能够根据演讲者或演唱者正在演讲或演唱的内容在事先录制的音频中快速定位,进而确定需要提示的内容,提高提词设备的便捷性,准确的为演讲者或演唱者提供基于事先录制音频的任意内容的个性化提词信息和速度。In the embodiment of the present application, by acquiring the historical audio data of the target object, determining the speech rate text of the target object according to the historical audio data, obtaining the real-time audio data of the target object, and determining the position information of the text corresponding to the real-time audio data in the speech rate text of the target object, Determine the content that needs to be prompted according to the location information and the speech rate text of the target object, and display the content that needs to be prompted through the teleprompter display. Through the technical solution of this application, it is possible to quickly locate in the pre-recorded audio according to the content of the speaker or singer's speech or singing, and then determine the content that needs to be prompted, improve the convenience of the teleprompter device, and accurately provide the speaker or singer Singers provide personalized teleprompter messages and tempo based on any content from pre-recorded audio.
在一些实施例中,根据位置信息和目标对象的语速文本确定需要提示的内容,包括:In some embodiments, the content to be prompted is determined according to the location information and the speech rate text of the target object, including:
获取历史音频数据对应的原始文本。Get the original text corresponding to the historical audio data.
需要说明的是,原始文本可以是演讲者或演唱者采集历史音频数据时演讲或演唱的准确的、没有错别字或多字漏字的文本。It should be noted that the original text may be an accurate speech or sung by a speaker or singer when collecting historical audio data, without typos or missing characters.
示例性的,历史音频数据对应的原始文本可以是演讲者的演讲稿等文本内容或演唱者的歌词等文本内容。Exemplarily, the original text corresponding to the historical audio data may be text content such as a lecturer's speech, or text content such as a singer's lyrics.
根据位置信息、原始文本以及目标对象的语速文本确定需要提示的内容。The content to be prompted is determined according to the location information, the original text, and the speech rate text of the target object.
在一个实施例中,历史音频数据和/或实时音频数据在进行语音、语法和语义识别时,可能会出现错别字或多字漏字等识别不准确的情况,例如,历史音频数据对应的原始文本可以是“白日依山尽,黄河入海流”,但对历史音频数据和/或实时音频数据进行识别时可能会出现错别字如“百日依山尽,黄河入海流”,此时,可以根据实时音频数据的识别结果在历史音频数据对应的原始文本中找到最合适的位置来确定需要提示的内容。In one embodiment, when historical audio data and/or real-time audio data are recognized for speech, grammar, and semantics, there may be inaccurate recognition such as typos or multi-word omissions, for example, the original text corresponding to historical audio data It can be "the sun is at the end of the mountain, the Yellow River flows into the sea", but typos may appear when recognizing historical audio data and/or real-time audio data, such as "a hundred days are at the end of the mountain, the Yellow River flows into the sea", at this time, you can use the The recognition result of the real-time audio data finds the most suitable position in the original text corresponding to the historical audio data to determine the content that needs prompting.
具体的,根据实时音频数据对应文本在目标对象的语速文本的位置信息、原始文本以及目标对象的语速文本确定需要提示的内容。Specifically, the content to be prompted is determined according to the position information of the text corresponding to the real-time audio data in the speech rate text of the target object, the original text, and the speech rate text of the target object.
在一些实施例中,根据历史音频数据确定目标对象的语速文本,包括:In some embodiments, determining the speech rate text of the target object according to historical audio data includes:
获取历史音频数据的音频幅度信息和音频频率信息。Obtain audio amplitude information and audio frequency information of historical audio data.
具体的,获取事先录制的历史音频数据中音频信号的幅度信息和频率信息。Specifically, the amplitude information and frequency information of the audio signal in the pre-recorded historical audio data are obtained.
对音频幅度信息和音频频率信息进行识别,得到历史音频数据对应的文字和语速信息。Identify the audio amplitude information and audio frequency information, and obtain the text and speech rate information corresponding to the historical audio data.
在一个实施例中,可以采用AI语音识别技术(如DNN-HMM深度神经网络技术)对历史音频数据进行语音识别,同时通过AI语法和语义识别转写出历史音频数据对应的文字和语速信息,可以简化提词内容提取的难度,减少人工制作的时间。但是本技术方案的实施不局限于AI语音、语法和语义识别。In one embodiment, AI speech recognition technology (such as DNN-HMM deep neural network technology) can be used to perform speech recognition on historical audio data, and at the same time, the text and speech rate information corresponding to historical audio data can be transcribed through AI grammar and semantic recognition , which can simplify the difficulty of extracting the prompt content and reduce the time for manual production. However, the implementation of the technical solution is not limited to AI speech, grammar and semantic recognition.
在一个实施例中,获取事先录制的历史音频数据,根据历史音频数据中音频信号的幅度和频率变化识别出断句、词语间隔位置以及对应的时间,记录到lrc(歌词对应的英文单词lyric的缩写,被用做歌词文件的扩展名)文件中,通过分析历史音频数据中人声的幅度变化检测出每句话乃至每个字或词的持续时间。In one embodiment, obtain pre-recorded historical audio data, identify sentence breaks, word interval positions, and corresponding time according to the amplitude and frequency changes of the audio signal in the historical audio data, and record them in lrc (the abbreviation of the English word lyric corresponding to the lyrics) , is used as the extension of the lyrics file) file, the duration of each sentence or even each word or word is detected by analyzing the amplitude change of the human voice in the historical audio data.
根据历史音频数据对应的文字和语速信息确定目标对象的语速文本。The speech rate text of the target object is determined according to the text and speech rate information corresponding to the historical audio data.
具体的,根据历史音频数据对应的文字、断句、词语间隔位置以及对应的时间等信息可以确定出目标对象的语速文本。Specifically, the speech rate text of the target object can be determined according to information such as words, sentences, word interval positions and corresponding time corresponding to the historical audio data.
在一些实施例中,语速信息包括:每个文字对应的持续时间、每个词对应的持续时间、每句话对应的持续时间以及每句话对应的起始时间中的至少一种。In some embodiments, the speech rate information includes: at least one of duration corresponding to each character, duration corresponding to each word, duration corresponding to each sentence, and start time corresponding to each sentence.
示例性的,可以根据历史音频数据中每句话对应的起始时间和历史音频数据中每句话对应的持续时间确定历史音频数据中两句话之间的间隔时间。具体的,根据历史音频数据中某句话对应的起始时间加上历史音频数据中该句话对应的持续时间的和,可以确定历史音频数据中该句话对应的持续时间以及历史音频数据中该句话对应的结束时间,根据历史音频数据中该句话对应的结束时间和历史音频数据中该句话下一句话对应的起始时间可以确定历史音频数据中该句话和该句话下一句话之间的间隔时间。在一个实施例中,两句话之间的间隔时间对应的提示方式可以是预设的,例如可以是进度条,具体可以是几个圆点,随着时间进度逐渐改变颜色或大小等提示方式,在另一个实施例中,两句话之间的间隔时间对应的提示方式也可以是其他可以用于表示时间进度的提示方式,本实施例对此不进行限定。获取两句话之间的间隔时间所对应的提示方式,根据历史音频数据中每个文字对应的持续时间确定历史音频数据中每个文字对应的提示方式,根据历史音频数据中每个文字对应的提示方式确定历史音频数据中每句话对应的提示方式,根据每两句话之间的间隔时间对应的提示方式和历史音频数据中每句话对应的提示方式确定历史音频数据对应的提示方式。Exemplarily, the interval time between two sentences in the historical audio data may be determined according to the start time corresponding to each sentence in the historical audio data and the duration corresponding to each sentence in the historical audio data. Specifically, according to the sum of the start time corresponding to a sentence in the historical audio data plus the duration corresponding to the sentence in the historical audio data, the duration corresponding to the sentence in the historical audio data and the duration corresponding to the sentence in the historical audio data can be determined. The end time corresponding to the sentence can be determined according to the end time corresponding to the sentence in the historical audio data and the corresponding start time of the next sentence in the historical audio data. The interval between sentences. In one embodiment, the prompt method corresponding to the interval between two sentences can be preset, for example, it can be a progress bar, specifically several dots, and the prompt method such as gradually changing color or size as time progresses , in another embodiment, the prompting manner corresponding to the interval between two sentences may also be other prompting manners that can be used to indicate the time progress, which is not limited in this embodiment. Obtain the prompt mode corresponding to the interval between two sentences, determine the prompt mode corresponding to each text in the historical audio data according to the duration corresponding to each text in the historical audio data, and determine the corresponding prompt mode for each text in the historical audio data according to the corresponding duration of each text in the historical audio data The prompt mode determines the prompt mode corresponding to each sentence in the historical audio data, and determines the prompt mode corresponding to the historical audio data according to the prompt mode corresponding to the interval between every two sentences and the prompt mode corresponding to each sentence in the historical audio data.
示例性的,在一个实施例中,当演讲者或演唱者进行演讲或演唱前,可以存在前奏音乐,或留白,位于演讲或演唱的第一句之前。获取历史音频数据的起始时间,根据历史音频数据的起始时间和语速文本的起始时间确定时间间隔。时间间隔对应的提示方式可以是预设的,例如可以是进度条,具体可以是几个圆点,随着时间进度逐渐改变颜色或大小等提示方式,此外,也可以是其他可以用于表示时间进度的提示方式,本实施例对此不进行限定。根据时间间隔对应的提示方式、历史音频数据中每句话对应的提示方式、每两句话之间的间隔时间对应的提示方式确定需要提示的内容对应的提示方式。Exemplarily, in one embodiment, when a speaker or a singer makes a speech or sings, there may be prelude music, or blank space, before the first sentence of the speech or sing. The start time of the historical audio data is obtained, and the time interval is determined according to the start time of the historical audio data and the start time of the speech rate text. The prompt method corresponding to the time interval can be preset, for example, it can be a progress bar, specifically it can be several dots, and the prompt method gradually changes in color or size as time progresses. In addition, it can also be other prompts that can be used to represent time. The way of prompting progress is not limited in this embodiment. The prompt mode corresponding to the content to be prompted is determined according to the prompt mode corresponding to the time interval, the prompt mode corresponding to each sentence in the historical audio data, and the prompt mode corresponding to the interval between every two sentences.
在一些实施例中,根据位置信息、原始文本以及目标对象的语速文本确定需要提示的内容,包括:In some embodiments, the content to be prompted is determined according to the location information, the original text, and the speech rate text of the target object, including:
根据位置信息和原始文本确定将要发音的文字。The text to be pronounced is determined based on the location information and the original text.
需要说明的是,将要发音的文字可以是演讲者或演唱者等目标对象在演讲或演唱时将要进行发音的文字。It should be noted that the text to be pronounced may be the text to be pronounced by the target object such as a speaker or a singer when speaking or singing.
在一个实施例中,原始文本可以是“白日依山尽,黄河入海流”,演讲者或演唱者等目标对象刚发音完“白日”,“白日”的位置信息是原始文本中最开始的前两个字的位置,则确定的将要发音的文字可以是“白日”后面的文字,例如可以是“依”、“依山尽”或者“依山尽,黄河入海流”,具体的提示文字的个数可以由用户预先设置,本实施例对此不进行限定。In one embodiment, the original text may be "Bai Ri ends at the end of the mountain, the Yellow River flows into the sea", and the target object such as a speaker or a singer has just finished pronouncing "Bai Ri", and the position information of "Bai Ri" is the most recent in the original text. The position of the first two words at the beginning, then the determined text to be pronounced can be the text behind "Bai Ri", for example, it can be "Yi", "Yi Shan Jin" or "Yi Shan Jin, the Yellow River flows into the sea", specifically The number of prompt texts can be preset by the user, which is not limited in this embodiment.
根据目标对象的语速文本确定将要发音的文字的提示方式。The prompting method of the text to be pronounced is determined according to the speech rate text of the target audience.
具体的,确定将要发音的文字之后,可以根据目标对象的语速文本确定将要发音的文字的提示方式。Specifically, after the text to be pronounced is determined, a prompting manner for the text to be pronounced may be determined according to the speech rate text of the target object.
在一些实施例中,提示方式包括:用光标、高亮以及跑马灯中的至少一种来提示将要发音的文字。In some embodiments, the prompting method includes: using at least one of a cursor, a highlight and a marquee to prompt the text to be pronounced.
在一个实施例中,将要发音的文字对应的提示方式例如可以是用光标,在另一个实施例中,将要发音的文字对应的提示方式例如也可以是高亮滚动式,此外,将要发音的文字对应的提示方式例如还可以是跑马灯式,还可以是粗细变化提示等,本实施例对此不进行限定。In one embodiment, the prompting method corresponding to the text to be pronounced may be, for example, a cursor. In another embodiment, the prompting method corresponding to the text to be pronounced may also be, for example, a highlight scrolling type. In addition, the text to be pronounced The corresponding prompt manner may also be, for example, a marquee style, or a thickness change prompt, etc., which are not limited in this embodiment.
在一些实施例中,根据目标对象的语速文本确定将要发音的文字的提示方式,包括:In some embodiments, the prompting method of the text to be pronounced is determined according to the speech rate of the target object, including:
根据目标对象的语速文本确定将要发音的文字中每个字对应的持续时间和相邻字之间的间隔时间。The duration corresponding to each word in the text to be pronounced and the interval time between adjacent words are determined according to the speech rate text of the target object.
在一个实施例中,在对历史视频数据进行识别时可以得到目标对象对应的语速文本,语速文本中包括了每个字对应的持续时间和相邻字之间的间隔时间,从目标对象的语速文本获取将要发音的文字中每个字对应的持续时间和相邻字之间的间隔时间。In one embodiment, the speech rate text corresponding to the target object can be obtained when the historical video data is identified, and the speech rate text includes the duration corresponding to each word and the interval between adjacent words, from the target object The speech rate text obtains the duration corresponding to each word in the text to be pronounced and the interval between adjacent words.
根据将要发音的文字中每个字对应的持续时间和相邻字之间的间隔时间确定将要发音的文字的提示方式。The prompt mode of the text to be pronounced is determined according to the duration corresponding to each character in the text to be pronounced and the interval time between adjacent words.
在一个实施例中,可以根据将要发音的文字中每个字对应的持续时间和相邻字之间的间隔时间确定将要发音的文字在提词显示屏上进行提示时的提示方式。在实际操作过程中,历史音频数据中每个文字对应的持续时间可以是不相同的,则将要发音的文字中每个文字对应的提示方式也可以是不相同的。例如在提示方式为高亮滚动式的情况下,将要发音的文字中每个文字对应的高亮持续时间也不同,例如可以是需要将要发音的文字中每个文字对应的高亮持续时间为将要发音的文字中每个文字对应的持续时间,最后可以根据将要发音的文字中每个文字对应的提示方式确定将要发音的文字对应的提示方式。In one embodiment, the prompting method of the text to be pronounced on the prompt display screen can be determined according to the duration corresponding to each character in the text to be pronounced and the interval between adjacent words. In the actual operation process, the duration corresponding to each character in the historical audio data may be different, and the prompting method corresponding to each character in the characters to be pronounced may also be different. For example, in the case that the prompt mode is a highlight rolling type, the corresponding highlight duration of each character in the text to be pronounced is also different, for example, it may be that the corresponding highlight duration of each character in the text to be pronounced is to be The duration corresponding to each character in the to-be-pronounced text, and finally, the prompt mode corresponding to the to-be-pronounced text can be determined according to the prompt mode corresponding to each character in the to-be-pronounced text.
在一些实施例中,在通过提词显示屏显示需要提示的内容之后,还包括:In some embodiments, after displaying the content to be prompted through the prompt display screen, it further includes:
接收对当前显示内容的控制指令。Receive control instructions on the currently displayed content.
在本实施例中,控制指令可以是由用户发出的,对可穿戴设备的显示屏上正在显示的内容进行调整的指令。在一个实施例中,控制指令可以是对可穿戴设备的显示屏上正在显示的内容的播放速度进行调整的指令,在另一个实施例中,控制指令也可以是对可穿戴设备的显示屏上正在显示的内容的播放进度进行调整的指令。In this embodiment, the control instruction may be an instruction sent by the user to adjust the content being displayed on the display screen of the wearable device. In one embodiment, the control instruction can be an instruction to adjust the playback speed of the content being displayed on the display screen of the wearable device. An instruction to adjust the playback progress of the content being displayed.
其中,当前显示内容可以是当接收到控制指令时,目标对象佩戴的可穿戴设备的显示屏上正在显示的内容。Wherein, the currently displayed content may be the content being displayed on the display screen of the wearable device worn by the target object when the control instruction is received.
具体的,用户可以直接在可穿戴设备上发出控制指令,在一个实施例中,用户可以通过可穿戴设备上的控制按键或者可穿戴设备的显示屏上的虚拟按键。在另一个实施例中,用户也可以通过手势识别等操作发出控制指令。此外,用户还可以通过智能手机等终端设备发出控制指令,在实际操作过程中,通过智能手机等终端设备向目标对象佩戴的可穿戴设备发送控制指令时,可以使用目前广泛应用的一些互联通信技术中的一种或多种:例如wifi、蓝牙或者云端终端等,本实施例对此不进行限定。Specifically, the user can directly issue a control command on the wearable device. In one embodiment, the user can use a control button on the wearable device or a virtual button on a display screen of the wearable device. In another embodiment, the user may also issue control instructions through operations such as gesture recognition. In addition, users can also issue control commands through terminal devices such as smart phones. In the actual operation process, when sending control commands to wearable devices worn by target objects through terminal devices such as smart phones, some interconnected communication technologies widely used at present can be used. One or more of them: for example, wifi, bluetooth, or cloud terminal, etc., which are not limited in this embodiment.
根据控制指令、当前显示内容以及目标对象的语速文本对当前显示内容进行更新。The current display content is updated according to the control instruction, the current display content and the speech rate text of the target object.
具体的,在可穿戴设备接收到对当前显示内容的控制指令之后,根据控制指令、提词显示屏上正在显示的内容以及目标对象的语速文本对提词显示屏对应的当前显示内容进行更新。在一个实施例中,例如可以对播放速度进行更新,在另一个实施例中,例如可以对播放进度进行更新。Specifically, after the wearable device receives the control instruction for the currently displayed content, it updates the current displayed content corresponding to the teleprompter display according to the control instruction, the content being displayed on the teleprompter display, and the speech rate text of the target object. . In one embodiment, for example, the playback speed may be updated, and in another embodiment, for example, the playback progress may be updated.
在一个实施例中,根据控制指令、当前显示内容以及目标对象的语速文本对当前显示内容进行更新,包括:In one embodiment, the current display content is updated according to the control instruction, the current display content and the speech rate text of the target object, including:
确定控制指令为播放速度调整指令。It is determined that the control instruction is a playback speed adjustment instruction.
需要说明的是,播放速度调整指令可以是由用户发出的,对可穿戴设备的显示屏上正在显示的内容的播放速度进行调整的指令。It should be noted that the playback speed adjustment instruction may be an instruction issued by the user to adjust the playback speed of the content being displayed on the display screen of the wearable device.
具体的,检测用户发出的控制指令是否为播放速度调整指令。Specifically, it is detected whether the control instruction issued by the user is a playback speed adjustment instruction.
获取播放速度调整指令对应的速度调整参数和当前播放速度。Obtain the speed adjustment parameter corresponding to the playback speed adjustment command and the current playback speed.
在本实施例中,速度调整参数可以是调整可穿戴设备的显示屏对应的当前显示内容的播放速度的倍数,例如可以是0.5倍、0.75倍、1.5倍或2倍等。具体的,速度调整参数可以是预先设置的,用户可以选择想要调整的播放速度。In this embodiment, the speed adjustment parameter may be to adjust the multiple of the playback speed of the currently displayed content corresponding to the display screen of the wearable device, for example, it may be 0.5 times, 0.75 times, 1.5 times or 2 times. Specifically, the speed adjustment parameter may be preset, and the user may select the playback speed to be adjusted.
其中,当前播放速度可以是当接收到播放速度调整指令时,目标对象佩戴的可穿戴设备的显示屏对应的当前显示内容的播放速度。Wherein, the current playback speed may be the playback speed of the currently displayed content corresponding to the display screen of the wearable device worn by the target object when the playback speed adjustment instruction is received.
具体的,如果接收到的控制指令是播放速度调整指令,则获取播放速度调整指令对应的用户选择的速度调整参数和目标对象佩戴的可穿戴设备的显示屏对应的当前显示内容的当前播放速度。Specifically, if the received control instruction is a playback speed adjustment instruction, the user-selected speed adjustment parameter corresponding to the playback speed adjustment instruction and the current playback speed of the currently displayed content corresponding to the display screen of the wearable device worn by the target object are acquired.
根据速度调整参数和当前播放速度确定目标播放速度。Determine the target playback speed according to the speed adjustment parameter and the current playback speed.
其中,目标播放速度可以是对目标对象佩戴的可穿戴设备的显示屏上显示的内容进行调整后的播放速度。Wherein, the target playback speed may be an adjusted playback speed of the content displayed on the display screen of the wearable device worn by the target object.
具体的,可以将当前播放速度与速度调整参数的乘积的结果作为目标播放速度。示例性的,目标对象佩戴的可穿戴设备的显示屏对应的当前显示内容的当前播放速度为1倍速,播放速度调整指令对应的速度调整参数为0.5倍,那么目标播放速度可以是1×0.5=0.5倍速播放。Specifically, the result of the product of the current playback speed and the speed adjustment parameter may be used as the target playback speed. Exemplarily, the current playback speed of the currently displayed content corresponding to the display screen of the wearable device worn by the target object is 1 times the speed, and the speed adjustment parameter corresponding to the playback speed adjustment instruction is 0.5 times, then the target playback speed can be 1×0.5= Play at 0.5x speed.
根据目标播放速度、当前显示内容以及目标对象的语速文本对当前显示内容进行更新。The current display content is updated according to the target playback speed, the current display content and the speech rate text of the target object.
具体的,将目标对象佩戴的可穿戴设备的显示屏对应的当前显示内容更新为按照目标播放速度播放的将要发音的文字,同时将将要发音的文字按照将要发音的文字对应的提示方式进行提示。Specifically, update the current display content corresponding to the display screen of the wearable device worn by the target object to the text to be pronounced at the target playback speed, and at the same time, prompt the text to be pronounced according to the prompt method corresponding to the text to be pronounced.
通过设置播放速度调整指令对应的速度调整参数,用户可以根据实际情况控制提词加速或减速提示,方便准确的为演讲者或演唱者提供任意内容的提词信息。By setting the speed adjustment parameter corresponding to the playback speed adjustment command, the user can control the prompt to accelerate or decelerate the prompt according to the actual situation, so as to conveniently and accurately provide the prompt information of any content for the speaker or singer.
在另一个实施例中,根据控制指令、当前显示内容以及目标对象的语速文本对当前显示内容进行更新,包括:In another embodiment, the current display content is updated according to the control instruction, the current display content and the speech rate text of the target object, including:
确定控制指令为播放进度调整指令。It is determined that the control instruction is a playback progress adjustment instruction.
需要说明的是,播放进度调整指令可以是由用户发出的,对可穿戴设备的显示屏上正在显示的内容的播放进度进行调整的指令。示例性的,播放进度调整指令可以是播放上一句或播放下一句等,播放上一句或播放下一句等进度调整选项可以是预先设置的,用户可以选择想要调整的播放进度。It should be noted that the playback progress adjustment instruction may be an instruction issued by the user to adjust the playback progress of the content being displayed on the display screen of the wearable device. Exemplarily, the playback progress adjustment instruction may be to play the previous sentence or to play the next sentence, etc., and the progress adjustment options such as playing the previous sentence or the next sentence may be preset, and the user may select the desired playback progress.
具体的,检测用户发出的控制指令是否为播放进度调整指令。Specifically, it is detected whether the control instruction issued by the user is a playback progress adjustment instruction.
若目标指令为播放进度调整指令,则根据播放进度调整指令、当前显示内容以及目标对象的语速文本确定将要提示的文本。If the target instruction is a playback progress adjustment instruction, the text to be prompted is determined according to the playback progress adjustment instruction, the currently displayed content, and the speech rate text of the target object.
其中,目标对象的语速文本包括将要提示的文本,即将要提示的文本为目标对象的语速文本中包括的文本。Wherein, the speech rate text of the target object includes the text to be prompted, that is, the text to be prompted is the text included in the speech rate text of the target object.
需要解释的是,将要提示的文本可以是目标对象佩戴的可穿戴设备的显示屏进行更新后将要提示的文本。It should be explained that the text to be prompted may be the text to be prompted after the display screen of the wearable device worn by the target object is updated.
具体的,如果接收到的控制指令为播放进度调整指令,则根据播放进度调整指令和目标对象的语速文本将可穿戴设备的显示屏对应的当前显示内容更新为将要提示的文本。示例性的,可穿戴设备的显示屏对应的当前显示内容为“白日依山尽”,用户选择了播放下一句,此时确定的播放进度调整指令可以是“播放下一句”,则可以确定可穿戴设备的显示屏上将要提示的文本可以是“黄河入海流”。Specifically, if the received control instruction is a playback progress adjustment instruction, update the current display content corresponding to the display screen of the wearable device to the text to be prompted according to the playback progress adjustment instruction and the speech rate text of the target object. Exemplarily, the current display content corresponding to the display screen of the wearable device is "The day is beyond the mountains", and the user chooses to play the next sentence. At this time, the determined playback progress adjustment instruction can be "Play the next sentence", then it can be determined The text to be prompted on the display screen of the wearable device may be "Yellow River flows into the sea".
将目标对象佩戴的可穿戴设备的显示屏的当前显示内容更新为将要提示的文本。The current display content of the display screen of the wearable device worn by the target object is updated with the text to be prompted.
具体的,将目标对象佩戴的可穿戴设备的显示屏的当前显示内容更新为将要提示的文本。Specifically, the current display content of the display screen of the wearable device worn by the target object is updated with the text to be prompted.
通过设置播放进度调整指令,用户可以根据实际情况控制回到前一句或者跳过一句,方便准确的为演讲者或演唱者提供任意内容的提词信息。By setting the playback progress adjustment command, the user can control to return to the previous sentence or skip a sentence according to the actual situation, so as to conveniently and accurately provide the speaker or singer with prompt information of any content.
作为本申请实施例的一个示例性描述,图2是本申请实施例中的一种提示方法的示意图。如图2所示,在目标对象佩戴的可穿戴设备AR眼镜的显示屏上显示的内容为“白日依山尽,黄河入海流”,可以看出当前的演讲进度为已讲完“白日依”,下一个即将演讲的字为“山”,已演讲过的字的字体较粗,还未演讲的字的字体较细,可以准确直观的为用户进行提示。同时,在显示屏下方还设置有慢0.5倍、上一句、继续/暂停、下一句以及快0.5倍的按钮,用户可以根据实际情况控制播放速度或播放进度,以及播放或暂停。本申请实施例的技术方案,用户可以根据实际情况控制提词加速、减速提示,或者回到前一句、跳过一句等,可穿戴设备AR眼镜能够准确的为演讲者或演唱者提供任意内容的提词信息,提高了提词设备的便捷性,同时提词内容投射在AR眼镜内屏,外部无法查看,提高了使用的私密性,保护了演讲者的隐私。As an exemplary description of the embodiment of the present application, FIG. 2 is a schematic diagram of a prompting method in the embodiment of the present application. As shown in Figure 2, the content displayed on the display screen of the wearable device AR glasses worn by the target object is "The sun is at the end of the mountain, and the Yellow River flows into the sea". It can be seen that the current progress of the speech has been completed. According to", the next word to be spoken is "mountain". The font of the word that has been spoken is thicker, and the font of the word that has not been spoken is thinner, which can accurately and intuitively prompt the user. At the same time, there are also buttons for 0.5 times slower, previous sentence, continue/pause, next sentence and 0.5 times faster at the bottom of the display. Users can control the playback speed or progress, and play or pause according to the actual situation. According to the technical solution of the embodiment of this application, the user can control the prompt to accelerate and decelerate the prompt according to the actual situation, or go back to the previous sentence, skip a sentence, etc. The wearable device AR glasses can accurately provide the speaker or singer with any content. The teleprompter information improves the convenience of the teleprompter device. At the same time, the content of the teleprompter is projected on the inner screen of the AR glasses and cannot be viewed outside, which improves the privacy of use and protects the privacy of the speaker.
本申请实施例的技术方案,将事先录制或实时录制的音频数据通过分析音频中人声的幅度变化和频率变化检测出每句话乃至每个字或词的持续时间,同时通过语法和语义识别转写出对应的文本,识别完成后将得到文本和时间信息组装成lrc文件,简化了提词内容的难度,减少了人工制作的时间。用户在正式演讲或演唱时可以通过智能手机等终端设备同步lrc文件到AR眼镜上,解决了提词器笨拙不好部署的问题。AR眼镜根据lrc文件中的时间信息将文本内容同步提示在AR眼镜的显示屏上,防止提词内容的泄露和他人感知,并在展示时自动根据速度和顺序进行渐隐或高亮提示,更为方便地提示用户调整演讲或演唱的语速和语调,同时在提示过程中用户可以通过手机或AR眼镜设备控制提示速度和进度。In the technical solution of the embodiment of the present application, the audio data recorded in advance or in real time is analyzed to detect the duration of each sentence or even each word or word by analyzing the amplitude change and frequency change of the human voice in the audio, and at the same time through the grammar and semantic recognition The corresponding text is transcribed. After the recognition is completed, the text and time information will be assembled into an lrc file, which simplifies the difficulty of prompting content and reduces the time for manual production. Users can synchronize lrc files to AR glasses through smartphones and other terminal devices during formal speeches or concerts, which solves the problem of clumsy and difficult deployment of teleprompters. According to the time information in the lrc file, the AR glasses will synchronously prompt the text content on the display screen of the AR glasses to prevent the leakage of the prompt content and the perception of others, and automatically fade or highlight the reminder according to the speed and order during display, and more In order to conveniently prompt the user to adjust the speech rate and intonation of the speech or singing, at the same time, the user can control the prompt speed and progress through the mobile phone or AR glasses device during the prompt process.
实施例二Embodiment two
图3是本申请实施例中的一种提示装置的结构示意图。本实施例可适用于提示的情况,该装置可采用软件和/或硬件的方式实现,该装置可集成在任何提供提示的功能的设备中,如图3所示,所述提示装置具体包括:第一获取模块201、第一确定模块202、第二获取模块203、第二确定模块204和获取与提示模块205。Fig. 3 is a schematic structural diagram of a prompting device in an embodiment of the present application. This embodiment is applicable to the situation of prompting, and the device can be realized by software and/or hardware, and the device can be integrated in any device that provides a prompting function, as shown in Figure 3, the prompting device specifically includes: A first acquisition module 201 , a first determination module 202 , a second acquisition module 203 , a second determination module 204 and an acquisition and prompt module 205 .
其中,第一获取模块201,用于获取目标对象的历史音频数据;Wherein, the first acquisition module 201 is used to acquire the historical audio data of the target object;
第一确定模块202,用于根据所述历史音频数据确定所述目标对象的语速文本;The first determination module 202 is configured to determine the speech rate text of the target object according to the historical audio data;
第二获取模块203,用于获取所述目标对象的实时音频数据;The second obtaining module 203 is used to obtain the real-time audio data of the target object;
第二确定模块204,用于确定所述实时音频数据对应文本在所述目标对象的语速文本的位置信息;The second determination module 204 is configured to determine the position information of the text corresponding to the real-time audio data in the speech rate text of the target object;
获取与提示模块205,用于根据所述位置信息和所述目标对象的语速文本确定需要提示的内容,并通过提词显示屏显示所述需要提示的内容。The obtaining and prompting module 205 is configured to determine the content to be prompted according to the location information and the speech rate text of the target object, and display the content to be prompted through the prompt display screen.
在一些实施例中,所述获取与提示模块205包括:In some embodiments, the obtaining and prompting module 205 includes:
第一获取单元,用于获取所述历史音频数据对应的原始文本;a first acquisition unit, configured to acquire the original text corresponding to the historical audio data;
第一确定单元,用于根据所述位置信息、所述原始文本以及所述目标对象的语速文本确定需要提示的内容。The first determining unit is configured to determine the content to be prompted according to the location information, the original text, and the speech rate text of the target object.
在一些实施例中,所述第一确定模块202包括:In some embodiments, the first determination module 202 includes:
第二获取单元,用于获取所述历史音频数据的音频幅度信息和音频频率信息;a second acquiring unit, configured to acquire audio amplitude information and audio frequency information of the historical audio data;
识别单元,用于对所述音频幅度信息和音频频率信息进行识别,得到所述历史音频数据对应的文字和语速信息;An identification unit, configured to identify the audio amplitude information and audio frequency information, and obtain text and speech rate information corresponding to the historical audio data;
第二确定单元,用于根据所述历史音频数据对应的文字和语速信息确定所述目标对象的语速文本。The second determination unit is configured to determine the speech rate text of the target object according to the text and speech rate information corresponding to the historical audio data.
在一些实施例中,所述语速信息包括:每个文字对应的持续时间、每个词对应的持续时间、每句话对应的持续时间以及每句话对应的起始时间中的至少一种。In some embodiments, the speech rate information includes: at least one of the duration corresponding to each character, the duration corresponding to each word, the duration corresponding to each sentence, and the start time corresponding to each sentence .
在一些实施例中,所述第一确定单元包括:In some embodiments, the first determination unit includes:
第一子确定单元,用于根据所述位置信息和所述原始文本确定将要发音的文字;a first sub-determining unit, configured to determine a character to be pronounced according to the location information and the original text;
第二子确定单元,用于根据所述目标对象的语速文本确定所述将要发音的文字的提示方式。The second sub-determining unit is configured to determine the prompting mode of the text to be pronounced according to the speech rate text of the target object.
在一些实施例中,所述提示方式包括:用光标、高亮以及跑马灯中的至少一种来提示所述将要发音的文字。In some embodiments, the prompting method includes: using at least one of a cursor, a highlight and a marquee to prompt the text to be pronounced.
在一些实施例中,所述第二子确定单元具体用于:In some embodiments, the second sub-determining unit is specifically configured to:
根据所述目标对象的语速文本确定所述将要发音的文字中每个字对应的持续时间和相邻字之间的间隔时间;Determine the duration corresponding to each word in the text to be pronounced and the interval between adjacent words according to the speech rate text of the target object;
根据所述将要发音的文字中每个字对应的持续时间和相邻字之间的间隔时间确定所述将要发音的文字的提示方式。The prompt mode of the to-be-pronounced text is determined according to the duration corresponding to each character in the to-be-pronounced text and the interval time between adjacent words.
在一些实施例中,所述提示装置还包括:In some embodiments, the prompting device also includes:
接收模块,用于在通过提词显示屏显示所述需要提示的内容之后,接收对当前显示内容的控制指令;A receiving module, configured to receive a control instruction for the currently displayed content after displaying the content to be prompted through the teleprompter display;
更新模块,用于在通过提词显示屏显示所述需要提示的内容之后,根据所述控制指令、所述当前显示内容以及所述目标对象的语速文本对所述当前显示内容进行更新,包括:An update module, configured to update the currently displayed content according to the control instruction, the currently displayed content, and the speech rate text of the target object after displaying the content to be prompted through the teleprompter display screen, including :
确定所述控制指令为播放速度调整指令;Determining that the control instruction is a playback speed adjustment instruction;
获取所述播放速度调整指令对应的速度调整参数和当前播放速度;Obtaining the speed adjustment parameter corresponding to the playback speed adjustment instruction and the current playback speed;
根据所述速度调整参数和所述当前播放速度确定目标播放速度;determining a target playback speed according to the speed adjustment parameter and the current playback speed;
根据所述目标播放速度、所述当前显示内容以及所述目标对象的语速文本对所述当前显示内容进行更新。The currently displayed content is updated according to the target playback speed, the currently displayed content, and the speech rate text of the target object.
上述产品可执行本申请任意实施例所提供的提示方法,具备执行提示方法相应的功能模块和有益效果。The above-mentioned product can execute the prompting method provided by any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the prompting method.
实施例三Embodiment Three
图4示出了可以用来实施本申请的实施例的电子设备30的结构示意图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备(如头盔、眼镜、手表等)和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。FIG. 4 shows a schematic structural diagram of an electronic device 30 that can be used to implement the embodiments of the present application. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices (eg, helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the applications described and/or claimed herein.
在一些实施例中,电子设备是目标对象佩戴的可穿戴设备,并且提词显示屏是可穿戴设备的光机所产生的屏幕。在一个实施例中,电子设备例如可以是头戴式AR眼镜等可穿戴设备,并且提词显示屏例如可以是头戴式AR眼镜等可穿戴设备的AR光机(即一种显示与光学系统集成在一起的投影设备)所产生的屏幕。In some embodiments, the electronic device is a wearable device worn by the target object, and the teleprompter display screen is a screen generated by an optical machine of the wearable device. In one embodiment, the electronic device may be a wearable device such as head-mounted AR glasses, and the teleprompter display screen may be, for example, an AR optical machine (that is, a display and optical system) of a wearable device such as head-mounted AR glasses. The screen produced by the projection device integrated together.
如图4所示,电子设备30包括至少一个处理器31,以及与至少一个处理器31通信连接的存储器,如只读存储器(ROM)32、随机访问存储器(RAM)33等,其中,存储器存储有可被至少一个处理器执行的计算机程序,处理器31可以根据存储在只读存储器(ROM)32中的计算机程序或者从存储单元38加载到随机访问存储器(RAM)33中的计算机程序,来执行各种适当的动作和处理。在RAM 33中,还可存储电子设备30操作所需的各种程序和数据。处理器31、ROM 32以及RAM 33通过总线34彼此相连。输入/输出(I/O)接口35也连接至总线34。As shown in FIG. 4 , the electronic device 30 includes at least one processor 31, and a memory connected in communication with the at least one processor 31, such as a read-only memory (ROM) 32, a random access memory (RAM) 33, etc., wherein the memory stores There is a computer program executable by at least one processor, and the processor 31 can operate according to a computer program stored in a read-only memory (ROM) 32 or loaded from a storage unit 38 into a random access memory (RAM) 33. Various appropriate actions and processes are performed. In the RAM 33, various programs and data necessary for the operation of the electronic device 30 are also stored. The processor 31 , ROM 32 and RAM 33 are connected to each other through a bus 34 . An input/output (I/O) interface 35 is also connected to the bus 34 .
电子设备30中的多个部件连接至I/O接口35,包括:输入单元36,例如键盘、鼠标等;输出单元37,例如各种类型的显示器、扬声器等;存储单元38,例如磁盘、光盘等;以及通信单元39,例如网卡、调制解调器、无线通信收发机等。通信单元39允许电子设备30通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the electronic device 30 are connected to the I/O interface 35, including: an input unit 36, such as a keyboard, a mouse, etc.; an output unit 37, such as various types of displays, speakers, etc.; a storage unit 38, such as a magnetic disk, an optical disk etc.; and a communication unit 39, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 39 allows the electronic device 30 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
处理器31可以是各种具有处理和计算能力的通用和/或专用处理组件。处理器31的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的处理器、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。处理器31执行上文所描述的各个方法和处理,例如提示方法:Processor 31 may be various general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 31 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various processors that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The processor 31 executes the various methods and processes described above, such as the prompting method:
获取目标对象的历史音频数据;Obtain historical audio data of the target object;
根据所述历史音频数据确定所述目标对象的语速文本;determining the speech rate text of the target object according to the historical audio data;
获取所述目标对象的实时音频数据;Acquiring real-time audio data of the target object;
确定所述实时音频数据对应文本在所述目标对象的语速文本的位置信息;Determining the position information of the text corresponding to the real-time audio data in the speech rate text of the target object;
根据所述位置信息和所述目标对象的语速文本确定需要提示的内容,并通过提词显示屏显示所述需要提示的内容。The content to be prompted is determined according to the location information and the speech rate text of the target object, and the content to be prompted is displayed on a prompter display screen.
在一些实施例中,提示方法可被实现为计算机程序,其被有形地包含于计算机可读存储介质,例如存储单元38。在一些实施例中,计算机程序的部分或者全部可以经由ROM32和/或通信单元39而被载入和/或安装到电子设备30上。当计算机程序加载到RAM 33并由处理器31执行时,可以执行上文描述的提示方法的一个或多个步骤。备选地,在其他实施例中,处理器31可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行提示方法。In some embodiments, the prompting method can be implemented as a computer program, which is tangibly embodied in a computer-readable storage medium, such as the storage unit 38 . In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 30 via the ROM 32 and/or the communication unit 39 . When the computer program is loaded into the RAM 33 and executed by the processor 31, one or more steps of the prompting method described above may be performed. Alternatively, in other embodiments, the processor 31 may be configured in any other suitable manner (for example, by means of firmware) to execute the prompting method.
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
用于实施本申请的方法的计算机程序可以采用一个或多个编程语言的任何组合来编写。这些计算机程序可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器,使得计算机程序当由处理器执行时使流程图和/或框图中所规定的功能/操作被实施。计算机程序可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, so that the computer program causes the functions/operations specified in the flowcharts and/or block diagrams to be implemented when executed by the processor. A computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本申请的上下文中,计算机可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的计算机程序。计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。备选地,计算机可读存储介质可以是机器可读信号介质。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present application, a computer readable storage medium may be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus or device. A computer readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. Alternatively, a computer readable storage medium may be a machine readable signal medium. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在电子设备上实施此处描述的系统和技术,该电子设备具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给电子设备。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。In order to provide interaction with the user, the systems and techniques described herein can be implemented on an electronic device having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display)) for displaying information to the user. monitor); and a keyboard and pointing device (eg, a mouse or a trackball) through which the user can provide input to the electronic device. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、区块链网络和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.
计算系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务中,存在的管理难度大,业务扩展性弱的缺陷。A computing system can include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the problems of difficult management and weak business expansion in traditional physical hosts and VPS services. defect.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the various steps described in this application may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution of this application can be achieved, there is no limitation herein.
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。The above specific implementation methods are not intended to limit the protection scope of the present application. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310552144.7A CN116501919A (en) | 2023-05-16 | 2023-05-16 | Prompting method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310552144.7A CN116501919A (en) | 2023-05-16 | 2023-05-16 | Prompting method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116501919A true CN116501919A (en) | 2023-07-28 |
Family
ID=87326602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310552144.7A Pending CN116501919A (en) | 2023-05-16 | 2023-05-16 | Prompting method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116501919A (en) |
-
2023
- 2023-05-16 CN CN202310552144.7A patent/CN116501919A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12125487B2 (en) | Method and system for conversation transcription with metadata | |
US11727914B2 (en) | Intent recognition and emotional text-to-speech learning | |
US10930300B2 (en) | Automated transcript generation from multi-channel audio | |
US11527233B2 (en) | Method, apparatus, device and computer storage medium for generating speech packet | |
CN110634483A (en) | Man-machine interaction method and device, electronic equipment and storage medium | |
CN112445395B (en) | Music piece selection method, device, equipment and storage medium | |
JP6280312B2 (en) | Minutes recording device, minutes recording method and program | |
US11211074B2 (en) | Presentation of audio and visual content at live events based on user accessibility | |
JP7331044B2 (en) | Information processing method, device, system, electronic device, storage medium and computer program | |
CN111145777A (en) | Virtual image display method and device, electronic equipment and storage medium | |
CN119213441A (en) | Summarization with UI flow control and actionable information extraction | |
US20240022772A1 (en) | Video processing method and apparatus, medium, and program product | |
US20160027471A1 (en) | Systems and methods for creating, editing and publishing recorded videos | |
US20200135169A1 (en) | Audio playback device and audio playback method thereof | |
CN116501919A (en) | Prompting method, device, equipment and storage medium | |
JP7230085B2 (en) | Method and device, electronic device, storage medium and computer program for processing sound | |
JP5340059B2 (en) | Character information presentation control device and program | |
KR101920653B1 (en) | Method and program for edcating language by making comparison sound | |
CN112837668A (en) | Voice processing method and device for processing voice | |
CN111091807A (en) | Speech synthesis method, speech synthesis device, computer equipment and storage medium | |
US20250046279A1 (en) | Information processing device, information processing method, and program | |
KR102446300B1 (en) | Method, system, and computer readable recording medium for improving speech recognition rate for voice recording | |
JP2019179081A (en) | Conference support device, conference support control method, and program | |
JP2022163217A (en) | Content editing support method and system based on real time generation of synthetic sound for video content | |
CN114283789A (en) | Singing voice synthesis method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |