[go: up one dir, main page]

CN118573648B - Message sending method, device, equipment and storage medium - Google Patents

Message sending method, device, equipment and storage medium Download PDF

Info

Publication number
CN118573648B
CN118573648B CN202411051670.6A CN202411051670A CN118573648B CN 118573648 B CN118573648 B CN 118573648B CN 202411051670 A CN202411051670 A CN 202411051670A CN 118573648 B CN118573648 B CN 118573648B
Authority
CN
China
Prior art keywords
resource
multimedia
multimedia resource
voice
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411051670.6A
Other languages
Chinese (zh)
Other versions
CN118573648A (en
Inventor
汤宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202411051670.6A priority Critical patent/CN118573648B/en
Publication of CN118573648A publication Critical patent/CN118573648A/en
Application granted granted Critical
Publication of CN118573648B publication Critical patent/CN118573648B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本公开关于一种消息发送方法、装置、设备及存储介质,涉及多媒体技术领域。方法包括:显示第一对象与第二对象的会话界面;响应于对在会话界面输入的语音消息的资源生成操作,生成语音消息对应的多媒体资源,多媒体资源包括图片和视频中的至少一项;响应于对多媒体资源的发送操作,向第二对象发送多媒体资源。该方法丰富了会话场景中的互动形式,提高了会话消息的多样性。且该方法在会话场景中能够基于语音消息一键生成多媒体资源,还提高了生成多媒体资源的便捷性,进而提高了交互效率。并且,由于多媒体资源是基于语音消息即时生成的,使得多媒体资源的内容更具体,这样通过个性化且新颖的多媒体资源,能够提高用户在会话场景中的交互概率。

The present disclosure relates to a message sending method, device, equipment and storage medium, and relates to the field of multimedia technology. The method includes: displaying a conversation interface between a first object and a second object; in response to a resource generation operation for a voice message input in the conversation interface, generating a multimedia resource corresponding to the voice message, the multimedia resource including at least one of a picture and a video; in response to a sending operation for the multimedia resource, sending the multimedia resource to the second object. The method enriches the interactive forms in the conversation scene and improves the diversity of conversation messages. Moreover, the method can generate multimedia resources based on voice messages in a conversation scene with one click, and also improves the convenience of generating multimedia resources, thereby improving the interaction efficiency. Moreover, since the multimedia resources are generated instantly based on voice messages, the content of the multimedia resources is more specific, so that the probability of user interaction in the conversation scene can be improved through personalized and novel multimedia resources.

Description

Message sending method, device, equipment and storage medium
Technical Field
The disclosure relates to the technical field of multimedia, and in particular relates to a message sending method, a device, equipment and a storage medium.
Background
With the development of multimedia technology, more and more applications provide session functions, such as people can conduct sessions with other people through a session interface within the application. Wherein, people can send voice messages to others on the conversation interface to conduct voice conversation with others. For the user, the input voice has higher interaction efficiency and is more convenient than the input text. Thus, how to better conduct a conversation in conjunction with voice is a problem we need to solve.
Disclosure of Invention
The method enriches interaction modes in a conversation scene, improves diversity of conversation messages, improves interaction efficiency, and can improve interaction probability of users in the conversation scene through personalized and novel multimedia resources. The technical scheme of the present disclosure is as follows.
In one aspect, a method for sending a message is provided, including:
Displaying a session interface of the first object and the second object;
Responding to resource generating operation of voice message input in the session interface, generating multimedia resources corresponding to the voice message, wherein the multimedia resources comprise at least one of pictures and videos;
and transmitting the multimedia resource to the second object in response to the transmission operation of the multimedia resource.
In some embodiments, the method further comprises:
Displaying a voice input panel, wherein the voice input panel comprises a voice acquisition control and a resource generation control;
Responding to the triggering of the voice acquisition control to acquire the voice message;
The responding to the resource generating operation of the voice message input in the conversation interface generates the multimedia resource corresponding to the voice message, and the method comprises the following steps:
and responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and generating the multimedia resource corresponding to the voice message.
In some embodiments, the resource generation control comprises at least one of a picture generation control and a video generation control, and the response from the triggering operation to the voice acquisition control to the triggering operation to the resource generation control comprises at least one of the following:
responding to the drag operation from the voice acquisition control to the picture generation control, and generating a picture corresponding to the voice message;
And responding to the drag operation from the voice acquisition control to the video generation control, and generating the video corresponding to the voice message.
In some embodiments, the method further comprises:
And responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and displaying resource prompt information, wherein the resource prompt information is used for prompting the type of the multimedia resource to be generated.
In some embodiments, before the sending the multimedia asset to the second object in response to the sending operation of the multimedia asset, the method further comprises:
Displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource;
Or displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource and the voice message.
In some embodiments, the method further comprises at least one of:
Responsive to a regeneration operation of the displayed multimedia resource, regenerating the multimedia resource corresponding to the voice message, the regenerated multimedia resource being different from the multimedia resource;
Displaying a plurality of candidate multimedia assets in response to a replacement operation for the displayed multimedia assets, and replacing any candidate multimedia asset with the multimedia asset in response to a selection operation for the any candidate multimedia asset.
In some embodiments, the method further comprises:
and transmitting the voice message to the second object in response to the transmission operation of the multimedia resource.
In some embodiments, the generated multimedia resource is a plurality, and before the multimedia resource is sent to the second object in response to the sending operation of the multimedia resource, the method further includes:
Displaying a plurality of multimedia resources corresponding to the voice message;
The transmitting the multimedia resource to the second object in response to the transmitting operation of the multimedia resource includes:
And transmitting any multimedia resource in the plurality of multimedia resources to the second object in response to a transmission operation of the any multimedia resource.
In some embodiments, the method further comprises:
Responding to the triggering operation of any historical voice message on the session interface, and displaying at least one functional control of any historical voice message, wherein the at least one functional control comprises a resource generation control;
and responding to the triggering operation of the resource generation control, generating a first multimedia resource, and displaying the first multimedia resource on the session interface, wherein the first multimedia resource corresponds to any historical voice message.
In some embodiments, the method further comprises:
Displaying a resource generation control of at least one historical voice message on the session interface;
and responding to the triggering operation of a resource generation control of any one of the at least one historical voice message, generating a second multimedia resource, and displaying the second multimedia resource on the session interface, wherein the second multimedia resource corresponds to any one historical voice message.
In some embodiments, the method further comprises at least one of:
transmitting any multimedia resource displayed on the session interface to the second object in response to a transmission operation of the any multimedia resource;
And transmitting any multimedia resource to a third object in response to a forwarding operation of the any multimedia resource displayed on the session interface.
In some embodiments, the method further comprises:
when the multimedia resource comprises a face image, displaying a face replacement control in an associated area of a session message where the multimedia resource is located;
Responding to the triggering operation of the face replacement control, and displaying a plurality of candidate images;
and in response to a selection operation of any candidate image, replacing the face image in the multimedia resource with the face image in the any candidate image.
According to another aspect of the embodiments of the present disclosure, there is provided a message transmitting apparatus, the apparatus including:
A display unit configured to perform a session interface displaying the first object and the second object;
A generation unit configured to perform a resource generation operation in response to a voice message input at the session interface, the generation unit generating a multimedia resource corresponding to the voice message, the multimedia resource including at least one of a picture and a video;
and a transmission unit configured to perform transmission of the multimedia resource to the second object in response to a transmission operation of the multimedia resource.
In some embodiments, the display unit is further configured to perform:
Displaying a voice input panel, wherein the voice input panel comprises a voice acquisition control and a resource generation control;
The apparatus further includes an acquisition unit configured to perform acquiring the voice message in response to the voice acquisition control being triggered;
the generation unit is configured to perform:
and responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and generating the multimedia resource corresponding to the voice message.
In some embodiments, the resource generation control comprises at least one of a picture generation control and a video generation control, and the generation unit is configured to perform at least one of the following:
responding to the drag operation from the voice acquisition control to the picture generation control, and generating a picture corresponding to the voice message;
And responding to the drag operation from the voice acquisition control to the video generation control, and generating the video corresponding to the voice message.
In some embodiments, the display unit is further configured to perform:
And responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and displaying resource prompt information, wherein the resource prompt information is used for prompting the type of the multimedia resource to be generated.
In some embodiments, the display unit is further configured to perform:
Displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource;
Or displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource and the voice message.
In some embodiments, the apparatus further comprises at least one of:
A regeneration unit configured to perform regeneration of a multimedia resource corresponding to the voice message in response to a regeneration operation of the displayed multimedia resource, the regenerated multimedia resource being different from the multimedia resource;
A candidate display unit configured to perform a replacement operation of displaying a plurality of candidate multimedia resources in response to the replacement operation of the displayed multimedia resources, and replace any one of the candidate multimedia resources with the multimedia resource in response to the selection operation of the candidate multimedia resources.
In some embodiments, the transmitting unit is further configured to perform:
and transmitting the voice message to the second object in response to the transmission operation of the multimedia resource.
In some embodiments, the generated multimedia asset is a plurality, the display unit is further configured to perform:
Displaying a plurality of multimedia resources corresponding to the voice message;
The transmission unit is configured to perform:
And transmitting any multimedia resource in the plurality of multimedia resources to the second object in response to a transmission operation of the any multimedia resource.
In some embodiments, the display unit is further configured to perform:
Responding to the triggering operation of any historical voice message on the session interface, and displaying at least one functional control of any historical voice message, wherein the at least one functional control comprises a resource generation control;
the generating unit is further configured to perform a triggering operation for responding to the resource generating control, generate a first multimedia resource, and display the first multimedia resource on the session interface, wherein the first multimedia resource corresponds to the any historical voice message.
In some embodiments, the display unit is further configured to perform:
Displaying a resource generation control of at least one historical voice message on the session interface;
the generating unit is further configured to perform a triggering operation of a resource generating control for responding to any one of the at least one historical voice message, generate a second multimedia resource, and display the second multimedia resource on the session interface, wherein the second multimedia resource corresponds to any one historical voice message.
In some embodiments, the transmitting unit is further configured to perform at least one of:
transmitting any multimedia resource displayed on the session interface to the second object in response to a transmission operation of the any multimedia resource;
And transmitting any multimedia resource to a third object in response to a forwarding operation of the any multimedia resource displayed on the session interface.
In some embodiments, the display unit is further configured to perform:
when the multimedia resource comprises a face image, displaying a face replacement control in an associated area of a session message where the multimedia resource is located;
Responding to the triggering operation of the face replacement control, and displaying a plurality of candidate images;
and in response to a selection operation of any candidate image, replacing the face image in the multimedia resource with the face image in the any candidate image.
According to another aspect of the embodiments of the present disclosure, there is provided a terminal including:
one or more processors;
a memory for storing the processor-executable program code;
Wherein the processor is configured to execute the program code to implement the message sending method described above.
According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of a terminal, enables the terminal to perform the above-described message transmission method.
According to another aspect of the disclosed embodiments, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above-described messaging method.
The embodiment of the disclosure provides a message sending method, which can respond to a resource generating operation for a voice message input in a session interface, generate multimedia resources such as pictures, videos and the like corresponding to the voice message, and send the multimedia resources to a session object, so that interaction modes in a session scene are enriched, and the diversity of the session message is improved. In the session scene, the method can generate multimedia resources such as pictures and videos based on one key of the voice message, and further improves the convenience of generating the multimedia resources and further improves the interaction efficiency. In addition, the multimedia resource is generated based on the voice message in real time, so that the content of the multimedia resource is more specific, and the interaction probability of the user in the session scene can be improved through the personalized and novel multimedia resource.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
FIG. 1 is a schematic diagram illustrating an implementation environment according to an example embodiment.
Fig. 2 is a flow chart illustrating a method of messaging according to an exemplary embodiment.
Fig. 3 is a flow chart illustrating another messaging method according to an exemplary embodiment.
Fig. 4 is a schematic diagram of a voice input panel shown according to an exemplary embodiment.
Fig. 5 is a schematic diagram of another speech input panel shown according to an exemplary embodiment.
Fig. 6 is a schematic diagram of yet another speech input panel shown according to an exemplary embodiment.
Fig. 7 is a schematic diagram of yet another speech input panel shown according to an exemplary embodiment.
Fig. 8 is a schematic diagram of a voice input panel with a multimedia asset being video, according to an example embodiment.
FIG. 9 is a schematic diagram of a session interface shown according to an example embodiment.
FIG. 10 is a schematic diagram of another session interface shown according to an example embodiment.
FIG. 11 is a schematic diagram of yet another session interface shown according to an example embodiment.
FIG. 12 is a flow diagram illustrating an interface display according to an exemplary embodiment.
Fig. 13 is a block diagram of a message transmitting apparatus according to an exemplary embodiment.
Fig. 14 is a block diagram of a terminal according to an exemplary embodiment.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the voice message, the candidate multimedia asset, the candidate image, etc., referred to in this disclosure are all acquired with sufficient authorization.
The message sending method provided by the embodiment of the disclosure can be executed by the terminal. Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the disclosure, and referring to fig. 1, the implementation environment includes a terminal 101 and a server 102. In the embodiment of the present disclosure, a target application is installed on the terminal 101, the target application provides a session function, and a user can perform a session with another person through a session interface within the application. Session messages include, but are not limited to, messages in the form of text, voice, pictures, and video. The target application may be any application that provides a session function for shopping applications, news applications, search applications, video applications, social applications, and the like, and is not particularly limited herein. The server 102 is a background server of the target application, and may provide background services for session functions of the terminal 101.
The terminal 101 may be at least one of a smart phone, a smart watch, a desktop computer, a laptop computer, a virtual reality terminal, an augmented reality terminal, a wireless terminal, and a laptop portable computer. The terminal 101 has a communication function and can access a wired network or a wireless network. The terminal 101 may refer broadly to one of a plurality of terminals, and those skilled in the art will recognize that the number of terminals may be greater or lesser. The server 102 may be an independent physical server, a server cluster or a distributed file system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like. In some embodiments, the server 102 is directly or indirectly connected to the terminal 101 through wired or wireless communication, which is not limited by the embodiments of the present disclosure. Alternatively, the number of servers 102 may be greater or lesser, which is not limited by the disclosed embodiments. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services. Wherein the server 102 performs the primary computing job and the terminal 101 performs the secondary computing job, or the server 102 performs the secondary computing job and the terminal 101 performs the primary computing job, or the server 102 or the terminal 101 can perform the computing job alone, respectively, which is not limited in the embodiments of the present disclosure.
Fig. 2 is a flowchart illustrating a message transmission method, as shown in fig. 2, performed by a terminal, according to an exemplary embodiment, the method including the following steps.
In step S201, the terminal displays a session interface of the first object and the second object.
In the embodiment of the disclosure, the session interface is used for performing a session between a first object and a second object, where the first object is an object using a terminal. The terminal may send a session message to the second object through the session interface and receive the session message sent by the second object through the session interface. Session messages include, but are not limited to, messages in the form of text, voice, pictures, and video.
In the embodiment of the present disclosure, the session interface may be an interface within any application having a session function.
In step S202, in response to a resource generating operation for a voice message input at the session interface, the terminal generates a multimedia resource corresponding to the voice message, the multimedia resource including at least one of a picture and a video.
In the embodiment of the disclosure, the resource generating operation is used for generating the multimedia resource corresponding to the voice message. The multimedia asset includes at least one of a picture and a video. The generated multimedia resources may be one or more, and the types of the plurality of multimedia resources may be one or more.
In some embodiments, the multimedia asset corresponding to the voice message comprises an entity in the voice message. For example, there is an entity "dog" in the voice message, then the multimedia asset includes a dog. Optionally, the voice message is an instruction message for indicating what multimedia resources are generated. If the voice message indicates that a "cat" is generated, then the multimedia resource includes a cat.
The terminal generates multimedia resources corresponding to the voice message, including but not limited to the following several implementation manners.
In a first implementation, the terminal inputs the voice message into a first resource generation model, processes the voice message through the first resource generation model, and outputs the multimedia resource. The first resource generation model is for generating multimedia resources based on the voice message. Optionally, the first resource generation model is AIGC (ARTIFICIAL INTELLIGENCE GENERATED Content, generated artificial intelligence) model. In the implementation mode, multimedia resources corresponding to the voice message are directly generated through the AIGC model, convenience is improved, and accuracy of the multimedia resources is guaranteed due to the fact that the multimedia resources are generated.
In a second implementation, the terminal converts the voice message into text, inputs the text into a second resource generation model, processes the text through the second resource generation model, and outputs the multimedia resource. The second asset generation model is for generating multimedia assets based on the text. Optionally, the second resource generation model is a AIGC model. In an implementation mode, the voice message is converted into the text, and then the multimedia resource is generated based on the text, and the text facilitates model recognition, so that the multimedia resource generated based on the text is more accurate.
In a third implementation manner, the terminal extracts the keywords in the voice message, inputs the keywords into the second resource generation model, processes the keywords through the second resource generation model, and outputs the multimedia resources. Optionally, the keywords are words for representing entities, thereby facilitating the generation of multimedia resources. For example, the keyword is "house". In the implementation mode, the multimedia resource is generated based on the keywords in the voice message, so that the interference of other redundant information is avoided, and the efficiency and the accuracy of the generation of the multimedia resource are improved.
In a fourth implementation, the terminal extracts keywords in the voice message. And determining the multimedia resources corresponding to the keywords from the resource corresponding relation, and outputting the multimedia resources as the multimedia resources corresponding to the voice message. The resource corresponding relation comprises a plurality of keywords and multimedia resources corresponding to the keywords. In the implementation mode, the resource corresponding relation is established in advance, so that the multimedia resource is generated based on the resource corresponding relation, and convenience is improved.
In the embodiment of the disclosure, the terminal can generate the multimedia resource based on the voice message, and the terminal can also send the voice message to the server, and the server generates the multimedia resource based on the voice message.
In step S203, the terminal transmits the multimedia asset to the second object in response to the transmission operation of the multimedia asset.
In the embodiment of the present disclosure, the terminal may send the multimedia resource alone or may send the voice message simultaneously with the multimedia resource, which is not limited herein.
After the terminal sends the multimedia resource to the second object, the session message where the multimedia resource is located is displayed on the session interface, and the terminal can play the multimedia resource in response to the triggering operation of the multimedia resource on the session interface.
The embodiment of the disclosure provides a message sending method, which can respond to a resource generating operation for a voice message input in a session interface, generate multimedia resources such as pictures, videos and the like corresponding to the voice message, and send the multimedia resources to a session object, so that interaction modes in a session scene are enriched, and the diversity of the session message is improved. In the session scene, the method can generate multimedia resources such as pictures and videos based on one key of the voice message, and further improves the convenience of generating the multimedia resources and further improves the interaction efficiency. In addition, the multimedia resource is generated based on the voice message in real time, so that the content of the multimedia resource is more specific, and the interaction probability of the user in the session scene can be improved through the personalized and novel multimedia resource.
The above-mentioned fig. 2 is a basic flow of message transmission, and the process of message transmission is further described below based on fig. 3. Referring to fig. 3, fig. 3 is a flowchart illustrating a message transmission method performed by a terminal according to an exemplary embodiment, the method including the steps of.
In step S301, the terminal displays a session interface of the first object and the second object.
In the embodiment of the disclosure, the session interface is an interface in the target application, and the session interface provides session functions. Optionally, the target application is a short video application. Correspondingly, a conversation control is displayed on the work playing page of the second object, and the terminal displays a conversation interface of the first object and the second object in response to triggering operation of the conversation control. Or the main page of the second object is displayed with a session control, and the terminal displays a session interface with the second object in response to the triggering operation of the session control. Optionally, status information of the second object is also displayed on the session interface, which is used to indicate whether the second object is in a live state, etc.
The session interface is a session interface between a first object and a second object, where the first object is an object of the terminal. The session message may be sent to the second object through the session interface, and the session message sent by the second object may also be received through the session interface. Session messages include, but are not limited to, messages in the form of text, voice, pictures, and video. Further, a conversation message may include various forms of messages, such as may include both voice and picture messages.
In the embodiment of the present disclosure, the second object may be plural, that is, the session interface may be a session interface of a group.
In step S302, in response to a voice input operation on the session interface, the terminal displays a voice input panel, the voice input panel including a voice acquisition control and a resource generation control, and in response to the voice acquisition control being triggered, the terminal acquires a voice message.
The voice input panel is used for acquiring voice messages and generating multimedia resources. Optionally, the voice input panel is displayed on the session interface in the form of a covered popup window.
In some embodiments, a voice input control is displayed on the session interface, and a voice input operation refers to a triggering operation on the voice input control. Optionally, the triggering operation of the voice input control is a long press operation of the voice input control. Accordingly, after entering the voice input panel in response to a long press operation on the voice input control, the long press operation is converted into a long press operation on the voice acquisition control on the voice input panel. In the long press operation process, the terminal collects the voice message input by the user to obtain the voice message.
In step S303, in response to the trigger operation from the trigger operation to the voice acquisition control to the trigger operation to the resource generation control, the terminal generates a multimedia resource corresponding to the voice message.
In some embodiments, the triggering operation from the voice acquisition control to the triggering operation from the resource generation control is a dragging operation from the voice acquisition control to the resource generation control. In the triggering process of the voice acquisition control, responding to the dragging operation from the voice acquisition control to the resource generation control, and generating the multimedia resource corresponding to the voice message, namely, acquiring the voice message and generating the multimedia resource as continuous gesture operation. Taking the triggering operation of the voice acquisition control as the long-press operation of the voice acquisition control as an example, the continuous gesture operation is a continuous gesture operation of long-press dragging. Or after the triggering operation of the voice acquisition control is finished, namely after the finger leaves the screen, responding to the dragging operation from the voice acquisition control to the resource generation control, and generating the multimedia resource corresponding to the voice message by the terminal.
In other embodiments, the triggering of the resource generation control from the triggering of the voice acquisition control to the triggering of the resource generation control refers to triggering the voice acquisition control followed by triggering the resource generation control. After the triggering operation of the voice acquisition control is finished, namely after the finger leaves the screen, the terminal responds to the triggering operation of the resource generation control to generate the multimedia resource corresponding to the voice message. Optionally, the triggering operation of the voice acquisition control is a long-press operation, and the triggering operation of the resource generation control is a click operation.
In some embodiments, the resource generation control includes at least one of a picture generation control and a video generation control. The process of generating the multimedia resource corresponding to the voice message by the terminal in response to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control comprises at least one implementation mode of responding to the dragging operation from the voice acquisition control to the picture generation control, generating the picture corresponding to the voice message by the terminal, and responding to the dragging operation from the voice acquisition control to the video generation control by the terminal.
In the embodiment, the picture generation control and the video generation control are provided on the voice input panel, and the voice acquisition control is dragged to any generation control to generate the corresponding type of multimedia resources, namely, the voice input panel can generate multimedia resources, and can also generate various types of multimedia resources, so that the interaction diversity is improved, the diversity and the richness of the multimedia resources are improved, and the utilization rate of a user on the multimedia resource generation function is improved.
In some embodiments, the terminal generates the multimedia resource and simultaneously converts the voice message into text, and displays the text on the multimedia resource, thereby improving the interface display effect.
For example, the terminal generates a picture and converts the voice message into text, and displays the text on the picture. The terminal generates the video and simultaneously converts the voice message into characters, and the characters are displayed in the video. Further, the text is used as a caption in the video.
In some embodiments, the voice input panel further includes a text control, and the terminal converts the voice message into text in response to a drag operation from the voice capture control to the text control. Further, text is displayed on the voice input panel.
In some embodiments, the voice input panel further includes a close control, and in response to a drag operation from the voice capture control to the close control, the terminal cancels displaying the voice input panel and discards the captured voice message. In this embodiment, by providing a close control, the user may also discard the voice message and re-collect the voice message if the user is not satisfied with the collected voice message.
In some embodiments, in response to a trigger operation from the trigger operation to the voice acquisition control to the trigger operation to the resource generation control, the terminal displays resource prompting information, where the resource prompting information is used to prompt a type of the multimedia resource to be generated. Types include, but are not limited to, pictures, videos, text, and the like. Further, the type indicated by the resource prompt information is the same as the type corresponding to the resource generation control. For example, the resource generating control refers to a picture generating control, the type of the resource prompting information prompt is a picture, the resource generating control refers to a video generating control, and the type of the resource prompting information prompt is a video. In the embodiment, the resource prompt information is displayed to prompt the type of the multimedia resource to be generated, so that the transparent transmission amount of the information is improved, and the probability of interaction of the user can be improved based on the transparent transmission information.
For example, referring to fig. 4, fig. 4 is a schematic diagram of a voice input panel shown according to an exemplary embodiment. The voice input panel is provided with a picture generation control 401, a video generation control 402, a text generation control 403, a closing control 404 and a voice acquisition control 405. In response to a drag operation from the voice acquisition control 405 to the picture generation control 401, the terminal displays resource prompt information 406, where the resource prompt information 406 is used to prompt that a multimedia resource to be generated is a picture.
As another example, referring to fig. 5, fig. 5 is a schematic diagram of a voice input panel, according to an exemplary embodiment. In response to a drag operation from the voice acquisition control to the video generation control, the terminal displays resource prompt information 501, where the resource prompt information 501 is used to prompt that a multimedia resource to be generated is a video.
In the embodiment of the present disclosure, the process of generating the multimedia resource corresponding to the voice message in response to the resource generating operation on the voice message input at the session interface is implemented through the above step S303. In the embodiment, the voice acquisition control and the resource generation control are displayed on the voice input panel, and the multimedia resource corresponding to the acquired voice message is generated in response to the triggering operation of the voice acquisition control from the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, so that the resource generation control is directly triggered after the voice message is acquired through the voice acquisition control, the visual connection between the voice message and the multimedia resource is established, the multimedia resource corresponding to the voice message is further directly generated, and the interaction convenience is improved.
In some embodiments, the terminal also displays the generation process of the multimedia resource on the voice input panel. For example, in the generation process of the multimedia resource, a process in which the transparency of the multimedia resource is gradually reduced is displayed. Or in the process of generating the multimedia resource, displaying a generation progress bar of the multimedia resource, wherein the generation progress bar is used for indicating the generation progress of the multimedia resource. In this embodiment, the display effect of the multimedia resource is improved by displaying the transparency change process of the multimedia resource. Through displaying the generation progress bar, the transparent transmission quantity of the information is improved, and the user can know the generation progress conveniently.
Alternatively, in the case where the multimedia resource is a picture, a process in which the transparency of the picture is gradually decreased is displayed. In the case that the multimedia resource is a video, a progress bar for generating the video is displayed, and further, a transparency of a cover of the video is displayed to be gradually reduced.
Optionally, the voice input panel further displays a first prompt message for prompting that the multimedia resource is being generated.
In some embodiments, in the process of generating the multimedia resource, that is, after the end of obtaining the voice message, the terminal further displays a voice playing control on the voice input panel, where the voice playing control is used for playing the input voice message. Therefore, the acquired voice message can be played at any time without waiting for the generation of the multimedia resource to allow the voice message to be played, and the interaction convenience is improved.
In some embodiments, during the process of generating the multimedia resource, that is, after the end of obtaining the voice message, a voice sending control is further displayed on the voice input panel, and is used for sending the voice message after being triggered. Therefore, the voice message can be sent independently without waiting for the generation of the multimedia resource to allow the voice message to be sent, and the interaction convenience is improved.
In some embodiments, in the process of generating the multimedia resource, a closing control is further displayed on the voice input panel, and the closing control is used for canceling the generation of the multimedia resource and discarding the acquired voice message after being triggered.
In some embodiments, in the process of generating the multimedia resource, a resource sending control is further displayed on the voice input panel, and the resource sending control is in a non-interactable state and is used for indicating that the multimedia resource is being generated. After the multimedia asset is generated, the asset transmission control is in an interactable state for indicating that the multimedia asset has been generated and is transmittable. And responding to the triggering operation of the resource sending control, and sending the multimedia resource by the terminal.
In some embodiments, after the multimedia resource is generated, the terminal further displays a second prompt message on the voice input panel for prompting that the multimedia resource has been successfully generated.
For example, referring to fig. 6, fig. 6 is a schematic diagram of a voice input panel shown according to an exemplary embodiment. In the process of generating the multimedia resource, the voice input panel displays the first prompt information and the multimedia resource with gradually smaller transparency, and also displays a voice playing control 601, a voice sending control 602, a closing control 603 and a resource sending control 604.
As another example, referring to fig. 7, fig. 7 is a schematic diagram of a voice input panel, according to an example embodiment. After the multimedia resource is generated, the terminal displays the multimedia resource and the second prompt information on the voice input panel, and the state of the resource sending control 701 is changed and is in an interactable state.
In step S304, the terminal displays the multimedia asset on the voice input panel, and plays the multimedia asset in response to the triggering operation on the multimedia asset.
In the embodiment of the disclosure, the terminal displays the generated multimedia resources on the voice input panel, so that a user can preview the generated multimedia resources, and the multimedia resources are transmitted only when the user is satisfied with the generated multimedia resources, thereby improving the integrity and convenience of interaction.
It should be noted that, in the case that the multimedia resource is a picture, the picture is directly displayed on the voice input panel, and is not required to be played after triggering. And in the case that the multimedia resource is a video, responding to the triggering operation of the video, and playing the video by the terminal. Optionally, in the case that the multimedia resource is a video, a playing progress bar of the video is further displayed on the voice input panel, so as to indicate the playing progress of the video.
For example, referring to fig. 8, fig. 8 is a schematic diagram illustrating a voice input panel in which a multimedia asset is video according to an exemplary embodiment. Taking a multimedia resource as an example of video, a play control 801 is displayed on the multimedia resource, and the terminal plays the video in response to a trigger operation on the play control 801.
In other embodiments, if the multimedia resource is a video, the terminal directly plays the video on the voice input panel after generating the multimedia resource, without triggering by the user, thereby further improving the interaction efficiency.
In other embodiments, in response to a triggering operation on the multimedia resource, the terminal plays the multimedia resource and the voice message, so that not only can the generated multimedia resource be previewed, but also the input voice message can be listened to, and the display of the two information is realized through one-key operation, so that the interaction efficiency is improved.
In some embodiments, at least one implementation is further included in which the terminal regenerates the multimedia asset corresponding to the voice message in response to the regenerating operation of the displayed multimedia asset, the regenerated multimedia asset being different from the multimedia asset, the terminal displays a plurality of candidate multimedia assets in response to the replacing operation of the displayed multimedia asset, and replaces the multimedia asset with any one of the candidate multimedia assets in response to the selecting operation of the candidate multimedia asset.
In the embodiment, the multimedia resource can be regenerated in response to the regeneration operation of the multimedia resource, so that the multimedia resource can be regenerated under the condition that the user is not satisfied with the generated multimedia resource, the diversity of functions and the convenience of interaction are improved, the regeneration by one key is realized, and the interaction efficiency is also improved. And in response to the replacement operation of the multimedia resource, the multimedia resource can be replaced by other multimedia resources selected by the user, so that the diversity and convenience of interaction are further improved, and the utilization rate of the function by the user is further improved.
Optionally, the regenerating operation refers to a triggering operation of the regenerating control on the resource generating panel, or the regenerating operation refers to a triggering operation of the multimedia resource, where the triggering operation may be a long-press operation or a double-click operation, for example.
The plurality of candidate multimedia resources comprise at least one of a plurality of multimedia resources stored by the terminal as the plurality of candidate multimedia resources or a plurality of multimedia resources with the historical transmission times exceeding the first times as the user. Or the plurality of candidate multimedia resources are multimedia resources associated with the voice message among the plurality of multimedia resources stored by the terminal. For example, the candidate multimedia asset comprises an entity indicated by the voice message.
It should be noted that, the acquisition of the multimedia resource stored in the terminal is authorized by the user. And displaying an authorization interface on the terminal, wherein the authorization interface displays prompt information, consent controls and disagreement controls. The prompt information is used for prompting to acquire the multimedia resources stored by the terminal. The consent control is used for indicating that the user agrees to the terminal to acquire the multimedia resources stored by the terminal. And responding to the triggering operation of the consent control, and acquiring the multimedia resource stored by the terminal.
In step S305, the terminal transmits the multimedia asset to the second object in response to the transmission operation of the multimedia asset.
Optionally, in response to the sending operation of the multimedia resource, the terminal also sends a voice message to the second object. In this embodiment, the multimedia resource is sent while the voice message is also sent, that is, the multimedia resource and the voice message can be sent out simultaneously through one-key operation, so that the interaction efficiency is improved.
In some embodiments, the sending operation may be a triggering operation on the multimedia resource, which may be a click operation or a long press operation, etc.
In some embodiments, a resource transmission control is displayed on the voice input panel, and the terminal transmits the multimedia resource to the second object in response to a triggering operation of the resource transmission control. Optionally, a prompt message is displayed on the resource sending control, and the prompt message is used for prompting the resource sending control to send the multimedia resource. Further, in response to the triggering operation of the resource sending control, the terminal also sends a voice message to the second object. Accordingly, the transmitted multimedia resource and voice message are displayed in one session message of the session interface, and the voice message may be displayed above or below the multimedia resource, which is not particularly limited herein.
The sent multimedia resource is displayed in a session interface, and further, the session interface also displays prompt information of the multimedia resource, which is used for prompting that the multimedia resource is generated based on voice information.
In some embodiments, a voice transmission control is further displayed on the voice input panel, and the terminal transmits a voice message to the second object in response to a trigger operation of the voice transmission control. Optionally, a prompt message is displayed on the voice sending control, and the prompt message is used for prompting the voice sending control to send a voice message.
In some embodiments, the generated multimedia resources are a plurality, and before the multimedia resources are sent to the second object in response to the sending operation of the multimedia resources, the method further comprises the step that the terminal displays the plurality of multimedia resources corresponding to the voice message, and the process of sending the multimedia resources to the second object in response to the sending operation of the multimedia resources comprises the step that the terminal sends any multimedia resource to the second object in response to the sending operation of any multimedia resource in the plurality of multimedia resources.
Wherein the types of the plurality of multimedia resources may be the same or different. If the resource generation control comprises a picture generation control and a video generation control, the types of the multimedia resources are pictures under the condition that the multimedia resources are generated based on the picture generation control, and the types of the multimedia resources are videos under the condition that the multimedia resources are generated based on the video generation control. If the resource generation control is one, the multimedia resources are randomly generated, and the types of the multiple multimedia resources can be the same or different.
Wherein the terminal displays a plurality of multimedia resources on the voice input panel. The sending operation to any multimedia resource may be a triggering operation to the multimedia resource.
In the embodiment, a plurality of multimedia resources corresponding to the voice message can be generated and displayed at one time, so that abundant and various multimedia resources are provided for users, the selection surface of the users is improved, the users can freely select the multimedia resources to be transmitted, the diversity and convenience of interaction are further improved, the probability that the generated multimedia resources meet the user preference is improved, and the probability that the users transmit the message based on the multimedia resources is further improved.
In the embodiments of the present disclosure, a session interface is described as an example of a session interface with a single object, and in other embodiments, the session interface is a group session interface, that is, in response to a sending operation of a multimedia resource, the terminal sends the multimedia resource to the group session, and displays the multimedia resource on the group session interface.
In step S306, when the multimedia asset includes a face image, the terminal displays a face replacement control in an associated area of a session message in which the multimedia asset is located.
In the embodiment of the disclosure, the association area of the session message, that is, the adjacent area of the session message, the distance between the association area and the session message is within a preset range. The face replacement control is used to replace face images in the multimedia asset.
Optionally, the terminal detects the face image of the multimedia resource, and in the case that the face image is detected, the terminal displays a face replacement control. Or the terminal sends the multimedia resource to the server, and the server detects the face image. Or the terminal generates the multimedia resource through the server, the server directly detects the face image of the multimedia resource after generating the multimedia resource, the multimedia resource and the detection result are sent to the terminal together, and the terminal displays the face replacement control when the detection result indicates that the multimedia resource comprises the face image.
In step S307, the terminal displays a plurality of candidate images in response to the triggering operation of the face replacement control, and replaces the face image in the multimedia resource with the face image in any candidate image in response to the selecting operation of any candidate image.
Optionally, the plurality of candidate images includes at least one of a plurality of images including a face image stored on the terminal. Or the plurality of candidate images are a plurality of images stored on the terminal, wherein the current time of the stored time interval does not exceed the preset time length. Or the plurality of candidate images are a plurality of images whose user history has been transmitted more than the second number of times. Or the plurality of candidate images are a plurality of images that the user history has been transmitted more than the second number of times and that include the face image.
In the embodiment of the disclosure, when the generated multimedia resource comprises the face image, a face replacement control is further provided on the session interface to replace the face image in the multimedia resource, and the face image in the multimedia resource can be replaced by the face image in any candidate image based on the selection of the user, so that the diversity of interaction is further improved, and the probability of interaction of the user in the session interface is improved.
In some embodiments, the terminal is also capable of generating multimedia assets for historical voice messages. The terminal displays at least one function control of any historical voice message on a session interface in response to triggering operation of the any historical voice message, wherein the at least one function control comprises a resource generation control, generates a first multimedia resource in response to triggering operation of the resource generation control, and displays the first multimedia resource on the session interface, and the first multimedia resource corresponds to the any historical voice message. In this embodiment, for the historical voice message in the session interface, a resource generation control is further provided in the function control, so that a multimedia resource can be further generated for the historical voice message, a generation entry of the multimedia resource is increased, the diversity of interaction is further improved, and the use rate of the function by the user can be further improved.
The triggering operation on the historical voice message can be long-press operation, double-click operation and the like.
The resource generation control is used for generating at least one of a picture and a video. Accordingly, the process of generating the multimedia resource through the resource generation control includes the following three cases.
In the first case, the resource generation control is used only to generate pictures or only to generate videos. Or the resource generation control is used to randomly generate a picture or video. The terminal generates the picture corresponding to the historical voice message in response to the triggering operation of the resource generation control when the resource generation control is only used for generating the picture, and generates the video corresponding to the historical voice message in response to the triggering operation of the resource generation control when the resource generation control is only used for generating the video. And under the condition that the resource generation control is used for randomly generating pictures or videos, responding to the triggering operation of the resource generation control, and generating the pictures or videos corresponding to the historical voice messages by the terminal.
In the second case, the resource generation control comprises a picture generation control and a video generation control, namely, in response to the triggering operation on the historical voice message, the terminal directly displays the picture generation control and the video generation control. And responding to the triggering operation of the picture generation control, and generating a picture corresponding to the historical voice message by the terminal. And responding to the triggering operation of the video generation control, and generating the video corresponding to the historical voice message by the terminal.
In the third case, the picture generation control and the video generation control are lower-level controls of the resource generation control, namely, the terminal displays the resource generation control in response to the triggering operation of the historical voice message, and displays the picture generation control and the video generation control in response to the triggering operation of the resource generation control.
For example, referring to fig. 9, fig. 9 is a schematic diagram of a session interface shown according to an example embodiment. In response to triggering operation on the historical voice message, the terminal displays a plurality of functional controls, wherein the plurality of functional controls comprise a resource generation control 901, and the resource generation control 901 is used for generating pictures.
In some embodiments, the terminal displays a resource generation control of at least one historical voice message on the session interface, and generates a second multimedia resource in response to a triggering operation of the resource generation control of any one of the at least one historical voice message, the second multimedia resource being displayed on the session interface, the second multimedia resource corresponding to any one of the historical voice messages.
In the embodiment, for the historical voice message on the session interface, the resource generation control is directly displayed on the session interface, so that the corresponding multimedia resource can be generated by one key, and the interaction efficiency is improved.
In some embodiments, the at least one historical voice message is a most recent historical voice message, the most recent historical voice message referring to a voice message that was most recent in time from the current time of transmission. Because the latest historical voice message has the highest time novelty, namely the probability of interaction through the voice message is the highest, the probability of the user desiring to generate the multimedia resource is the highest, the resource generation control is displayed, the convenience of user interaction is improved, and the probability of interaction of the user can be improved.
The process of generating the multimedia resource of the last historical voice message based on the resource generation control is the same as the process of generating the multimedia resource of any historical voice message based on the resource generation control, and is not described herein.
Optionally, the resource generating control is displayed with prompt information for prompting generation of the multimedia resource based on the voice message. The prompt message includes text, such as "phonetic drawing". Further, the prompt information also comprises an icon, and the form of the icon can be set according to the requirement.
For any historical voice message, the process of generating the corresponding multimedia resource by the terminal is the same as that of step S202, and will not be described again here.
For example, referring to fig. 10, fig. 10 is a schematic diagram of a session interface shown according to an example embodiment. Wherein, the conversation interface has a resource generation control 1001 of the last historical voice message.
In other embodiments, for a multimedia asset generated based on a historical voice message, if the multimedia asset includes a facial image, a facial replacement control is displayed in an associated area of a conversation message in which the multimedia asset is located, a plurality of candidate images are displayed in response to a triggering operation of the facial replacement control, and the facial image in the multimedia asset is replaced with the facial image in any candidate image in response to a selection operation of any candidate image.
Optionally, after replacing the face image in the multimedia resource, a face replacement control remains displayed on the session interface so that the user can replace again if he is not satisfied with the replaced face.
For example, referring to fig. 11, fig. 11 is a schematic diagram of a session interface, shown according to an example embodiment. And responding to the triggering operation of the resource generating control of any voice message, displaying a buffer interface for generating the multimedia resource on the session interface, and displaying the generated multimedia resource after generating the multimedia resource. Further, if the multimedia asset includes a facial image, a facial replacement control 1101 is also displayed in an associated area of the conversation message in which the multimedia asset is located.
For example, referring to fig. 12, fig. 12 is a flow diagram illustrating an interface display according to an exemplary embodiment. The terminal displays a plurality of candidate images in response to triggering operation of the face replacement control, and replaces the face image in the multimedia resource with the face image in the candidate images in response to selection operation of any one of the candidate images.
In the embodiment of the disclosure, for any multimedia resource displayed on the session interface, the terminal can also send it to the second object or forward it to other objects. Accordingly, in response to a transmission operation of any multimedia resource displayed on the session interface, the terminal transmits the any multimedia resource to the second object. And responding to the forwarding operation of any multimedia resource displayed on the session interface, and sending the any multimedia resource to the third object by the terminal. In the embodiment, multimedia resources can be generated on the session interface based on the historical voice message, and the multimedia resources can be sent or forwarded to other objects, so that convenience in message sending is improved, and the diversity of interaction functions on the session interface is further improved.
The multimedia resources displayed on the session interface comprise multimedia resources generated based on historical voice messages, multimedia resources sent by the second object, multimedia resources sent by the terminal and the like.
The method provided by the embodiment of the disclosure reduces the operation difficulty of AIGC functions and can promote the quantity of the user created contents. In addition, the interest of sending the voice message in the private message scene is expanded, so that the user prefers to send the voice message, and the sending quantity of the voice message is improved. In addition, the user can share corresponding multimedia resources in real time when sending the voice message, so that the interaction path of the user for sharing AIGC content is shortened, the interaction efficiency is further improved, and the signaling quantity of private messages is also improved.
The method provided by the embodiment of the disclosure can help a user to input the instruction more conveniently so as to generate the picture or the video. And the ability of AIGC is embedded in the private letter scene, so that the sharing path of the user is shortened. And the interest of message sending is improved, when the generated multimedia resources comprise facial images, users can quickly replace facial photos uploaded at the cost, the users can play the multimedia resources when chatting with conversation objects, and the diversity of functions is improved.
The method provided by the embodiment of the application reduces the input cost of the user, excites the creativity of the user, enables the user to share and communicate with the conversation object more conveniently, and improves the interaction efficiency and convenience during conversation.
The embodiment of the disclosure provides a message sending method, which can respond to a resource generating operation for a voice message input in a session interface to generate a multimedia resource corresponding to the voice message, and can send the multimedia resource to a session object, so that interaction modes in a session scene are enriched, and the diversity of the session message is improved. In the session scene, the method can generate the multimedia resources based on the voice message by one key, and further improves the convenience of generating the multimedia resources, thereby improving the interaction efficiency. In addition, the multimedia resource is generated based on the voice message in real time, so that the content of the multimedia resource is more specific, and the interaction probability in the user session scene can be improved through the personalized and novel multimedia resource.
Fig. 13 is a block diagram of a message transmitting apparatus according to an exemplary embodiment. Referring to fig. 13, the apparatus includes:
a display unit 1301 configured to perform a session interface that displays the first object and the second object;
a generating unit 1302 configured to perform a resource generating operation in response to a voice message input at a session interface, generating a multimedia resource corresponding to the voice message, the multimedia resource including at least one of a picture and a video;
The transmitting unit 1303 is configured to perform transmission of the multimedia resource to the second object in response to the transmission operation of the multimedia resource.
In some embodiments, the display unit 1301 is further configured to perform:
Displaying a voice input panel, wherein the voice input panel comprises a voice acquisition control and a resource generation control;
The apparatus further includes an acquisition unit configured to perform acquiring a voice message in response to the voice acquisition control being triggered;
a generating unit 1302 configured to perform:
And responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and generating the multimedia resource corresponding to the voice message.
In some embodiments, the resource generation control comprises at least one of a picture generation control and a video generation control, the generation unit 1302 configured to perform at least one of:
responding to the drag operation from the voice acquisition control to the picture generation control, and generating a picture corresponding to the voice message;
and responding to the drag operation from the voice acquisition control to the video generation control, and generating the video corresponding to the voice message.
In some embodiments, the display unit 1301 is further configured to perform:
And responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and displaying resource prompt information, wherein the resource prompt information is used for prompting the type of the multimedia resource to be generated.
In some embodiments, the display unit 1301 is further configured to perform:
displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource;
Or displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource and the voice message.
In some embodiments, the apparatus further comprises at least one of:
A regeneration unit configured to perform regeneration of a multimedia resource corresponding to the voice message in response to a regeneration operation of the displayed multimedia resource, the regenerated multimedia resource being different from the multimedia resource;
And a candidate display unit configured to perform a replacement operation of the displayed multimedia assets in response to the replacement operation of the displayed multimedia assets, and replace the multimedia assets with any one of the candidate multimedia assets in response to the selection operation of any one of the candidate multimedia assets.
In some embodiments, the transmitting unit 1303 is further configured to perform:
in response to the sending operation of the multimedia resource, a voice message is sent to the second object.
In some embodiments, the generated multimedia assets are multiple, and the display unit 1301 is further configured to perform:
displaying a plurality of multimedia resources corresponding to the voice message;
A transmitting unit 1303 configured to perform:
any of the plurality of multimedia assets is transmitted to the second object in response to a transmission operation of any of the plurality of multimedia assets.
In some embodiments, the display unit 1301 is further configured to perform:
Responding to the triggering operation of any historical voice message on the session interface, and displaying at least one functional control of any historical voice message, wherein the at least one functional control comprises a resource generation control;
The generating unit 1302 is further configured to perform generating a first multimedia resource in response to a triggering operation of the resource generating control, and displaying the first multimedia resource on the session interface, where the first multimedia resource corresponds to any one of the historical voice messages.
In some embodiments, the display unit 1301 is further configured to perform:
displaying at least one resource generation control of the historical voice message on the session interface;
The generating unit 1302 is further configured to perform a triggering operation of a resource generating control in response to any one of the at least one historical voice message, generate a second multimedia resource, and display the second multimedia resource on the session interface, where the second multimedia resource corresponds to any one of the historical voice messages.
In some embodiments, the transmitting unit 1303 is further configured to perform at least one of:
transmitting any multimedia resource to the second object in response to a transmission operation of any multimedia resource displayed on the session interface;
and transmitting any multimedia resource to the third object in response to the forwarding operation of any multimedia resource displayed on the session interface.
In some embodiments, the display unit 1301 is further configured to perform:
when the multimedia resource comprises a face image, displaying a face replacement control in an associated area of a session message where the multimedia resource is located;
Responding to the triggering operation of the face replacement control, and displaying a plurality of candidate images;
and in response to the selection operation of any candidate image, replacing the face image in the multimedia resource with the face image in any candidate image.
The embodiment of the disclosure provides a message sending device, which can respond to a resource generating operation for a voice message input in a session interface, generate a multimedia resource corresponding to the voice message, and send the multimedia resource to a session object, so that interaction modes in a session scene are enriched, and the diversity of the session message is improved. In addition, the device can generate the multimedia resource based on the voice message by one key in the session scene, thereby improving the convenience of generating the multimedia resource and further improving the interaction efficiency. In addition, the multimedia resource is generated based on the voice message in real time, so that the content of the multimedia resource is more specific, and the interaction probability in the user session scene can be improved through the personalized and novel multimedia resource.
The specific manner in which the individual units perform the operations in relation to the apparatus of the above embodiments has been described in detail in relation to the embodiments of the method and will not be described in detail here.
Fig. 14 shows a block diagram of a terminal 1400 provided by an exemplary embodiment of the present disclosure. The terminal 1400 may be a smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, MPEG audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, MPEG audio layer 4) player, notebook computer, or desktop computer. Terminal 1400 may also be referred to as a user device, a portable terminal, a laptop terminal, a desktop terminal, and the like.
In general, terminal 1400 includes a processor 1401 and memory 1402.
Processor 1401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1401 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 1401 may also include a main processor for processing data in the awake state, which is also called a CPU (Central Processing Unit ), and a coprocessor for processing data in the standby state, which is a low-power-consumption processor. In some embodiments, the processor 1401 may be integrated with a GPU (Graphics Processing Unit, image processor) that is responsible for rendering and rendering of the content that the display screen is required to display. In some embodiments, the processor 1401 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 1402 may include one or more computer-readable storage media, which may be non-transitory. Memory 1402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1402 is used to store at least one program code for execution by processor 1401 to implement the messaging method provided by the method embodiments in the present disclosure.
In some embodiments, terminal 1400 can optionally include a peripheral interface 1403 and at least one peripheral. The processor 1401, memory 1402, and peripheral interface 1403 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1403 via buses, signal lines or a circuit board. Specifically, the peripheral devices include at least one of radio frequency circuitry 1404, a display screen 1405, a camera assembly 1406, audio circuitry 1407, and a power source 1408.
Peripheral interface 1403 may be used to connect at least one Input/Output (I/O) related peripheral to processor 1401 and memory 1402. In some embodiments, processor 1401, memory 1402, and peripheral interface 1403 are integrated on the same chip or circuit board, and in some other embodiments, either or both of processor 1401, memory 1402, and peripheral interface 1403 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
The Radio Frequency circuit 1404 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1404 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1404 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1404 includes an antenna system, an RF transceiver, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 1404 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to, metropolitan area networks, generation-by-generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 1404 may also include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present disclosure.
The display screen 1405 is used to display UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1405 is a touch display screen, the display screen 1405 also has the ability to collect touch signals at or above the surface of the display screen 1405. The touch signal may be input to the processor 1401 as a control signal for processing. At this time, the display 1405 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1405 may be one, the front panel of the terminal 1400 is provided, in other embodiments, the display 1405 may be at least two, provided on different surfaces of the terminal 1400 or in a folded design, respectively, and in still other embodiments, the display 1405 may be a flexible display, provided on a curved surface or a folded surface of the terminal 1400. Even more, the display 1405 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 1405 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode), or other materials.
The camera component 1406 is used to capture images or video. Optionally, camera assembly 1406 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1406 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuitry 1407 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1401 for processing, or inputting the electric signals to the radio frequency circuit 1404 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal 1400, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1401 or the radio frequency circuit 1404 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuitry 1407 may also include a headphone jack.
A power supply 1408 is used to provide power to various components in terminal 1400. The power supply 1408 may be alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 1408 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the structure shown in fig. 14 is not limiting and that terminal 1400 may include more or less components than those illustrated, or may combine certain components, or employ a different arrangement of components.
In an exemplary embodiment, a computer readable storage medium is also provided, e.g., a memory, comprising instructions executable by a processor of a terminal to perform the above-described messaging method. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, a computer program product is also provided, comprising a computer program which, when executed by a processor, implements the above-described messaging method. In some embodiments, a computer program product according to embodiments of the present disclosure may be deployed for execution on one terminal, or on multiple terminals located at one site, or on multiple terminals distributed across multiple sites and interconnected by a communication network, which may constitute a blockchain system.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims. Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (14)

1. A method of messaging, the method comprising:
Displaying a session interface of the first object and the second object;
Displaying a voice input panel, wherein the voice input panel comprises a voice acquisition control and a resource generation control;
responding to the triggering of the voice acquisition control to acquire a voice message;
Responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and generating a multimedia resource corresponding to the voice message, wherein the multimedia resource comprises at least one of a picture and a video;
transmitting the multimedia resource to the second object in response to the transmission operation of the multimedia resource;
when the multimedia resource comprises a face image, displaying a face replacement control in an associated area of a session message where the multimedia resource is located;
Responding to the triggering operation of the face replacement control, and displaying a plurality of candidate images;
and in response to a selection operation of any candidate image, replacing the face image in the multimedia resource with the face image in the any candidate image.
2. The message sending method according to claim 1, wherein the resource generation control comprises at least one of a picture generation control and a video generation control, and the generating the multimedia resource corresponding to the voice message in response to the triggering operation of the resource generation control from the triggering operation of the voice acquisition control comprises at least one of:
responding to the drag operation from the voice acquisition control to the picture generation control, and generating a picture corresponding to the voice message;
And responding to the drag operation from the voice acquisition control to the video generation control, and generating the video corresponding to the voice message.
3. The message transmission method according to claim 1, characterized in that the method further comprises:
And responding to the triggering operation of the voice acquisition control to the triggering operation of the resource generation control, and displaying resource prompt information, wherein the resource prompt information is used for prompting the type of the multimedia resource to be generated.
4. The message transmission method according to claim 1, wherein before the transmission of the multimedia resource to the second object in response to the transmission operation of the multimedia resource, the method further comprises:
Displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource;
Or displaying the multimedia resource on the voice input panel, and responding to the triggering operation of the multimedia resource, and playing the multimedia resource and the voice message.
5. The message sending method of claim 4, further comprising at least one of:
Responsive to a regeneration operation of the displayed multimedia resource, regenerating the multimedia resource corresponding to the voice message, the regenerated multimedia resource being different from the multimedia resource;
Displaying a plurality of candidate multimedia assets in response to a replacement operation for the displayed multimedia assets, and replacing any candidate multimedia asset with the multimedia asset in response to a selection operation for the any candidate multimedia asset.
6. The message transmission method according to claim 1, characterized in that the method further comprises:
and transmitting the voice message to the second object in response to the transmission operation of the multimedia resource.
7. The message transmission method according to claim 1, wherein the generated multimedia resources are plural, and the method further comprises, before transmitting the multimedia resources to the second object in response to the transmission operation of the multimedia resources:
Displaying a plurality of multimedia resources corresponding to the voice message;
The transmitting the multimedia resource to the second object in response to the transmitting operation of the multimedia resource includes:
And transmitting any multimedia resource in the plurality of multimedia resources to the second object in response to a transmission operation of the any multimedia resource.
8. The message transmission method according to claim 1, characterized in that the method further comprises:
Responding to the triggering operation of any historical voice message on the session interface, and displaying at least one functional control of any historical voice message, wherein the at least one functional control comprises a resource generation control;
and responding to the triggering operation of the resource generation control, generating a first multimedia resource, and displaying the first multimedia resource on the session interface, wherein the first multimedia resource corresponds to any historical voice message.
9. The message transmission method according to claim 1, characterized in that the method further comprises:
Displaying a resource generation control of at least one historical voice message on the session interface;
and responding to the triggering operation of a resource generation control of any one of the at least one historical voice message, generating a second multimedia resource, and displaying the second multimedia resource on the session interface, wherein the second multimedia resource corresponds to any one historical voice message.
10. The message sending method according to claim 8 or 9, characterized in that the method further comprises at least one of the following:
transmitting any multimedia resource displayed on the session interface to the second object in response to a transmission operation of the any multimedia resource;
And transmitting any multimedia resource to a third object in response to a forwarding operation of the any multimedia resource displayed on the session interface.
11. A message transmission apparatus, the apparatus comprising:
A display unit configured to perform a session interface displaying the first object and the second object;
The display unit is further configured to execute a voice input panel, and the voice input panel comprises a voice acquisition control and a resource generation control;
An acquisition unit configured to perform acquiring a voice message in response to the voice acquisition control being triggered;
A generating unit configured to perform a generation of a multimedia resource corresponding to the voice message in response to a trigger operation from a trigger operation to the voice acquisition control to a trigger operation to the resource generation control, the multimedia resource including at least one of a picture and a video;
A transmission unit configured to perform transmission of the multimedia resource to the second object in response to a transmission operation of the multimedia resource;
The display unit is further configured to perform:
when the multimedia resource comprises a face image, displaying a face replacement control in an associated area of a session message where the multimedia resource is located;
Responding to the triggering operation of the face replacement control, and displaying a plurality of candidate images;
and in response to a selection operation of any candidate image, replacing the face image in the multimedia resource with the face image in the any candidate image.
12. A terminal, comprising:
A processor;
A memory for storing the processor-executable instructions;
Wherein the processor is configured to execute the instructions to implement the messaging method of any of claims 1 to 10.
13. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of a terminal, enable the terminal to perform the messaging method of any of claims 1 to 10.
14. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the messaging method of any of claims 1 to 10.
CN202411051670.6A 2024-08-01 2024-08-01 Message sending method, device, equipment and storage medium Active CN118573648B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411051670.6A CN118573648B (en) 2024-08-01 2024-08-01 Message sending method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411051670.6A CN118573648B (en) 2024-08-01 2024-08-01 Message sending method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN118573648A CN118573648A (en) 2024-08-30
CN118573648B true CN118573648B (en) 2025-01-21

Family

ID=92463943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411051670.6A Active CN118573648B (en) 2024-08-01 2024-08-01 Message sending method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118573648B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111835621A (en) * 2020-07-10 2020-10-27 腾讯科技(深圳)有限公司 Session message processing method and device, computer equipment and readable storage medium
CN112711366A (en) * 2020-12-23 2021-04-27 维沃移动通信(杭州)有限公司 Image generation method and device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220130860A (en) * 2021-03-19 2022-09-27 주식회사 웨인힐스브라이언트에이아이 Operation method of a service providing device that converts voice information into multimedia video content
CN117834576A (en) * 2024-01-05 2024-04-05 北京字跳网络技术有限公司 Expression interaction method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111835621A (en) * 2020-07-10 2020-10-27 腾讯科技(深圳)有限公司 Session message processing method and device, computer equipment and readable storage medium
CN112711366A (en) * 2020-12-23 2021-04-27 维沃移动通信(杭州)有限公司 Image generation method and device and electronic equipment

Also Published As

Publication number Publication date
CN118573648A (en) 2024-08-30

Similar Documents

Publication Publication Date Title
CN111078655B (en) Document content sharing method, device, terminal and storage medium
CN113709022B (en) Message interaction method, device, equipment and storage medium
CN107888965A (en) Image present methods of exhibiting and device, terminal, system, storage medium
CN118264846B (en) Information display method, device, electronic device and storage medium
CN113518143A (en) Interface input source switching method, device and electronic device
CN118573648B (en) Message sending method, device, equipment and storage medium
CN116962338A (en) Method and device for interaction between objects, electronic equipment and storage medium
CN116774897A (en) Work distribution method, device, equipment and storage medium
CN113542257B (en) Video processing method, video processing device, electronic apparatus, and storage medium
CN118502626B (en) Work recommendation method, device, electronic device and storage medium
CN117812352B (en) Object interaction method, device, electronic equipment and medium
CN118820494B (en) Resource recommendation method, device, equipment and storage medium
CN118860213B (en) Interactive method based on work, setting method, device and terminal for blocking image
CN118972668B (en) Resource adding method, resource publishing method, device, equipment and medium
CN115348240B (en) Voice call method, device, electronic equipment and storage medium for sharing document
CN120151313A (en) Information transmission method, device, terminal and storage medium
CN119676204A (en) Message sending method, device, equipment and storage medium
CN119676469A (en) Resource sending method, device, equipment and storage medium
CN118939160A (en) Multimedia resource display method, device, electronic device and storage medium
CN119030948A (en) Content sharing method, device, electronic device and storage medium
CN118524077A (en) Session method, device, equipment and storage medium
CN117896568A (en) Comment display method and device, electronic equipment and storage medium
CN119561941A (en) Resource sharing method, device, electronic device and storage medium
CN118093068A (en) Multimedia resource sharing method, device and equipment
CN118972658A (en) Bullet screen display method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant