[go: up one dir, main page]

CN119629162A - A text display method, system and related device - Google Patents

A text display method, system and related device Download PDF

Info

Publication number
CN119629162A
CN119629162A CN202311190565.6A CN202311190565A CN119629162A CN 119629162 A CN119629162 A CN 119629162A CN 202311190565 A CN202311190565 A CN 202311190565A CN 119629162 A CN119629162 A CN 119629162A
Authority
CN
China
Prior art keywords
text
text segment
cloud server
electronic device
target text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311190565.6A
Other languages
Chinese (zh)
Inventor
苏庆
张淑庆
夏捷
谢光剑
宋凯凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202311190565.6A priority Critical patent/CN119629162A/en
Priority to PCT/CN2024/118584 priority patent/WO2025055994A1/en
Publication of CN119629162A publication Critical patent/CN119629162A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Information Transfer Between Computers (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a text display method, a text display system and a related device. The electronic device sends a first message to the cloud server over the long link. And the cloud server generates target text corresponding to the first message through the language model, wherein the target text comprises a plurality of language symbols. In the process of generating the target text, the cloud server determines the display time consumption of the single text based on the generation time of the single language symbol, and sends the generated language symbol and the display time consumption of the single text to the electronic equipment according to the generation sequence. The electronic device can display the target text based on the display time of the single text (i.e., display one text per time interval corresponding to the display time of the single text). Thus, the waiting time of the user can be reduced, the fluency of text output can be ensured, output jamming is avoided, and better use experience is provided for the user.

Description

Text display method, text display system and related device
Technical Field
The present application relates to the field of machine learning, and in particular, to a text display method, system and related device.
Background
With the continuous development of machine learning related technologies, algorithms related to natural language processing are becoming mature. The electronic device can generate target text based on the user-entered speech or primary text through the language model. After the entire content of the target text is obtained, the electronic device may display the target text on a display screen.
However, the language model has an excessively large number of parameters and calculation amount, the time required to generate the target text is long, and the longer the text length is, the longer the time is consumed. If the target text is output by adopting the mode, the waiting time of the user is too long, and the user experience is poor.
Disclosure of Invention
The application provides a text display method, a text display system and a related device, which reduce the waiting time of a user, can also avoid the occurrence of clamping in the text display process, and realize the smooth output of text.
The application provides a text display system, which comprises an electronic device and a cloud server, wherein the electronic device is used for sending a first message to the cloud server, the cloud server is used for generating a first target text corresponding to the first message through a language model, the cloud server is further used for sending a first display speed and a first text segment to the electronic device after the first text segment in the first target text is generated, the first display speed is used for indicating the number of characters in the first target text displayed in a unit time, the electronic device is further used for displaying the first text segment at the first display speed, the cloud server is further used for generating a second text segment in the first target text before the electronic device finishes displaying the first text segment at the first display speed, the second text segment is located after the first text segment in the first target text, the cloud server is further used for sending the second text segment to the electronic device after the second text segment in the first target text is generated, the electronic device is further used for displaying the first text segment before the first text segment is displayed at the first display speed or after the first text segment is displayed at the first display speed, and the first text segment is further used for displaying the first text segment at the first display speed after the first text segment is displayed at the first display speed.
Therefore, the generated characters can be displayed in the process of generating the target text, the waiting time of a user is reduced, the phenomenon of blocking in the display process can be avoided, and smooth display of the characters is ensured.
In one possible implementation manner, the first text segment comprises at least one token, the second text segment comprises one token, the cloud server is further used for acquiring a first generation speed in the process of generating the first target text, the first generation speed is used for indicating the number of tokens generated in unit time, the cloud server is further used for determining a first display speed based on the first generation speed and a first constraint relation, the first constraint relation is that the first display speed is smaller than or equal to the product of the first generation speed and a first scale, and the first scale is the ratio of the number of characters corresponding to the language model to the number of tokens.
Therefore, the number of the tokens transmitted for the first time is larger than or equal to the number of tokens transmitted for each subsequent time, the time spent on displaying all the characters transmitted for the first time is long, and smooth display of the characters can be ensured.
In one possible implementation, the cloud server is further configured to obtain a first generation speed during generation of the first target text, where the first generation speed is used to indicate a number of tokens generated in a unit time, the cloud server is further configured to determine a first display speed based on the first generation speed and a first constraint relation, where the first constraint relation is that the first display speed is equal to a product of the first generation speed and a first constant, the first constant is smaller than a first proportion, the first proportion is a proportion of a number of characters corresponding to the language model to the number of tokens, the cloud server is further configured to set a word number interval, where the word number interval is used to indicate a word number range of a single text segment of the first target text, the cloud server is further configured to determine the first text segment based on the word number interval during generation of the first target text, the cloud server is further configured to determine that the first text segment is risk-free before transmission of the first text segment, the cloud server is further configured to determine a second text segment based on the word number interval during generation of the first target text, and the cloud server is further configured to determine that the second text segment is risk-free before transmission of the second text segment.
Thus, by setting the word count floating section of the text segment and the display speed of the text, smooth display between two adjacent text segments can be ensured.
In one possible implementation, the cloud server is further configured to determine, during the generation of the first target text, a third text segment based on the word count interval, the third text segment being subsequent to the second text segment in the first target text, the cloud server is further configured to determine that the third text segment is at risk, the cloud server is further configured to send the third text segment and first risk information to the electronic device, the first risk information being configured to indicate that the first target text is at risk, and the electronic device is further configured to withdraw all content that has been displayed in the first target text after receiving the first risk information.
In this way, the electronic device may withdraw the displayed content in case of detecting that the target text is at risk.
In one possible implementation, the cloud server is further configured to send second risk information to the electronic device when the first text segment is sent, the second risk information being used to indicate that the first target text is temporarily risk-free, and the cloud server is further configured to send the second risk information to the electronic device when the second text segment is sent.
In this way, the cloud server can also send risk information to the electronic device under the condition that the target text is risk-free, wherein the risk information is used for informing the electronic device that the current text segment is risk-free.
In one possible implementation, the electronic device is further configured to establish a long link with the cloud server before sending the first message to the cloud server.
Thus, the first message and the first target text and other data can be transmitted through the long link.
In one possible implementation, the electronic device is further configured to receive a first input from a user prior to establishing the long link with the cloud server, and the electronic device is further configured to determine the first message based on the first input.
The types of the first input can include voice input, text input and event input. The type of the first message may include a voice message, a text message, or an event message.
The method comprises the steps of determining a first message based on a first input, and particularly comprises the steps of determining the type of the first message based on the type of the first input and determining the content of the first message based on the content of the first input. The content of the first message may be the same as the content of the first input.
In one possible implementation, the electronic device is further configured to disconnect the long link with the cloud server when the first disconnection condition is detected to be satisfied, where the first disconnection condition includes any one or more of a network error, receiving a first operation of a user, and the first operation is used to trigger the electronic device to stop using the language model service.
In this way, the electronic device may disconnect the long link with the cloud server if it is detected that the first disconnection condition is satisfied.
In one possible implementation manner, the cloud server is further configured to disconnect the long link with the electronic device when the second disconnection condition is detected to be satisfied, where the second disconnection condition includes any one or more of a network error, a first target text being sent completely, the first target text being sent completely, and a message sent by the electronic device not being received within a first duration after the first target text is sent completely.
In this way, the cloud server may disconnect the long link with the electronic device if the second disconnection condition is detected to be satisfied.
In one possible implementation, the electronic device is further configured to receive a second input from the user after displaying the first target text at the first display speed, the electronic device is further configured to determine a second message based on the second input, the electronic device is further configured to send the second message to the cloud server, the cloud server is further configured to generate a second target text corresponding to the second message via the language model, the cloud server is further configured to send the second display speed and the fourth text segment to the electronic device after generating the fourth text segment in the second target text, the second display speed is used to indicate a number of characters in the second target text displayed per unit time, the electronic device is further configured to display the fourth text segment at the second display speed, the cloud server is further configured to generate a fifth text segment in the second target text before the electronic device has displayed the fourth text segment at the second display speed, the cloud server is further configured to send the fifth text segment in the second target text after generating the fifth text segment in the second target text, the electronic device is further configured to send the fifth text segment in the second target text segment, the electronic device is further configured to display the fifth text segment in the second target text segment before the fourth text segment is displayed at the fourth display speed, and the electronic device is further configured to display the fifth text segment in the fourth text segment at the fourth display speed after the fourth text segment is displayed at the fourth display speed.
In this way, the second message and the second target text may also be transmitted over the long link.
In one possible implementation, the electronic device is further configured to send a second message to the cloud server, and specifically includes the electronic device further configured to send the second message to the cloud server when a time interval between a time when the first target text is displayed and a time when the second input is received is less than a second time period.
The multiplexing of long links may correspond to a temporal threshold. The second message may multiplex long links used for transmitting the first message within a preset time threshold. Therefore, if the long chain is not used for a long time, the long chain can be automatically disconnected, and the energy consumption is reduced.
The application provides a text display method, which is applied to a cloud server and comprises the steps of receiving a first message sent by electronic equipment, generating a first target text corresponding to the first message through a language model, sending a first display speed and a first text segment to the electronic equipment after the first text segment in the first target text is generated, wherein the first display speed is used for indicating the number of characters displayed in unit time, generating a second text segment in the first target text, wherein the second text segment is positioned after the first text segment in the first target text, and sending the second text segment to the electronic equipment before the electronic equipment finishes displaying the first text segment at the first display speed.
Therefore, the generated characters can be displayed in the process of generating the target text, the waiting time of a user is reduced, the phenomenon of blocking in the display process can be avoided, and smooth display of the characters is ensured.
In one possible implementation, the first text segment comprises one or more language symbols token, the second text segment comprises one token, the method further comprises the steps of acquiring a first generation speed used for indicating the number of tokens generated in unit time in the process of generating the first target text, determining a first display speed based on the first generation speed and a first constraint relation, wherein the first constraint relation is that the first display speed is less than or equal to the product of the first generation speed and a first scale, and the first scale is the ratio of the number of characters corresponding to the language model to the number of tokens.
Therefore, the number of the tokens transmitted for the first time is larger than or equal to the number of tokens transmitted for each subsequent time, the time spent on displaying all the characters transmitted for the first time is long, and smooth display of the characters can be ensured.
In one possible implementation, the method further comprises the steps of acquiring a first generation speed used for indicating the number of tokens generated in unit time in the process of generating the first target text, determining a first display speed based on the first generation speed and a first constraint relation, wherein the first constraint relation is that the first display speed is equal to the product of the first generation speed and a first constant, the first constant is smaller than a first proportion, the first proportion is the proportion of the number of characters corresponding to the language model to the number of tokens, setting a word number interval used for indicating the word number range of a single text segment of the first target text, determining the first text segment based on the word number interval in the process of generating the first target text, determining that the first text segment is free of risk before the first text segment is transmitted, determining the second text segment based on the word number interval in the process of generating the first target text, and determining that the word number of the second text segment belongs to the word number interval before the second text segment is transmitted.
Thus, by setting the word count floating section of the text segment and the display speed of the text, smooth display between two adjacent text segments can be ensured.
In one possible implementation, the method further comprises determining a third text segment based on the word count interval in the process of generating the first target text, wherein the third text segment is behind the second text segment in the first target text, determining that the third text segment is at risk, and sending the third text segment and first risk information to the electronic device, wherein the first risk information is used for indicating that the first target text is at risk.
In this way, the electronic device may withdraw the displayed content in case of detecting that the target text is at risk.
In one possible implementation, the method further comprises sending second risk information to the electronic device when the first text segment is sent, the second risk information being used for indicating that the first target text is temporarily risk-free, and sending second risk information to the electronic device when the second text segment is sent.
In this way, the cloud server can also send risk information to the electronic device under the condition that the target text is risk-free, wherein the risk information is used for informing the electronic device that the current text segment is risk-free.
In one possible implementation, the method further includes establishing a long link with the electronic device prior to receiving the first message sent by the electronic device.
Thus, the first message and the first target text and other data can be transmitted through the long link.
In one possible implementation, the method further comprises disconnecting the long link with the electronic device when the second disconnection condition is detected to be met, wherein the second disconnection condition comprises any one or more of a network error, a first target text being sent completely, the first target text being sent completely, and a message sent by the electronic device not being received within a first time period after the first target text is sent completely.
In this way, the cloud server may disconnect the long link with the electronic device if the second disconnection condition is detected to be satisfied.
In one possible implementation, the method further comprises the steps of receiving a second message sent by the electronic device after the first target text is sent, generating a second target text corresponding to the second message through the language model, sending a second display speed and a fourth text segment to the electronic device after a fourth text segment in the second target text is generated, generating a fifth text segment in the second target text before the fourth text segment is displayed by the electronic device at the second display speed, the fifth text segment being after the fourth text segment in the second target text, and sending the fifth text segment to the electronic device after the fifth text segment of the second target text is generated.
In this way, the second message and the second target text may also be transmitted over the long link.
The application provides a text display method, which is applied to electronic equipment and comprises the steps of sending a first message to a cloud server, receiving a first display speed sent by the cloud server and a first text segment in a first target text, wherein the first display speed is used for indicating the number of characters in the first target text displayed in unit time, the first target text is text generated by the cloud server based on the first message through a language model, displaying the first text segment at the first display speed, receiving a second text segment in the first target text before the first text segment is displayed by the electronic equipment at the first display speed, and displaying the second text segment at the first display speed after the first text segment is displayed in the first target text.
Therefore, the generated characters can be displayed in the process of generating the target text, the waiting time of a user is reduced, the phenomenon of blocking in the display process can be avoided, and smooth display of the characters is ensured.
In one possible implementation, the first text segment includes one or more token tokens and the second text segment includes one token.
Therefore, the number of the tokens transmitted for the first time is larger than or equal to the number of tokens transmitted for each subsequent time, the time spent on displaying all the characters transmitted for the first time is long, and smooth display of the characters can be ensured.
In one possible implementation, the method further comprises receiving a third text segment and first risk information sent by the cloud server, wherein the first risk information is used for indicating that the first target text is at risk, and withdrawing all content displayed in the first target text after receiving the first risk information.
In this way, the electronic device may withdraw the displayed content in case of detecting that the target text is at risk.
In a possible implementation manner, receiving the first display speed and the first text segment in the first target text sent by the cloud server specifically comprises receiving the first display speed, the first text segment in the first target text and second risk information sent by the cloud server, wherein the second risk information is used for indicating that the first target text is temporarily free of risk, and receiving the second text segment in the first target text specifically comprises receiving the second text segment in the first target text and the second risk information.
In this way, the electronic device can determine whether the current text segment is at risk or not based on the risk information sent by the cloud server.
In one possible implementation, the method further includes establishing a long link with the cloud server before sending the first message to the cloud server.
Thus, the first message and the first target text and other data can be transmitted through the long link.
In one possible implementation, the method further includes receiving a first input by the user prior to establishing the long link with the cloud server, and determining the first message based on the first input in response to the first input.
The types of the first input can include voice input, text input and event input. The type of the first message may include a voice message, a text message, or an event message.
The method comprises the steps of determining a first message based on a first input, and particularly comprises the steps of determining the type of the first message based on the type of the first input and determining the content of the first message based on the content of the first input. The content of the first message may be the same as the content of the first input.
In one possible implementation, the method further comprises disconnecting the long link with the cloud server when the first disconnection condition is detected to be met, wherein the first disconnection condition comprises any one or more of network errors, receiving a first operation of a user, and triggering the electronic device to stop using the language model service.
In this way, the electronic device may disconnect the long link with the cloud server if it is detected that the first disconnection condition is satisfied.
In one possible implementation, the method further includes receiving a second input from the user after the first target text has been displayed at the first display speed, determining a second message based on the second input in response to the second input, sending the second message to the cloud server, receiving the second display speed sent by the cloud server and a fourth text segment in the second target text, the second display speed indicating a number of characters in the second target text displayed per unit time, the second target text being text generated by the cloud server based on the second message by the language model, displaying the fourth text segment at the second display speed, receiving a fifth text segment in the second target text before the fourth text segment has been displayed by the electronic device at the second display speed, the fifth text segment being after the fourth text segment in the second target text, and displaying the fifth text segment at the second display speed after the fourth text segment has been displayed at the second display speed.
In this way, the second message and the second target text may also be transmitted over the long link.
In one possible implementation, the sending of the second message to the cloud server specifically includes sending the second message to the cloud server when a time interval between a time when the first target text is displayed and a time when the second input is received is less than a second time period.
The multiplexing of long links may correspond to a temporal threshold. The second message may multiplex long links used for transmitting the first message within a preset time threshold. Therefore, if the long chain is not used for a long time, the long chain can be automatically disconnected, and the energy consumption is reduced.
In a fourth aspect, the present application provides a server comprising one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being configured to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the server to perform the text display method in any of the possible implementations of the second aspect described above.
In a fifth aspect, the present application provides an electronic device comprising one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the text display method in any of the possible implementations of the third aspect described above.
In a sixth aspect, an embodiment of the present application provides a computer storage medium, including computer instructions, which when executed on a server, cause the server to perform the text display method in any one of the possible implementation manners of the second aspect.
In a seventh aspect, embodiments of the present application provide a computer program product for, when run on an electronic device, causing the electronic device to perform the text display method in any of the possible implementations of the third aspect.
The advantageous effects of the fourth aspect to the seventh aspect may be referred to the advantageous effects described in the first aspect to the third aspect described above.
Drawings
FIG. 1A is a schematic diagram of a token in text A according to an embodiment of the present application;
FIG. 1B is a schematic diagram of a system architecture of a text display system 10 according to an embodiment of the present application;
fig. 1C is a schematic hardware structure of an electronic device 100 according to an embodiment of the present application;
fig. 1D is a schematic hardware structure diagram of a cloud server 200 according to an embodiment of the present application;
FIG. 2A is a schematic diagram showing a time relationship between a time of generating a token1, a time of transmitting the token1, and a display time according to an embodiment of the present application;
FIG. 2B is a schematic diagram of a time relationship between a time of generating token2, a time of transmitting token2, and a time of displaying token2 according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a text display method according to an embodiment of the present application;
fig. 4A is a schematic diagram of a time relationship between a generation time, a transmission time and a display time of a text segment 1 according to an embodiment of the present application;
Fig. 4B is a schematic diagram of a time relationship between a generation time, a transmission time and a display time of a text segment 2 according to an embodiment of the present application;
FIG. 5 is a flowchart of another text display method according to an embodiment of the present application;
FIGS. 6A-6F are schematic diagrams illustrating an interface of a text display method according to an embodiment of the present application;
FIG. 7A is a schematic diagram of a functional module of a text display system 10 according to an embodiment of the present application;
fig. 7B is a schematic diagram of two links between each module of the electronic device 100 and the cloud server 200 according to the embodiment of the present application;
fig. 8 is a schematic software architecture of an electronic device 100 according to an embodiment of the present application;
Fig. 9 is a flowchart of a text display method according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly and thoroughly described below with reference to the accompanying drawings. In the description of the embodiment of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B, and "and/or" in the text is merely an association relationship describing an association object, which means that three relationships may exist, for example, a and/or B, and that three cases of a alone, a and B together, and B alone exist, and further, in the description of the embodiment of the present application, "a plurality" means two or more.
The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.
The term "User Interface (UI)" in the following embodiments of the present application is a media interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user. The user interface is a source code written in a specific computer language such as java, extensible markup language (extensible markup language, XML) and the like, and the interface source code is analyzed and rendered on the electronic equipment to finally be presented as content which can be identified by a user. A commonly used presentation form of a user interface is a graphical user interface (graphic user interface, GUI), which refers to a graphically displayed user interface that is related to computer operations. It may be a visual interface element of text, icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, widgets, etc., displayed in a display of the electronic device.
Some terms of art to which embodiments of the present application relate are described below.
Natural language understanding (naturallanguage understanding, NLU) NLU is a discipline that incorporates linguistics, logic, psychology and computer science. The NLU can obtain semantic representations of natural language through analysis of grammar, semantics, and speech.
The large language model (large language model, LLM) is a neural network model trained with a large amount of text data, has strong natural language processing capability and code generation capability, and is widely applied to the fields of natural language processing, machine translation, dialogue systems, code generation and the like. LLM can generate a piece of text by learning a large amount of text data (e.g., web pages, books, papers, etc.). LLM is typically composed of millions of parameters with very high representation and generation capabilities. By entering a question or task description, the LLM can generate a code or text corresponding to the input.
A token is a basic unit of natural language, and may be a word, phrase, character, or the like. After receiving the input, the LLM may generate the target text in token.
Exemplary, fig. 1A shows a token schematic diagram in a text a according to an embodiment of the present application.
As shown in fig. 1A, text a may be the word "in our daily lives, we always face a variety of challenges. "depending on grammar and semantics, text A may include a number of token" in "," we "," daily "," living "," medium "," "," we "," general "," will "," face "," various "," look "," challenge "and". The arrangement order of the token is consistent with the arrangement order of the token in the text A. Because the LLM is a text generated by taking the token as a basic unit, in the process of generating the text a, the LLM can sequentially select all the tokens in the text a according to the arrangement order of the tokens in the text a until all the tokens in the text a are generated.
It should be understood that the embodiment shown in fig. 1A is merely an example, and in embodiments of the present application, more, fewer, or different token may be included in different text, and the present application is not limited thereto.
Long links-long links are also referred to as persistent links, meaning links that are not broken immediately after the completion of the data transfer, and conversely links that are broken immediately after the completion of the data transfer are short links. In an embodiment of the present application, the long link may include a web socket (websocket), or the like.
The following describes a system architecture of a text display system 10 according to an embodiment of the present application.
As shown in fig. 1B, the text display system 10 may include an electronic device 100 and a cloud server 200.
The electronic device 100 may receive and respond to user input (e.g., voice input, text input, etc.) by determining message 1 (e.g., a voice message, a text message, or an event message, etc.). After obtaining message 1, electronic device 100 may establish a long link with cloud server 200 and send message 1 to cloud server 200 over the long link. The electronic device 100 may receive the display time consumption of the single text sent by the cloud server 200, and may also receive the target text sent by the cloud server 200 in multiple times, and display the target text based on the display time consumption of the single text (i.e., display one text every time interval corresponding to the display time consumption of the single text).
Cloud server 200 may store one or more language models, such as LLM, etc. After receiving message 1, cloud server 200 may invoke LLM to generate a target text corresponding to message 1 and send the target text to electronic device 100 over the long link. It should be noted that, the cloud server 200 may continuously send the generated token in the target text to the electronic device 100 in the process of generating the target text until all the tokens of the target text are sent, so that the waiting time of the user may be reduced. In addition, in the process of generating the target text by the cloud server 200, the cloud server 200 may further detect the generation time of the single token (or obtain the generation time of the single token corresponding to the pre-stored LLM) in the generation process, determine the display time consumption of the single text based on the generation time of the single token, and send the display time consumption of the single text to the electronic device 100. This ensures that the electronic device 100 can smoothly display the target text in the process of displaying the target text.
It should be understood that the text display system 10 shown in fig. 1B is only an example, and in embodiments of the present application, the text display system 10 may further include more electronic devices or include one or more servers, which are not limited herein.
Fig. 1C shows a hardware configuration diagram of the electronic device 100.
The electronic device 100 may be a cell phone, tablet computer, desktop computer, laptop computer, handheld computer, notebook computer, ultra-mobile personal computer (UMPC), netbook, and cellular telephone, personal Digital Assistant (PDA), augmented reality (augmented reality, AR) device, virtual Reality (VR) device, artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) device, wearable device, vehicle-mounted device, smart home device, and/or smart city device, and the specific type of the electronic device is not particularly limited by the embodiments of the present application.
The electronic device 100 may include a processor 110, an internal memory 121, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a sensor module 180, a display 194, and the like. Optionally, the electronic device 100 may further include any one or more of an external memory interface 120, keys 190, a motor 191, an indicator 192, and the like.
Where the sensor module 180 may include one or more sensors, such as a touch sensor 180K, etc., in some embodiments, the sensor module 180 may also include any one or more of a pressure sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, an ambient light sensor, a bone conduction sensor, etc.
It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.
The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the display 194, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example, the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, demodulates and filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques can include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (GENERAL PACKET radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation SATELLITE SYSTEM, GLONASS), a beidou satellite navigation system (beidou navigation SATELLITE SYSTEM, BDS), a quasi zenith satellite system (quasi-zenith SATELLITE SYSTEM, QZSS) and/or a satellite based augmentation system (SATELLITE BASED AUGMENTATION SYSTEMS, SBAS).
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD) panel, and the display panel may also be fabricated using organic light-emitting diodes (OLEDs), active-matrix organic LIGHT EMITTING diodes (AMOLEDs), flexible light-emitting diodes (FLEDs), miniled, microLed, micro-oLed, quantum dot LIGHT EMITTING diodes (QLEDs), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The internal memory 121 may include one or more random access memories (random access memory, RAM) and one or more non-volatile memories (NVM). The random access memory may be read directly from and written to by the processor 110, may be used to store executable programs (e.g., machine instructions) for an operating system or other on-the-fly programs, may also be used to store data for users and applications, and the like. The nonvolatile memory may store executable programs, store data of users and applications, and the like, and may be loaded into the random access memory in advance for the processor 110 to directly read and write.
The external memory interface 120 may be used to connect external non-volatile memory to enable expansion of the memory capabilities of the electronic device 100. The external nonvolatile memory communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and video are stored in an external nonvolatile memory.
The audio module 170 may include any one or more of a speaker, a receiver, a microphone, and an earphone interface, etc. The electronic device 100 may implement audio functions through an audio module 170, an application processor, and the like. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
Speakers, also known as "horns," are used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.
A receiver, also called an "earpiece", is used to convert the audio electrical signal into a sound signal. When the electronic device 100 is answering a telephone call or voice message, the voice can be heard by placing the receiver in close proximity to the human ear.
Microphones, also known as "microphones" and "microphones", are used to convert sound signals into electrical signals. When making a call or transmitting voice information, a user can sound near the microphone through the mouth, inputting a sound signal to the microphone. The electronic device 100 may be provided with at least one microphone. In other embodiments, the electronic device 100 may be provided with two microphones, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.
The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.
Fig. 1D shows a schematic hardware structure of a cloud server 200 according to an embodiment of the present application.
As shown in fig. 1D, cloud server 200 may include one or more network device processors 201, memory 202, communication interface 203, transmitter 205, receiver 206, coupler 207, and antenna 208. These components may be connected by a bus 204 or otherwise, with fig. 1C being an example of a connection via a bus. Wherein:
The communication interface 203 may be used for the cloud server 200 to communicate with other communication devices, such as electronic devices used by consumers of the project. In particular, the communication interface 203 may be a 3G communication interface, a Long Term Evolution (LTE) (4G) communication interface, a 5G communication interface, a WLAN communication interface, a WAN communication interface, and the like. Not limited to a wireless communication interface, the cloud server 200 may also be configured with a wired communication interface 203 to support wired communication.
In some embodiments of the present application, the transmitter 205 and the receiver 206 may be considered as one wireless modem. The transmitter 205 may be used to transmit signals output by the network device processor 201. The receiver 206 may be used to receive signals. In cloud server 200, the number of transmitters 205 and receivers 206 may each be one or more. The antenna 208 may be used to convert electromagnetic energy in the transmission line into electromagnetic waves in free space or to convert electromagnetic waves in free space into electromagnetic energy in the transmission line. Coupler 207 may be used to split the mobile communication signal into multiple paths that are distributed to multiple receivers 206. It is appreciated that the antenna 208 of the network device may be implemented as a large-scale antenna array.
The memory 202 is coupled to the network device processor 201 for storing various software programs and/or sets of instructions. In particular, memory 202 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The memory 202 may store an operating system (hereinafter referred to as a system), such as an embedded operating system uCOS, vxWorks, RTLinux. Memory 202 may also store network communication programs that may be used to communicate with other communication devices.
In an embodiment of the application, the network device processor 201 may be used to read and execute computer readable instructions. In particular, the network device processor 201 may be configured to invoke a program stored in the memory 202, for example, a program for implementing a text display method provided in one or more embodiments of the present application, and execute instructions included in the program.
It should be noted that, the cloud server 200 shown in fig. 1D is only one implementation manner of the embodiment of the present application, and in practical application, the cloud server 200 may further include more or fewer components, which is not limited herein.
The embodiment of the application provides a text display method, the electronic device 100 can receive and respond to the input of a user to determine the type of a message 1 and the content of the message 1, and the message 1 can be a voice message, a text message or an event message. The electronic device 100 establishes a long link with the cloud server 200 and sends the message 1 to the cloud server 200 through the long link. Upon receiving message 1, cloud server 200 may generate a target text corresponding to message 1 through a language model. In generating the target text, the cloud server 200 may sequentially transmit language symbols (token) that have been generated in the target text to the electronic device 100 in the generation order. Moreover, the cloud server 200 may also determine a display time consuming of the single text based on the generation time of the single token, and send the display time consuming of the single text to the electronic device 100. The electronic device 100 may display the target text based on the display time of the single text (i.e., display one text per time interval corresponding to the display time of the single text).
Thus, the waiting time of the user can be reduced, the fluency of text output can be ensured, output jamming is avoided, and better use experience is provided for the user.
The relationship between the generation time and the display time of the adjacent token in the target text provided by the embodiment of the application is described below.
FIG. 2A is a schematic diagram showing the generation, transmission and display time of token1 in a target text according to an embodiment of the present application.
As shown in fig. 2A, the one-dimensional coordinate axis may represent time, from time T a0 to time T a1, the cloud server 200 may generate token1, from time T a1 to time T a2, and the cloud server 200 may send the token1 to the electronic device 100. At time T a2, the electronic device 100 may begin displaying text in token1 at a constant speed. At time T a3, the electronic device 100 may display all of the text in token 1. Since the display time T display of a single text is fixed, if the number of text in token1 is N 1, the following relationship is satisfied between T a2 and T a3:
T a3=Ta2+tdisplay*N1 formula (1)
FIG. 2B is a schematic diagram showing the generation, transmission and display time of token2 in a target text according to an embodiment of the present application.
As shown in fig. 2B, the one-dimensional coordinate axis may represent time. Since token2 and token1 are two adjacent tokens in the target text and the generation order of token1 is earlier than that of token2, at time T a1, the cloud server 200 can start generating token2. From time T a1 to time T b1, cloud server 200 may generate token2. From time T b1 to time T b2, the cloud server 200 may send token2 to the electronic device 100. The time T b2 is the time when the electronic device 100 receives the token2. After the time T b2 arrives, the electronic device 100 can display all the text in the token2 at a uniform speed.
In the process that the cloud server 200 starts generating a plurality of token of the target text, the speed of generating the token may be considered to be uniform, that is, the time taken by the cloud server 200 to generate a single token is t token. Thus, the following relationship is satisfied between T a1 and T b1:
t b1=Ta1+ttoken formula (2)
In order to avoid the electronic device 100 from generating a blocking phenomenon during the process of displaying the target text, the electronic device 100 needs to start displaying the token2 immediately after displaying the token1, and the receiving time T b2 of the token2 is no later than the time T a3 when displaying the token1, which is:
t b2≤Ta3 formula (3)
In the case of stable network quality, the data size of the individual token is not greatly different, the time T trans spent transmitting the individual token can be regarded as the same, and the following relations can be satisfied between T a1 and T a2、Tb1 and T b2
T a2=Ta1+ttrans formula (4)
T b2=Tb1+ttrans formula (5)
The above formulas (1) to (5) are combined to obtain:
In the above formula (6), t word represents the time it takes for the cloud server 200 to generate a single text. Therefore, under the condition of stable network quality, when the time for displaying a single text by the electronic device 100 is greater than or equal to the average time consumed by the cloud server 200 to generate the single text (i.e., the speed for displaying the text by the electronic device 100 is less than the speed for generating the text by the cloud server 200), the electronic device 100 can avoid the occurrence of a jam during the display process, and ensure smooth display of the text.
If scale is defined as the ratio of the number of characters to the number of token in the same text, there are
Scale=n word/Ntoken formula (7)
In the above formula (7), N word is the number of characters in one text, N token is the number of tokens in the same text, and scale is the ratio of the number of characters to the number of tokens in the same text. As can be seen from the embodiment shown in FIG. 1A, a token may include one or more words, i.e., scale may have a value greater than or equal to 1.
The following relationship may exist between the time t token that the cloud server 200 takes to generate a token and the average time t word that it takes to generate a word:
in connection with equation (8), equation (6) above may be equivalently expressed as equation (9) below:
In equation (9) above, t display to the left of the inequality may represent the time when the electronic device 100 displays a single text, to the right of the inequality It may take time to generate a single word on behalf of cloud server 200. As can be seen from the formula (9), when the time for displaying a single text by the electronic device 100 is greater than or equal to the ratio of the time consumed by the cloud server 200 to generate a single token to scale, the electronic device 100 can avoid the occurrence of a jam during the display process, and ensure smooth display of the text.
It can be understood that, in the embodiment of the present application, a certain time interval may exist between the time of generating the token and the time of sending the token, and in this case, as long as the time interval is fixed, the electronic device 100 may still avoid the occurrence of a clip during the display process and ensure smooth display of characters under the condition that the above formula (9) is satisfied.
The following describes a specific flow of a text display method provided by the embodiment of the application.
As shown in fig. 3, a specific flow of a text display method provided by an embodiment of the present application may include the following steps:
S301, the electronic device 100 receives and responds to the user' S input to determine message 1, which may be a voice message, a text message, or an event message.
In the embodiment of the present application, the user input may be text input, for example, inputting the word "please write an article of 100 words" or the like. The user's input may also be a voice input, such as the voice "recite a poem", etc.
In other embodiments, the user input may also be an event input, which is used to set an event and a trigger condition for the event. The triggering condition of the event may be a predetermined time, a predetermined state (such as an awake state, a motion state, a music listening state) or the like, and the triggering condition of the event is not limited in the present application. The electronic device 100 may receive and respond to user input, storing the correspondence of events and trigger conditions for the events. For example, table 1 shows a correspondence between an event and a trigger condition of the event according to an embodiment of the present application.
TABLE 1
Trigger condition Event(s)
14:00 Outputting an article related to spring
Listening to music Music appreciation article outputting 100 words
As shown in table 1, the electronic device 100 may store an event and a trigger condition of the event based on an input of a user, for example, the trigger condition of the event "output an article related to spring" is time "14:00", the trigger condition of the event "output a music appreciation article of 100 words" is listening to music, etc.
It should be understood that the embodiment shown in table 1 is only an example, and in the embodiment of the present application, the electronic device 100 may further store more, fewer or different events and trigger conditions than the embodiment shown in table 1, and the present application is not limited herein.
The electronic device 100 may determine message 1 based on the user's input and the type of message 1 may include a voice message, a text message, and an event message. When the user's input is text input, message 1 is a text message. When the user's input is voice input, message 1 is a voice message. Message 1 is an event message when the user's input is used to set the trigger condition for the event.
If the user's input is text input or voice input, the electronic device 100 may determine message 1 based on the user's input when the user's input is received. At this time, the content of the message 1 may include the content of the user input (text input or voice input). For example, if the user's input is "write an article related to an animal," message 1 may include "write an article related to an animal.
If the user's input is an event input, the electronic device 100 may receive and respond to the user's input, set an event and a trigger condition for the event, and determine the message 1 when it is detected that the trigger condition for the event is satisfied. The content of message 1 may include the content of the event, e.g., if the event is "output an article related to spring," then message 1 may include "output an article related to spring.
S302, the electronic device 100 establishes a long link with the cloud server 200.
After determining message 1, electronic device 100 may establish a long link with cloud server 200. In some embodiments, the long link may be a websocket. The details of the long link between the electronic device 100 and the cloud server 200 may be described with reference to the following related description in the embodiment shown in fig. 7B, which is not described in detail herein.
In other embodiments, if a long link has been established between the electronic device 100 and the cloud server 200, step S302 may not be performed, and at this time, the electronic device 100 may communicate with the cloud server 200 through the established long link.
S303, the electronic device 100 sends the message 1 to the cloud server 200 through the long link.
S304, the cloud server 200 starts generating a target text based on the message 1 through the language model, the target text including a plurality of token.
One or more language models, such as LLM, etc., may be stored in the cloud server 200.
In some embodiments, after receiving message 1, cloud server 200 may use message 1 as an input to LLM, and generate the target text corresponding to message 1 through LLM.
In generating a plurality of token of the target text, the cloud server 200 may store the plurality of token in the generation order.
Illustratively, table 2 shows a generated token stored by the cloud server 200 according to an embodiment of the present application.
TABLE 2
Order of generation token
1 token1
2 token2
3 token3
As shown in table 2, the cloud server 200 may store a plurality of token generated and a corresponding generation order. For example, token1, token2, and token3, wherein the generation order of token1 is 1, the generation order of token2 is 2, the generation order of token3 is 3, and the like.
It should be understood that the embodiment shown in table 2 is merely an example, and in the embodiment of the present application, the cloud server 200 may store more, fewer or different token than the embodiment shown in table 2, which is not limited herein.
Also by way of example, table 3 shows another generated token stored by the cloud server 200 provided by an embodiment of the present application.
TABLE 3 Table 3
Order of generation token Transmitting identification
1 token1 Not transmitted
2 token2 Not transmitted
3 token3 Not transmitted
As shown in table 3, the cloud server 200 may store a plurality of generated token and corresponding generation order and transmission identification. For example, token1, token2, and token3, wherein the generation order of token1 is 1, the generation order of token2 is 2, the generation order of token3 is 3, and the like. The send identification may be used to indicate whether the token was sent by the cloud server 200. As can be seen from table 3, the transmission identifiers of all of token1, token2, and token3 are not transmitted, indicating that all three tokens are transmitted by the cloud server 200.
It should be understood that the embodiment shown in table 3 is merely an example, and in the embodiment of the present application, the cloud server 200 may store more, fewer or different token or transmission identifier from the embodiment shown in table 3, which is not limited herein.
In other embodiments, in the process of generating multiple token of the target text, the cloud server 200 may also store the generated text in the target text and the generation sequence of the text according to the generation sequence, and the specific storage form may refer to the embodiments shown in table 2 or table 3 and will not be described herein.
S305, the cloud server 200 obtains time t token consumed by the language model to generate a token.
In some embodiments, the cloud server 200 may store the time it takes for the LLM to generate a token. It should be noted that, after the language model is trained, the speed of generating the token by the language model is determined, that is, after the LLM training is finished, the time t token spent by the LLM in generating a token may be measured, and t token may be stored in the cloud server 200.
In other embodiments, the cloud server 200 may also detect t token during the LLM generation of the target text. Specifically, the cloud server 200 may detect the number of tokens generated by the LLM within a fixed time interval (e.g., 100 milliseconds) and determine t token based on the fixed time interval and the number of tokens generated. Wherein the fixed time interval may be a preset time interval.
It should be noted that, since the LLM needs to perform the parsing process on the message 1 first when generating the target text based on the message 1, the parsing process needs a certain time, and the LLM needs a long time (for example, 250 ms) from receiving the message 1 to generating the first token of the target text, which may be called the first token delay. After the first token is generated, the LLM can continuously generate subsequent tokens at a constant speed. Thus, when calculating the time t token taken for the LLM to generate a token, the cloud server 200 may select a time period of a fixed interval within the time period after the LLM generates the first token, detect the number of tokens generated during the time period of the fixed interval, and determine the time t token taken for the LLM to generate a token. In this way, errors introduced by the first token delay can be avoided.
S306, the cloud server 200 determines, based on t token, a display time period t display of the single text.
The cloud server 200 may store a scale (e.g., 1.25,1.20, etc.) that indicates a ratio of the number of words of text to the number of tokens in the same text. The preset scale value may be set by an administrator of the cloud server 200, or may be determined by the cloud server 200 according to a ratio of the number of characters of a large amount of text stored in the cloud server to the number of token, and the scale value and the determination method of the value are not limited in the present application. In some embodiments, different scale values may correspond to different language models. The cloud server 200 may store a correspondence between the language model and scale values, and determine the corresponding scale values based on the language model currently used.
The cloud server 200 may obtain t display.tdisplay from the above formula (9) based on t token and scale, which may be used to indicate the number of words displayed by the electronic device 100 in a unit time during the process of displaying the target text. In this way, the electronic device 100 can avoid jamming in the process of displaying the target text, and ensure smooth display.
S307, after determining t display, the cloud server 200 sends t display and the generated token to the electronic device 100 through the long link.
After determining the display time period t display of the single text, the cloud server 200 may send the generated token or tokens to the electronic device 100. It will be appreciated that, when the number of the first transmitted tokens is greater than or equal to 1 and the number of the subsequently transmitted tokens is 1, the time taken to display all the first transmitted tokens is greater than or equal to the time required to generate the next token, so that it is ensured that the next token is received before the electronic device 100 displays all the first transmitted tokens.
In some embodiments, when transmitting the generated token to the electronic device 100, the cloud server 200 may also transmit the generation order of the token, for example, when transmitting the token1 and the token2 shown in table 2, the generation order 1 of the token1 and the generation order 2 of the token2 are transmitted to the electronic device 100 together.
In other embodiments, the cloud server 200 may sequentially send the one or more token to the electronic device 100 in the order of token generation. For example, if at the time of determining t display, the cloud server 200 generates a number of token n shown in table 2, the cloud server 200 may send token n1 to the electronic device 100, then send token n2, and then send token n3.
In some embodiments, after the cloud server 200 transmits the token to the electronic device 100, the cloud server 200 may prune the transmitted token from the stored plurality of generated tokens. For example, if the generated token stored in the cloud server 200 is a plurality of the token shown in table 2 above when t display is determined, the token stored in the cloud server 200 may refer to the embodiment shown in table 4 below after t display, token1, token2, and token3 are transmitted from the cloud server 200 to the electronic device 100.
TABLE 4 Table 4
Order of generation token
4 token4
As shown in table 4, the cloud server 200 may store newly generated token and corresponding generation order. For example, token4, token4 is generated in the order of 4. The token4 is a token newly generated after t display is determined by the cloud server 200. Moreover, table 4 does not include transmitted token1, token2, and token3.
It should be understood that the embodiment shown in table 4 is merely illustrative of the cloud server 200 that may delete a transmitted token from a plurality of stored tokens, and in an embodiment of the present application, the token stored by the cloud server 200 during the process of transmitting a token may include more, fewer or different tokens than table 4, and the present application is not limited herein.
In other embodiments, if the cloud server 200 stores the transmission identifier corresponding to the generated token, the cloud server 200 may change the transmission identifier of the token from "not transmitted" to "transmitted" after transmitting the token. For example, if the generated token stored in the cloud server 200 is a plurality of the token shown in table 3 above when t display is determined, the token stored in the cloud server 200 may refer to the embodiment shown in table 5 below after t display, token1, token2, and token3 are transmitted to the electronic device 100 by the cloud server 200.
TABLE 5
Order of generation token Transmitting identification
1 token1 Sent already
2 token2 Sent already
3 token3 Sent already
4 token4 Not transmitted
As shown in table 5, the cloud server 200 may store a plurality of generated token and corresponding generation order and transmission identification. For example, token1, token2, token3, and token4, wherein token4 is a token newly generated after the cloud server 200 determines t display, and the generation order of token1 is 1, the generation order of token2 is 2, the generation order of token3 is 3, the generation order of token4 is 4, and so on. The send identification may be used to indicate whether the token was sent by the cloud server 200. As can be seen from table 5, the transmission identifiers of the token1, token2, and token3 are all transmitted, indicating that the three tokens have been transmitted by the cloud server 200. the transmit flag of token4 is not transmitted, indicating that token4 has not been transmitted by cloud server 200.
It should be understood that the embodiment shown in table 5 is merely an example, and in embodiments of the present application, the cloud server 200 may store more, fewer or different token or transmission identifier than the embodiment shown in table 5, which is not limited herein.
S308, the electronic device 100 displays the received token based on t display.
After receiving the t display, one or more token, sent by the cloud server 200, the electronic device 100 may display the one or more token based on t display. Specifically, the electronic device 100 may display all received tokens at a speed of displaying one more text per interval t display based on the order in which the tokens are received (or based on the order in which the received tokens are generated).
For example, if all received tokens are "we" in the order of precedence, the electronic device 100 may display the word "me" first and, after passing through t display, display "people" at a display location after the word "me".
The specific manner in which the electronic device 100 displays the target text based on t display may also be referred to in the relevant description of the embodiments shown in fig. 6A-6F, described below.
S309, when detecting the newly generated token, the cloud server 200 transmits the token to the electronic device 100 through the long link.
In some embodiments, after t display is sent, if cloud server 200 detects a newly generated token, cloud server 200 may send the token to electronic device 100. That is, the new token may be transmitted to the electronic device 100 after being generated.
It should be noted that, steps S309 to S310 are repeatedly executable steps, and when the cloud server 200 detects a new token, steps S309 and S310 may be executed again.
S310, the electronic device 100 displays the newly received token based on t display after displaying all previously received tokens.
S311, when it is detected that the disconnection condition is satisfied, the long link is disconnected.
The disconnection condition may include any one or more of a network error, the cloud server 200 detecting that the target text is transmitted and a message transmitted by the electronic device 100 is not received within a preset time threshold after the transmission is completed, the electronic device 100 detecting that the user ends an operation of using the language model service (e.g., an operation of exiting the application 11, an operation of releasing the engine, etc.), and the like. The application 11 is an application capable of providing a language model service to a user through the cloud server 200. In some embodiments, the disconnection conditions and the way in which the long links are disconnected may also be described with reference to the following description of the embodiment shown in FIG. 7B.
By adopting the text display method provided by the embodiment of the application, the waiting time of a user can be reduced, the phenomenon of blocking in the text display process can be avoided, and the smooth display of the text is ensured.
In some embodiments, the long link between the electronic device 100 and the cloud server 200 may be multiplexed by other messages (e.g., message 2) generated by the electronic device 100. Long link multiplexing refers to that long links between the electronic device 100 and the cloud server 200 always maintain a connection state between requests and responses multiple times so as to multiplex the long links in subsequent requests without having to establish new long links each time. The method can reduce the overhead of connection establishment and closing and improve the efficiency of network communication. In the embodiment of the present application, the fact that the long link is not multiplexed due to timeout means that the electronic device 100 does not initiate a long link use request within a preset time threshold after the target text of the message 1 is sent. The long link use request may be message 2, etc., determined by the electronic device 100 based on another input by the user. In the case of long-link multiplexing, the type of the message 2 may be the same as the type of the message 1, or may be different from the type of the message 1, which is not limited herein.
In some embodiments, the type of message 1 is different and the time threshold for long links is different. For example, when the message 1 is a voice message, the time threshold corresponding to the long link may be 3 seconds, that is, after the target text of the message 1 is sent, if the long link is not multiplexed within 3 seconds, the long link is disconnected. For another example, when the message 1 is a text message or an event message, the time threshold corresponding to the long link may be 60 seconds, that is, after the target text of the message 1 is sent, if the long link is not multiplexed within 60 seconds, the long link is disconnected. It should be understood that the embodiments herein are merely illustrative of different types of messages 1, and the time thresholds corresponding to the long links are different, and in the embodiments of the present application, the time thresholds may also be different values from those of the above embodiments, which is not limited herein.
In the embodiment of the present application, the electronic device 100 may also determine, based on the time consumption t display of displaying the single text, the display speed of the target text, where the display speed of the target text is used to indicate how many characters are displayed in each second in the process of displaying the target text by the electronic device 100. For example, if t display is 20 ms, the display speed of the target text isAfter determining the display speed of the target text, the electronic device 100 may display the target text transmitted by the cloud server 200 based on the display speed of the target text. It will be appreciated that, in some embodiments, after determining the time t display for displaying the single text, the cloud server 200 may also determine the display speed of the target text based on the time t display for displaying the single text, and send the display speed of the target text to the electronic device 100, which is not limited herein.
In some application scenarios, the cloud server 200 also needs to perform risk control detection (hereinafter referred to as wind control detection) on the generated target text. The wind control detection may be to divide the target text into a plurality of text segments, and sequentially wind control detect each text segment, where each text segment may include one or more token. Since the wind control detection needs to ensure that the semantics of the text segment are complete and the lengths of different sentences are different, the number of words (or number of token) in the text segment cannot be a fixed value. In the process of dividing the text segment, the number of words of the text segment may not be a fixed value, but may correspond to a certain interval, that is, the number of words of the text segment may have a minimum value N min and a maximum value N max, and the maximum floating number N float of the text segment may be the difference between the maximum value N max and the minimum value N min of the number of words of the text segment, that is:
n float=Nmax-Nmin formula (10)
The following describes a method for determining a word count section of a text segment according to an embodiment of the present application.
Fig. 4A shows a schematic diagram of generation, transmission and display time of a text segment 1 in a target text according to an embodiment of the present application.
As shown in fig. 4A, the one-dimensional coordinate axis may represent time, from time T c0 to time T c1, the cloud server 200 may generate the text segment 1, from time T c1 to time T c2, and the cloud server 200 may perform wind control detection on the text segment 1. From time T c2 to time T c3, cloud server 200 may send text segment 1 to electronic device 100. At time T c3, electronic device 100 may begin displaying all of the text in text segment 1 at a uniform speed. At time T c4, electronic device 100 may display all of the words in text 1. Since the display time T display of a single text is fixed, if the number of text in the text segment 1 is N c1, the following relationship is satisfied between T c3 and T c4:
T c4=Tc3+tdisplay*Nc1 formula (11)
Fig. 4B illustrates generation, transmission and display time of a text segment 2 in a target text according to an embodiment of the present application.
As shown in fig. 4B, the one-dimensional coordinate axis may represent time. Since text segment 2 and text segment 1 are two adjacent text segments in the target text and the generation order of text segment 1 is earlier than that of text segment 2, cloud server 200 may begin generating text segment 2 at time T c1. From time T c1 to time T d1, cloud server 200 may generate text segment 2. From time T d1 to time T d2, cloud server 200 may perform wind-controlled detection on text segment 2. From time T d2 to time T d3, cloud server 200 may send text segment 2 to electronic device 100. Time T d3 may be the time at which electronic device 100 receives text segment 2. After time T d3 arrives, electronic device 100 may display all of the text in text segment 2 at a uniform speed. If the number of characters in the text segment 2 is N c2, the following relationship is satisfied between T c1 and T d1:
T d1=Tc1+ttoken*Nc2/scale formula (12)
In the above formula (12), t token is the average time spent by the cloud server 200 in generating a single token, and scale is the ratio of the number of characters in the same text to the number of tokens.
Because the data volume difference between the text segments is not large, and the wind control detection is based on the semantics of the text segments, the time consumed by the wind control detection of the text segments can be considered to be the same, namely:
t c2-Tc1=Td2-Td1 formula (13)
In the case of stable network quality, the transmission times of the text segments can also be considered to be approximately the same, i.e. there are:
T c3-Tc2=Td3-Td2 formula (14)
In order to avoid the electronic device 100 from generating a jamming phenomenon during the process of displaying the target text, the electronic device 100 needs to immediately start displaying the text segment 2 after displaying the text segment 1, and the receiving time T d3 of the text segment 2 is no later than the time T c4 when displaying the text segment 1, namely:
T d3≤Tc4 formula (15)
Combining the above formulas (11) to (15) can obtain:
if the number of words of text segment 1 is N min and the number of words of text segment 2 is N max, then equation (16) may be replaced with the following equation (17):
As can be seen from the above formula (9), If the equal sign in the formula (9) is established, the display time of the single character is the same as the generation time of the single character, and in this case, the formula (17) cannot be established. Therefore, consider the case where the greater than number holds in equation (9). If a constant prop exists, and prop < scale, t display can be given by the following equation (18):
in connection with equation (18), equation (17) above may be replaced with equation (19) below:
As can be seen from the above embodiments, in the case where the number of words of two adjacent text segments is the largest, if N float and N min satisfy the formula (19), the electronic device 100 can avoid the occurrence of a jam between the text segments when displaying the target text. If the number of words in the two adjacent text segments float less than the maximum number of words N float, the above formula (16) can be obviously satisfied when N float and N min satisfy formula (19), and the electronic device 100 can also avoid a clip between the text segments when displaying the target text.
Table 6 shows a sentence length interval distribution provided by an embodiment of the present application.
TABLE 6
As shown in table 6, after exhaustive statistics is performed on 120 ten thousand Chinese corpora, sentences can be divided into a plurality of intervals according to the number of Chinese characters, and the total number of sentences corresponding to each interval is different and the percentage of sentences is also different. For example, the range of the number of words corresponding to the section 1 is [1,5], the total number of sentences in the section is 25620, the percentage of the total number of sentences is 22.79%, and the accumulated value is 22.79%. The range of the number of words corresponding to the interval 2 is [6,10], the total number of sentences in the interval is 45600, the percentage of the total number of sentences is 40.56%, and the accumulated value is 63.35%. The range of the number of words corresponding to the interval 3 is [11,15], the total number of sentences in the interval is 25275, the percentage of the total number of sentences is 22.48%, and the accumulated value is 85.83%. The range of the number of words corresponding to the interval 4 is [16,20] words, the total number of sentences in the interval is 10020, the total number of sentences is 8.91%, and the accumulated value is 94.74%. The range of the number of words corresponding to the interval 5 is [21,30], the total number of sentences in the interval is 5007, the total number of sentences is 4.45 percent, and the accumulated value is 99.19 percent. The range of the number of words corresponding to the interval 6 is [31,40], the total number of sentences in the interval is 725, the total number of sentences is 0.65%, and the accumulated value is 99.84%. The range of the number of words corresponding to the interval 7 is [41,50], the total number of sentences in the interval is 150, the percentage of the total number of sentences is 0.13%, and the accumulated value is 99.97%. The range of the number of words corresponding to the interval 8 is [51,63], the total number of sentences in the interval is 34, the percentage of the total number of sentences is 0.03%, and the accumulated value is 100%. As can be seen from Table 6 above, the number of words in most sentences is 6-10.
In selecting the word length of the text segment, the selection may be made with reference to the contents shown in table 6. Since the text in the text segment needs to ensure the integrity of the semantics, and the lengths of most sentences fall within the interval 2 and the interval 3 shown in table 6, the maximum number of floating words N float of the text segment can be valued in the interval 2 and the interval 3 shown in table 6.
By way of example, N float may take 10,1.2857 May be taken, at which time the word count interval corresponding to the text segment may be 35,45, as may be obtained according to equation (19) above.
It should be understood that the embodiment is merely an example, in the embodiment of the present application, N float may take other values, and the number of words corresponding to the text segment may also be other segments, so long as the above formula (19) is satisfied, the specific value of the number of words of the text segment is not limited in the present application.
The following describes a specific flow of another text display method provided by the embodiment of the present application.
S501, the electronic device 100 receives and responds to the user' S input by determining message 1, which may be a voice message, a text message, or an event message.
S502, the electronic device 100 establishes a long link with the cloud server 200.
S503, the electronic device 100 sends the message 1 to the cloud server 200 through the long link.
At S504, the cloud server 200 starts generating a target text including a plurality of token based on the message 1 through the language model.
S505, the cloud server 200 obtains time t token spent by the language model to generate a token.
For details of step S501 to step S505, reference may be made to the descriptions related to step S301 to step S305 shown in fig. 3, and the details are not repeated here.
S506, the cloud server 200 determines, based on t token, a display time period t display of the single text.
Cloud server 200 may determine, based on t token, a single text display time period t display via equation (18) above.
Wherein, prop may be a real number greater than 0 and less than scale. The pro may be a predetermined constant, and in other embodiments, if the scale value is related to the language model used, the pro may be determined in real time based on the scale value.
S507, the cloud server 200 acquires a text segment word number interval [ N min,Nmax ].
In some embodiments, the word count interval N min,Nmax of the text segment may be a predetermined interval, such as 35,45, 20,26, etc. N min and N max are required to satisfy the relationship shown in the above formula (10) and formula (19). The value of prop is the same as that in step S506.
In other embodiments, the word number interval [ N min,Nmax ] of the text segment may also be determined by the above formula (10) and formula (19) based on the scale and the prop value after determining the language model (for example LLM) to be used, which is not limited herein.
S508, when detecting that the word number of the generated token is greater than N max, the cloud server 200 determines a text segment based on the semantic and word number interval [ N min,Nmax ].
The text segment may include one or more token that have been generated and the number of words of the text segment falls within the number of words interval N min,Nmax.
Cloud server 200 may determine a text segment for which the word count falls within word count interval [ N min,Nmax ] based on grammar and semantics.
Illustratively, if the word count interval [ N min,Nmax ] is [35,45], the generated text composed of a plurality of token is "I have an lovely cat, which is happy and happy every day and carefree. It waits for me to go home from work every day, me likes to play with kittens, and kittens also like to play with me. "when determining a text passage based on the plurality of token, the first 35 words can be taken first, i.e." I have an lovely cat which is happy and happy every day, and carefree. It waits for me to go home and me every day. The semantic analysis shows that the semantics of the 35 characters are incomplete, so that the value continues to be taken backwards until the semantics of the text are complete or until the number of characters of the text reaches 45. Thus, the text passage determined based on the plurality of token is "I have an lovely cat, which is happy and happy every day, and carefree. It waits for me to go home from work every day, i like to play with kittens, ".
It should be understood that the embodiment herein is merely illustrative of how text segments may be determined based on the semantic and word number interval [ N min,Nmax ], and in the embodiment of the present application, the plurality of token generated by the cloud server 200 may be different from the foregoing embodiment, and the word number interval [ N min,Nmax ] may be a different value, which is not limited herein.
In some embodiments, after determining a text segment, the cloud server 200 may store the text segment and delete all token (or text) included in the text segment from the generated token table. For example, if the cloud server 200 determines the token1 and token2 in the above table 2 as one text segment, the text segment stored by the cloud server 200 may be the content shown in table 7.
TABLE 7
Text segments Content of text segments Whether or not to send
1 token1、token2 Not transmitted
As shown in table 7, cloud server 200 may store a determined text segment, such as text segment 1, and the content of text segment 1 may include token1 and token2. Optionally, the cloud server 200 may also store a transmission identifier of the text segment, for determining whether the text segment has been transmitted. The transmission of text segment 1 shown in table 7 is identified as "not transmitted," indicating that the text segment 1 has not been transmitted.
In the above case, the generated plurality of token stored by the cloud server 200 may be the contents shown in table 8 below.
TABLE 8
Order of generation token
3 token3
As shown in table 8, the plurality of token generated by the cloud server 200 may include token3, and the generation order of the token3 is 3.
It should be understood that the embodiments shown in table 7 and table 8 are only examples, and in the embodiments of the present application, the text segments stored in the cloud server 200 may include more, fewer or different text segments than those shown in table 7, and the generated plurality of token stored in the cloud server 200 may be token different from those shown in table 8, which is not limited herein.
It should be noted that, step S508 to step S515 described below are steps that can be repeatedly performed, and when the cloud server 200 detects that the number of words of the token that is generated and not transmitted is greater than N max, the cloud server 200 can execute step S508 and the subsequent steps again.
And S509, the cloud server 200 performs risk detection on the newly determined text segment to obtain risk information of the text segment.
The risk information may be used to indicate whether the text segment is at risk. In some embodiments, the risk information may include risky and risky.
S510, the cloud server 200 determines whether a text segment of the target text has been transmitted to the electronic device 100.
In the case where it is determined that the text segment of the target text has not been transmitted to the electronic device 100, the cloud server 200 may perform step S511 described below.
In the case where it is determined that the text segment of the target text has been transmitted to the electronic device 100, the cloud server 200 may perform step S512 described below.
S511, the cloud server 200 sends t display, the newly determined text segment, and the risk information of the text segment to the electronic device 100 through a long link.
T display may be used to indicate the number of words displayed per unit time by the electronic device 100 in displaying the target text.
In some embodiments, cloud server 200 may also determine whether to send risk information to electronic device 100 based on whether the text segment is at risk. If the risk information of the text segment is risk-free, the risk information is not sent, and if the risk information of the text segment is risk-free, the risk information of the text segment is sent to the electronic equipment 100. Alternatively, in the case where the text segment is at risk, the cloud server 200 may send only risk information and not send the text segment.
After step S511, the electronic apparatus 100 may perform step S513 described below.
S512, the cloud server 200 sends the newly determined text segment and the risk information of the text segment to the electronic device 100 through a long link.
After step S512, the electronic apparatus 100 may perform step S513 described below.
S513, the electronic device 100 determines whether the target text is at risk based on the risk information.
The electronic device 100 may determine that the target text is at risk when the risk information is at risk, and the electronic device 100 determines that the target text is temporarily at risk when the risk information is at risk.
If it is determined that the target text is risk-free, the electronic device 100 may perform step S514 described below.
If it is determined that the target text is at risk, the electronic device 100 may perform step S515 described below.
S514, the electronic device 100 displays the newly received text segment based on t display after displaying all the previously received text segments based on t display.
When the electronic device 100 displays the target text, one more text is displayed at each interval t display, and the specific display manner may refer to the related description in step S308 shown in fig. 3.
S515, stopping displaying the target text, and withdrawing the displayed text in the target text.
S516, when the condition of disconnection is detected to be met, the long link is disconnected.
For details of step S516, reference may be made to the related content of step S311 shown in fig. 3, which is not described herein.
By adopting the text display method provided by the embodiment of the application, the waiting time of a user can be reduced, the risk control detection can be carried out on the target text generated by the cloud server 200, the smooth display of the electronic equipment 100 in the process of displaying the target text can be ensured, the occurrence of a clamping phenomenon is avoided, and the use experience of the user is improved.
In some application scenarios, if the user of the electronic device 100 selects to output the target text sent by the cloud server 200 through voice, the electronic device 100 may also receive and respond to the operation of the user, and output the received token (or text segment) at a constant speed based on t display, that is, play a word every interval t display. In this application scenario, the flow of the method for outputting text by voice may refer to the related steps in the embodiment shown in fig. 3 or fig. 5, which are not described herein again.
An interface schematic diagram of a set of text display methods according to an embodiment of the present application is described below.
In the case where the electronic device 100 displays the interface 1 of the application 11, the electronic device 100 may receive and respond to user input (e.g., voice input, text input, etc.), display the content of the user input in the interface 1, and determine the message 1 based on the user input, and transmit the message 1 to the cloud server 200. After receiving the time t display for displaying a single text sent by the cloud server 200 and the token of the target text (or the text segment of the target text), the electronic device 100 may display more than one text in the target text at intervals t display in the interface 1 until all the texts in the target text are displayed. In this way, the target text can be displayed in the process of generating the target text by the cloud server 200, so that the first waiting time of the user is reduced, and the time consumption of displaying a single text of the electronic device 100 can be controlled by t display, so that the occurrence of a clamping in the display process is avoided.
For example, as shown in FIG. 6A, the electronic device 100 is displayed with an application interface 600, and the application interface 600 may include a text input control 601 and a speech input control 602. Wherein text input control 601 is operable to trigger electronic device 100 to display user-entered text in application interface 600 based on user text input. The voice input control 602 may be used to trigger the electronic device 100 to display text content corresponding to a user's voice input in the application interface 600 based on the user's voice input.
The electronic device 100 may receive and respond to user input to a text input control 601, as shown in fig. 6B, by displaying a dialog box 603 in the application interface 600, where text entered by the user, such as "write an article around 100 words," may be displayed in the dialog box 603. After displaying the dialog box 603, the electronic device 100 may also display a wait identifier 604 below the dialog box 603, where the wait identifier 604 may be used to prompt the user to wait for text corresponding to the dialog box 603 to be generated.
After the electronic device 100 receives the time t display for displaying the single word and the token of the target text (or the text segment of the target text) sent by the cloud server 200, as shown in fig. 6C, the electronic device 100 may display a dialog box 605 in the application interface 600 and display the first word in the received token (or text segment), e.g., "at" in the dialog box 605.
After time t display has elapsed, as shown in fig. 6D, electronic device 100 may display the first word "i me" after "in dialog 605. Also, the display position of "me" is a display position subsequent to the display position of the word "in".
After time t display has elapsed, as shown in FIG. 6E, electronic device 100 may display the first word "people" after "I" in dialog 605. The display position of "people" is a display position subsequent to the display position of the word "me".
Thereafter, the electronic device 100 may repeat the above process, and display the next text at intervals t display until all the text in the target text is displayed. Illustratively, after all of the words of the target text are displayed, the electronic device 100 may display an application interface 600 as shown in FIG. 6F.
As shown in fig. 6F, the electronic device 100 displays all words of the target text in the dialog box 605, for example, "in our daily lives, we always face a variety of challenges. Sometimes, these challenges may come from our work, sometimes from our personal relationships, sometimes from our health conditions. ". The target text is text generated by the cloud server 200 based on the content input by the user.
It should be understood that the embodiments shown in fig. 6A to 6F are only examples, and in the embodiments of the present application, the application 11 may be a mini-application, or may be other applications that may use the language model service provided by the cloud server 200, which is not limited herein.
The following describes a schematic functional block diagram of a text display system 10 according to an embodiment of the present application.
As shown in fig. 7A, the text display system 10 may include an electronic device 100 and a cloud server 200. Among other things, the electronic device 100 may include an application 11 and a sound tool box (voicekit) 12, the cloud server 200 may include an autonomous system (Autonomous System, AS) module 21, a dialog manager (dialogue manager, DM) 22, a natural language understanding (natural language understanding, NLU) module 23, a speech recognition (auto speech recognition, ASR) module 24, and in some embodiments, the cloud server 200 may also include a wind control module 25.
The application 11 may include an interface module 11a and a dialog recording module 11b. The interface module 11a may display an interface and dynamic effects (e.g., dynamic effects of text display, etc.) in the interface, and may also display user inputs. The interface module 11a may also be used for user interaction, such as receiving and determining text entered by a user in response to the user's text input. After determining the text entered by the user, the interface module 11a may send the text entered by the user to the sound kit 12. The dialogue recording module 11b may detect a user's voice input, and when detecting the user's voice input, the dialogue recording module 11b may send an enabling instruction to the sound tool box 12. The interface module 11a in the application 11 may also receive and display the target text based on t display in response to data sent by the sound kit 12 (e.g., time spent displaying individual words in the target text t display, token of the target text, text segment of the target text, risk information for the text segment, etc.).
The sound kit 12 may include a connector 12a, an acquisition module 12b, a processing module 12c, an understanding module 12d, and an execution module 12e. In some embodiments, the voice toolbox 12 may collect voice input of a user through the collection module 12b, process the voice input through the processing module 12c, perform semantic analysis on the processed voice input through the understanding module 12d, determine an instruction corresponding to the voice input, and send an execution instruction to the execution module 12e. The execution module 12e may receive and respond to the execution instruction sent by the understanding module 12d by executing the operation indicated by the execution instruction. In some embodiments, when the understanding module 12d determines that the voice input requires use of a language model in the cloud server 200 based on semantic analysis, execution instruction 1 may be sent to the execution module 12e, where the execution instruction 1 may be used to instruct the execution module 12e to determine, based on the processed voice input, that message 1 is of the type voice message. In some embodiments, the sound kit 12 may receive and respond to text input sent by the interface module 11a by determining message 1 based on the text input, the type of message 1 being a text message. After determining message 1, the sound kit 12 may establish a long link (e.g., websocket) with the AS module 21 through the connector 12a, sending message 1 to the AS module 21. The connector 12a may also receive data sent by the dialog manager 22 through the AS module 21 (e.g., time spent displaying a single word in the target text t display, token of the target text, text segment of the target text, risk information for the text segment, etc.) and send the received data to the interface module 11a in the application 11.
The AS module 21, upon receiving the message 1, may establish a long link (e.g., websocket) with the dialog manager 22 or ASR module 24 based on the type of message 1. For example, if the type of message 1 is a text message or an event message, the AS module 21 may establish a long link with the dialog manager 22 and send the message 1 to the dialog manager 22 through the long link. If the type of message 1 is a voice message, the AS module 21 can establish a long link with the ASR module 24 and send the message 1 to the ASR module 24. The AS module 21 may also receive text content based on the message 1 conversion sent by the ASR module 24 and send the text content to the dialog manager 22.
The session manager 22 may include a session management service module 22a and a streaming proxy service module 22b. After receiving message 1 (or text content of the voice message conversion), dialog manager 22 may send message 1 to NLU module 23 via streaming proxy service module 22b. The streaming proxy service module 22b may also receive data sent by the NLU module 23 (e.g., time spent displaying individual words in the target text t display, token of the target text, text segment of the target text, risk information for the text segment, etc.) and send the target text to the sound kit 12 via the AS module 21. In some embodiments, dialog management service module 22a may receive the text segment sent by NLU module 23 and send the text segment to wind control module 25. The dialogue management service module 22a may also receive the risk information sent by the wind control module 25 and send the risk information of the text segment to the streaming proxy service module 22b.
After receiving the voice type message 1, the ASR module 24 may perform voice recognition on the voice message, determine text content corresponding to the voice message, and send the text content of the voice message to the AS module 21.
NLU module 23 may include one or more service modules, such as large model service module 23a, and optionally, an intent service module. Wherein the large model service module 23a may store one or more language models, such as LLM, etc. The large model service module 23a may receive the message 1 (or text content of the voice message conversion) sent by the streaming proxy service module 22b. After receiving the above-mentioned message 1 (or the content of the message 1), the large model service module 23a may take the received message 1 as an input of LLM and generate a target text corresponding to the message 1 through LLM. In the process of generating the target text by the large model service module 23a, the NLU module 23 may also detect the time t token spent generating the single token of the target text, and determine the time t display spent displaying the single word in the target text based on the above formula (9). The NLU module 23 may also send data (e.g., time spent displaying individual words in the target text t display, token of the target text, text segment of the target text, etc.) to the streaming proxy service module 22b. In some embodiments, NLU module 23 may also store the word count interval [ N min,Nmax ] of the text segment and determine the text segment based on the word count interval [ N min,Nmax ] and the generated token. After determining the text segment, NLU module 23 may send the text segment to dialog manager 22.
The wind control module 25 may receive and respond to the text segment sent by the dialog management service module 22a, perform risk control detection on the text segment, obtain risk information of the text segment, and send the risk information of the text segment to the dialog management service module 22a.
It should be understood that the embodiment shown in fig. 7A is merely an example, and in embodiments of the present application, the text display system 10 may include more, fewer, or different modules than the embodiment shown in fig. 7A, or may be a combination of the above modules into one module, or may be a decomposition of any of the above modules into a plurality of modules, which is not limited herein.
Fig. 7B illustrates two web socket (websocket) links provided by embodiments of the present application.
As shown in fig. 7B, in the process of executing the text display method provided by the embodiment of the present application by the electronic device 100 and the cloud server 200, a web socket (websocket) link between each module in the electronic device 100 and the cloud server 200 may be the link 1 or the link 2. The link 1 may include websocket1 between the connector 12a and the AS module 21, websocket2 between the AS module 21 and the session manager 22, and websocket3 between the session manager 22 and the NLU module 23, among others. The link 2 may include websocket1 between the connector 12a and the AS module 21, websocket4 between the AS module 21 and the speech recognition module 24, websocket2 between the AS module 21 and the dialog manager 22, and websocket3 between the dialog manager 22 and the NLU module 23.
When message 1 sent by electronic device 100 is a text message or an event message, the relevant modules in electronic device 100 and cloud server 200 may establish link 1 and communicate with NLU module 23 via link 1.
When the message 1 sent by the electronic device 100 is a voice message, the electronic device 100 may establish a websocket1 with the AS module 21 through the connector 12a, and when the AS module 21 recognizes that the message 1 is a voice message, the AS module may establish a websocket4 with the voice recognition module 24. The voice recognition module 24 may convert the voice message into a text message and send the text message to the AS module 21. The AS module 21 may also establish a websocket2 with the dialog manager 22, and the dialog manager 22 may establish a websocket3 with the NLU module 23 and send the text message converted by the speech recognition module 24 to the NLU module 23 through the websocket2 and the websocket 3. Then, NLU module 23 may send data such as the target text of message 1 to electronic device 100 over link 1.
Since the long link between the electronic device 100 and the cloud server 200 may include a plurality of websockets, in the above-described step S311 and step S516, when the disconnection conditions are satisfied differently, the websockets disconnected first may be different. For example, table 9 shows a correspondence relationship between a disconnection condition and a websocket disconnection sequence provided in the embodiment of the present application.
TABLE 9
As shown in table 9, when different disconnection conditions are satisfied, the websocket disconnected first may be different. For example, websocket1 is first disconnected when the disconnection condition is that the electronic device 100 receives and responds to an operation of the user to exit the application 11 (or an operation of the user to release the engine, etc.), websocket1 is first disconnected when the disconnection condition is that the network is wrong, websocket4 is first disconnected when the disconnection condition is that the voice recognition module is timeout (for example, 3 seconds), websocket1 is first disconnected when the disconnection condition is that the AS module is timeout (for example, 60 seconds), websocket3 is first disconnected when the disconnection condition is that the LLM outputs the target text, etc.
It should be understood that the embodiment shown in table 9 above is merely illustrative of the difference in websocket that is first disconnected under different disconnection conditions, and in the embodiment of the present application, more, fewer, or different disconnection conditions than those of the embodiment above may be included, which is not limited herein.
Since the cost of the speech recognition module 24 is greater than the cost of the dialog manager 22, the cost of maintaining websocket3 is greater than the cost of maintaining websocket2, and thus the connections between different nodes in the links 1 and 2 (e.g., websockets) may correspond to different multiplexing time thresholds, and the links may be multiplexed within the time threshold after the current target text is transmitted. The cloud server 200 may store link configurations of different links, and correspondence between constituent parts of links and multiplexing time thresholds.
For example, table 10 shows a correspondence relationship between a message type, a link configuration, and a multiplexing time threshold of a link configuration part provided in the embodiment of the present application.
Table 10
As shown in table 10, the cloud server 200 may store a correspondence relationship among a message type, a link composition, and a multiplexing time threshold of a link composition portion. For example, when the message type is a text message or an event message, the links corresponding to the long link may include websocket1, websocket2, and websocket3, and the multiplexing time threshold of websocket2 may be 60 seconds, that is, if the duration of websocket2 that is not multiplexed exceeds 60 seconds, websocket2 may be disconnected, and when the message type is a voice message, the links corresponding to the long link may include websocket1, websocket4, websocket2, and websocket3, and the multiplexing time threshold of websocket2 may be 60 seconds, that is, if the duration of websocket2 that is not multiplexed exceeds 60 seconds, websocket4 may be disconnected.
It should be understood that the embodiment shown in the above table 10 is only an example, and in the embodiment of the present application, the link multiplexing time threshold corresponding to different links may also be different from the embodiment shown in the above table 10, and the present application is not limited herein.
In the embodiment of the present application, if the electronic device 100 generates the message 2 based on the input of the user after the target text of the message 1 is sent, the electronic device 100 may send the message 2 to the cloud server 200 through a long link with the cloud server 200. The type of message 1 and the type of message 2 do not affect the multiplexing of long links and may affect the change of part of the websocket in the link. Exemplary, table 11 shows a correspondence between link multiplexing and message types provided by the embodiment of the present application.
TABLE 11
As shown in table 11, the type of the message 1 and the type of the message 2 have correspondence with the front-to-back variation of the link, in which when the message 1 is a text message or an event message, the link transmitting the target text of the message 1 and the message 1 includes websocket1, websocket2 and websocket3, and when the message 2 is a text message or an event message, the link transmitting the target text of the message 2 and the message 2 is the same as the link transmitting the target text of the message 1 and the message 1, and when the message 2 is a voice message, the link transmitting the target text of the message 2 and the message 2 is required to be newly connected as compared with the link transmitting the target text of the message 1 and the message 1, and when the message 2 is a voice message, the link transmitting the target text of the message 2 and the message 2 is required to be disconnected from the link transmitting the text of the message 1 and the message 1, and the message 1 is required to be disconnected from the link transmitting the text of the message 1 and the message 1 when the message 2 is a voice message 1 is a voice message, and the link transmitting the target text of the message 1 is a voice message 4 is required to be the text message 1, and the text of the message 2 is required to be the text of the text is a voice message 2 is required to be disconnected.
It will be appreciated that the embodiment shown in table 11 above is merely illustrative of the type of message that does not affect multiplexing of long links. Moreover, in the embodiment shown in table 11, by default, all established websockets are in an unbroken state (i.e. in a time period corresponding to the multiplexing time threshold), and in the embodiment of the present application, the case that different websockets correspond to different multiplexing time thresholds may also be considered in combination with the embodiment shown in table 10, which is not limited herein.
The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the invention, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.
Fig. 8is a software configuration block diagram of the electronic device 100 according to the embodiment of the present invention.
The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun rows (Android runtime) and system libraries, and a kernel layer, respectively.
The application layer may include a series of application packages and the sound tool kit (voicekit) 12 shown in fig. 7, described above.
As shown in fig. 8, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc. The application package may also include the application 11 shown in fig. 7 described above.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions.
As shown in fig. 8, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
Android run time includes a core library and virtual machines. Android runtime is responsible for scheduling and management of the android system.
The core library comprises two parts, wherein one part is a function required to be called by java language, and the other part is an android core library.
The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. Such as surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (e.g., openGL ES), 2D graphics engine (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
Fig. 9 shows a flow chart of a text display method according to an embodiment of the present application.
As shown in fig. 9, a specific flow of a text display method provided by an embodiment of the present application may include the following steps:
S901, the electronic device 100 sends a first message to the cloud server 200.
The first message may be message 1 in the above-described embodiment.
S902, the cloud server 200 generates a first target text corresponding to the first message through the language model.
The first target text may be the target text corresponding to the message 1 in the above embodiment.
S903, after generating the first text segment in the first target text, the cloud server 200 sends a first display speed and the first text segment to the electronic device, where the first display speed is used to indicate the number of characters in the first target text displayed in a unit time.
The first text segment may be text segment 1 in the above-described embodiment. The first text passage may also be one or more token (S) that are first transmitted to the electronic device 100 in step S307 shown in fig. 3.
The first display speed may refer to the number of characters displayed in a unit time, and in the embodiment of the present application, the number of characters may also be referred to as the number of characters (in the embodiment of the present application, the characters may include punctuation). The first display speed may also be the time consumption t display for displaying the single text in the above embodiment, and the number of characters displayed in a unit time may be determined based on t display.
The determination of t display may refer to the related description in the above embodiment, and will not be repeated here.
S904, the electronic apparatus 100 displays the first text segment at the first display speed.
For details of step S904, reference may be made to step S308 shown in fig. 3 or step S514 shown in fig. 5.
At S905, the cloud server 200 generates a second text segment in the first target text before the electronic device 100 finishes displaying the first text segment at the first display speed, the second text segment being subsequent to the first text segment in the first target text.
The first text segment and the second text segment are two adjacent text segments in the first target text. In the embodiment of the present application, the second text segment may be the text segment 2 in the above embodiment.
S906, after generating the second text segment of the first target text, the cloud server 200 transmits the second text segment to the electronic device 100.
In S907, the electronic device 100 receives the second text segment before the first text segment is displayed at the first display speed, or when the first text segment is displayed at the first display speed.
S908, the electronic device 100 displays the second text segment at the first display speed after displaying the first text segment at the first display speed.
For the specific content of step S908, reference may be made to the related content in step S514 shown in fig. 5 or step S310 shown in fig. 3.
By adopting the text display method provided by the embodiment of the application, the generated text can be displayed in the process of generating the target text, the waiting time of a user is reduced, the occurrence of clamping in the display process can be avoided, and the smooth display of the text is ensured.
In one possible implementation, the first text segment comprises at least one token, the second text segment comprises one token, the method further comprises the step that in the process of generating the first target text, the cloud server 200 obtains a first generation speed, the first generation speed is used for indicating the number of tokens generated in unit time, the cloud server 200 determines a first display speed based on the first generation speed and a first constraint relation, the first constraint relation is that the first display speed is smaller than or equal to the product of the first generation speed and a first scale, and the first scale is the ratio of the number of characters corresponding to the language model to the number of tokens.
Therefore, the number of the tokens transmitted for the first time is larger than or equal to the number of tokens transmitted for each subsequent time, the time spent on displaying all the characters transmitted for the first time is long, and smooth display of the characters can be ensured.
In one possible implementation, the method further includes the cloud server 200 obtaining a first generation speed during generation of the first target text, the first generation speed being used for indicating a number of tokens generated in a unit time, the cloud server 200 determining a first display speed based on the first generation speed and a first constraint relation, the first constraint relation being that the first display speed is equal to a product of the first generation speed and a first constant, the first constant is smaller than a first proportion, the first proportion being a proportion of a number of characters corresponding to the language model to the number of tokens, the cloud server 200 setting a word number interval for indicating a word number range of a single text segment of the first target text, the cloud server 200 determining the first text segment based on the word number interval during generation of the first target text, the cloud server 200 determining that the first text segment is risk-free prior to transmission of the first text segment, the cloud server 200 determining that the word number of the second text segment is a word number interval based on the word number interval during generation of the first target text, and the cloud server 200 determining that the word number of the second text segment is risk-free prior to transmission of the second text segment.
Thus, by setting the word count floating section of the text segment and the display speed of the text, smooth display between two adjacent text segments can be ensured.
In one possible implementation, the method further comprises the cloud server 200 determining a third text segment based on the word count interval in the process of generating the first target text, the third text segment being after the second text segment in the first target text, the cloud server 200 determining that the third text segment is at risk, the cloud server 200 sending the third text segment and first risk information to the electronic device, the first risk information being used for indicating that the first target text is at risk, and the electronic device 100 withdrawing all content already displayed in the first target text after receiving the first risk information.
In this way, the electronic device may withdraw the displayed content in case of detecting that the target text is at risk.
In one possible implementation, the method further comprises the steps that the cloud server 200 sends second risk information to the electronic device when sending the first text segment, wherein the second risk information is used for indicating that the first target text is temporarily risk-free, and the cloud server 200 sends the second risk information to the electronic device when sending the second text segment.
In this way, the cloud server can also send risk information to the electronic device under the condition that the target text is risk-free, wherein the risk information is used for informing the electronic device that the current text segment is risk-free.
In one possible implementation, the method further includes the electronic device 100 establishing a long link with the cloud server before sending the first message to the cloud server.
Thus, the first message and the first target text and other data can be transmitted through the long link.
In one possible implementation, the method further includes the electronic device 100 receiving a first input from a user before establishing the long link with the cloud server, the electronic device 100 determining the first message based on the first input.
The types of the first input can include voice input, text input and event input. The type of the first message may include a voice message, a text message, or an event message.
The method comprises the steps of determining a first message based on a first input, and particularly comprises the steps of determining the type of the first message based on the type of the first input and determining the content of the first message based on the content of the first input. The content of the first message may be the same as the content of the first input.
In one possible implementation, the method further includes the electronic device 100 disconnecting the long link with the cloud server when the first disconnection condition is detected to be met, the first disconnection condition including any one or more of a network error, receiving a first operation of a user, the first operation being used to trigger the electronic device to stop using the language model service.
In this way, the electronic device may disconnect the long link with the cloud server if it is detected that the first disconnection condition is satisfied.
In one possible implementation, the method further comprises the step that the cloud server 200 breaks the long link with the electronic device when the cloud server 200 detects that the second breaking condition is met, wherein the second breaking condition comprises any one or more of network errors, the first target text is sent completely, and a message sent by the electronic device is not received within a first duration after the first target text is sent completely.
In this way, the cloud server may disconnect the long link with the electronic device if the second disconnection condition is detected to be satisfied.
In one possible implementation, the electronic device 100 receives a second input of the user after displaying the first target text at the first display speed, the electronic device 100 determines a second message based on the second input, the electronic device 100 sends the second message to the cloud server, the cloud server 200 generates a second target text corresponding to the second message through the language model, the cloud server 200 sends the second display speed and the fourth text segment to the electronic device after generating the fourth text segment in the second target text, the second display speed is used for indicating the number of characters in the second target text displayed in a unit time, the electronic device 100 displays the fourth text segment at the second display speed, the cloud server 200 generates a fifth text segment in the second target text segment before the electronic device displays the fourth text segment at the second display speed, the fifth text segment is behind the fourth text segment in the second target text segment, the cloud server 200 sends the fifth text segment to the electronic device after generating the fifth text segment in the second target text segment, the electronic device 100 displays the fourth text segment before the fourth text segment is displayed at the second display speed or the fifth text segment is displayed at the fourth display speed after the fourth text segment is displayed at the fourth display speed.
In this way, the second message and the second target text may also be transmitted over the long link.
In one possible implementation, the electronic device 100 sends the second message to the cloud server, which specifically includes that the electronic device 100 sends the second message to the cloud server when a time interval between a time when the first target text is displayed and a time when the second input is received is less than a second time period.
The multiplexing of long links may correspond to a temporal threshold. The second message may multiplex long links used for transmitting the first message within a preset time threshold. Therefore, if the long chain is not used for a long time, the long chain can be automatically disconnected, and the energy consumption is reduced.
The embodiments of the present application may be arbitrarily combined to achieve different technical effects.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Drive (SSD)), etc.
Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. The storage medium includes a ROM or a random access memory RAM, a magnetic disk or an optical disk, and other various media capable of storing program codes.
In summary, the foregoing description is only exemplary embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made according to the disclosure of the present invention should be included in the protection scope of the present invention.

Claims (32)

1. A character display system is characterized by comprising an electronic device and a cloud server, wherein,
The electronic equipment is used for sending a first message to the cloud server;
the cloud server is used for generating a first target text corresponding to the first message through a language model;
The cloud server is further used for sending a first display speed and the first text segment to the electronic equipment after generating the first text segment in the first target text, wherein the first display speed is used for indicating the number of characters in the first target text displayed in unit time;
the electronic equipment is further used for displaying the first text segment at the first display speed;
The cloud server is further configured to generate a second text segment in the first target text before the electronic device finishes displaying the first text segment at the first display speed, where the second text segment is located after the first text segment in the first target text;
The cloud server is further configured to send a second text segment of the first target text to the electronic device after generating the second text segment;
The electronic device is further configured to receive the second text segment before the first text segment is displayed at the first display speed or when the first text segment is displayed at the first display speed;
The electronic device is further configured to display the second text segment at the first display speed after the first text segment is displayed at the first display speed.
2. The system of claim 1, wherein the first text segment comprises at least one token, and the second text segment comprises a token;
The cloud server is further configured to obtain a first generation speed in a process of generating the first target text, where the first generation speed is used to indicate a number of tokens generated in a unit time;
The cloud server is further configured to determine the first display speed based on the first generation speed and a first constraint relationship, where the first constraint relationship is that the first display speed is less than or equal to a product of the first generation speed and a first scale, and the first scale is a ratio of the number of characters corresponding to the language model to the number of token.
3. The system of claim 1, wherein the cloud server is further configured to obtain a first generation speed in generating the first target text, the first generation speed being used to indicate a number of tokens generated per unit time;
The cloud server is further configured to determine the first display speed based on the first generation speed and a first constraint relationship, where the first constraint relationship is that the first display speed is equal to a product of the first generation speed and a first constant, the first constant is smaller than a first proportion, and the first proportion is a proportion of a number of characters corresponding to the language model to a number of token;
The cloud server is further configured to set a word count interval, where the word count interval is used to indicate a word count range of a single text segment of the first target text;
the cloud server is further configured to determine, in the process of generating the first target text, the first text segment based on the word count interval;
The cloud server is further configured to determine, before sending the first text segment, that the first text segment is risk-free;
The cloud server is further configured to determine, in the process of generating the first target text, the second text segment based on the word count interval, where the word count of the second text segment belongs to the word count interval;
The cloud server is further configured to determine that the second text segment is risk-free before sending the second text segment.
4. The system of claim 3, wherein the cloud server is further configured to determine the third text segment based on the word count interval in generating the first target text, the third text segment being subsequent to the second text segment in the first target text;
the cloud server is further configured to determine that the third text segment is at risk;
The cloud server is further configured to send the third text segment and first risk information to the electronic device, where the first risk information is used to indicate that the first target text is at risk;
the electronic device is further configured to withdraw all content that has been displayed in the first target text after receiving the first risk information.
5. The system of claim 3 or 4, wherein the cloud server is further configured to send second risk information to the electronic device when sending the first text segment, the second risk information being configured to indicate that the first target text is temporarily risk-free;
And the cloud server is further configured to send the second risk information to the electronic device when sending the second text segment.
6. The system of any of claims 1-5, wherein the electronic device is further configured to establish a long link with the cloud server before sending the first message to the cloud server.
7. The system of claim 6, wherein the electronic device is further configured to receive a first input from a user prior to establishing the long link with the cloud server;
The electronic device is further configured to determine the first message based on the first input.
8. The system of claim 6 or 7, wherein the electronic device is further configured to disconnect the long link with the cloud server when a first disconnect condition is detected to be satisfied, the first disconnect condition including any one or more of a network error, receipt of a first operation by a user, the first operation to trigger the electronic device to cease using a language model service.
9. The system of any of claims 6-8, wherein the cloud server is further configured to disconnect the long link with the electronic device when a second disconnect condition is detected to be satisfied, the second disconnect condition including any one or more of a network error, the first target text being sent out, and a message sent by the electronic device not being received within a first time period after the sending out.
10. The system of any of claims 1-9, wherein the electronic device is further configured to receive a second input from a user after the first target text has been displayed at the first display speed;
the electronic device is further configured to determine a second message based on the second input;
the electronic device is further configured to send the second message to the cloud server;
the cloud server is further used for generating a second target text corresponding to the second message through a language model;
The cloud server is further configured to send, after generating a fourth text segment in the second target text, the second display speed and the fourth text segment to the electronic device, where the second display speed is used to indicate a number of characters in the second target text displayed in a unit time;
The electronic device is further configured to display the fourth text segment at the second display speed;
The cloud server is further configured to generate a fifth text segment in the second target text before the electronic device finishes displaying the fourth text segment at the second display speed, where the fifth text segment is located after the fourth text segment in the second target text;
the cloud server is further configured to send a fifth text segment of the second target text to the electronic device after generating the fifth text segment;
the electronic device is further configured to receive the fifth text segment before the fourth text segment is displayed at the second display speed or when the fourth text segment is displayed at the second display speed;
The electronic device is further configured to display the fifth text segment at the second display speed after the fourth text segment is displayed at the second display speed.
11. The system of claim 10, wherein the electronic device is further configured to send the second message to the cloud server, and specifically comprises:
the electronic device is further configured to send the second message to the cloud server when a time interval between a time when the first target text is displayed and a time when the second input is received is less than a second duration.
12. A text display method, characterized by being applied to a cloud server, the method comprising;
receiving a first message sent by electronic equipment;
generating a first target text corresponding to the first message through a language model;
After a first text segment in the first target text is generated, a first display speed and the first text segment are sent to the electronic equipment, wherein the first display speed is used for indicating the number of characters displayed in unit time;
Generating a second text segment in the first target text, wherein the second text segment is positioned behind the first text segment in the first target text;
And before the electronic equipment finishes displaying the first text segment at the first display speed, sending the second text segment to the electronic equipment.
13. The method of claim 12, wherein the first text segment comprises one or more token tokens and the second text segment comprises a token;
The method further comprises the steps of:
In the process of generating the first target text, acquiring a first generation speed, wherein the first generation speed is used for indicating the number of tokens generated in unit time;
The first display speed is determined based on the first generation speed and a first constraint relation, wherein the first constraint relation is that the first display speed is smaller than or equal to the product of the first generation speed and a first scale, and the first scale is the ratio of the number of characters corresponding to the language model to the number of token.
14. The method according to claim 12, wherein the method further comprises:
In the process of generating the first target text, acquiring a first generation speed, wherein the first generation speed is used for indicating the number of tokens generated in unit time;
Determining the first display speed based on the first generation speed and a first constraint relation, wherein the first constraint relation is that the first display speed is equal to the product of the first generation speed and a first constant, the first constant is smaller than a first proportion, and the first proportion is the proportion of the number of characters corresponding to the language model to the number of token;
Setting a word number interval, wherein the word number interval is used for indicating the word number range of a single text segment of the first target text;
determining the first text segment based on the word count interval in the process of generating the first target text;
before sending the first text segment, determining that the first text segment is risk-free;
Determining the second text segment based on the word count interval in the process of generating the first target text, wherein the word count of the second text segment belongs to the word count interval;
Before sending the second text segment, determining that the second text segment is risk-free.
15. The method of claim 14, wherein the method further comprises:
Determining the third text segment based on the word count interval in the process of generating the first target text, wherein the third text segment is behind the second text segment in the first target text;
determining that the third text segment is at risk;
and sending the third text segment and first risk information to the electronic equipment, wherein the first risk information is used for indicating that the first target text is at risk.
16. The method according to claim 14 or 15, characterized in that the method further comprises:
When the first text segment is sent, second risk information is sent to the electronic equipment, wherein the second risk information is used for indicating that the first target text is temporarily risk-free;
And when the second text segment is sent, sending the second risk information to the electronic equipment.
17. The method according to any one of claims 12-16, further comprising:
And before receiving the first message sent by the electronic equipment, establishing a long link with the electronic equipment.
18. The method of claim 17, wherein the method further comprises:
and when the condition that the second disconnection condition is met is detected, disconnecting the long link with the electronic equipment, wherein the second disconnection condition comprises any one or more of network errors, the transmission of the first target text is finished, and a message transmitted by the electronic equipment is not received within a first time period after the transmission is finished.
19. The method according to any one of claims 12-18, further comprising:
After the first target text is sent, receiving a second message sent by the electronic equipment;
Generating a second target text corresponding to the second message through a language model;
After generating a fourth text segment in the second target text, sending the second display speed and the fourth text segment to the electronic device;
generating a fifth text segment in the second target text before the fourth text segment is displayed by the electronic device at the second display speed, wherein the fifth text segment is positioned after the fourth text segment in the second target text;
And after generating a fifth text segment of the second target text, sending the fifth text segment to the electronic equipment.
20. A text display method, characterized in that it is applied to an electronic device, the method comprising:
sending a first message to a cloud server;
The method comprises the steps of receiving a first display speed and a first text segment in a first target text sent by a cloud server, wherein the first display speed is used for indicating the number of characters in the first target text displayed in unit time;
displaying the first text segment at the first display speed;
receiving a second text segment in the first target text before the electronic device finishes displaying the first text segment at the first display speed, wherein the second text segment is behind the first text segment in the first target text;
after the first text segment is displayed at the first display speed, the second text segment is displayed at the first display speed.
21. The method of claim 20, wherein the first text segment comprises one or more token tokens and the second text segment comprises a token.
22. The method of claim 20, wherein the method further comprises:
Receiving the third text segment and first risk information sent by the cloud server, wherein the first risk information is used for indicating that the first target text is at risk;
after receiving the first risk information, all content that has been displayed in the first target text is withdrawn.
23. The method of claim 22, wherein the receiving the first display speed and the first text segment in the first target text sent by the cloud server specifically includes:
Receiving a first display speed, a first text segment in a first target text and second risk information sent by the cloud server, wherein the second risk information is used for indicating that the first target text is temporarily risk-free;
the receiving the second text segment in the first target text specifically includes:
and receiving a second text segment in the first target text and the second risk information.
24. The method according to any one of claims 20-23, further comprising:
And before the first message is sent to the cloud server, establishing a long link with the cloud server.
25. The method of claim 24, wherein the method further comprises:
before the long link is established with the cloud server, receiving a first input of a user;
The first message is determined based on the first input in response to the first input.
26. The method according to claim 24 or 25, characterized in that the method further comprises:
And when the first disconnection condition is detected to be met, disconnecting the long link with the cloud server, wherein the first disconnection condition comprises any one or more of network errors and first operation of a user, and the first operation is used for triggering the electronic equipment to stop using the language model service.
27. The method according to any one of claims 20-26, further comprising:
After the first target text is displayed at the first display speed, receiving a second input of a user;
Responsive to the second input, determining a second message based on the second input;
sending the second message to the cloud server;
The method comprises the steps of receiving a second display speed and a fourth text segment in a second target text sent by a cloud server, wherein the second display speed is used for indicating the number of characters in the second target text displayed in unit time;
Displaying the fourth text segment at the second display speed;
Receiving a fifth text segment in the second target text before the fourth text segment is displayed by the electronic device at the second display speed, the fifth text segment being subsequent to the fourth text segment in the second target text;
And displaying the fifth text segment at the second display speed after the fourth text segment is displayed at the second display speed.
28. The method according to claim 27, wherein the sending the second message to the cloud server specifically comprises:
and when the time interval between the moment when the first target text is displayed and the moment when the second input is received is smaller than a second duration, sending the second message to the cloud server.
29. A server comprising one or more processors, one or more memories coupled to the one or more processors, the one or more memories to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the cloud server to perform the method of any of the preceding claims 12-19.
30. An electronic device comprising one or more processors, one or more memories coupled to the one or more processors, the one or more memories to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of the preceding claims 20-28.
31. A computer readable storage medium comprising computer instructions which, when run on a cloud server, cause the cloud server to perform the method of any of the preceding claims 12-19.
32. A computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of the preceding claims 20-28.
CN202311190565.6A 2023-09-14 2023-09-14 A text display method, system and related device Pending CN119629162A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202311190565.6A CN119629162A (en) 2023-09-14 2023-09-14 A text display method, system and related device
PCT/CN2024/118584 WO2025055994A1 (en) 2023-09-14 2024-09-12 Character display method and system, and related apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311190565.6A CN119629162A (en) 2023-09-14 2023-09-14 A text display method, system and related device

Publications (1)

Publication Number Publication Date
CN119629162A true CN119629162A (en) 2025-03-14

Family

ID=94908940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311190565.6A Pending CN119629162A (en) 2023-09-14 2023-09-14 A text display method, system and related device

Country Status (2)

Country Link
CN (1) CN119629162A (en)
WO (1) WO2025055994A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6130968A (en) * 1997-10-03 2000-10-10 Mcian; Peter Method of enhancing the readability of rapidly displayed text
KR20140102201A (en) * 2011-12-16 2014-08-21 소니 주식회사 Reception device, method for controlling same, distribution device, distribution method, program, and distribution system
CN109977390B (en) * 2017-12-27 2023-11-03 北京搜狗科技发展有限公司 Method and device for generating text
CN111722730A (en) * 2020-06-23 2020-09-29 平安医疗健康管理股份有限公司 Character input method, device and equipment based on all-in-one machine and readable storage medium

Also Published As

Publication number Publication date
WO2025055994A1 (en) 2025-03-20

Similar Documents

Publication Publication Date Title
CN111724775B (en) Voice interaction method and electronic equipment
CN112399390B (en) Method and related device for Bluetooth back-up connection
US12356287B2 (en) Bluetooth-based object searching method and electronic device
CN113133095B (en) Method for reducing power consumption of mobile terminal and mobile terminal
WO2021204098A1 (en) Voice interaction method and electronic device
WO2022143258A1 (en) Voice interaction processing method and related apparatus
WO2020207326A1 (en) Dialogue message sending method and electronic device
CN112154640B (en) A message playing method and terminal
CN109286725A (en) Translation method and terminal
WO2021042881A1 (en) Message notification method and electronic device
CN112416984A (en) A data processing method and device thereof
CN114268689B (en) Power display method, terminal and storage medium of bluetooth device
CN113380240B (en) Voice interaction method and electronic device
CN110737765A (en) Dialogue data processing method for multi-turn dialogue and related device
CN119629162A (en) A text display method, system and related device
CN116665692A (en) Voice noise reduction method and terminal equipment
CN118072723A (en) Cooperative awakening method, device and electronic device
CN113672404A (en) Display method and electronic terminal equipment
CN115841099B (en) Intelligent recommendation method of page filling words based on data processing
CN116095219B (en) Notification display method and terminal device
CN117880885B (en) Audio playback optimization method and electronic device
CN116737667A (en) Method for importing portable document format file and terminal equipment
CN119271787A (en) Question and answer method, electronic device and storage medium
WO2025001279A1 (en) Method for controlling terminal device by means of voice, and terminal device
CN118673172A (en) Image display method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination