WO2023129182A1 - System, method and computer-readable medium for video processing - Google Patents
System, method and computer-readable medium for video processing Download PDFInfo
- Publication number
- WO2023129182A1 WO2023129182A1 PCT/US2021/073183 US2021073183W WO2023129182A1 WO 2023129182 A1 WO2023129182 A1 WO 2023129182A1 US 2021073183 W US2021073183 W US 2021073183W WO 2023129182 A1 WO2023129182 A1 WO 2023129182A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- message
- user
- live video
- region
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
- H04L12/1827—Network arrangements for conference optimisation or adaptation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/07—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
- H04L51/10—Multimedia information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/4728—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
Definitions
- the present disclosure relates to video processing in a video streaming.
- the applications include live streaming, live conference calls and the like. As these applications increase in popularity, user demands for improved communication efficiency and better understanding of each other’s message during the communication are rising.
- a method is a method for live video processing.
- the method includes receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
- a system is a system for live video processing that includes one or a plurality of processors, and the one or plurality of processors execute a machine-readable instruction to perform: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
- a computer-readable medium is a non-transitory computer-readable medium including a program for live video processing, and the program causes one or a plurality of computers to execute: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
- FIG. 1 shows an example of a live streaming.
- FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
- FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
- FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
- FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
- FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure.
- on-line communication has some disadvantages which may reduce the communication efficiency or increase the chances of misunderstanding.
- a live video or a live streaming communication it is difficult to keep the focus on the correct region, especially when there are some distractions such as comments, special effects on the display wherein the live video is being displayed.
- a live video or a live streaming communication it is difficult to see the details of the video content due to the limited size of the display or the limited resolution of the video.
- FIG. 1 shows an example of a live streaming.
- SI is a screen of a user terminal displaying the live streaming.
- RA is a display region within the screen SI displaying a live video of a user A.
- the live video of user A may be taken and provided by a video capturing device, such as a camera, positioned in the vicinity of user A.
- user A may be a streamer or a broadcastor who is distributing a live video to teach how to cook.
- User A would like viewers of this live video to be able to focus on the right region of the video, and to be able to see the details of the region, in order for the viewers to get the correct knowledge such as cooking steps or cooking materials.
- user A may need to bring up the object of interest (such as a pan or a chopping board) closer to the camera for the users to see clearly.
- user A may need to adjust a direction, a position or a focus of the camera for users to see the details user A wants to emphasize.
- the above actions are inconvenient for user A and interrupt the cooking process.
- FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
- the message Ml is a voice message indicating “zoom in.”
- the message Ml may be a gesture message expressed by user A.
- user A may use a body portion (such as a hand) to form a gesture message.
- the message Ml may be a facial expression message expressed by user A.
- the message Ml is part of the video (including audio data) of user A.
- the message Ml may be received by a user terminal used to capture the video of user A, such as a smartphone, a tablet, a laptop or any device with a video capturing function.
- the message Ml is recognized by a user terminal used to produce or deliver the video of user A.
- the message Ml is recognized by a system that provides the streaming service.
- the message Ml is recognized by a server that supports the streaming service.
- the message Ml is recognized by an application that supports the streaming service.
- the message Ml is recognized by a voice recognition process, a gesture recognition process and/or a facial expression recognition process.
- the message Ml may be an electrical signal, and can be transmitted and received by wireless connections.
- objects 01 are recognized, and a region R1 is determined.
- the objects 01 are recognized according to the message Ml.
- the recognition of the object 01 follows the receiving of the message Ml.
- the receiving of the message Ml triggers the recognition of the object 01.
- a recognition of the message Ml is done before the recognition of the object 01.
- the object 01 is set, taught or determined to be a body part (hands) of user A.
- the object 01 may be determined to be a non-body object such as a chopping board or a pan.
- the object 01 may be determined to be a wearable object on user A such as a watch, a bracelet or a sticker.
- the object 01 may be predetermined or set to be any object in the video of user A.
- the region R1 is determined to be a region in the vicinity of the object 01.
- the region R1 may be determined to be a region enclosing or surrounding all objects 01, thereby user A may control the size of the region R1 conveniently by controlling the positions of objects 01 (in this case, the objects 01 are her hands).
- a distance between an edge of the region R1 and the object 01 may be determined according to the actual practice.
- different messages Ml may correspond to different predetermined objects 01.
- user A may choose the object to be recognized, and the region to be determined, simply by sending out the corresponding message.
- user A may speak “pan,” and then a pan (which is a predetermined object corresponding to the message “pan”) is recognized, and the region R1 would be determined to be a region in the vicinity of the pan.
- an object 01 is recognized by a user terminal used to capture the live video of user A. In some embodiments, an object 01 is recognized by a user terminal used to produce or deliver the video of user A. In some embodiments, an object 01 is recognized by a system that provides the streaming service. In some embodiments, an object 01 is recognized by a server that supports the streaming service. In some embodiments, an object 01 is recognized by an application that supports the streaming service.
- the region R1 is determined by a user terminal used to capture the live video of user A. In some embodiments, the region R1 is determined by a user terminal used to produce or deliver the video of user A. In some embodiments, the region R1 is determined by a system that provides the streaming service. In some embodiments, the region R1 is determined by a server that supports the streaming service. In some embodiments, the region R1 is determined by an application that supports the streaming service.
- the region R1 is enlarged such that details of the video content within the region R1 can be seen clearly.
- the enlarged region R1 may cover or overlap a portion of the video of user A that is outside the region Rl.
- the enlarged region R1 may be displayed on any region of the screen SI.
- the enlarging process is performed by a user terminal used to capture the live video of user A. In some embodiments, the enlarging process is performed by a user terminal used to produce or deliver the video of user A. In some embodiments, the enlarging process is performed by a system that provides the streaming service. In some embodiments, the enlarging process is performed by a server that supports the streaming service. In some embodiments, the enlarging process is performed by an application that supports the streaming service. In some embodiments, the enlarging process is performed by a user terminal displaying the video of user A, such as a user terminal of a viewer.
- the user terminal can be configured to capture the region Rl (the region R1 may move according to a movement of an object 01) with a higher resolution compared to another region outside of the region Rl. Therefore, the region of the live video to be enlarged has a higher resolution than another region of the live video not to be enlarged. Therefore, the region to be emphasized can have more information for a viewer to see the details.
- regions within the display region RA may be processed such that the enlarged region Rl stands out and becomes more obvious.
- other regions may be darkened or blurred, such that a viewer can focus more easily on the region Rl.
- FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
- the object 01 is determined to be a wearable device or a wearable object on user A.
- the object 01 moves synchronously with a movement of user A, and the region of the live video to be enlarged moves synchronously with a movement of the object 01. Therefore, it is convenient for user A to determine which region to be enlarged or emphasized by simply controlling the position of the object 01.
- enlarging a region of a live video and/ or moving the enlarged region are done with video processes executed by a user terminal, a server, or an application. Therefore, a direction of a video capturing device used to capture the live video can be kept fixed when the region of the live video to be enlarged moves synchronously with the movement of the predetermined object.
- a user may send out a first message to trigger a message recognition process, and then send out a second message to indicate which object to recognize. The object then determines the region to be enlarged.
- the first message and/or the second message can be or can include voice message, gesture message or facial expression message.
- the first message can be referred to as a trigger message.
- user A may speak “focus” or “zoom in” to indicate that whatever he or she sends out next is for recognizing the object 01.
- user A may speak “pan” such that a pan on the video would be recognized as the object 01.
- a region in the vicinity of the pan would be enlarged.
- the above configuration may save the resources used in message recognition.
- a constantly ongoing message recognition process (which may include comparing the video information with a message table) can be only focused on the first message, which may be a single voice message.
- the second message may have more variants, each corresponding to a different object in the video.
- the message recognition process for the second message can be turned on only when the first message is received and/ or detected.
- FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
- the communication system 1 may provide a live streaming service with interaction via a content.
- content refers to a digital content that can be played on a computer device.
- the communication system 1 enables a user to participate in real-time interaction with other users on-line.
- the communication system 1 includes a plurality of user terminals 10, a backend server 30, and a streaming server 40.
- the user terminals 10, the backend server 30 and the streaming server 40 are connected via a network 90, which may be the Internet, for example.
- the backend server 30 may be a server for synchronizing interaction between the user terminals and/ or the streaming server 40.
- the backend server 30 may be referred to as the origin server of an application (APP) provider.
- the streaming server 40 is a server for handling or providing streaming data or video data.
- the backend server 30 and the streaming server 40 may be independent servers.
- the backend server 30 and the streaming server 40 may be integrated into one server.
- the user terminals 10 are client devices for the live streaming.
- a user terminal 10 may be referred to as viewer, streamer, anchor, podcaster, audience, listener or the like.
- Each of the user terminals 10, the backend server 30, and the streaming server 40 is an example of an information-processing device.
- the streaming may be live streaming or video replay.
- the streaming may be audio streaming and/or video streaming.
- the streaming may include contents such as online shopping, talk shows, talent shows, entertainment events, sports events, music videos, movies, comedy, concerts, group calls, conference calls or the like.
- FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
- the user terminal 10S is a user terminal of a streamer or a broadcastor.
- the user terminal 10S includes a live video capturing unit 12, a message reception unit 13, an object identifying unit 14, a region determining unit 15, an enlarging unit 16, and a transmitting unit 17.
- the live video capturing unit 12 includes a camera 122 and a microphone 124, and is configured to capture live video data (including audio data) of the streamer.
- the message reception unit 13 is configured to monitor voice stream (or image stream in some embodiments) in the live video, and to recognize a predetermined word (for example, “focus” or “zoom-in”) in the voice stream.
- the object identifying unit 14 is configured to identify one or more predetermined objects in the live video, and to recognize the identified one or more objects in the image or the live video.
- the identification of objects may be done by a look-up table and the predetermined word recognized by the message reception unit 13, which will be described later. In another embodiment, the identification of objects may be done by the message reception unit 13.
- the region determining unit 15 is configured to determine a region in the live video to be enlarged.
- the region to be enlarged is a region in the vicinity of the identified or recognized object.
- the enlarging unit 16 is configured to perform video processes related to enlarging a region of a live video.
- the camera 122 may be involved in the enlarging process.
- the transmitting unit 17 is configured to transmit the enlarged live video (or a live video with a region enlarged) to a server (such as a streaming server) if the enlarging process is performed. If an enlarging process is not performed, the transmitting unit 17 transmits the live video captured by the live video capturing unit 12.
- FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure, which may be utilized by the object identifying unit 14 of FIG. 5.
- the column “predetermined word” indicates the words to be identified in the voice stream of the live video.
- the column “object” indicates the object corresponding to each predetermined word to be recognized. For example, in this example, an identified “zoom-in” leads to recognition of the streamer’s hand in the live video, an identified “pan” leads to recognition of a pan in the live video, an identified “board please” leads to recognition of a chopping board in the live video.
- the predetermined words or the objects are pre-set by a user. In some embodiments, the predetermined words or the objects may be auto-created through Al or machine learning.
- processing and procedures described in the present disclosure may be realized by software, hardware, or any combination of these in addition to what was explicitly described.
- the processing and procedures described in the specification may be realized by implementing a logic corresponding to the processing and procedures in a medium such as an integrated circuit, a volatile memory, a non-volatile memory, a non-transitory computer-readable medium and a magnetic disk.
- the processing and procedures described in the specification can be implemented as a computer program corresponding to the processing and procedures, and can be executed by various kinds of computers.
- the system or method described in the above embodiments may be integrated into programs stored in a computer-readable non-transitory medium such as a solid state memory device, an optical disk storage device, or a magnetic disk storage device.
- the programs may be downloaded from a server via the Internet and be executed by processors.
- a person having common knowledge in the technical field of the present invention may still make many variations and modifications without disobeying the teaching and disclosure of the present invention. Therefore, the scope of the present invention is not limited to the embodiments that are already disclosed, but includes another variation and modification that do not disobey the present invention, and is the scope covered by the patent application scope.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The present disclosure relates to a system, a method and a computer-readable medium for live video processing. The method includes receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object. The present disclosure can facilitate the presenting and focusing of a live video.
Description
SYSTEM. METHOD AND COMPUTER-READABLE MEDIUM FOR VIDEO PROCESSING
Field of the Invention
[0001] The present disclosure relates to video processing in a video streaming.
Description of the Prior Art
[0002] Various technologies for enabling users to participate in mutual on-line communication are known. The applications include live streaming, live conference calls and the like. As these applications increase in popularity, user demands for improved communication efficiency and better understanding of each other’s message during the communication are rising.
Summary of the Invention
[0003] A method according to one embodiment of the present disclosure is a method for live video processing. The method includes receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
[0004] A system according to one embodiment of the present disclosure is a system for live video processing that includes one or a plurality of processors, and the one or plurality of processors execute a machine-readable instruction to perform: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
[0005] A computer-readable medium according to one embodiment of the present disclosure is a non-transitory computer-readable medium including a program for live video processing, and the program causes one or a plurality of computers to execute: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
Brief description of the drawings
[0006] FIG. 1 shows an example of a live streaming.
[0007] FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
[0008] FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
[0009] FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
[0010] FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
[0011] FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure.
Detailed Description
[0012] Conventionally, compared with face-to-face communication, on-line communication has some disadvantages which may reduce the communication efficiency or increase the chances of misunderstanding. For example, during a live video or a live streaming communication, it is difficult to keep the focus on the correct region, especially when there are some distractions such as comments, special effects on the display wherein the live video is being displayed. For another example, during a live video or a live streaming communication, it is difficult to see the details of the video content due to the limited size of the display or the limited resolution of the video.
[0013] FIG. 1 shows an example of a live streaming. SI is a screen of a user terminal displaying the live streaming. RA is a display region within the screen SI displaying a live video of a user A. The live video of user A may be taken and provided by a video capturing device, such as a camera, positioned in the vicinity of user A. In this example, user A may be a streamer or a broadcastor who is distributing a live video to teach how to cook.
[0014] User A would like viewers of this live video to be able to focus on the right region of the video, and to be able to see the details of the region, in order for the viewers to get the correct knowledge such as cooking steps or cooking materials. Conventionally, user A may need to bring up the object of interest (such as a pan or a chopping board) closer to the camera for the users to see clearly. Or, user A may need to adjust a direction, a position or a focus of the camera for users to see the details user A wants to emphasize. The above actions are inconvenient for user A and interrupt the cooking process.
[0015] Therefore, it is desirable to have a method by which a user can indicate the region of interest in the live video and present the details of the region without having to stop the ongoing process. It is also desirable to have a method to help a viewer to focus on the correct
region of a live video and to see the details of the region. The present disclosure can facilitate the presenting and focusing of a live video.
[0016] FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
[0017] Referring to FIG. 2A, user A sends out a message or a signal Ml. In this embodiment, the message Ml is a voice message indicating “zoom in.” In other embodiments, the message Ml may be a gesture message expressed by user A. For example, user A may use a body portion (such as a hand) to form a gesture message. In some embodiments, the message Ml may be a facial expression message expressed by user A. The message Ml is part of the video (including audio data) of user A.
[0018] The message Ml may be received by a user terminal used to capture the video of user A, such as a smartphone, a tablet, a laptop or any device with a video capturing function. In some embodiments, the message Ml is recognized by a user terminal used to produce or deliver the video of user A. In some embodiments, the message Ml is recognized by a system that provides the streaming service. In some embodiments, the message Ml is recognized by a server that supports the streaming service. In some embodiments, the message Ml is recognized by an application that supports the streaming service. In some embodiments, the message Ml is recognized by a voice recognition process, a gesture recognition process and/or a facial expression recognition process. In some embodiments, the message Ml may be an electrical signal, and can be transmitted and received by wireless connections.
[0019] Referring to FIG. 2B, objects 01 are recognized, and a region R1 is determined. The objects 01 are recognized according to the message Ml. In some embodiments, the recognition of the object 01 follows the receiving of the message Ml. In some embodiments, the receiving of the message Ml triggers the recognition of the object 01. In some embodiments, a recognition of the message Ml is done before the recognition of the object 01.
[0020] In this embodiment, the object 01 is set, taught or determined to be a body part (hands) of user A. In other embodiments, the object 01 may be determined to be a non-body object such as a chopping board or a pan. In some embodiments, the object 01 may be determined to be a wearable object on user A such as a watch, a bracelet or a sticker. The object 01 may be predetermined or set to be any object in the video of user A.
[0021] The region R1 is determined to be a region in the vicinity of the object 01. For example, the region R1 may be determined to be a region enclosing or surrounding all objects 01, thereby user A may control the size of the region R1 conveniently by controlling the
positions of objects 01 (in this case, the objects 01 are her hands). A distance between an edge of the region R1 and the object 01 may be determined according to the actual practice. [0022] In some embodiments, different messages Ml may correspond to different predetermined objects 01. For example, user A may choose the object to be recognized, and the region to be determined, simply by sending out the corresponding message. For example, user A may speak “pan,” and then a pan (which is a predetermined object corresponding to the message “pan”) is recognized, and the region R1 would be determined to be a region in the vicinity of the pan.
[0023] In some embodiments, an object 01 is recognized by a user terminal used to capture the live video of user A. In some embodiments, an object 01 is recognized by a user terminal used to produce or deliver the video of user A. In some embodiments, an object 01 is recognized by a system that provides the streaming service. In some embodiments, an object 01 is recognized by a server that supports the streaming service. In some embodiments, an object 01 is recognized by an application that supports the streaming service.
[0024] In some embodiments, the region R1 is determined by a user terminal used to capture the live video of user A. In some embodiments, the region R1 is determined by a user terminal used to produce or deliver the video of user A. In some embodiments, the region R1 is determined by a system that provides the streaming service. In some embodiments, the region R1 is determined by a server that supports the streaming service. In some embodiments, the region R1 is determined by an application that supports the streaming service.
[0025] Referring to FIG. 2C, the region R1 is enlarged such that details of the video content within the region R1 can be seen clearly. The enlarged region R1 may cover or overlap a portion of the video of user A that is outside the region Rl. The enlarged region R1 may be displayed on any region of the screen SI.
[0026] In some embodiments, the enlarging process is performed by a user terminal used to capture the live video of user A. In some embodiments, the enlarging process is performed by a user terminal used to produce or deliver the video of user A. In some embodiments, the enlarging process is performed by a system that provides the streaming service. In some embodiments, the enlarging process is performed by a server that supports the streaming service. In some embodiments, the enlarging process is performed by an application that supports the streaming service. In some embodiments, the enlarging process is performed by a user terminal displaying the video of user A, such as a user terminal of a viewer.
[0027] In an embodiment wherein the enlarging process is performed by a user terminal that captures the video of user A, the user terminal can be configured to capture the region Rl
(the region R1 may move according to a movement of an object 01) with a higher resolution compared to another region outside of the region Rl. Therefore, the region of the live video to be enlarged has a higher resolution than another region of the live video not to be enlarged. Therefore, the region to be emphasized can have more information for a viewer to see the details.
[0028] Referring to FIG. 2D, in some embodiments, except for the enlarged region Rl , other regions within the display region RA may be processed such that the enlarged region Rl stands out and becomes more obvious. For example, other regions may be darkened or blurred, such that a viewer can focus more easily on the region Rl.
[0029] FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
[0030] Referring to FIG. 3, the object 01 is determined to be a wearable device or a wearable object on user A. The object 01 moves synchronously with a movement of user A, and the region of the live video to be enlarged moves synchronously with a movement of the object 01. Therefore, it is convenient for user A to determine which region to be enlarged or emphasized by simply controlling the position of the object 01. In some embodiments, enlarging a region of a live video and/ or moving the enlarged region are done with video processes executed by a user terminal, a server, or an application. Therefore, a direction of a video capturing device used to capture the live video can be kept fixed when the region of the live video to be enlarged moves synchronously with the movement of the predetermined object.
[0031] In some embodiments, a user may send out a first message to trigger a message recognition process, and then send out a second message to indicate which object to recognize. The object then determines the region to be enlarged. The first message and/or the second message can be or can include voice message, gesture message or facial expression message. In some embodiments, the first message can be referred to as a trigger message.
[0032] For example, user A may speak “focus” or “zoom in” to indicate that whatever he or she sends out next is for recognizing the object 01. Next, user A may speak “pan” such that a pan on the video would be recognized as the object 01. Subsequently, a region in the vicinity of the pan would be enlarged.
[0033] In some embodiments, the above configuration may save the resources used in message recognition. For example, a constantly ongoing message recognition process (which may include comparing the video information with a message table) can be only focused on the first message, which may be a single voice message. The second message may have more
variants, each corresponding to a different object in the video. The message recognition process for the second message can be turned on only when the first message is received and/ or detected.
[0034] FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure. The communication system 1 may provide a live streaming service with interaction via a content. Here, the term “content” refers to a digital content that can be played on a computer device. The communication system 1 enables a user to participate in real-time interaction with other users on-line. The communication system 1 includes a plurality of user terminals 10, a backend server 30, and a streaming server 40. The user terminals 10, the backend server 30 and the streaming server 40 are connected via a network 90, which may be the Internet, for example. The backend server 30 may be a server for synchronizing interaction between the user terminals and/ or the streaming server 40. In some embodiments, the backend server 30 may be referred to as the origin server of an application (APP) provider. The streaming server 40 is a server for handling or providing streaming data or video data. In some embodiments, the backend server 30 and the streaming server 40 may be independent servers. In some embodiments, the backend server 30 and the streaming server 40 may be integrated into one server. In some embodiments, the user terminals 10 are client devices for the live streaming. In some embodiments, a user terminal 10 may be referred to as viewer, streamer, anchor, podcaster, audience, listener or the like. Each of the user terminals 10, the backend server 30, and the streaming server 40 is an example of an information-processing device. In some embodiments, the streaming may be live streaming or video replay. In some embodiments, the streaming may be audio streaming and/or video streaming. In some embodiments, the streaming may include contents such as online shopping, talk shows, talent shows, entertainment events, sports events, music videos, movies, comedy, concerts, group calls, conference calls or the like.
[0035] FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
[0036] The user terminal 10S is a user terminal of a streamer or a broadcastor. The user terminal 10S includes a live video capturing unit 12, a message reception unit 13, an object identifying unit 14, a region determining unit 15, an enlarging unit 16, and a transmitting unit 17.
[0037] The live video capturing unit 12 includes a camera 122 and a microphone 124, and is configured to capture live video data (including audio data) of the streamer.
[0038] The message reception unit 13 is configured to monitor voice stream (or image stream in some embodiments) in the live video, and to recognize a predetermined word (for example, “focus” or “zoom-in”) in the voice stream.
[0039] The object identifying unit 14 is configured to identify one or more predetermined objects in the live video, and to recognize the identified one or more objects in the image or the live video. The identification of objects may be done by a look-up table and the predetermined word recognized by the message reception unit 13, which will be described later. In another embodiment, the identification of objects may be done by the message reception unit 13.
[0040] The region determining unit 15 is configured to determine a region in the live video to be enlarged. The region to be enlarged is a region in the vicinity of the identified or recognized object.
[0041] The enlarging unit 16 is configured to perform video processes related to enlarging a region of a live video. In an embodiment wherein the region to be enlarged is captured with a higher resolution, the camera 122 may be involved in the enlarging process.
[0042] The transmitting unit 17 is configured to transmit the enlarged live video (or a live video with a region enlarged) to a server (such as a streaming server) if the enlarging process is performed. If an enlarging process is not performed, the transmitting unit 17 transmits the live video captured by the live video capturing unit 12.
[0043] FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure, which may be utilized by the object identifying unit 14 of FIG. 5. [0044] The column “predetermined word” indicates the words to be identified in the voice stream of the live video. The column “object” indicates the object corresponding to each predetermined word to be recognized. For example, in this example, an identified “zoom-in” leads to recognition of the streamer’s hand in the live video, an identified “pan” leads to recognition of a pan in the live video, an identified “board please” leads to recognition of a chopping board in the live video.
[0045] In some embodiments, the predetermined words or the objects are pre-set by a user. In some embodiments, the predetermined words or the objects may be auto-created through Al or machine learning.
[0046] The processing and procedures described in the present disclosure may be realized by software, hardware, or any combination of these in addition to what was explicitly described. For example, the processing and procedures described in the specification may be realized by implementing a logic corresponding to the processing and procedures in a medium
such as an integrated circuit, a volatile memory, a non-volatile memory, a non-transitory computer-readable medium and a magnetic disk. Further, the processing and procedures described in the specification can be implemented as a computer program corresponding to the processing and procedures, and can be executed by various kinds of computers.
[0047] The system or method described in the above embodiments may be integrated into programs stored in a computer-readable non-transitory medium such as a solid state memory device, an optical disk storage device, or a magnetic disk storage device. Alternatively, the programs may be downloaded from a server via the Internet and be executed by processors. [0048] Although technical content and features of the present invention are described above, a person having common knowledge in the technical field of the present invention may still make many variations and modifications without disobeying the teaching and disclosure of the present invention. Therefore, the scope of the present invention is not limited to the embodiments that are already disclosed, but includes another variation and modification that do not disobey the present invention, and is the scope covered by the patent application scope.
Description of reference numerals si Screen
RA Region
01 Object
R1 Region
1 System
10 User terminal
30 Backend server
40 Streaming server
90 Network
10S User terminal
12 Live video capturing unit
122 Camera
124 Microphone
13 Message reception unit
14 Object identifying unit
15 Region determining unit
16 Enlarging unit
17 Transmitting unit
Claims
1. A method for live video processing, comprising: receiving a message from a user while live video created by the user is being broadcasted; and enlarging a region of the live video in the vicinity of a predetermined object according to the message.
2. The method according to claim 1, further comprising recognizing the predetermined object in the live video according to the message.
3. The method according to claim 2, further comprising receiving a trigger message from the user, wherein the trigger message triggers the recognizing the predetermined object in the live video according to the message.
4. The method according to claim 1, wherein the message comprises a voice message, a gesture message, or a facial expression message.
5. The method according to claim 1, further comprising recognizing the message from the user.
6. The method according to claim 5, wherein the recognizing the message from the user comprises a voice recognition process, a gesture recognition process, or a facial expression recognition process.
7. The method according to claim 1, wherein the predetermined object comprises a body part of the user or a wearable object on the user.
8. The method according to claim 1, wherein the predetermined object moves synchronously with a movement of the user.
9. The method according to claim 1, wherein the message corresponds to the predetermined object.
10. The method according to claim 1, wherein the region of the live video to be enlarged
is captured by a video capturing device with a higher resolution than another region of the live video not to be enlarged.
11. The method according to claim 1, wherein the region of the live video to be enlarged moves synchronously with a movement of the predetermined object.
12. The method according to claim 11, wherein the live video is generated by a video capturing device in the vicinity of the user, and a direction of the video capturing device are kept fixed when the region of the live video to be enlarged moves synchronously with the movement of the predetermined object.
13. A system for live video processing, comprising one or a plurality of processors, wherein the one or plurality of processors execute a machine-readable instruction to perform: receiving a message from a user while live video created by the user is being broadcasted; and enlarging a region of the live video in the vicinity of a predetermined object according to the message.
14. A non-transitory computer-readable medium including a program for live video processing, wherein the program causes one or a plurality of computers to execute: receiving a message from a user while live video created by the user is being broadcasted; and enlarging a region of the live video in the vicinity of a predetermined object according to the message.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2021/073183 WO2023129182A1 (en) | 2021-12-30 | 2021-12-30 | System, method and computer-readable medium for video processing |
| JP2022528663A JP7449519B2 (en) | 2021-12-30 | 2021-12-30 | Systems, methods, and computer-readable media for video processing |
| US17/881,743 US12413685B2 (en) | 2021-09-30 | 2022-08-05 | System, method and computer-readable medium for video processing |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2021/073183 WO2023129182A1 (en) | 2021-12-30 | 2021-12-30 | System, method and computer-readable medium for video processing |
Related Child Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/073182 Continuation-In-Part WO2023129181A1 (en) | 2021-09-30 | 2021-12-30 | System, method and computer-readable medium for image recognition |
| US17/881,743 Continuation-In-Part US12413685B2 (en) | 2021-09-30 | 2022-08-05 | System, method and computer-readable medium for video processing |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023129182A1 true WO2023129182A1 (en) | 2023-07-06 |
Family
ID=87000027
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/073183 Ceased WO2023129182A1 (en) | 2021-09-30 | 2021-12-30 | System, method and computer-readable medium for video processing |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7449519B2 (en) |
| WO (1) | WO2023129182A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160205341A1 (en) * | 2013-08-20 | 2016-07-14 | Smarter Tv Ltd. | System and method for real-time processing of ultra-high resolution digital video |
| US20210099505A1 (en) * | 2013-02-13 | 2021-04-01 | Guy Ravine | Techniques for Optimizing the Display of Videos |
| US20210365707A1 (en) * | 2020-05-20 | 2021-11-25 | Qualcomm Incorporated | Maintaining fixed sizes for target objects in frames |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4232419B2 (en) * | 2001-09-18 | 2009-03-04 | ソニー株式会社 | TRANSMISSION DEVICE, TRANSMISSION METHOD, CONTENT DISTRIBUTION DEVICE, CONTENT DISTRIBUTION METHOD, AND PROGRAM |
| CN107851334A (en) * | 2015-08-06 | 2018-03-27 | 索尼互动娱乐股份有限公司 | Information processor |
| JP2020021225A (en) * | 2018-07-31 | 2020-02-06 | 株式会社ニコン | Display control system, display control method, and display control program |
| TW202133118A (en) * | 2020-02-21 | 2021-09-01 | 四葉草娛樂有限公司 | Panoramic reality simulation system and method thereof with which the user may feel like arbitrary passing through the 3D space so as to achieve the entertainment enjoyment with immersive effect |
-
2021
- 2021-12-30 WO PCT/US2021/073183 patent/WO2023129182A1/en not_active Ceased
- 2021-12-30 JP JP2022528663A patent/JP7449519B2/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210099505A1 (en) * | 2013-02-13 | 2021-04-01 | Guy Ravine | Techniques for Optimizing the Display of Videos |
| US20160205341A1 (en) * | 2013-08-20 | 2016-07-14 | Smarter Tv Ltd. | System and method for real-time processing of ultra-high resolution digital video |
| US20210365707A1 (en) * | 2020-05-20 | 2021-11-25 | Qualcomm Incorporated | Maintaining fixed sizes for target objects in frames |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7449519B2 (en) | 2024-03-14 |
| JP2024501091A (en) | 2024-01-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20130205322A1 (en) | Method and system for synchronization of dial testing and audience response utilizing automatic content recognition | |
| US20150334344A1 (en) | Virtual Window | |
| CN111343476A (en) | Video sharing method and device, electronic equipment and storage medium | |
| US20180077461A1 (en) | Electronic device, interractive mehotd therefor, user terminal and server | |
| US9736518B2 (en) | Content streaming and broadcasting | |
| JP6289651B2 (en) | Method and apparatus for synchronizing playback on two electronic devices | |
| US9756373B2 (en) | Content streaming and broadcasting | |
| EP3316582B1 (en) | Multimedia information processing method and system, standardized server and live broadcast terminal | |
| US20150029342A1 (en) | Broadcasting providing apparatus, broadcasting providing system, and method of providing broadcasting thereof | |
| US20240296195A1 (en) | System, method and computer-readable medium for recommendation | |
| WO2021204139A1 (en) | Video displaying method, device, equipment, and storage medium | |
| CN114095671A (en) | Cloud conference live broadcast system, method, device, device and medium | |
| WO2014190655A1 (en) | Application synchronization method, application server and terminal | |
| CN116437147A (en) | Live broadcast task interaction method and device, electronic equipment and storage medium | |
| WO2015035247A1 (en) | Virtual window | |
| CN108401163B (en) | Method and device for realizing VR live broadcast and OTT service system | |
| US9332206B2 (en) | Frame sharing | |
| US12413685B2 (en) | System, method and computer-readable medium for video processing | |
| WO2023129182A1 (en) | System, method and computer-readable medium for video processing | |
| US11825170B2 (en) | Apparatus and associated methods for presentation of comments | |
| CN108521579A (en) | The display methods and device of barrage information | |
| TW202327366A (en) | System, method and computer-readable medium for video processing | |
| US20170094367A1 (en) | Text Data Associated With Separate Multimedia Content Transmission | |
| CN105187934A (en) | Terminal platform for television interactive system | |
| CN112714331B (en) | Information prompting method and device, storage medium and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022528663 Country of ref document: JP |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21970189 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21970189 Country of ref document: EP Kind code of ref document: A1 |