[go: up one dir, main page]

WO2023129182A1 - System, method and computer-readable medium for video processing - Google Patents

System, method and computer-readable medium for video processing Download PDF

Info

Publication number
WO2023129182A1
WO2023129182A1 PCT/US2021/073183 US2021073183W WO2023129182A1 WO 2023129182 A1 WO2023129182 A1 WO 2023129182A1 US 2021073183 W US2021073183 W US 2021073183W WO 2023129182 A1 WO2023129182 A1 WO 2023129182A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
user
live video
region
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2021/073183
Other languages
French (fr)
Inventor
Shao Yuan Wu
Ming-Che Cheng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
17Live Japan Inc
17Live USA Corp
Original Assignee
17Live Japan Inc
17Live USA Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 17Live Japan Inc, 17Live USA Corp filed Critical 17Live Japan Inc
Priority to PCT/US2021/073183 priority Critical patent/WO2023129182A1/en
Priority to JP2022528663A priority patent/JP7449519B2/en
Priority to US17/881,743 priority patent/US12413685B2/en
Publication of WO2023129182A1 publication Critical patent/WO2023129182A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region

Definitions

  • the present disclosure relates to video processing in a video streaming.
  • the applications include live streaming, live conference calls and the like. As these applications increase in popularity, user demands for improved communication efficiency and better understanding of each other’s message during the communication are rising.
  • a method is a method for live video processing.
  • the method includes receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
  • a system is a system for live video processing that includes one or a plurality of processors, and the one or plurality of processors execute a machine-readable instruction to perform: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
  • a computer-readable medium is a non-transitory computer-readable medium including a program for live video processing, and the program causes one or a plurality of computers to execute: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
  • FIG. 1 shows an example of a live streaming.
  • FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
  • FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
  • FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
  • FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
  • FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure.
  • on-line communication has some disadvantages which may reduce the communication efficiency or increase the chances of misunderstanding.
  • a live video or a live streaming communication it is difficult to keep the focus on the correct region, especially when there are some distractions such as comments, special effects on the display wherein the live video is being displayed.
  • a live video or a live streaming communication it is difficult to see the details of the video content due to the limited size of the display or the limited resolution of the video.
  • FIG. 1 shows an example of a live streaming.
  • SI is a screen of a user terminal displaying the live streaming.
  • RA is a display region within the screen SI displaying a live video of a user A.
  • the live video of user A may be taken and provided by a video capturing device, such as a camera, positioned in the vicinity of user A.
  • user A may be a streamer or a broadcastor who is distributing a live video to teach how to cook.
  • User A would like viewers of this live video to be able to focus on the right region of the video, and to be able to see the details of the region, in order for the viewers to get the correct knowledge such as cooking steps or cooking materials.
  • user A may need to bring up the object of interest (such as a pan or a chopping board) closer to the camera for the users to see clearly.
  • user A may need to adjust a direction, a position or a focus of the camera for users to see the details user A wants to emphasize.
  • the above actions are inconvenient for user A and interrupt the cooking process.
  • FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
  • the message Ml is a voice message indicating “zoom in.”
  • the message Ml may be a gesture message expressed by user A.
  • user A may use a body portion (such as a hand) to form a gesture message.
  • the message Ml may be a facial expression message expressed by user A.
  • the message Ml is part of the video (including audio data) of user A.
  • the message Ml may be received by a user terminal used to capture the video of user A, such as a smartphone, a tablet, a laptop or any device with a video capturing function.
  • the message Ml is recognized by a user terminal used to produce or deliver the video of user A.
  • the message Ml is recognized by a system that provides the streaming service.
  • the message Ml is recognized by a server that supports the streaming service.
  • the message Ml is recognized by an application that supports the streaming service.
  • the message Ml is recognized by a voice recognition process, a gesture recognition process and/or a facial expression recognition process.
  • the message Ml may be an electrical signal, and can be transmitted and received by wireless connections.
  • objects 01 are recognized, and a region R1 is determined.
  • the objects 01 are recognized according to the message Ml.
  • the recognition of the object 01 follows the receiving of the message Ml.
  • the receiving of the message Ml triggers the recognition of the object 01.
  • a recognition of the message Ml is done before the recognition of the object 01.
  • the object 01 is set, taught or determined to be a body part (hands) of user A.
  • the object 01 may be determined to be a non-body object such as a chopping board or a pan.
  • the object 01 may be determined to be a wearable object on user A such as a watch, a bracelet or a sticker.
  • the object 01 may be predetermined or set to be any object in the video of user A.
  • the region R1 is determined to be a region in the vicinity of the object 01.
  • the region R1 may be determined to be a region enclosing or surrounding all objects 01, thereby user A may control the size of the region R1 conveniently by controlling the positions of objects 01 (in this case, the objects 01 are her hands).
  • a distance between an edge of the region R1 and the object 01 may be determined according to the actual practice.
  • different messages Ml may correspond to different predetermined objects 01.
  • user A may choose the object to be recognized, and the region to be determined, simply by sending out the corresponding message.
  • user A may speak “pan,” and then a pan (which is a predetermined object corresponding to the message “pan”) is recognized, and the region R1 would be determined to be a region in the vicinity of the pan.
  • an object 01 is recognized by a user terminal used to capture the live video of user A. In some embodiments, an object 01 is recognized by a user terminal used to produce or deliver the video of user A. In some embodiments, an object 01 is recognized by a system that provides the streaming service. In some embodiments, an object 01 is recognized by a server that supports the streaming service. In some embodiments, an object 01 is recognized by an application that supports the streaming service.
  • the region R1 is determined by a user terminal used to capture the live video of user A. In some embodiments, the region R1 is determined by a user terminal used to produce or deliver the video of user A. In some embodiments, the region R1 is determined by a system that provides the streaming service. In some embodiments, the region R1 is determined by a server that supports the streaming service. In some embodiments, the region R1 is determined by an application that supports the streaming service.
  • the region R1 is enlarged such that details of the video content within the region R1 can be seen clearly.
  • the enlarged region R1 may cover or overlap a portion of the video of user A that is outside the region Rl.
  • the enlarged region R1 may be displayed on any region of the screen SI.
  • the enlarging process is performed by a user terminal used to capture the live video of user A. In some embodiments, the enlarging process is performed by a user terminal used to produce or deliver the video of user A. In some embodiments, the enlarging process is performed by a system that provides the streaming service. In some embodiments, the enlarging process is performed by a server that supports the streaming service. In some embodiments, the enlarging process is performed by an application that supports the streaming service. In some embodiments, the enlarging process is performed by a user terminal displaying the video of user A, such as a user terminal of a viewer.
  • the user terminal can be configured to capture the region Rl (the region R1 may move according to a movement of an object 01) with a higher resolution compared to another region outside of the region Rl. Therefore, the region of the live video to be enlarged has a higher resolution than another region of the live video not to be enlarged. Therefore, the region to be emphasized can have more information for a viewer to see the details.
  • regions within the display region RA may be processed such that the enlarged region Rl stands out and becomes more obvious.
  • other regions may be darkened or blurred, such that a viewer can focus more easily on the region Rl.
  • FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
  • the object 01 is determined to be a wearable device or a wearable object on user A.
  • the object 01 moves synchronously with a movement of user A, and the region of the live video to be enlarged moves synchronously with a movement of the object 01. Therefore, it is convenient for user A to determine which region to be enlarged or emphasized by simply controlling the position of the object 01.
  • enlarging a region of a live video and/ or moving the enlarged region are done with video processes executed by a user terminal, a server, or an application. Therefore, a direction of a video capturing device used to capture the live video can be kept fixed when the region of the live video to be enlarged moves synchronously with the movement of the predetermined object.
  • a user may send out a first message to trigger a message recognition process, and then send out a second message to indicate which object to recognize. The object then determines the region to be enlarged.
  • the first message and/or the second message can be or can include voice message, gesture message or facial expression message.
  • the first message can be referred to as a trigger message.
  • user A may speak “focus” or “zoom in” to indicate that whatever he or she sends out next is for recognizing the object 01.
  • user A may speak “pan” such that a pan on the video would be recognized as the object 01.
  • a region in the vicinity of the pan would be enlarged.
  • the above configuration may save the resources used in message recognition.
  • a constantly ongoing message recognition process (which may include comparing the video information with a message table) can be only focused on the first message, which may be a single voice message.
  • the second message may have more variants, each corresponding to a different object in the video.
  • the message recognition process for the second message can be turned on only when the first message is received and/ or detected.
  • FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
  • the communication system 1 may provide a live streaming service with interaction via a content.
  • content refers to a digital content that can be played on a computer device.
  • the communication system 1 enables a user to participate in real-time interaction with other users on-line.
  • the communication system 1 includes a plurality of user terminals 10, a backend server 30, and a streaming server 40.
  • the user terminals 10, the backend server 30 and the streaming server 40 are connected via a network 90, which may be the Internet, for example.
  • the backend server 30 may be a server for synchronizing interaction between the user terminals and/ or the streaming server 40.
  • the backend server 30 may be referred to as the origin server of an application (APP) provider.
  • the streaming server 40 is a server for handling or providing streaming data or video data.
  • the backend server 30 and the streaming server 40 may be independent servers.
  • the backend server 30 and the streaming server 40 may be integrated into one server.
  • the user terminals 10 are client devices for the live streaming.
  • a user terminal 10 may be referred to as viewer, streamer, anchor, podcaster, audience, listener or the like.
  • Each of the user terminals 10, the backend server 30, and the streaming server 40 is an example of an information-processing device.
  • the streaming may be live streaming or video replay.
  • the streaming may be audio streaming and/or video streaming.
  • the streaming may include contents such as online shopping, talk shows, talent shows, entertainment events, sports events, music videos, movies, comedy, concerts, group calls, conference calls or the like.
  • FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
  • the user terminal 10S is a user terminal of a streamer or a broadcastor.
  • the user terminal 10S includes a live video capturing unit 12, a message reception unit 13, an object identifying unit 14, a region determining unit 15, an enlarging unit 16, and a transmitting unit 17.
  • the live video capturing unit 12 includes a camera 122 and a microphone 124, and is configured to capture live video data (including audio data) of the streamer.
  • the message reception unit 13 is configured to monitor voice stream (or image stream in some embodiments) in the live video, and to recognize a predetermined word (for example, “focus” or “zoom-in”) in the voice stream.
  • the object identifying unit 14 is configured to identify one or more predetermined objects in the live video, and to recognize the identified one or more objects in the image or the live video.
  • the identification of objects may be done by a look-up table and the predetermined word recognized by the message reception unit 13, which will be described later. In another embodiment, the identification of objects may be done by the message reception unit 13.
  • the region determining unit 15 is configured to determine a region in the live video to be enlarged.
  • the region to be enlarged is a region in the vicinity of the identified or recognized object.
  • the enlarging unit 16 is configured to perform video processes related to enlarging a region of a live video.
  • the camera 122 may be involved in the enlarging process.
  • the transmitting unit 17 is configured to transmit the enlarged live video (or a live video with a region enlarged) to a server (such as a streaming server) if the enlarging process is performed. If an enlarging process is not performed, the transmitting unit 17 transmits the live video captured by the live video capturing unit 12.
  • FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure, which may be utilized by the object identifying unit 14 of FIG. 5.
  • the column “predetermined word” indicates the words to be identified in the voice stream of the live video.
  • the column “object” indicates the object corresponding to each predetermined word to be recognized. For example, in this example, an identified “zoom-in” leads to recognition of the streamer’s hand in the live video, an identified “pan” leads to recognition of a pan in the live video, an identified “board please” leads to recognition of a chopping board in the live video.
  • the predetermined words or the objects are pre-set by a user. In some embodiments, the predetermined words or the objects may be auto-created through Al or machine learning.
  • processing and procedures described in the present disclosure may be realized by software, hardware, or any combination of these in addition to what was explicitly described.
  • the processing and procedures described in the specification may be realized by implementing a logic corresponding to the processing and procedures in a medium such as an integrated circuit, a volatile memory, a non-volatile memory, a non-transitory computer-readable medium and a magnetic disk.
  • the processing and procedures described in the specification can be implemented as a computer program corresponding to the processing and procedures, and can be executed by various kinds of computers.
  • the system or method described in the above embodiments may be integrated into programs stored in a computer-readable non-transitory medium such as a solid state memory device, an optical disk storage device, or a magnetic disk storage device.
  • the programs may be downloaded from a server via the Internet and be executed by processors.
  • a person having common knowledge in the technical field of the present invention may still make many variations and modifications without disobeying the teaching and disclosure of the present invention. Therefore, the scope of the present invention is not limited to the embodiments that are already disclosed, but includes another variation and modification that do not disobey the present invention, and is the scope covered by the patent application scope.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The present disclosure relates to a system, a method and a computer-readable medium for live video processing. The method includes receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object. The present disclosure can facilitate the presenting and focusing of a live video.

Description

SYSTEM. METHOD AND COMPUTER-READABLE MEDIUM FOR VIDEO PROCESSING
Field of the Invention
[0001] The present disclosure relates to video processing in a video streaming.
Description of the Prior Art
[0002] Various technologies for enabling users to participate in mutual on-line communication are known. The applications include live streaming, live conference calls and the like. As these applications increase in popularity, user demands for improved communication efficiency and better understanding of each other’s message during the communication are rising.
Summary of the Invention
[0003] A method according to one embodiment of the present disclosure is a method for live video processing. The method includes receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
[0004] A system according to one embodiment of the present disclosure is a system for live video processing that includes one or a plurality of processors, and the one or plurality of processors execute a machine-readable instruction to perform: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
[0005] A computer-readable medium according to one embodiment of the present disclosure is a non-transitory computer-readable medium including a program for live video processing, and the program causes one or a plurality of computers to execute: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
Brief description of the drawings
[0006] FIG. 1 shows an example of a live streaming.
[0007] FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure. [0008] FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
[0009] FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
[0010] FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
[0011] FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure.
Detailed Description
[0012] Conventionally, compared with face-to-face communication, on-line communication has some disadvantages which may reduce the communication efficiency or increase the chances of misunderstanding. For example, during a live video or a live streaming communication, it is difficult to keep the focus on the correct region, especially when there are some distractions such as comments, special effects on the display wherein the live video is being displayed. For another example, during a live video or a live streaming communication, it is difficult to see the details of the video content due to the limited size of the display or the limited resolution of the video.
[0013] FIG. 1 shows an example of a live streaming. SI is a screen of a user terminal displaying the live streaming. RA is a display region within the screen SI displaying a live video of a user A. The live video of user A may be taken and provided by a video capturing device, such as a camera, positioned in the vicinity of user A. In this example, user A may be a streamer or a broadcastor who is distributing a live video to teach how to cook.
[0014] User A would like viewers of this live video to be able to focus on the right region of the video, and to be able to see the details of the region, in order for the viewers to get the correct knowledge such as cooking steps or cooking materials. Conventionally, user A may need to bring up the object of interest (such as a pan or a chopping board) closer to the camera for the users to see clearly. Or, user A may need to adjust a direction, a position or a focus of the camera for users to see the details user A wants to emphasize. The above actions are inconvenient for user A and interrupt the cooking process.
[0015] Therefore, it is desirable to have a method by which a user can indicate the region of interest in the live video and present the details of the region without having to stop the ongoing process. It is also desirable to have a method to help a viewer to focus on the correct region of a live video and to see the details of the region. The present disclosure can facilitate the presenting and focusing of a live video.
[0016] FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
[0017] Referring to FIG. 2A, user A sends out a message or a signal Ml. In this embodiment, the message Ml is a voice message indicating “zoom in.” In other embodiments, the message Ml may be a gesture message expressed by user A. For example, user A may use a body portion (such as a hand) to form a gesture message. In some embodiments, the message Ml may be a facial expression message expressed by user A. The message Ml is part of the video (including audio data) of user A.
[0018] The message Ml may be received by a user terminal used to capture the video of user A, such as a smartphone, a tablet, a laptop or any device with a video capturing function. In some embodiments, the message Ml is recognized by a user terminal used to produce or deliver the video of user A. In some embodiments, the message Ml is recognized by a system that provides the streaming service. In some embodiments, the message Ml is recognized by a server that supports the streaming service. In some embodiments, the message Ml is recognized by an application that supports the streaming service. In some embodiments, the message Ml is recognized by a voice recognition process, a gesture recognition process and/or a facial expression recognition process. In some embodiments, the message Ml may be an electrical signal, and can be transmitted and received by wireless connections.
[0019] Referring to FIG. 2B, objects 01 are recognized, and a region R1 is determined. The objects 01 are recognized according to the message Ml. In some embodiments, the recognition of the object 01 follows the receiving of the message Ml. In some embodiments, the receiving of the message Ml triggers the recognition of the object 01. In some embodiments, a recognition of the message Ml is done before the recognition of the object 01.
[0020] In this embodiment, the object 01 is set, taught or determined to be a body part (hands) of user A. In other embodiments, the object 01 may be determined to be a non-body object such as a chopping board or a pan. In some embodiments, the object 01 may be determined to be a wearable object on user A such as a watch, a bracelet or a sticker. The object 01 may be predetermined or set to be any object in the video of user A.
[0021] The region R1 is determined to be a region in the vicinity of the object 01. For example, the region R1 may be determined to be a region enclosing or surrounding all objects 01, thereby user A may control the size of the region R1 conveniently by controlling the positions of objects 01 (in this case, the objects 01 are her hands). A distance between an edge of the region R1 and the object 01 may be determined according to the actual practice. [0022] In some embodiments, different messages Ml may correspond to different predetermined objects 01. For example, user A may choose the object to be recognized, and the region to be determined, simply by sending out the corresponding message. For example, user A may speak “pan,” and then a pan (which is a predetermined object corresponding to the message “pan”) is recognized, and the region R1 would be determined to be a region in the vicinity of the pan.
[0023] In some embodiments, an object 01 is recognized by a user terminal used to capture the live video of user A. In some embodiments, an object 01 is recognized by a user terminal used to produce or deliver the video of user A. In some embodiments, an object 01 is recognized by a system that provides the streaming service. In some embodiments, an object 01 is recognized by a server that supports the streaming service. In some embodiments, an object 01 is recognized by an application that supports the streaming service.
[0024] In some embodiments, the region R1 is determined by a user terminal used to capture the live video of user A. In some embodiments, the region R1 is determined by a user terminal used to produce or deliver the video of user A. In some embodiments, the region R1 is determined by a system that provides the streaming service. In some embodiments, the region R1 is determined by a server that supports the streaming service. In some embodiments, the region R1 is determined by an application that supports the streaming service.
[0025] Referring to FIG. 2C, the region R1 is enlarged such that details of the video content within the region R1 can be seen clearly. The enlarged region R1 may cover or overlap a portion of the video of user A that is outside the region Rl. The enlarged region R1 may be displayed on any region of the screen SI.
[0026] In some embodiments, the enlarging process is performed by a user terminal used to capture the live video of user A. In some embodiments, the enlarging process is performed by a user terminal used to produce or deliver the video of user A. In some embodiments, the enlarging process is performed by a system that provides the streaming service. In some embodiments, the enlarging process is performed by a server that supports the streaming service. In some embodiments, the enlarging process is performed by an application that supports the streaming service. In some embodiments, the enlarging process is performed by a user terminal displaying the video of user A, such as a user terminal of a viewer.
[0027] In an embodiment wherein the enlarging process is performed by a user terminal that captures the video of user A, the user terminal can be configured to capture the region Rl (the region R1 may move according to a movement of an object 01) with a higher resolution compared to another region outside of the region Rl. Therefore, the region of the live video to be enlarged has a higher resolution than another region of the live video not to be enlarged. Therefore, the region to be emphasized can have more information for a viewer to see the details.
[0028] Referring to FIG. 2D, in some embodiments, except for the enlarged region Rl , other regions within the display region RA may be processed such that the enlarged region Rl stands out and becomes more obvious. For example, other regions may be darkened or blurred, such that a viewer can focus more easily on the region Rl.
[0029] FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
[0030] Referring to FIG. 3, the object 01 is determined to be a wearable device or a wearable object on user A. The object 01 moves synchronously with a movement of user A, and the region of the live video to be enlarged moves synchronously with a movement of the object 01. Therefore, it is convenient for user A to determine which region to be enlarged or emphasized by simply controlling the position of the object 01. In some embodiments, enlarging a region of a live video and/ or moving the enlarged region are done with video processes executed by a user terminal, a server, or an application. Therefore, a direction of a video capturing device used to capture the live video can be kept fixed when the region of the live video to be enlarged moves synchronously with the movement of the predetermined object.
[0031] In some embodiments, a user may send out a first message to trigger a message recognition process, and then send out a second message to indicate which object to recognize. The object then determines the region to be enlarged. The first message and/or the second message can be or can include voice message, gesture message or facial expression message. In some embodiments, the first message can be referred to as a trigger message.
[0032] For example, user A may speak “focus” or “zoom in” to indicate that whatever he or she sends out next is for recognizing the object 01. Next, user A may speak “pan” such that a pan on the video would be recognized as the object 01. Subsequently, a region in the vicinity of the pan would be enlarged.
[0033] In some embodiments, the above configuration may save the resources used in message recognition. For example, a constantly ongoing message recognition process (which may include comparing the video information with a message table) can be only focused on the first message, which may be a single voice message. The second message may have more variants, each corresponding to a different object in the video. The message recognition process for the second message can be turned on only when the first message is received and/ or detected.
[0034] FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure. The communication system 1 may provide a live streaming service with interaction via a content. Here, the term “content” refers to a digital content that can be played on a computer device. The communication system 1 enables a user to participate in real-time interaction with other users on-line. The communication system 1 includes a plurality of user terminals 10, a backend server 30, and a streaming server 40. The user terminals 10, the backend server 30 and the streaming server 40 are connected via a network 90, which may be the Internet, for example. The backend server 30 may be a server for synchronizing interaction between the user terminals and/ or the streaming server 40. In some embodiments, the backend server 30 may be referred to as the origin server of an application (APP) provider. The streaming server 40 is a server for handling or providing streaming data or video data. In some embodiments, the backend server 30 and the streaming server 40 may be independent servers. In some embodiments, the backend server 30 and the streaming server 40 may be integrated into one server. In some embodiments, the user terminals 10 are client devices for the live streaming. In some embodiments, a user terminal 10 may be referred to as viewer, streamer, anchor, podcaster, audience, listener or the like. Each of the user terminals 10, the backend server 30, and the streaming server 40 is an example of an information-processing device. In some embodiments, the streaming may be live streaming or video replay. In some embodiments, the streaming may be audio streaming and/or video streaming. In some embodiments, the streaming may include contents such as online shopping, talk shows, talent shows, entertainment events, sports events, music videos, movies, comedy, concerts, group calls, conference calls or the like.
[0035] FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
[0036] The user terminal 10S is a user terminal of a streamer or a broadcastor. The user terminal 10S includes a live video capturing unit 12, a message reception unit 13, an object identifying unit 14, a region determining unit 15, an enlarging unit 16, and a transmitting unit 17.
[0037] The live video capturing unit 12 includes a camera 122 and a microphone 124, and is configured to capture live video data (including audio data) of the streamer. [0038] The message reception unit 13 is configured to monitor voice stream (or image stream in some embodiments) in the live video, and to recognize a predetermined word (for example, “focus” or “zoom-in”) in the voice stream.
[0039] The object identifying unit 14 is configured to identify one or more predetermined objects in the live video, and to recognize the identified one or more objects in the image or the live video. The identification of objects may be done by a look-up table and the predetermined word recognized by the message reception unit 13, which will be described later. In another embodiment, the identification of objects may be done by the message reception unit 13.
[0040] The region determining unit 15 is configured to determine a region in the live video to be enlarged. The region to be enlarged is a region in the vicinity of the identified or recognized object.
[0041] The enlarging unit 16 is configured to perform video processes related to enlarging a region of a live video. In an embodiment wherein the region to be enlarged is captured with a higher resolution, the camera 122 may be involved in the enlarging process.
[0042] The transmitting unit 17 is configured to transmit the enlarged live video (or a live video with a region enlarged) to a server (such as a streaming server) if the enlarging process is performed. If an enlarging process is not performed, the transmitting unit 17 transmits the live video captured by the live video capturing unit 12.
[0043] FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure, which may be utilized by the object identifying unit 14 of FIG. 5. [0044] The column “predetermined word” indicates the words to be identified in the voice stream of the live video. The column “object” indicates the object corresponding to each predetermined word to be recognized. For example, in this example, an identified “zoom-in” leads to recognition of the streamer’s hand in the live video, an identified “pan” leads to recognition of a pan in the live video, an identified “board please” leads to recognition of a chopping board in the live video.
[0045] In some embodiments, the predetermined words or the objects are pre-set by a user. In some embodiments, the predetermined words or the objects may be auto-created through Al or machine learning.
[0046] The processing and procedures described in the present disclosure may be realized by software, hardware, or any combination of these in addition to what was explicitly described. For example, the processing and procedures described in the specification may be realized by implementing a logic corresponding to the processing and procedures in a medium such as an integrated circuit, a volatile memory, a non-volatile memory, a non-transitory computer-readable medium and a magnetic disk. Further, the processing and procedures described in the specification can be implemented as a computer program corresponding to the processing and procedures, and can be executed by various kinds of computers.
[0047] The system or method described in the above embodiments may be integrated into programs stored in a computer-readable non-transitory medium such as a solid state memory device, an optical disk storage device, or a magnetic disk storage device. Alternatively, the programs may be downloaded from a server via the Internet and be executed by processors. [0048] Although technical content and features of the present invention are described above, a person having common knowledge in the technical field of the present invention may still make many variations and modifications without disobeying the teaching and disclosure of the present invention. Therefore, the scope of the present invention is not limited to the embodiments that are already disclosed, but includes another variation and modification that do not disobey the present invention, and is the scope covered by the patent application scope.
Description of reference numerals si Screen
RA Region
01 Object
R1 Region
1 System
10 User terminal
30 Backend server
40 Streaming server
90 Network
10S User terminal
12 Live video capturing unit
122 Camera
124 Microphone
13 Message reception unit
14 Object identifying unit
15 Region determining unit
16 Enlarging unit
17 Transmitting unit

Claims

We Claim:
1. A method for live video processing, comprising: receiving a message from a user while live video created by the user is being broadcasted; and enlarging a region of the live video in the vicinity of a predetermined object according to the message.
2. The method according to claim 1, further comprising recognizing the predetermined object in the live video according to the message.
3. The method according to claim 2, further comprising receiving a trigger message from the user, wherein the trigger message triggers the recognizing the predetermined object in the live video according to the message.
4. The method according to claim 1, wherein the message comprises a voice message, a gesture message, or a facial expression message.
5. The method according to claim 1, further comprising recognizing the message from the user.
6. The method according to claim 5, wherein the recognizing the message from the user comprises a voice recognition process, a gesture recognition process, or a facial expression recognition process.
7. The method according to claim 1, wherein the predetermined object comprises a body part of the user or a wearable object on the user.
8. The method according to claim 1, wherein the predetermined object moves synchronously with a movement of the user.
9. The method according to claim 1, wherein the message corresponds to the predetermined object.
10. The method according to claim 1, wherein the region of the live video to be enlarged is captured by a video capturing device with a higher resolution than another region of the live video not to be enlarged.
11. The method according to claim 1, wherein the region of the live video to be enlarged moves synchronously with a movement of the predetermined object.
12. The method according to claim 11, wherein the live video is generated by a video capturing device in the vicinity of the user, and a direction of the video capturing device are kept fixed when the region of the live video to be enlarged moves synchronously with the movement of the predetermined object.
13. A system for live video processing, comprising one or a plurality of processors, wherein the one or plurality of processors execute a machine-readable instruction to perform: receiving a message from a user while live video created by the user is being broadcasted; and enlarging a region of the live video in the vicinity of a predetermined object according to the message.
14. A non-transitory computer-readable medium including a program for live video processing, wherein the program causes one or a plurality of computers to execute: receiving a message from a user while live video created by the user is being broadcasted; and enlarging a region of the live video in the vicinity of a predetermined object according to the message.
PCT/US2021/073183 2021-09-30 2021-12-30 System, method and computer-readable medium for video processing Ceased WO2023129182A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/US2021/073183 WO2023129182A1 (en) 2021-12-30 2021-12-30 System, method and computer-readable medium for video processing
JP2022528663A JP7449519B2 (en) 2021-12-30 2021-12-30 Systems, methods, and computer-readable media for video processing
US17/881,743 US12413685B2 (en) 2021-09-30 2022-08-05 System, method and computer-readable medium for video processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/073183 WO2023129182A1 (en) 2021-12-30 2021-12-30 System, method and computer-readable medium for video processing

Related Child Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2021/073182 Continuation-In-Part WO2023129181A1 (en) 2021-09-30 2021-12-30 System, method and computer-readable medium for image recognition
US17/881,743 Continuation-In-Part US12413685B2 (en) 2021-09-30 2022-08-05 System, method and computer-readable medium for video processing

Publications (1)

Publication Number Publication Date
WO2023129182A1 true WO2023129182A1 (en) 2023-07-06

Family

ID=87000027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/073183 Ceased WO2023129182A1 (en) 2021-09-30 2021-12-30 System, method and computer-readable medium for video processing

Country Status (2)

Country Link
JP (1) JP7449519B2 (en)
WO (1) WO2023129182A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160205341A1 (en) * 2013-08-20 2016-07-14 Smarter Tv Ltd. System and method for real-time processing of ultra-high resolution digital video
US20210099505A1 (en) * 2013-02-13 2021-04-01 Guy Ravine Techniques for Optimizing the Display of Videos
US20210365707A1 (en) * 2020-05-20 2021-11-25 Qualcomm Incorporated Maintaining fixed sizes for target objects in frames

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4232419B2 (en) * 2001-09-18 2009-03-04 ソニー株式会社 TRANSMISSION DEVICE, TRANSMISSION METHOD, CONTENT DISTRIBUTION DEVICE, CONTENT DISTRIBUTION METHOD, AND PROGRAM
CN107851334A (en) * 2015-08-06 2018-03-27 索尼互动娱乐股份有限公司 Information processor
JP2020021225A (en) * 2018-07-31 2020-02-06 株式会社ニコン Display control system, display control method, and display control program
TW202133118A (en) * 2020-02-21 2021-09-01 四葉草娛樂有限公司 Panoramic reality simulation system and method thereof with which the user may feel like arbitrary passing through the 3D space so as to achieve the entertainment enjoyment with immersive effect

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210099505A1 (en) * 2013-02-13 2021-04-01 Guy Ravine Techniques for Optimizing the Display of Videos
US20160205341A1 (en) * 2013-08-20 2016-07-14 Smarter Tv Ltd. System and method for real-time processing of ultra-high resolution digital video
US20210365707A1 (en) * 2020-05-20 2021-11-25 Qualcomm Incorporated Maintaining fixed sizes for target objects in frames

Also Published As

Publication number Publication date
JP7449519B2 (en) 2024-03-14
JP2024501091A (en) 2024-01-11

Similar Documents

Publication Publication Date Title
US20130205322A1 (en) Method and system for synchronization of dial testing and audience response utilizing automatic content recognition
US20150334344A1 (en) Virtual Window
CN111343476A (en) Video sharing method and device, electronic equipment and storage medium
US20180077461A1 (en) Electronic device, interractive mehotd therefor, user terminal and server
US9736518B2 (en) Content streaming and broadcasting
JP6289651B2 (en) Method and apparatus for synchronizing playback on two electronic devices
US9756373B2 (en) Content streaming and broadcasting
EP3316582B1 (en) Multimedia information processing method and system, standardized server and live broadcast terminal
US20150029342A1 (en) Broadcasting providing apparatus, broadcasting providing system, and method of providing broadcasting thereof
US20240296195A1 (en) System, method and computer-readable medium for recommendation
WO2021204139A1 (en) Video displaying method, device, equipment, and storage medium
CN114095671A (en) Cloud conference live broadcast system, method, device, device and medium
WO2014190655A1 (en) Application synchronization method, application server and terminal
CN116437147A (en) Live broadcast task interaction method and device, electronic equipment and storage medium
WO2015035247A1 (en) Virtual window
CN108401163B (en) Method and device for realizing VR live broadcast and OTT service system
US9332206B2 (en) Frame sharing
US12413685B2 (en) System, method and computer-readable medium for video processing
WO2023129182A1 (en) System, method and computer-readable medium for video processing
US11825170B2 (en) Apparatus and associated methods for presentation of comments
CN108521579A (en) The display methods and device of barrage information
TW202327366A (en) System, method and computer-readable medium for video processing
US20170094367A1 (en) Text Data Associated With Separate Multimedia Content Transmission
CN105187934A (en) Terminal platform for television interactive system
CN112714331B (en) Information prompting method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022528663

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21970189

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21970189

Country of ref document: EP

Kind code of ref document: A1