US9270937B2 - Real time stream provisioning infrastructure - Google Patents
Real time stream provisioning infrastructure Download PDFInfo
- Publication number
- US9270937B2 US9270937B2 US14/140,839 US201314140839A US9270937B2 US 9270937 B2 US9270937 B2 US 9270937B2 US 201314140839 A US201314140839 A US 201314140839A US 9270937 B2 US9270937 B2 US 9270937B2
- Authority
- US
- United States
- Prior art keywords
- chat
- audio
- participants
- stream
- real time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 238000000034 method Methods 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 21
- 238000004891 communication Methods 0.000 claims description 47
- 230000000977 initiatory effect Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 12
- 230000015654 memory Effects 0.000 description 12
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000003870 depth resolved spectroscopy Methods 0.000 description 1
- 208000009743 drug hypersensitivity syndrome Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000004382 visual function Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
- H04L12/1822—Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1069—Session establishment or de-establishment
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/61—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
- H04L65/611—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for multicast or broadcast
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/70—Media network packetisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/24—Negotiation of communication capabilities
Definitions
- This disclosure relates to a real time stream provisioning infrastructure for use in audio-visual multi-user chat interactions.
- Audio-visual chat has been available among a plurality of computer users for some time.
- Skype® enables audio-visual user-to-user calling via a peer-to-peer system with server-based initiation and messaging protocols.
- Skype®, Facetime®, and Google® Hangouts have enabled various permutations of so-called “group” audio-visual chat sessions.
- Facetime® and Skype® also enable mobile-to-mobile single-user-to-single-user audio-visual calling.
- websites such as YouTube®, Netflix® and Vimeo® have enabled streaming of stored videos.
- Sites such as UStream® and Twit.tv® have enabled real time or “live” (or nearly-live) audio-visual streaming.
- Stored video streaming has relied upon conversion of the video into a format suitable for low-bandwidth streaming.
- algorithms can dynamically alter the quality of the stream in real-time so as to accommodate higher or lower bandwidth availability.
- Real time audio-visual streaming typically relies upon a single stream and, encodes the stream before it is broadcast directly from the stream to any number of watchers using multicast protocols.
- Some of these real time streaming services can operate directly from mobile devices, but offer limited streaming capabilities, low resolution, and delay the stream in order to account for issues related to mobile device processing capability, latency and bandwidth considerations.
- FIG. 1 is a block diagram of a real time stream provisioning infrastructure.
- FIG. 2 is a block diagram of a computing device.
- FIG. 3 is a block diagram of an encoding computing device.
- FIG. 4 is a functional diagram of a real time stream provisioning infrastructure.
- FIG. 5 is a functional diagram of a server for video encoding.
- FIG. 6 is a functional diagram of a server for audio encoding.
- FIG. 7 is a flowchart of a video encoding process.
- FIG. 8 is a flowchart of an audio encoding process.
- Mobile devices have been virtually unable to take part in multi-party real-time audio-visual chat functionality.
- Most real-time audio-visual chat systems operate on a peer-to-peer basis, with the server's only function being to arrange the peer-to-peer connection.
- These systems rely upon one or more of the chat participant's computers to encode their own or each of the chat participant's audio and video before rebroadcasting to the group.
- each device may encode its own audio and video and provide it to the other party's computing device.
- the environment 100 includes a mobile device 110 , a client system 120 , a communication system 130 , a cluster controller system 140 , a server cluster 150 , and a viewer system 160 . Each of these elements are interconnected via a network 170 .
- the mobile device 110 and client system 120 are computing devices (see FIG. 1 ) that are used by chat participants and viewers in order to take part in or to view a chat.
- the mobile device 110 may be a mobile phone including a screen, a microphone and a video camera.
- the client system 120 may be a personal desktop computer, a tablet, a laptop or a similar device including a screen, a microphone and a video camera.
- the screen, microphone and video camera may be independent of or integral to either the mobile device 110 or the client system 120 .
- chat means an audio and/or video simultaneous communication involving at least two chat participants.
- a “chat participant” is an individual taking part in a chat, using a mobile device 110 or a client system 120 , and providing an audio and/or video component making up a part of the chat.
- a “chat viewer,” in contrast, is an individual viewing a chat, but not providing any audio and/or video component making up a part of the chat.
- a “chat viewer” may, permanently or temporarily, be converted into a chat participant, either of their own volition, if allowed by the system, by an administrator of a chat, or by another chat participant.
- An “audio component,” a “video component” or an “audio-visual component” as used herein means an audio and/or video stream provided by a single chat participant.
- a “combined” audio and video stream or audio-visual stream is an audio and/or video stream simultaneously incorporating the components of more than one chat participant in a single stream.
- a “master” audio stream, video stream, or audio-visual stream is an audio and/or video stream simultaneously incorporating the components of all chat participants.
- the communication system 130 is a computing device that is responsible for routing communications, such as chat initiation requests, any text-based communication between chat participants and viewers, any unique chat identifiers, and the protocol communications necessary to establish, initiate, maintain, and end chat sessions.
- the communication system 130 may enable peer-to-peer sessions to be initiated.
- the communication system 130 may be made up of more than one physical or logical computing device in one or more locations.
- the cluster controller system 140 is a computing device that is responsible for receiving chat initiation (and termination) requests and then identifying and allocating a server, from the server cluster 150 , to handle audio-visual chats.
- the cluster controller system 140 may also maintain a full database of all ongoing chats and each participant in the ongoing chats.
- the cluster controller system 140 may operate as an organizer of the overall audio-visual chat process. In situations in which a server, from the server cluster 150 , ceases to function or is no longer reachable on the network, the cluster controller system 140 may transition an in-process audio-visual chat to a newly-provisioned server within the server cluster 150 .
- the server cluster 150 is a group of servers that are available to be used to host one or more audio-visual chats.
- a server within the server cluster 150 is used to receive a plurality of audio and/or video components from a plurality of chat participants and to encode those into one or more combined audio and/or video streams.
- the server cluster 150 may, for example, be a set of dynamically available servers that may be allocated on an as-needed basis for use in one or more audio-visual chats. Amazon® and Microsoft® currently offer such servers that may be paid-for on an as-needed basis.
- the servers making up the server cluster 150 each incorporate at least one graphical processing unit (GPU) for use in encoding audio and/or video.
- GPU graphical processing unit
- the viewer system 160 is a computing device that is used to view an on-going audio-visual chat.
- the viewer system 160 is essentially the same as the mobile device 110 and the client system 120 , but is used by a chat viewer. As a result, the viewer system 160 does not provide an audio and/or video component for inclusion in the chat. Instead, the viewer system 160 merely receives an audio and/or video stream.
- FIG. 2 there is shown a block diagram of a computing device 200 , which is representative of the mobile device 110 , the client system 120 , the communication system 130 , the cluster controller system 140 , and the viewer system 160 in FIG. 1 .
- the computing device 200 may be, for example, a desktop or laptop computer, a server computer, a tablet, a smartphone or other mobile device.
- the computing device 200 may include software and/or hardware for providing functionality and features described herein.
- the computing device 200 may therefore include one or more of: logic arrays, memories, analog circuits, digital circuits, software, firmware and processors.
- the hardware and firmware components of the computing device 200 may include various specialized units, circuits, software and interfaces for providing the functionality and features described herein.
- the computing device 200 has a processor 210 coupled to a memory 212 , storage 214 , a network interface 216 and an I/O interface 218 .
- the processor 210 may be or include one or more microprocessors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), programmable logic devices (PLDs) and programmable logic arrays (PLAs).
- the memory 212 may be or include RAM, ROM, DRAM, SRAM and MRAM, and may include firmware, such as static data or fixed instructions, BIOS, system functions, configuration data, and other routines used during the operation of the computing device 200 and processor 210 .
- the memory 212 also provides a storage area for data and instructions associated with applications and data handled by the processor 210 .
- the storage 214 provides non-volatile, bulk or long term storage of data or instructions in the computing device 200 .
- the storage 214 may take the form of a magnetic or solid state disk, tape, CD, DVD, or other reasonably high capacity addressable or serial storage medium. Multiple storage devices may be provided or available to the computing device 200 . Some of these storage devices may be external to the computing device 200 , such as network storage or cloud-based storage.
- the term storage medium corresponds to the storage 214 and does not include transitory media such as signals or waveforms. In some cases, such as those involving solid state memory devices, the memory 212 and storage 214 may be a single device.
- the network interface 216 includes an interface to a network such as network 170 ( FIG. 1 ).
- the network interface 216 may be wired or wireless.
- the I/O interface 218 interfaces the processor 210 to peripherals (not shown) such as displays, video cameras, microphones, keyboards and USB devices.
- FIG. 3 there is shown a block diagram of an encoding computing device 300 , which is representative of the servers making up the server cluster 150 in FIG. 1 .
- the processor 310 , memory 312 , storage 314 , network interface 316 and I/O interface 318 of FIG. 3 serve the same function as the corresponding elements discussed with reference to FIG. 2 above. These will not be discussed further here.
- the GPUs (graphical processing units) 322 , 324 , 326 , and 328 are also present in this computing device 300 . There may be more or fewer GPUs dependent upon the needs of the computing device 300 .
- GPUs such as GPU 322 , are specialized processors including instruction sets designed specifically for processing visual-related algorithms. GPUs differ from CPUs (such as processor 310 ) primarily in that they are capable of interacting with memory directly allocated to the GPU very rapidly and, as a result, can manipulate the large quantities of data pertaining to computer visual functions stored in that memory very rapidly.
- GPUs typically incorporate a “frame buffer” which stores processed data in a format suitable for near-direct output to a computer display.
- GPUs unlike most CPUs, offer high parallelism that enables them to process large blocks of data simultaneously.
- multiple GPUs may be incorporated into a single computing device 300 to enable simultaneous processing by multiple GPUs.
- the computing device 300 may include, for example, five GPUs, each operating independently from one another and communicating with a single CPU (such as processor 310 ).
- a GPU is distinct from and in addition to a CPU.
- a GPU incorporates at least one specific instruction set for operating upon computer graphical data. The instruction set specific to the GPU and is not incorporated in the CPU.
- FIG. 4 a functional diagram of a real time stream provisioning infrastructure 400 is shown.
- FIG. 4 corresponds to FIG. 1 , but includes more detail regarding the functional elements making up the individual computing devices.
- the mobile device 410 , the client system 420 , the communication system 430 , the cluster controller system 440 , the server cluster 450 and the viewer system 460 each have counterparts in FIG. 1 .
- the mobile device 410 and client system 420 each include a communication interface 412 , 422 and an audio-visual chat application 414 , 424 .
- the communication interface 412 , 422 are used to enable textual chat between chat participants.
- the textual chat may take the form of an asynchronous communication between the chat participants and may include text, images (such as .jpg, .gif) and embedded videos (such as from YouTube® and similar video sharing sites).
- the communication interface 412 , 422 is also used to transfer signaling and protocol related messages pertaining to the creation, maintenance and ending of chats between the mobile device 410 , the client system 420 , any viewer systems 460 and the cluster controller system 440 and the server cluster 450 . These messages signal to the communication system 430 which then signals messages to cluster controller system 440 , the server cluster 450 and to the mobile devices and client systems associated with chat participants that at least one chat participant wishes to initiate, continue, and/or end a chat.
- the audio-visual chat application 414 , 424 operate to receive audio and/or video components provided by a chat participant using either a mobile device 410 or a client system 420 and to cause those components to be transmitted to (or through) a cluster controller system 440 for combination into one or more combined streams.
- the audio-visual chat application 414 , 424 may then receive the one or more combined streams and display those to a chat participant using the mobile device 410 or the client system 420 .
- the communication system 430 uses a communication interface 432 to communicate chat requests, initiation messages, chat end messages, and related protocol messages to and from chat participants and any of the infrastructure 400 elements.
- the communication system 430 may provide, for example, a uniform resource locator (URL) for a particular chat session or a particular chat participant. This URL may redirect requests to an associated real-time audio and/or video stream.
- URL uniform resource locator
- the communication system 430 also includes chat instance controllers, such as chat instance controller A 434 and chat instance controller B 436 , for each concurrent chat operating on the system. These controllers 434 and 436 operate as central hubs for all protocol, text, audio components and video components making up a part of a particular chat.
- a chat may be identified by a particular chat ID, with protocol messages, text, audio components and video components directed to the communication system using the chat ID to determine which chat instance controller the data is directed.
- a communication system like the communication system 430 is a system currently provided by Wowza® which enables publication of audio and video components by individual users and further enables publication of a resulting combined or master stream to a plurality of chat participants and chat viewers.
- Wowza® incorporates a protocol for initiating the broadcast (or transmission) of those streams and for receipt of the combined stream for broadcasting to an identified group of chat participants and chat viewers.
- These protocols are an example of the communication protocols used by the communication interface 412 , 422 , although many other systems offering similar functionality may be used. Additional or different systems may make up all or a part of the communication system 430 which may also enable text chat, routing of protocol messages, sharing of audio/visual data via social networks, social network API integration, instant messaging and other, similar functions.
- the cluster controller system 440 is primarily responsible for acting as an orchestrator and a conduit for audio component and video component encoding operations that are passed to the server cluster 450 .
- the cluster controller system 440 incorporates a communication interface 432 , a chat router 444 , a load/active database 446 and a load balancer 448 .
- the communication interface 442 operates, in conjunction with the communication system 430 , to receive and route chat requests, maintenance messages and chat termination messages.
- the chat router 444 operates to direct incoming audio and/or video components from one or more chat participants to a server within the server cluster (or to a newly-allocated server) for encoding of the audio and/or video components into one or more combined streams.
- the load/active database 446 maintains a database of all on-going chats and all participants and viewers for each of those chats. This enables the cluster controller system 440 to determine which audio and/or video components and which combined and master streams pertain to which chats and/or chat participants and chat viewers.
- the load/active database 446 maintains a database of the overall load associated with the encoding operations of each server making up the server cluster 450 . This enables the cluster controller system 440 to determine which server, of the server cluster 450 , would be best-suited to service a new chat and when to activate additional servers available to the server cluster 150 in order to avoid overextending one or more server's capacity to host chats.
- the load balancer 448 uses the information in the load/active database to activate new servers, deactivate unused (or underused) servers, to transfer ongoing chats in real-time to less-utilized servers and to otherwise ensure that an efficient use of the server cluster 450 is taking place. In the event of a server failure, the load balancer 448 may use the information in the load/active database 446 to quickly transition all ongoing chats to one or more other servers.
- the server cluster 450 is a group of servers 454 , 455 , 456 , 457 and an associated communication interface 452 that are responsible for encoding multiple audio and/or video components into one or more combined or master streams.
- Each server in the server cluster 450 can include a transcoder controller 454 - 0 that is responsible for directing transcoding of audio and/or video components and other messages to one of a plurality of transcoder instances 454 - 1 , 454 - 2 operating on that server 454 .
- the transcoder controller 454 - 0 may also report back usages and load data to the load/active database 446 for use by the load balancer 448 .
- Incoming audio and/or video components may only be identified by a chat ID, and the transcoder controller 454 - 0 may use that to direct the stream to the appropriate transcoder instance.
- the transcoder instances A 454 - 1 and B 454 - 2 are responsible for directing one or more GPUs on the server 454 to transcode audio and/or video components into a series of combined audio and/or video streams for service back to one or more of the chat participants.
- the viewer system 460 operates in virtually the same fashion as the mobile device 410 and the client system 420 .
- the audio-visual chat application 464 of a chat viewer using the viewer system 460 does not provide an audio and/or video component for combination into a combined or master stream.
- the chat viewer using the viewer system 460 may, temporarily or permanently, become a chat participant.
- the communication interface 462 communicates a desire to become a chat participant in an ongoing chat or in a new chat
- the communication system 430 provisions that chat in interaction with the cluster controller system 440 and adds the chat viewer (now participant) to a chat instance on a server available to the server cluster 450 .
- FIG. 5 is a functional diagram of server 500 and a communication system for video encoding 530 .
- the server 500 is one of the servers in a server cluster, such as server cluster 450 in FIG. 4 .
- the communication system 530 may be the communication system 430 of FIG. 4 . All communication in the system may be routed through the communication system 530 , may utilize one or more APIs operating on the server 500 or within the various elements allocated for a particular chat instance. For example, an API within a given transcoder controller may communicate, using the communication system 530 , directly with a chat instance controller for that chat instance.
- the communication system 530 includes a plurality of chat instance controllers A 510 , B 520 , and n 530 .
- the chat instance controller A 510 ensures that video received by the system that is associated with a particular chat instance is routed and returned to the appropriate chat instance for chat participants and any chat viewers.
- the server 500 includes a transcoder controller 590 , transcoders A 592 , B 594 and n 596 , and a GPU 550 .
- the transcoder controller 590 controls the operation of the transcoders A 592 , B 594 , and n 596 .
- the transcoder controllers A 592 , B 594 and n 596 are responsible for directing the conversion of audio and video components received from their respective chat instance controllers A 510 , B 520 , and n 530 into combined or master audio or video streams while under the direction of transcoder controller 590 .
- the transcoding takes place on the GPU 550 .
- Each transcoder such as transcoder A 560 , includes a video receiver, such as video receiver 562 , and a video publisher, such as video publisher 564 .
- Each video receiver 562 receives each of the video components for the chat instance controller associated with the transcoder.
- the video receiver 562 for transcoder A 560 receives the video components A 1 522 , A 2 524 , and An 526 associated with chat instance controller A 510 .
- the video components are then provided by the video receiver 562 to the GPU 550 to be combined into video stream A 1 +A 2 +An 552 .
- the master stream is provided to the video publisher 564 that ensures that the master video stream reaches all of the chat participants A 1 512 , A 2 514 , An 516 and any chat viewers, such as chat viewer A 5 518 associated with chat instance controller A 510 .
- chat instance controller A 510 and all other chat instance controllers on a given communication system 530 , are made up of data objects that incorporate a unique chat ID which is associated with a set of chat participants and any chat viewers.
- chat participant A 1 512 , chat participant A 2 514 , and chat participant An 516 make up chat instance A, operating on chat instance controller A 510 allocated to that chat instance.
- Chat viewer 518 may be associated with chat instance A and, similarly, operate on chat instance controller 510 as a chat viewer (not a chat participant). This means that audio and/or video components generated by these chat participants and viewers that is meant for the chat ID associated with chat instance A will be appropriately routed by the chat instance controller A 510 to these chat participants and chat viewers. Similarly, any resulting combined or master streams will be routed back to the appropriate chat participants and chat viewers.
- the chat instance controller 510 may receive (from the chat router 444 , through the communication interface 432 ), as a part of an ongoing chat instance, such as chat instance A, video components from the chat participants. Examples of such video components are shown as video component A 1 522 , video component A 2 524 , and video component An 526 .
- transcoder controller A 592 allocated on the server 500 for a chat instance A.
- video components associated with chat instance A are routed to transcoder controller A 592 .
- video components associated with chat instance B 520 are routed to transcoder controller B 594 and to transcoder B 570 associated with chat instance controller B 520 .
- Video components associated with chat instance n 530 are routed to transcoder controller n 596 and to transcoder n 580 .
- the transcoders, such as transcoder A 560 accept a plurality of video components, prepare those components for encoding, and package those components for encoding using GPU 550 .
- the GPU 550 within the server 500 is used for encoding into a single video stream including the video components of A 1 522 , A 2 524 and An 526 . This is encoded as video stream A 1 +A 2 +An 552 .
- User and/or server settings may determine how this encoding takes place.
- the videos may be overlaid, may be boxed into set portions of an overall stream (when displayed) or may otherwise be organized into a single master A 1 +A 2 +An 552 stream.
- timing data may be transmitted with the video components in order to enable the GPU to properly synchronize video data that is not necessarily received simultaneously from all of the chat participants. This same master stream may then be returned to all of the chat participants A 1 512 , A 2 514 , An 516 and to any chat viewers, such as chat viewer A 5 518 .
- This master stream may be transmitted using user datagram protocol (UDP).
- UDP prioritizes throughput over ensured delivery. In this way, the master stream is continuously sent, regardless of whether a particular chat participant or chat viewer has received or acknowledged delivery. The most important priority is ensuring that the master stream continues to be sent, not ensuring that every participant (one or more may have intentionally or unintentionally dropped out of the chat) has received every frame of video or audio.
- the resulting video stream utilizes substantially less bandwidth than directly providing each, individual video component to each of the chat participants and each chat viewer for combination therein.
- this process utilizes less CPU cycles on any one of the chat participant computing devices than would be necessary for that computing device to encode all of the video into a single stream or for each participant or viewer computing device to simultaneously receive, decode, synchronize and then display each of these video components. This is particularly acute when there are many chat participants and each of the chat participant's resulting video streams or when many or all of the chat participants are on mobile devices.
- server 500 The use of a server, such as server 500 at all is unusual in these types of systems.
- these systems rely upon the computing power of one or more of the chat participant's computing devices to encode some or all of the video for each of the chat participants in a given video chat session.
- the chat participant computing devices are not necessarily capable of performing this function.
- the bandwidth considerations and latency issues related to mobile devices in addition to the more-limited computing power associated with such devices makes using those devices to perform these functions difficult or impractical.
- a server such as server 500 , is used to perform these functions.
- utilizing a single server to perform the encoding of more than a few (3-5) simultaneous chats involving a few (3-5) chat participants results in that server being overloaded and, in some cases, ceasing to function or becoming otherwise unavailable.
- the raw data associated with multiple video streams alone is very bandwidth and memory intensive.
- a single server has difficulty keeping up with such huge volumes of data, both in network bandwidth and in CPU throughput.
- Adding additional servers is an option, but each additional server costs money to initialize, to maintain and to administer. The costs associated with providing an individual, dedicated server for each set of only a few simultaneous on-going audio-visual chat sessions makes continued allocation of additional servers prohibitive.
- the present system may use servers which incorporate a plurality of GPUs operating simultaneously within each of the servers to provide additional processing power necessary to overcome the single-server limitations regarding a total number of simultaneous chat sessions.
- the GPU's direct access to high-speed memory and the GPUs capability to simultaneously operate on large chunks of data enable the GPUs to quickly synchronize and encode multiple, simultaneous video components into a plurality of combined and master video streams.
- the CPU the primary processor for a server, may primarily operate the transcoder controller 590 for a given server 500 and direct the various video streams to their appropriate places where they may then be operated upon by the GPUs.
- a single server, having at least one GPU, of current, typical capability may handle anywhere from 5-100 simultaneous chats involving three or more individual chat participants and any number of chat viewers. Additional simultaneous chats may be possible under the same system using later server and GPU technology.
- the GPU 550 encoding the video uses lock-free memory, meaning that no single chat participant or chat instance can make any part of the data in memory un-editable. This serves to enable the encoding process to continue operating even if one or more chat participants have high latency or are non-responsive. In addition, incoming new video components are not skipped in the encoding process. So, for example, if additional video data comes in for one chat participant while the remaining chat participants have yet to provide data, the video for the single chat participant is encoded along with the last video component received for the remaining chat participants so as to continue the master video stream advancing, even though only a single participant has provided new data. The GPU does not “lock” any data awaiting input from other chat participants.
- the GPU 550 may utilize blocking of data such that large collections of data are operated upon simultaneously. These collections may be time-allocated, meaning that they are allocated in collections based upon when the video components arrive at the GPU 550 . These collections may be type-allocated, meaning that similar video portions that are received within a reasonable time-frame of one another may be grouped for processing because the GPU 550 can perform similar operations at once on different collections of data.
- FIG. 6 is a functional diagram of a server 600 and a communication system 630 for audio encoding.
- the communication system 630 incorporates all of the elements of the communication system 530 in FIG. 5 .
- the chat instance controller A 610 , chat instance controller B 620 , the chat instance controller n 630 , the transcoder controller 690 and the GPU 650 may serve virtually identical functions to that described above with reference to FIG. 5 , except those systems function in the same manner with regard to audio components and combined or master audio streams rather than video components and streams. Those descriptions will not be repeated here.
- each audio transcoder such as transcoder A 660
- each audio transcoder includes an audio receiver for each chat participant in an associated chat instance controller, such as chat instance controller A 610 .
- an audio publisher for each chat participant along with a single audio publisher used for each chat viewer is provided.
- chat instance controller A 610 there are three chat participants, A 1 612 , A 2 614 , and An 616 along with a single chat viewer A 5 618 .
- the audio components A 1 622 , A 2 624 , and An 626 are each received in transcoder A 660 at their respective audio receivers A 1 662 , A 2 663 , and An 664 . These are passed to the GPU 650 for encoding.
- the GPU 650 receives audio components A 1 622 , A 2 624 , and An 626 from the transcoder A 660 , the GPU 650 simultaneously encodes n combined audio streams where n is the total number of chat participants. Each of these individual combined audio streams incorporates all of the audio associated with every chat participant except for one. The GPU 650 then returns the n combined audio streams to the respective audio publisher in transcoder A 660 which routes the combined audio streams such that the missing audio component for a given chat participant is their own audio component.
- the audio publisher A 1 666 receives the audio stream A 2 +An 652
- the audio publisher A 2 667 receives the audio stream A 1 +An 654
- the audio publisher An 668 receives audio stream A 1 +A 2 656
- the viewer publisher 669 receives a master audio stream A 1 +A 2 +An 658 .
- Audio publisher A 1 666 passes the received audio stream A 2 +An 652 to chat participant A 1 612 .
- Audio publisher A 2 667 passes the received audio stream A 1 +An 654 to chat participant A 2 614 .
- Audio publisher An 668 passes the received audio stream A 1 +A 2 656 to chat participant An 616 .
- Chat viewer A 5 618 receives the master audio stream A 1 +A 2 +An 658 .
- chat viewers receive a master audio stream A 1 +A 2 +An 658 incorporating all audio components that are also encoded by the GPU and transmitted, through the chat instance controller 610 , to all chat viewers, such as chat viewer A 5 618 .
- chat viewers receive all audio along with the master video discussed with reference to FIG. 5 .
- a CPU in a server can quickly be overwhelmed by the encoding of multiple audio components, for multiple chat participants across multiple, simultaneous chats.
- the user of a plurality of GPUs to synchronize and encode the audio for each of the on-going chats enables a single current server to handle 10-80 simultaneous chats. Future servers, incorporating better GPUs may increase this number dramatically.
- FIG. 7 a flowchart of a video encoding process is shown.
- FIG. 7 has both a start 705 and an end 795 , but the process is cyclical in nature, particularly with regard to the encoding-related steps.
- One or more of the computing devices identified above may be programmed to carry out these steps, using these steps as their algorithm.
- the process starts at 705 when a chat request is received at 710 .
- This request may take the form of a user-initiated chat, or a pre-scheduled chat. In either case, a request is forwarded to, for example, the communication system 130 ( FIG. 1 ) for forwarding on to the appropriate recipient identified by the request.
- This chat request may take the form of a more traditional chat initiation request asking a potential chat participant that did not initiate the request at 710 to “accept” the incoming chat before the audio and/or video streams begin.
- the audio and/or video components of the initiating, potential chat participant may be provided in place of the more traditional “acceptance” dialogue box.
- the recipient potential chat participant may be able to pre-screen the call before his or her audio and/or video components begin streaming to the initiating, potential chat participant.
- a chat server is identified at 720 .
- This process which may be undertaken by the cluster control system 140 ( FIG. 1 ), involves the load balancing functionality of the cluster control system 440 ( FIG. 4 ) to ensure that a given server is capable of handling the requested chat. For example, a test may determine that a CPU load is above a pre-determined threshold (e.g. 80%) and require that the cluster control system 440 allocate a new server on the server cluster 450 in order to host the requested chat.
- a pre-determined threshold e.g. 80%
- the cluster control system 440 may allocate a chat instance for the requested chat at 730 . This involves creating a unique chat ID, associating a particular chat instance on a server of the server cluster 450 for the chat
- At least two chat participants are then added to the chat instance at 740 . Once all of the information involved in the allocation is returned to the chat participants, those participants may be added to the chat instance. Subsequent participants may also be added at at 745 . For simplification of explanation, this process is shown at this point in the flowchart of FIG. 7 , however, additional chat participants may join an on-going chat at any time. In addition, chat participants may leave a chat at any time. In order to simplify the flowchart, a test for additional chat participants, shown at 745 , is not shown after each step of FIG. 7
- Each of the computing devices such as the mobile device 110 or client system 120 , associated with a chat participant begins providing video components, along with a unique user ID and chat ID, to the allocated server at 750 . This video is for inclusion in the overall on-going chat.
- the server then transfers those video components to one of a plurality of GPUs operating on the server for encoding.
- the GPU or GPUs encode the individual video components into a master video stream at 770 .
- this process may, for example, involve encoding of the video into a single master video stream with each of the chat participant's real-time images superimposed over a background or “tiled” across a screen as desired by one or more of the chat participants, dependent upon the number of chat participants, or as set by default depending on the number of chat participants.
- the server then returns the master video stream to all chat participants and, to the extent there are any, to all chat viewers at 780 .
- the chat participants and chat viewers, at that point, can see the real-time master video stream.
- a determination of whether the chat is complete at 785 is made. If the chat is complete, the process ends at 795 . If the chat is not complete (all participants have left the chat), then a determination is made whether there are additional chat participants or viewers at 745 and the process continues. Thus, the processing described from steps 740 - 780 continues for each frame of video for each of the chat participants until the chat is complete at 785 .
- FIG. 8 a flowchart of an audio encoding process is shown.
- FIG. 8 has both a start 805 and an end 895 , but the process is cyclical in nature, particularly with regard to the encoding-related steps.
- One or more of the computing devices identified above may be programmed to carry out these steps, using these steps as their algorithm.
- Steps 810 - 845 mirror steps 710 - 745 described above. Their description will not be repeated here.
- a plurality of audio components are received from each chat participant at 850 . These audio components are transferred to one or more GPUs at 860 .
- chats may be viewed by any number of chat viewers in addition to the chat participants. In some cases these numbers may be small, a select team of programmers viewing a meeting among supervisors. In other cases, a small number of chat participants may elect to broadcast their chat to a huge number of viewers, much like a live-streamed television program.
- the GPU operates to encode the audio components into n combined audio streams, where n is the number of chat participants.
- n is the number of chat participants.
- Each of the n combined audio streams will include n ⁇ 1 audio components.
- the combined audio stream returned to each of the chat participants will not include that chat participant's audio component.
- the GPU operates to encode the audio components into n+1 combined audio streams, where n is the number of chat participants. As discussed above, this is one stream, of the n, for each of the chat participants and one master stream (the +1) for all chat viewers.
- Each of the n combined audio streams includes audio components for n ⁇ 1 of the chat participants as discussed above.
- the master audio stream includes all audio components.
- the n combined audio streams are provided to each of the chat participants and the master audio stream is provided to all chat viewers.
- chat is complete at 892 (meaning that all chat participants have left the chat)
- the process ends at 895 . If the chat is not complete at 895 , then a determination is made whether there are additional chat participants or viewers at 845 . Then, the process proceeds as described above with respect to elements 840 - 892 until a chat is complete at 892 , when the process ends.
- “plurality” means two or more. As used herein, a “set” of items may include one or more of such items.
- the terms “comprising”, “including”, “carrying”, “having”, “containing”, “involving”, and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of”, respectively, are closed or semi-closed transitional phrases with respect to claims.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- Information Transfer Between Computers (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/140,839 US9270937B2 (en) | 2013-12-26 | 2013-12-26 | Real time stream provisioning infrastructure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/140,839 US9270937B2 (en) | 2013-12-26 | 2013-12-26 | Real time stream provisioning infrastructure |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150189234A1 US20150189234A1 (en) | 2015-07-02 |
US9270937B2 true US9270937B2 (en) | 2016-02-23 |
Family
ID=53483404
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/140,839 Expired - Fee Related US9270937B2 (en) | 2013-12-26 | 2013-12-26 | Real time stream provisioning infrastructure |
Country Status (1)
Country | Link |
---|---|
US (1) | US9270937B2 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103841353B (en) * | 2014-02-24 | 2017-08-01 | 广州华多网络科技有限公司 | Video interactive method, terminal, server and system |
US10554713B2 (en) * | 2015-06-19 | 2020-02-04 | Microsoft Technology Licensing, Llc | Low latency application streaming using temporal frame transformation |
JP2017069744A (en) * | 2015-09-30 | 2017-04-06 | 沖電気工業株式会社 | Communication control device, program, recording medium, and communication control method |
US10764343B2 (en) | 2015-12-28 | 2020-09-01 | Google Llc | Methods, systems, and media for navigating through a stream of content items |
US20180007115A1 (en) * | 2016-07-01 | 2018-01-04 | Cisco Technology, Inc. | Fog enabled telemetry embedded in real time multimedia applications |
CN108696364B (en) * | 2017-04-06 | 2020-10-16 | 北京云中融信网络科技有限公司 | Request message processing method, chat room message server and chat room system |
CN112672100B (en) * | 2021-03-16 | 2021-07-09 | 浙江华创视讯科技有限公司 | Multi-display-card data cooperative processing method, video conference system and cloud server |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080010347A1 (en) * | 2006-05-02 | 2008-01-10 | Dan Houghton | Group communication system and method |
US20090222572A1 (en) | 2006-05-02 | 2009-09-03 | Sony Computer Entertainment Inc. | Communication system, communication apparatus, communication program, and computer-readable storage medium stored with the communication program |
US20100234002A1 (en) | 2006-05-22 | 2010-09-16 | Afrigis (Pty) Ltd. | Information distribution system and method for a mobile network |
US7904537B2 (en) | 2008-01-11 | 2011-03-08 | Microsoft Corporation | Architecture for online communal and connected experiences |
US20120082226A1 (en) | 2010-10-04 | 2012-04-05 | Emmanuel Weber | Systems and methods for error resilient scheme for low latency h.264 video coding |
US20120127262A1 (en) * | 2010-11-24 | 2012-05-24 | Cisco Technology, Inc. | Automatic Layout and Speaker Selection in a Continuous Presence Video Conference |
US20130123019A1 (en) | 2002-12-10 | 2013-05-16 | David R. Sullivan | System and method for managing audio and video channels for video game players and spectators |
US20130147906A1 (en) | 2011-12-07 | 2013-06-13 | Reginald Weiser | Systems and methods for offloading video processing of a video conference |
US8482593B2 (en) | 2010-05-12 | 2013-07-09 | Blue Jeans Network, Inc. | Systems and methods for scalable composition of media streams for real-time multimedia communication |
US20130218783A1 (en) * | 2012-02-21 | 2013-08-22 | Digital Manufacturing, Inc. | Apparatus and method for real-time data capture and usage for fault repair |
US8529356B2 (en) | 2010-08-26 | 2013-09-10 | Steelseries Aps | Apparatus and method for adapting audio signals |
US20130265378A1 (en) * | 2010-04-07 | 2013-10-10 | Apple Inc. | Switching Cameras During a Video Conference of a Multi-Camera Mobile Device |
US20140149522A1 (en) * | 2012-11-27 | 2014-05-29 | Nhn Corporation | System and method for online fan meeting |
US9001178B1 (en) * | 2012-01-27 | 2015-04-07 | Google Inc. | Multimedia conference broadcast system |
-
2013
- 2013-12-26 US US14/140,839 patent/US9270937B2/en not_active Expired - Fee Related
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130123019A1 (en) | 2002-12-10 | 2013-05-16 | David R. Sullivan | System and method for managing audio and video channels for video game players and spectators |
US20090222572A1 (en) | 2006-05-02 | 2009-09-03 | Sony Computer Entertainment Inc. | Communication system, communication apparatus, communication program, and computer-readable storage medium stored with the communication program |
US20080010347A1 (en) * | 2006-05-02 | 2008-01-10 | Dan Houghton | Group communication system and method |
US20100234002A1 (en) | 2006-05-22 | 2010-09-16 | Afrigis (Pty) Ltd. | Information distribution system and method for a mobile network |
US7904537B2 (en) | 2008-01-11 | 2011-03-08 | Microsoft Corporation | Architecture for online communal and connected experiences |
US20130265378A1 (en) * | 2010-04-07 | 2013-10-10 | Apple Inc. | Switching Cameras During a Video Conference of a Multi-Camera Mobile Device |
US8482593B2 (en) | 2010-05-12 | 2013-07-09 | Blue Jeans Network, Inc. | Systems and methods for scalable composition of media streams for real-time multimedia communication |
US8514263B2 (en) | 2010-05-12 | 2013-08-20 | Blue Jeans Network, Inc. | Systems and methods for scalable distributed global infrastructure for real-time multimedia communication |
US8529356B2 (en) | 2010-08-26 | 2013-09-10 | Steelseries Aps | Apparatus and method for adapting audio signals |
US20120082226A1 (en) | 2010-10-04 | 2012-04-05 | Emmanuel Weber | Systems and methods for error resilient scheme for low latency h.264 video coding |
US20120127262A1 (en) * | 2010-11-24 | 2012-05-24 | Cisco Technology, Inc. | Automatic Layout and Speaker Selection in a Continuous Presence Video Conference |
US20130147906A1 (en) | 2011-12-07 | 2013-06-13 | Reginald Weiser | Systems and methods for offloading video processing of a video conference |
US9001178B1 (en) * | 2012-01-27 | 2015-04-07 | Google Inc. | Multimedia conference broadcast system |
US20130218783A1 (en) * | 2012-02-21 | 2013-08-22 | Digital Manufacturing, Inc. | Apparatus and method for real-time data capture and usage for fault repair |
US20140149522A1 (en) * | 2012-11-27 | 2014-05-29 | Nhn Corporation | System and method for online fan meeting |
Non-Patent Citations (1)
Title |
---|
UBM LLC, Cloud-Based Video Conferencing: a Flexible Approach to Face-to-Face Communication, Technology Brief, Mar. 2013, total of 5 pages. |
Also Published As
Publication number | Publication date |
---|---|
US20150189234A1 (en) | 2015-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20150188928A1 (en) | Private-public chat functionality | |
US9270937B2 (en) | Real time stream provisioning infrastructure | |
US9288441B2 (en) | Distributed transcoding of a video based on insufficient computing resources | |
US11153533B2 (en) | System and method for scalable media switching conferencing | |
US10356365B2 (en) | Framework to support a hybrid of meshed endpoints with non-meshed endpoints | |
US9900553B2 (en) | Multi-stream video switching with selective optimized composite | |
US8966095B2 (en) | Negotiate multi-stream continuous presence | |
US20100185956A1 (en) | Signaling support for sharer switching in application sharing | |
US9560096B2 (en) | Local media rendering | |
US9264662B2 (en) | Chat preauthorization | |
US10523730B2 (en) | Real-time transport protocol (RTP) media conference server routing engine | |
CN108063911B (en) | Video conference capacity expansion method | |
US9413540B2 (en) | Combining P2P and server-based conferencing | |
CN104348700A (en) | Method and system for sending microblog | |
WO2016045496A1 (en) | Media control method and device | |
CN115695387B (en) | Audio and video conference implementation method, audio and video conference system and related devices | |
CN119011551A (en) | Media data web page real-time communication method, system, electronic equipment and storage medium | |
Zhiyuan et al. | A Cloud-Based Pan-Terminal Video Conferencing System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ONCAM, INC., ARIZONA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHAPIRO, JOE;REEL/FRAME:031849/0442 Effective date: 20131220 |
|
AS | Assignment |
Owner name: ONCAM, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHAPIRO, JOE;REEL/FRAME:032005/0021 Effective date: 20131220 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: SOCAL IP LAW GROUP LLP, CALIFORNIA Free format text: LIEN;ASSIGNOR:ONCAM INC.;REEL/FRAME:041461/0897 Effective date: 20170117 |
|
AS | Assignment |
Owner name: ONCAM, INC., ARIZONA Free format text: RELEASE OF LIEN & SECURITY INTEREST;ASSIGNOR:SOCAL IP LAW GROUP, LLP;REEL/FRAME:048376/0771 Effective date: 20190215 |
|
AS | Assignment |
Owner name: TONN INVESTMENTS, LLC, ARIZONA Free format text: COURT ORDER;ASSIGNOR:ONCAM, INC.;REEL/FRAME:048832/0557 Effective date: 20180731 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20200223 |