CN118509423B

CN118509423B - A method for transmitting audio and video in fusion communication

Info

Publication number: CN118509423B
Application number: CN202410961307.1A
Authority: CN
Inventors: 孙辉
Original assignee: Beijing Nelda Technology Co ltd
Current assignee: Beijing Nelda Technology Co ltd
Priority date: 2024-07-18
Filing date: 2024-07-18
Publication date: 2024-09-13
Anticipated expiration: 2044-07-18
Also published as: CN118509423A

Abstract

A method for transmitting audio and video in converged communication, the steps of which include: a server collects device information of a terminal, and selects a coding standard with the smallest bandwidth usage as an adaptation coding standard corresponding to the terminal. The server dynamically allocates the bandwidth of the terminal based on congestion control technology, and pushes the corresponding stream of the adaptation coding standard according to the request information of the terminal. The server also collects user tags and performs a weighted evaluation on the terminal; when the server detects network congestion, the media service of the terminal is suspended based on the result of the weighted evaluation; when the correlation between the content of the media service and the user tag exceeds a threshold, the server performs a secondary weighted evaluation on the terminal, and suspends or resumes the media service of the terminal. The present invention maximizes the use of network bandwidth through the server's same-source multi-stream push method, combined with congestion control technology, to provide users of each participating terminal with a high-quality audio and video communication experience, reduce hardware investment in a specific communication system, and avoid waste of resources.

Description

A method for transmitting audio and video in fusion communication

技术领域Technical Field

本发明属于通讯技术领域，具体涉及一种融合通信音视频传输方法。The present invention belongs to the field of communication technology, and in particular relates to a method for transmitting integrated communication audio and video.

背景技术Background Art

现阶段融合通信技术普遍应用于各种领域。其主要形式为视频会议，通过音视频同步传输，保证参与人员通过各自的终端设备能实时参与线上会议，特别地，在紧急事件出现时，通过融合通信指挥调度平台实现前、后方协调指挥，协同协调业务部署，并迅速启动混合音视频会议，实现全员对事件信息的共享互通。At present, converged communication technology is widely used in various fields. Its main form is video conferencing, which ensures that participants can participate in online meetings in real time through their respective terminal devices through synchronous audio and video transmission. In particular, when an emergency occurs, the converged communication command and dispatch platform can be used to achieve front-end and rear-end coordination and coordinated business deployment, and quickly start a hybrid audio and video conference to achieve sharing and communication of event information among all personnel.

由于融合通信具备较强的时效性，且网络环境在紧急事件情况下不能得到很好的保证；因此，对其具体实施方法进行传输优化，可对调度效果以及成本起至关重要作用。Since converged communications have a strong timeliness and the network environment cannot be well guaranteed in emergency situations, transmission optimization of its specific implementation methods can play a crucial role in scheduling effects and costs.

目前，市面上的音视频传输优化方案常见的是基于网页视频语音实时通讯协议（以下简称：WebRTC）自带的优化方案，从而有如下弊端：At present, the most common audio and video transmission optimization solutions on the market are based on the optimization solutions provided by the Web Video Voice Real-Time Communication Protocol (hereinafter referred to as: WebRTC), which has the following disadvantages:

（1）不能有效的利用网络带宽和解决弱网下的丢包、延时问题；(1) Failure to effectively utilize network bandwidth and solve packet loss and delay problems in weak networks;

（2）不能有效的适配各个终端的解码要求，从而增加基础设施的投资和资源的浪费；(2) It cannot effectively adapt to the decoding requirements of each terminal, thereby increasing infrastructure investment and wasting resources;

（3）发生网络拥塞时，不能进行网络带宽有效调节，造成网络资源浪费，且影响会议质量。(3) When network congestion occurs, the network bandwidth cannot be effectively adjusted, resulting in a waste of network resources and affecting the quality of the meeting.

由此，需要针对融合通信音视频传输技术提出新型方案，以便最大化利用网络带宽，提供优质的音视频通信体验。Therefore, it is necessary to propose new solutions for converged communication audio and video transmission technology in order to maximize the use of network bandwidth and provide high-quality audio and video communication experience.

发明内容Summary of the invention

本发明提供一种融合通信音视频传输方法，以解决上述问题。The present invention provides a fusion communication audio and video transmission method to solve the above problems.

本发明采用以下技术方案：The present invention adopts the following technical solutions:

一种融合通信音视频传输方法，其主要步骤包括：A method for transmitting audio and video in converged communication, the main steps of which include:

服务器与多个终端建立连接，且所述服务器采集所述多个终端的设备信息；The server establishes connections with multiple terminals, and the server collects device information of the multiple terminals;

所述服务器基于所述设备信息列出所述多个终端中每一个终端对应的支持的编码标准，并由所述服务器分别选取所述对应的支持的编码标准中带宽占比最小的编码标准为所述多个终端中每一个终端对应的适配编码标准；The server lists the supported coding standards corresponding to each of the multiple terminals based on the device information, and selects the coding standard with the smallest bandwidth share from the corresponding supported coding standards as the adapted coding standard corresponding to each of the multiple terminals;

所述服务器基于拥塞控制技术动态分配所述多个终端的带宽；The server dynamically allocates bandwidth of the multiple terminals based on congestion control technology;

所述服务器根据所述多个终端的请求信息及所述多个终端的带宽，由所述服务器向所述多个终端推送所述对应的适配编码标准的流；The server pushes the corresponding streams of the adapted coding standard to the multiple terminals according to the request information of the multiple terminals and the bandwidth of the multiple terminals;

所述服务器采集所述多个终端中每一个终端的每一个用户的用户标签，并由所述服务器对所述用户标签提取特征信息；所述服务器计算所述特征信息的权重；The server collects a user tag of each user of each terminal among the multiple terminals, and extracts feature information from the user tag; the server calculates a weight of the feature information;

所述服务器基于所述权重对所述多个终端中每一个终端进行加权评估；The server performs a weighted evaluation on each of the multiple terminals based on the weight;

当所述服务器检测到网络拥塞时，所述服务器基于所述加权评估的结果暂停所述多个终端中至少一个终端的媒体服务；When the server detects network congestion, the server suspends a media service of at least one of the plurality of terminals based on a result of the weighted evaluation;

所述加权评估中还设有动态权重；所述服务器检测到网络拥塞时，当所述服务器推送的所述媒体服务的内容与所述用户标签的相关性超出阈值时，所述服务器基于所述动态权重对所述多个终端中每一个终端进行二次加权评估，并由所述服务器基于所述二次加权评估的结果暂停或恢复所述多个终端中的至少一个终端的媒体服务。A dynamic weight is also provided in the weighted evaluation; when the server detects network congestion, when the correlation between the content of the media service pushed by the server and the user tag exceeds a threshold, the server performs a secondary weighted evaluation on each of the multiple terminals based on the dynamic weight, and the server suspends or resumes the media service of at least one of the multiple terminals based on the result of the secondary weighted evaluation.

可选的，所述权重基于层次分析法获得，且所述服务器计算所述权重的步骤包括：Optionally, the weight is obtained based on a hierarchical analysis method, and the step of the server calculating the weight includes:

确定所述权重计算的层次结构模型；Determining a hierarchical model for the weight calculation;

建立所述特征信息的重要程度判断矩阵；Establishing an importance judgment matrix of the feature information;

通过特征向量法计算所述特征信息的重要程度，并对所得值进行归一化处理，得到所述权重。The importance of the feature information is calculated by a feature vector method, and the obtained value is normalized to obtain the weight.

可选的，所述媒体服务的内容与所述用户标签的相关性计算步骤包括：Optionally, the step of calculating the correlation between the content of the media service and the user tag includes:

所述服务器基于大语言模型实时对所述媒体服务的内容进行信息抽取，所述服务器基于所述信息抽取的结果与所述多个终端中每一个终端的每一个用户的用户标签进行相关性计算，获得计算结果；The server extracts information from the content of the media service in real time based on the large language model, and the server calculates correlation between the information extraction result and the user tag of each user of each of the multiple terminals to obtain a calculation result;

所述服务器将所述计算结果与所述阈值进行比较。The server compares the calculation result with the threshold value.

可选的，所述服务器和所述多个终端通过公共网络或专用网络建立连接；Optionally, the server and the multiple terminals are connected via a public network or a private network;

所述服务器和所述多个终端之间的数据传输基于网页视频语音实时通讯协议；The data transmission between the server and the plurality of terminals is based on a web video and voice real-time communication protocol;

所述拥塞控制技术为基于网页视频语音实时通讯协议的延时梯度和丢包率的拥塞控制算法。The congestion control technology is a congestion control algorithm based on the delay gradient and packet loss rate of the web video and voice real-time communication protocol.

可选的，所述服务器采集所述多个终端的设备信息的步骤包括：Optionally, the step of the server collecting device information of the multiple terminals includes:

所述网页视频语音实时通讯协议中配置有会话描述协议，所述会话描述协议的文本包括所述设备信息；The webpage video and voice real-time communication protocol is configured with a session description protocol, and the text of the session description protocol includes the device information;

所述服务器采集所述会话描述协议的文本，并由所述服务器获得所述设备信息；The server collects the text of the session description protocol, and obtains the device information by the server;

所述设备信息包括所述多个终端的类型信息、操作系统信息、固件信息。The device information includes type information, operating system information, and firmware information of the multiple terminals.

可选的，所述编码标准包括VP8、VP9、H.264、H.265；Optionally, the coding standard includes VP8, VP9, H.264, and H.265;

当所述多个终端中至少一个终端支持AV1解码时，所述编码标准还包括AV1。When at least one of the multiple terminals supports AV1 decoding, the encoding standard also includes AV1.

可选的，所述加权评估的结果为所述多个终端中每一个终端的加权值，并将所述多个终端按照加权值从小到大排列；Optionally, the result of the weighted evaluation is a weighted value of each of the multiple terminals, and the multiple terminals are arranged in ascending order according to the weighted values;

当检测到网络拥塞时，依次按照加权值从小到大暂停对应终端的媒体服务至网络拥塞结束；When network congestion is detected, media services of corresponding terminals are suspended in descending order of weighted values until the network congestion ends;

当检测到网络拥塞结束后，所述服务器逆向恢复已暂停终端的媒体服务。When it is detected that the network congestion ends, the server reversely resumes the media service of the suspended terminal.

可选的，所述用户标签的特征信息包括多个标题；所述多个终端中每一个终端的用户对所述标题进行选择或标注。Optionally, the characteristic information of the user tag includes multiple titles; and the user of each of the multiple terminals selects or marks the title.

可选的，所述用户标签的特征信息中还包括多个预设选项，且至少部分所述标题中对应设有至少一个所述预设选项；Optionally, the characteristic information of the user tag further includes a plurality of preset options, and at least one of the preset options is correspondingly provided in at least part of the titles;

所述预设选项对应设有得分，所述得分用于所述加权评估。The preset options are each provided with a score, and the score is used for the weighted evaluation.

可选的，所述标题均配置有重要程度的系数，所述系数用于计算所述权重。Optionally, each of the titles is configured with an importance coefficient, and the coefficient is used to calculate the weight.

本发明采用以上技术方案与现有技术相比，具有以下技术效果：Compared with the prior art, the present invention adopts the above technical solution and has the following technical effects:

本发明通过服务器同源多流推送的方式，结合拥塞控制技术，最大利用网络带宽，为各参与终端的用户提供优质的音视频通信体验，减少在具体通讯系统中硬件投资，避免资源浪费。The present invention uses a server same-source multi-stream push method combined with congestion control technology to maximize network bandwidth, provide high-quality audio and video communication experience for users of each participating terminal, reduce hardware investment in a specific communication system, and avoid resource waste.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1是本发明系统简图；Fig. 1 is a schematic diagram of the system of the present invention;

图2是本发明推流流程图；FIG2 is a flow chart of the streaming method of the present invention;

图3是本发明加权评估流程图。FIG3 is a flow chart of weighted evaluation of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明实施方式的目的、技术方案和优点更加清楚，下面将结合本发明实施方式中的附图，对本发明实施方式中的技术方案进行清楚、完整地描述，显然，所描述的实施方式是本发明一部分实施方式，而不是全部的实施方式。基于本发明中的实施方式，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施方式，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters denote similar items in the following drawings, and therefore, once an item is defined in one drawing, it does not require further definition and explanation in the subsequent drawings.

还需要说明的是，本发明中所使用的方法如无特殊规定，均为常规的方法；所使用的原料和装置，如无特殊规定，均为常规的市售产品。It should also be noted that the methods used in the present invention are all conventional methods unless otherwise specified; the raw materials and devices used are all conventional commercially available products unless otherwise specified.

本发明提供一种融合通信音视频传输方法，其主要步骤包括：The present invention provides a method for transmitting audio and video in converged communication, the main steps of which include:

首先，本实施例的服务器与终端之间基于WebRTC协议（网页实时通信Web Real-Time Communication的缩写，是一个支持网页浏览器进行实时语音通话或视频聊天的技术）进行通讯。虽然，其可实现点对点通讯，但是考虑应用场景的特殊性，本实施例中，依然在通讯网络中部署相应的服务器，包括信令服务器和媒体服务器（也可以在单一服务器中集成信令服务和媒体服务，如图1所示，并将其他服务器作为冗余备份），并对应提供相应的信令服务和媒体服务，再通过事件通知关联上述两种服务。其中，媒体服务主要负责处理和传输实际的媒体内容，例如音频、视频、图像等。在视频通话中，媒体服务负责将双方的视频和音频数据进行编码、传输和解码，让用户能够清晰流畅地看到和听到对方。事件通知服务则是用于向信令服务方传递特定事件的发生信息。这些事件可以是系统状态的变化、用户操作的结果、异常情况的出现等。信令服务在通信中起着控制和协调的作用，负责建立、管理和终止通信会话，以及在会话过程中传递控制信息。在各个终端，如终端1、终端2、终端3、终端4中分别设置有显示设备及相应的终端设备，终端设备具体如摄像头、麦克风等数据采集设备，以便实现实时在线的音视频混合通讯双向通讯。First, the server and the terminal of this embodiment communicate based on the WebRTC protocol (the abbreviation of Web Real-Time Communication, which is a technology that supports web browsers to conduct real-time voice calls or video chats). Although it can realize point-to-point communication, considering the particularity of the application scenario, in this embodiment, corresponding servers are still deployed in the communication network, including signaling servers and media servers (signaling services and media services can also be integrated in a single server, as shown in Figure 1, and other servers are used as redundant backups), and corresponding signaling services and media services are provided accordingly, and then the above two services are associated through event notifications. Among them, the media service is mainly responsible for processing and transmitting actual media content, such as audio, video, images, etc. In a video call, the media service is responsible for encoding, transmitting and decoding the video and audio data of both parties, so that users can see and hear each other clearly and smoothly. The event notification service is used to transmit the occurrence information of specific events to the signaling service party. These events can be changes in system status, results of user operations, the occurrence of abnormal situations, etc. The signaling service plays a control and coordination role in communication, and is responsible for establishing, managing and terminating communication sessions, and transmitting control information during the session. In each terminal, such as terminal 1, terminal 2, terminal 3, and terminal 4, a display device and corresponding terminal devices are respectively provided. The terminal devices are specifically data acquisition devices such as cameras and microphones, so as to realize real-time online audio and video mixed communication and two-way communication.

服务器采集终端的设备信息；可选的，服务器在媒体协商阶段采集WebRTC中的会话描述协议（SDP Unified Plan），其文本包括设备信息。具体地，是开发人员在UnifiedPlan中添加设备信息需求，明确需要在 Unified Plan 文本中包含哪些设备信息；利用WebRTC 提供的 API 来获取设备信息。例如，可以使用 navigator.media Devices 获取媒体设备列表，从中获取如显卡等相关信息，并构建相应的 Unified Plan文本。服务器采集Unified Plan的文本，并获得上述设备信息；可选的，设备信息具体包括终端的类型信息、操作系统信息、固件信息；如类型信息分为平板、手机、笔记本电脑、台式机、专用设备等；操作系统信息分为Android、iOS、Mac OS、windows、linux等；固件信息为各系统对应版本号。The server collects the device information of the terminal; optionally, the server collects the Session Description Protocol (SDP Unified Plan) in WebRTC during the media negotiation phase, and its text includes device information. Specifically, the developer adds device information requirements in the Unified Plan, and specifies which device information needs to be included in the Unified Plan text; the device information is obtained using the API provided by WebRTC. For example, you can use navigator.media Devices to obtain a list of media devices, from which relevant information such as graphics cards can be obtained, and the corresponding Unified Plan text can be constructed. The server collects the text of the Unified Plan and obtains the above-mentioned device information; optionally, the device information specifically includes the type information, operating system information, and firmware information of the terminal; for example, the type information is divided into tablets, mobile phones, laptops, desktops, dedicated devices, etc.; the operating system information is divided into Android, iOS, Mac OS, windows, linux, etc.; the firmware information is the corresponding version number of each system.

服务器基于设备信息分别列出各个终端支持的编码标准的列表，并针对每个终端中选取列表内带宽占比最小的编码标准为该终端对应的适配编码标准；可选的，编码标准包括VP8、VP9、H.264、H.265；并且，当服务器通过Unified Plan文本发现某些终端具备显卡，且支持AV1解码时，上述编码标准还包括AV1。其中，H.265是高效视频编码（HEVC），MPEG-H part 2，具体为一种视频压缩标准，与AVC相比在相同的视频质量水平下提供大约两倍的数据压缩比，或者以相同的比特率显著提高视频质量；同等分辨率模式下，H.265占用带宽比H.264少大约39%-44%，AV1占用带宽比H.264少50%左右。The server lists the encoding standards supported by each terminal based on the device information, and selects the encoding standard with the smallest bandwidth share in the list for each terminal as the adaptation encoding standard corresponding to the terminal; optionally, the encoding standards include VP8, VP9, H.264, and H.265; and when the server finds through the Unified Plan text that some terminals have graphics cards and support AV1 decoding, the above encoding standards also include AV1. Among them, H.265 is High Efficiency Video Coding (HEVC), MPEG-H part 2, specifically a video compression standard, which provides about twice the data compression ratio at the same video quality level compared to AVC, or significantly improves the video quality at the same bit rate; in the same resolution mode, H.265 occupies about 39%-44% less bandwidth than H.264, and AV1 occupies about 50% less bandwidth than H.264.

举例说明，服务器检测到终端1和终端4均为专用的Android平板，但版本号小于10（Android 8），支持的编码标准有H.264、H.265，根据使用场景及条件（视频分辨率、帧率、码率等）预估带宽占比，终端1和终端4的列表中编码标准的排序从小到大均为：[H.265]、[H.264]，由此，服务器选取H.265这一编码标准分别为终端1和终端4的适配编码标准，并以此进行推流；同理，如服务器检测到终端2和终端3为专用设备的web端，仅支持H.264和其他编码标准，且H.264为带宽占比最小的编码格式，故选取H.264这一编码标准为终端2和终端3的适配编码标准，并以此进行推流；同理，又如服务器检测到终端z（图中未显示）为Windows台式机，其浏览器和硬件支持AV1解码，故根据使用场景及条件（视频分辨率、帧率、码率等）预估带宽占比，其列表为（从小到大）：[AV1]、[H.265]、[VP9]、[VP8]、[H.264]，由此，服务器选取AV1这一编码标准为终端z的适配编码标准，并以此进行推流。进一步，也可采取问询的方式完成上述操作，即如图2所示，在终端订阅流时，服务器访问终端或采集Unified Plan文本，并依次从带宽占比小到大（如H.265、H.264）编码标准进行比对，并选取适配编码标准推流，终端进行相应的收流操作即可，这样无需各终端进行转码操作，不仅节省了网络带宽资源，还能减轻终端压力。For example, the server detects that both terminal 1 and terminal 4 are dedicated Android tablets, but the version number is less than 10 (Android 8), and the supported encoding standards are H.264 and H.265. According to the usage scenario and conditions (video resolution, frame rate, bit rate, etc.), the bandwidth share is estimated. The encoding standards in the list of terminal 1 and terminal 4 are sorted from small to large: [H.265], [H.264]. Therefore, the server selects H.265 as the adaptation encoding standard for terminal 1 and terminal 4 respectively, and uses it for streaming. Similarly, if the server detects that terminal 2 and terminal 3 are web terminals of dedicated devices, they only support H.264 and other encoding standards, and H.264 is the encoding format with the smallest bandwidth share. Therefore, the H.264 coding standard is selected as the adaptive coding standard for terminal 2 and terminal 3, and the streaming is pushed based on it; similarly, if the server detects that terminal z (not shown in the figure) is a Windows desktop computer, and its browser and hardware support AV1 decoding, the bandwidth share is estimated according to the usage scenario and conditions (video resolution, frame rate, bit rate, etc.), and the list is (from small to large): [AV1], [H.265], [VP9], [VP8], [H.264]. Therefore, the server selects AV1 as the adaptive coding standard for terminal z, and pushes the stream based on it. Further, the above operation can also be completed by inquiry, that is, as shown in Figure 2, when the terminal subscribes to the stream, the server accesses the terminal or collects the Unified Plan text, and compares the coding standards from the smallest bandwidth share to the largest (such as H.265, H.264) in turn, and selects the adaptive coding standard to push the stream, and the terminal performs the corresponding streaming operation. In this way, each terminal does not need to perform transcoding operations, which not only saves network bandwidth resources, but also reduces the pressure on the terminal.

其中，在具体推流前服务器还需要确认终端是否分配到足够的带宽，以便进行相应的服务，由此，在通讯期初始阶段或进行期间，服务器实时基于拥塞控制技术动态分配终端的带宽，通过监测网络拥塞情况，依据拥塞程度调整数据发送速率，以避免网络拥塞的发生。具体地，本实施例中拥塞控制技术采用延时梯度和丢包率的拥塞控制算法，即GCC(Google Congestion Control)算法。进一步，在本实施例中，举例如图2所示，当丢包率大于25%即由服务器判断当前组建的融合通信音视频传输网络出现网络拥塞状况，由此服务器拒绝新请求加入的终端的媒体服务，或暂停至少部分终端的媒体服务。其中，媒体服务包含收流服务和推流服务，也可简称为收流和推流，又由于网络拥塞具体存在上行拥塞或下行拥塞，因此实际工作中暂停终端的媒体服务，可根据拥塞情况确定，如暂停部分终端的收流或推流，又或是收流和推流均暂停，从而调整整体会议的带宽使用，满足实际需求。Among them, before the specific streaming, the server also needs to confirm whether the terminal is allocated enough bandwidth to provide the corresponding service. Therefore, in the initial stage or during the communication period, the server dynamically allocates the bandwidth of the terminal based on the congestion control technology in real time, and adjusts the data transmission rate according to the degree of congestion by monitoring the network congestion to avoid the occurrence of network congestion. Specifically, the congestion control technology in this embodiment adopts the congestion control algorithm of delay gradient and packet loss rate, namely the GCC (Google Congestion Control) algorithm. Further, in this embodiment, as shown in Figure 2, when the packet loss rate is greater than 25%, the server determines that the currently established converged communication audio and video transmission network has network congestion, and the server rejects the media service of the terminal that requests to join the new terminal, or suspends the media service of at least some terminals. Among them, media services include streaming reception service and streaming push service, which can also be referred to as streaming reception and streaming push. Since network congestion specifically exists in uplink congestion or downlink congestion, the suspension of terminal media services in actual work can be determined according to the congestion situation, such as suspending streaming reception or streaming push of some terminals, or suspending both streaming reception and streaming push, so as to adjust the bandwidth usage of the overall meeting to meet actual needs.

当服务器检测到网络拥塞时（如上所述，通讯数据丢包率大于25%），按照终端排序依次暂停排序靠前的终端的媒体服务，直至网络拥塞结束；When the server detects network congestion (as mentioned above, the communication data packet loss rate is greater than 25%), the media services of the terminals ranked higher in order are suspended until the network congestion ends;

当检测到网络拥塞结束后，服务器逆向恢复已暂停终端的媒体服务。其中，逆向应当理解为与暂停终端的顺序相反，按照时间顺序从最后一个暂停的终端依次向前恢复，至检测出现网络拥塞情况时停止，或全部终端均恢复媒体服务时停止。When the network congestion is detected to be over, the server reversely resumes the media services of the suspended terminals. Reverse should be understood as the reverse order of suspending the terminals, and resumes the services from the last suspended terminal in chronological order until the network congestion is detected or all terminals resume the media services.

上述过程应当理解为一个动态的过程，即出现带宽与视频会议需求不匹配的情况下，服务器会根据检测网络拥塞情况，实时进行调节，避免网络资源转好的时候或有终端自主退出时，还存在除预留带宽以外的空闲带宽问题；或是带宽不足的时候需要暂停的终端却依然挤占带宽的问题。The above process should be understood as a dynamic process, that is, when the bandwidth does not match the video conferencing requirements, the server will make real-time adjustments based on the detected network congestion to avoid the problem of idle bandwidth other than the reserved bandwidth when network resources improve or when a terminal exits voluntarily; or the problem of terminals that need to be paused still occupying bandwidth when the bandwidth is insufficient.

暂停终端的媒体服务需要有对应的策略，基于常规思路对后进入会议或网络环境较差的终端进行暂停仅适用于一般媒体服务的情况，并不适用于特殊情况或应急情况，尤其是在部分特别会议中，网络环境越差，说明其所处场景情况更加复杂，且需要对应的会议支持，且部分参与人员务必要保持在线，由此本实施例方法中还包括服务器对各终端进行加权评估的步骤，并根据加权评估的结果相应暂停终端的服务。A corresponding strategy is required to suspend the media service of a terminal. Suspending the terminal that enters the meeting later or has a poor network environment based on conventional ideas is only applicable to general media services, and is not applicable to special situations or emergency situations. Especially in some special meetings, the worse the network environment, the more complicated the scenario is, and the corresponding meeting support is required, and some participants must stay online. Therefore, the method of this embodiment also includes a step in which the server performs a weighted evaluation on each terminal, and suspends the terminal service accordingly according to the result of the weighted evaluation.

具体地，服务器采集每一个终端的每一个用户的用户标签，并由服务器对用户标签提取特征信息；服务器计算特征信息的权重；其中，权重基于层次分析法获得，服务器计算步骤设有：Specifically, the server collects the user tag of each user of each terminal, and extracts feature information from the user tag; the server calculates the weight of the feature information; wherein the weight is obtained based on the hierarchical analysis method, and the server calculation step includes:

确定权重计算的层次结构模型；Determine the hierarchical model for weight calculation;

建立特征信息的重要程度判断矩阵；Establish the importance judgment matrix of feature information;

通过特征向量法计算特征信息的重要程度，并对所得值进行归一化处理，得到权重：The importance of feature information is calculated by the eigenvector method, and the obtained value is normalized to obtain the weight:

服务器基于权重对每一个终端进行加权评估；The server performs a weighted evaluation on each terminal based on the weight;

当服务器检测到网络拥塞时，服务器基于加权评估的结果暂停多个终端中至少一个终端的媒体服务。When the server detects network congestion, the server suspends a media service of at least one of the plurality of terminals based on a result of the weighted evaluation.

其中，考虑节约系统计算资源的方法如下：Among them, the methods of saving system computing resources are as follows:

S1、在终端初始连接服务器时，服务器采集各终端的用户标签。其中，本实施例中用户标签的特征信息为该用户标签的标题和对应的选项。至少部分用户标签的标题及对应选项是由会议发起方在会议创建阶段根据系统预设、建议或自定义方式进行组合给出，如[职位]、[参会意愿]、[紧急情况]、[特殊标记]等，进一步，在各标题下存在多个选项（不同标题设置有不同的选项，如：“是”、“否”、“高级”、“中级”、“助理”、“主管”等），且会议发起方可对各标题进行重要程度的评分；用户在终端对前述用户标签的选项进行选择（各选项内置有对应的得分），并由终端连接服务器时上报。S1. When the terminal initially connects to the server, the server collects the user tags of each terminal. Among them, the characteristic information of the user tag in this embodiment is the title of the user tag and the corresponding options. At least part of the titles and corresponding options of the user tags are given by the conference initiator in the conference creation stage according to the system preset, suggestion or custom method, such as [Position], [Willingness to Participate], [Emergency], [Special Mark], etc. Further, there are multiple options under each title (different titles have different options, such as: "Yes", "No", "Senior", "Intermediate", "Assistant", "Supervisor", etc.), and the conference initiator can score the importance of each title; the user selects the options of the aforementioned user tags at the terminal (each option has a corresponding score built in), and the terminal reports it when connecting to the server.

S2、服务器分配部分计算资源进行权重计算，并进行加权评估。其中，权重基于层次分析法获得，举例如下；S2. The server allocates some computing resources to perform weight calculation and weighted evaluation. The weights are obtained based on the hierarchical analysis method, as shown below:

S21、确定权重计算层次结构模型，可按常规划分为目标层、准侧层、方案层；S21, determine the weight calculation hierarchical structure model, which can be conventionally divided into a target layer, a quasi-side layer, and a solution layer;

S22、建立特征信息的重要程度判断矩阵（两两比较矩阵）；S22, establishing a feature information importance judgment matrix (pairwise comparison matrix);

假设对于重要程度这一准则，存在用户标签中的[职位]和[特殊标记]，并进行两两比较，得到如下判断矩阵（表1）：Assuming that for the criterion of importance, there are [Position] and [Special Tag] in the user tag, and a pairwise comparison is performed, the following judgment matrix is obtained (Table 1):

重要程度Importance 职位Position 特殊标记Special markings 职位Position 11 33 特殊标记Special markings 1/31/3 11

表1Table 1

S23、计算权重；S23, calculating weight;

通过特征向量法计算特征信息的重要程度，其中，重要程度的系数为上述人为设置的数值，[职位]为1，[特殊标记]为3；The importance of the feature information is calculated by the feature vector method, where the coefficient of the importance is the above-mentioned artificially set value, [Position] is 1, and [Special Mark] is 3;

计算矩阵的每行乘积：Compute the row product of a matrix:

[职位]行乘积=1×3=3；[Position] Row product = 1×3 = 3;

[特殊标记]行乘积=(1/3)×1=1/3；[Special mark] Row product = (1/3) × 1 = 1/3;

计算每行乘积的n次方根（n为矩阵阶数，本实施例为2）；Calculate the nth root of the product of each row (n is the matrix order, which is 2 in this embodiment);

[职位]的n次方根=3^(1/2)≈1.732；The nth root of [position] = 3^(1/2)≈1.732;

[特殊标记]的n次方根=(1/3)^(1/2)≈0.577；[Special mark] nth root = (1/3)^(1/2)≈0.577;

对计算获得的各标题的重要程度的值进行归一化处理，从而得到各标题的权重：The importance values of each title calculated are normalized to obtain the weight of each title:

[职位]的权重=1.732/(1.732+0.577)≈0.75；The weight of [position] = 1.732/(1.732+0.577)≈0.75;

[特殊标记]的权重=0.577/(1.732+0.577)≈0.25；The weight of [special mark] = 0.577/(1.732+0.577)≈0.25;

S24、加权评估；S24, weighted assessment;

假设各终端中存在终端z₁和终端z₂，且终端z₁用户A在[职位]标题下选项得分为0.8，[特殊标记]标题下选项得分为0.6；终端z₂用户B的[职位]标题下选项得分为0.6，[特殊标记]标题下选项得分为0.8。由于终端与用户对应，可理解为，终端加权评估的结果为用户加权评估的结果，即加权值。Assume that there are terminal z ₁ and terminal z ₂ among the terminals, and the option score of user A of terminal z ₁ under the title [Position] is 0.8, and the option score of user A under the title [Special Mark] is 0.6; the option score of user B of terminal z ₂ under the title [Position] is 0.6, and the option score of user B under the title [Special Mark] is 0.8. Since the terminal corresponds to the user, it can be understood that the result of the terminal weighted evaluation is the result of the user weighted evaluation, that is, the weighted value.

用户A加权评估的结果（终端z₁加权值）=0.8×0.75+0.6×0.25=0.75；The result of the weighted evaluation of user A (the weighted value of terminal z ₁ ) = 0.8 × 0.75 + 0.6 × 0.25 = 0.75;

用户B加权评估的结果（终端z₂加权值）=0.6×0.75+0.8×0.25=0.65；The result of the weighted evaluation of user B (the weighted value of terminal z ₂ ) = 0.6 × 0.75 + 0.8 × 0.25 = 0.65;

可选的，在上述加权评估中还可引入动态加权，加权评估中还设有动态权重，如图3所示，在服务器检测到网络拥塞的前提下，服务器实时检测当前媒体服务的内容与用户标签的相关性是否超出阈值，当相关性超出阈值的时候，服务器基于动态权重对终端进行二次加权评估，并基于二次加权评估的结果，通过调整各终端的带宽，暂停或恢复部分终端的媒体服务。动态权重具体是在媒体服务中预设一个权重项（如[会议关联性]），并与其他标题一并按照上述步骤计算获得相应的权重，同期还在服务器部署大语言模型或调用现有的大语言模型。进一步，服务器基于大语言模型实时对媒体服务的内容进行信息抽取，服务器基于信息抽取的结果与每一个终端涵盖的用户标签进行相关性计算，获得计算结果；服务器将计算结果与阈值进行比较。较优的，在上述方案中选取一种有利于节省系统计算资源的方式是通过预先准备会议进程材料的方式进行，具体地，是在媒体服务创建阶段（即会议创建阶段）将进程材料转换为文本，输入大语言模型中；并通过注意力机制在通讯开始后实时将媒体服务的内容（会议音频转化为文本）持续输入大语言模型中，期间服务器的系统通过大语言模型的问答模式周期性询问大语言模型当前媒体服务的进程，如询问后出现进程（可理解为进程标题）与某一终端的某一个用户标签相关性超出阈值时，则服务器对该终端或全部终端进行动态加权，在原有的加权评估的基础上形成二次加权评估。本实施例中具体操作是对该终端的[会议关联性]标题的得分赋值（之前各终端此项赋值为0，从而不计入初始的加权评估中），从而以调大该终端二次加权评估的结果。进一步，关于二次加权的推出机制可根据实际情况确认，在上述实施例的基础上，一种较为简单的方式是服务器检测到当前会议进程结束或开启新的会议进程，则重新对[会议关联性]标题的得分赋值0，从而退出二次加权评估，回归初始的加权评估结果。Optionally, dynamic weighting can be introduced in the above weighted evaluation, and dynamic weights are also provided in the weighted evaluation. As shown in FIG3, under the premise that the server detects network congestion, the server detects in real time whether the correlation between the content of the current media service and the user tag exceeds the threshold. When the correlation exceeds the threshold, the server performs a secondary weighted evaluation on the terminal based on the dynamic weight, and based on the result of the secondary weighted evaluation, the server adjusts the bandwidth of each terminal to suspend or resume the media service of some terminals. The dynamic weight is specifically to preset a weight item (such as [Conference Relevance]) in the media service, and calculate the corresponding weight together with other titles according to the above steps. At the same time, a large language model is deployed on the server or an existing large language model is called. Further, the server extracts information from the content of the media service in real time based on the large language model, and the server calculates the correlation between the result of the information extraction and the user tags covered by each terminal to obtain the calculation result; the server compares the calculation result with the threshold. Preferably, in the above scheme, a method that is conducive to saving system computing resources is to prepare the conference process materials in advance. Specifically, the process materials are converted into text and input into the large language model during the media service creation stage (i.e., the conference creation stage); and the content of the media service (conference audio converted into text) is continuously input into the large language model in real time after the communication starts through the attention mechanism. During this period, the server system periodically inquires the large language model about the process of the current media service through the question-and-answer mode of the large language model. If the correlation between the process (which can be understood as the process title) and a user tag of a certain terminal exceeds the threshold after the inquiry, the server dynamically weights the terminal or all terminals, and forms a secondary weighted evaluation based on the original weighted evaluation. The specific operation in this embodiment is to assign a score to the [Conference Relevance] title of the terminal (previously, each terminal was assigned 0 for this item, so it was not included in the initial weighted evaluation), so as to increase the result of the secondary weighted evaluation of the terminal. Furthermore, the secondary weighted launch mechanism can be confirmed according to actual conditions. On the basis of the above embodiments, a simpler way is that when the server detects that the current meeting process has ended or a new meeting process has started, it re-assigns the score of the [Meeting Relevance] title to 0, thereby exiting the secondary weighted evaluation and returning to the initial weighted evaluation result.

S3、当服务器检测到网络拥塞时（丢包率大于25%），对部分加权评估结果低的终端暂停媒体服务；S3. When the server detects network congestion (packet loss rate is greater than 25%), it suspends media services for some terminals with low weighted evaluation results;

例如：基于上述加权评估结果得：For example: Based on the above weighted evaluation results:

用户A的初始加权评估结果=x_a×m₁+y_a×m₂+0×m₃=a_start；The initial weighted evaluation result of user A = x _a ×m ₁ + y _a ×m ₂ + 0 ×m ₃ = a _start ;

式中，a_start为用户A的初始加权评估结果；m₁为[职位]的权重，x_a为用户A在[职位]标题下选择选项对应的得分；m₂为[特殊标记]的权重，y_a为用户A在[特殊标记]标题下选择选项对应的得分；m₃为[会议关联性]的权重；Where a _start is the initial weighted evaluation result of user A; m ₁ is the weight of [Position], x _{a is} the score corresponding to the option selected by user A under the title [Position]; m ₂ is the weight of [Special Tag], y _a is the score corresponding to the option selected by user A under the title [Special Tag]; m ₃ is the weight of [Conference Relevance];

用户B的初始加权评估结果=x_b×m₁+y_b×m₂+0×m₃=b_start；The initial weighted evaluation result of user B = x _b ×m ₁ + y _b ×m ₂ + 0 ×m ₃ = b _start ;

式中，b_start为用户B的初始加权评估结果；x_b为用户B在[职位]标题下选择选项对应的得分；y_b为用户B在[特殊标记]标题下选择选项对应的得分；Where b _start is the initial weighted evaluation result of user B; x _b is the score corresponding to the option selected by user B under the title [Position]; y _b is the score corresponding to the option selected by user B under the title [Special Mark];

其中，b_start＜a_start，且a_start小于其他终端加权评估结果；由此，服务器基于该结果优先暂停终端z₂的媒体服务；Wherein, b _start < a _start , and a _start is smaller than the weighted evaluation results of other terminals; therefore, the server preferentially suspends the media service of terminal z ₂ based on the result;

进一步，当会议进程中存在一个进程为“施工现场指导”，终端z₂某一用户标签为自定义的“施工方”，当大语言模型反馈会议进程为“施工现场指导”时，服务器通过模型或算法（如词袋模型、潜在语义分析模型等）计算得出终端z₂与当前会议进程相关性超出人为设置的阈值，由此对终端z₂[会议关联性]赋值λ，调整后的二次加权评估的结果为：Furthermore, when there is a process called "Construction Site Guidance" in the conference process, and a user tag of terminal z ₂ is the customized "Construction Party", when the large language model feedbacks that the conference process is "Construction Site Guidance", the server calculates through a model or algorithm (such as a bag-of-words model, a latent semantic analysis model, etc.) that the relevance between terminal z ₂ and the current conference process exceeds the manually set threshold, and thus assigns a value λ to terminal z ₂ [conference relevance]. The result of the adjusted secondary weighted evaluation is:

用户B的二次加权评估结果=x_b×m₁+y_b×m₂+λ×m₃=b^’；The secondary weighted evaluation result of user B = x _b ×m ₁ + y _b ×m ₂ + λ ×m ₃ = b ^' ;

式中，b^’为用户B的动态加权后的二次加权评估的结果，且假设有b^’＞a_start；Where b ^' is the result of the second weighted evaluation after dynamic weighting of user B, and it is assumed that b ^' ＞a _start ;

从而对终端z₂实现动态加权，进而当出现会议进程与用户标签强相关时，通过动态加权使终端z₂重新恢复媒体服务，并暂停终端z₁媒体服务。Thus, dynamic weighting is implemented for terminal z _2. When the conference process is strongly correlated with the user tag, the media service of terminal z ₂ is restored through dynamic weighting, and the media service of terminal z ₁ is suspended.

需要说明的是，在以上实施例举例说明中，默认暂停一个终端即缓解带宽压力，当暂停一个终端无法结束网络拥塞时，依次按照加权值从小到大的顺序继续暂停终端至网络拥塞结束。It should be noted that in the above embodiments, suspending a terminal by default relieves bandwidth pressure. When suspending a terminal cannot end network congestion, the terminals are continuously suspended in ascending order of weighted values until network congestion ends.

S4、当检测到网络拥塞结束后，服务器逆向恢复已暂停终端的媒体服务。举例：当终端z₁和终端z₂均已暂停，且没有动态加权的情况下，优先恢复终端z₁的推流/收流，如果没有检测到网络拥塞则继续恢复终端z₂推流/收流。以此类推，服务器恢复终端服务的操作至检测出现网络拥塞情况时停止，或全部终端均恢复媒体服务时停止。S4. When the network congestion is detected to be over, the server reversely resumes the media service of the suspended terminal. For example: when both terminal z ₁ and terminal z ₂ are suspended and there is no dynamic weighting, the push/receive streaming of terminal z ₁ is resumed first, and if no network congestion is detected, the push/receive streaming of terminal z ₂ is resumed. Similarly, the server resumes the terminal service operation until the network congestion is detected, or when all terminals resume the media service.

服务器在动态分配终端的带宽同时，还进行带宽预留，即保持一部分带宽空闲，将带宽资源留给未订阅的流，以便后续加入终端能正常进行会议，具体预留带宽占比根据实际情况或过往情况确定。While dynamically allocating bandwidth to terminals, the server also reserves bandwidth, that is, it keeps a portion of bandwidth free and reserves bandwidth resources for unsubscribed streams so that subsequent terminals can conduct meetings normally. The specific proportion of reserved bandwidth is determined based on actual conditions or past situations.

以上仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明技术原理的前提下，还可以做出若干改进和变形，这些改进和变形也应视为本发明的保护范围。The above are only preferred embodiments of the present invention. It should be pointed out that, for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the technical principles of the present invention. These improvements and modifications should also be regarded as the protection scope of the present invention.

Claims

1. A method for transmitting audio and video in converged communication, characterized in that the steps include:

The server establishes connections with multiple terminals, and the server collects device information of the multiple terminals;

The server lists the supported coding standards corresponding to each of the multiple terminals based on the device information, and selects the coding standard with the smallest bandwidth share from the corresponding supported coding standards as the adapted coding standard corresponding to each of the multiple terminals;

The server dynamically allocates bandwidth of the multiple terminals based on congestion control technology;

The server pushes the corresponding streams of the adapted coding standard to the multiple terminals according to the request information of the multiple terminals and the bandwidth of the multiple terminals;

The server collects a user tag of each user of each terminal among the multiple terminals, and extracts feature information from the user tag; the server calculates a weight of the feature information;

The server performs a weighted evaluation on each of the multiple terminals based on the weight;

When the server detects network congestion, the server suspends a media service of at least one of the plurality of terminals based on a result of the weighted evaluation;

A dynamic weight is also provided in the weighted evaluation; when the server detects network congestion, when the correlation between the content of the media service pushed by the server and the user tag exceeds a threshold, the server performs a secondary weighted evaluation on each of the multiple terminals based on the dynamic weight, and the server suspends or resumes the media service of at least one of the multiple terminals based on the result of the secondary weighted evaluation.

2. The method for transmitting audio and video in converged communication according to claim 1, wherein the weight is obtained based on a hierarchical analysis method, and the step of the server calculating the weight comprises:

Determining a hierarchical model for said weight calculation;

Establishing an importance judgment matrix of the feature information;

The importance of the feature information is calculated by a feature vector method, and the obtained value is normalized to obtain the weight.

3. The method for transmitting audio and video in converged communication according to claim 2, wherein the step of calculating the correlation between the content of the media service and the user tag comprises:

The server extracts information from the content of the media service in real time based on the large language model, and the server calculates correlation between the information extraction result and the user tag of each user of each of the multiple terminals to obtain a calculation result;

The server compares the calculation result with the threshold value.

4. The method for transmitting audio and video in converged communication according to claim 1, characterized in that the server and the plurality of terminals are connected via a public network or a private network;

The data transmission between the server and the plurality of terminals is based on a web video and voice real-time communication protocol;

The congestion control technology is a congestion control algorithm based on the delay gradient and packet loss rate of the web video and voice real-time communication protocol.

5. The method for transmitting audio and video in converged communication according to claim 4, wherein the step of collecting device information of the plurality of terminals by the server comprises:

The webpage video and voice real-time communication protocol is configured with a session description protocol, and the text of the session description protocol includes the device information;

The server collects the text of the session description protocol, and obtains the device information by the server;

The device information includes type information, operating system information, and firmware information of the multiple terminals.

6. The method for transmitting audio and video in converged communication according to claim 1, wherein the coding standard comprises VP8, VP9, H.264, and H.265;

When at least one of the multiple terminals supports AV1 decoding, the encoding standard also includes AV1.

7. The method for transmitting audio and video in converged communication according to claim 1, characterized in that the result of the weighted evaluation is the weighted value of each terminal in the plurality of terminals, and the plurality of terminals are arranged from small to large according to the weighted value;

When network congestion is detected, media services of corresponding terminals are suspended in descending order of weighted values until the network congestion ends;

When it is detected that the network congestion ends, the server reversely resumes the media service of the suspended terminal.

8. The converged communication audio and video transmission method according to claim 1 is characterized in that the characteristic information of the user tag includes multiple titles; and the user of each of the multiple terminals selects or marks the title.

9. The method for transmitting audio and video in converged communication according to claim 8, characterized in that the characteristic information of the user tag also includes a plurality of preset options, and at least one of the preset options is correspondingly provided in at least part of the titles;

The preset options are each provided with a score, and the score is used for the weighted evaluation.

10. The converged communication audio and video transmission method according to claim 8 or 9, characterized in that each of the titles is configured with an importance coefficient, and the coefficient is used to calculate the weight.