CN104144162A

CN104144162A - A method and system for realizing intelligent voice linkage shouting

Info

Publication number: CN104144162A
Application number: CN201410337954.1A
Authority: CN
Inventors: 吴宁; 徐长福; 陶风波; 薄斌; 何在军; 何灿国
Original assignee: Hai Li River Shenzhen Science And Technology Ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd; Maintenance Branch of State Grid Jiangsu Electric Power Co Ltd
Current assignee: Hai Li River Shenzhen Science And Technology Ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd; Maintenance Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2014-07-16
Filing date: 2014-07-16
Publication date: 2014-11-12

Abstract

The invention discloses a method and system for realizing intelligent voice linkage shouting. The system mainly includes: an audio device, a video device, a storage server, a media server, a management server, and main hardware devices of a client; the software adopts a C/S architecture; the method It mainly includes: intelligent voice linkage shouting can trigger conditional voice automatic shouting through the video device; the shouting function can be turned on through the client software and the front-end video device and voice device can be used for on-site voice shouting. In theory, the server can support unlimited client access, which is mainly determined by bandwidth and computer performance. By adopting the present invention, it is possible to realize shouting alarms and real-time shouting exchanges between objects (equipment) and people in an unattended and real-time operation environment, thereby improving efficiency and ensuring safety.

Description

A method and system for realizing intelligent voice linkage shouting

技术领域 technical field

本发明涉及多媒体技术和通信技术，特别涉及一种智能语音联动喊话的实现方法及系统。 The invention relates to multimedia technology and communication technology, in particular to a method and system for realizing intelligent voice linkage calling.

the

背景技术 Background technique

多媒体技术和通信技术不断发展，现场喊话应用广泛应用，主要实现的方式有采用无线通信和语音技术集成的对讲机功能，见图1为现有技术中语音喊话流程图。该流程包括以下步骤：步骤101，设备之间打开或者关闭喊话功能，步骤102，用户之间单向喊话；有集成视频和音频通过有线或者无线通信传输的语音对讲；也有基于会议系统开发的语音系统。现有技术均为语音信息更流畅，信息更多样为目地进行开发，均为人之间直接沟通，基于点对点或者点对多点的人与人之间信息实时交互。 With the continuous development of multimedia technology and communication technology, the application of on-site shouting is widely used. The main way to realize it is to use the intercom function integrated with wireless communication and voice technology. See Figure 1 for the flowchart of voice shouting in the prior art. The process includes the following steps: step 101, open or close the shouting function between devices, step 102, one-way shouting between users; there are voice intercoms that integrate video and audio through wired or wireless communication; there are also conference system-based development voice system. The existing technologies are all developed for the purpose of smoother voice information and more diverse information. They are all direct communication between people, based on point-to-point or point-to-multipoint information interaction between people in real time.

在实际应用中需要人或者物(设备)相互之间或者根据判断条件进行语音喊话，根据不同的条件选择不同喊话内容，应用智能语音联动喊话以管理为主要目的。而现有技术中不存在相应技术，因此本发明提供一种智能语音喊话方法，语音喊话的发起可以是前端智能视频装置触发也可以是后端客户开启喊话功能，本发明基于TCP/IP协议开发，提高语音喊话智能化、灵活、高效性，实现将语音和视频分流，实现前端智能语音联动喊话和客户端实时语音喊话功能。 In practical applications, it is necessary for people or objects (equipment) to make voice calls to each other or according to judgment conditions, select different call content according to different conditions, and use intelligent voice linkage call for management as the main purpose. However, there is no corresponding technology in the prior art, so the present invention provides a method for intelligent voice calling. The initiation of voice calling can be triggered by a front-end intelligent video device or can be triggered by a back-end client. The present invention is developed based on the TCP/IP protocol , improve the intelligence, flexibility, and efficiency of voice announcements, realize the splitting of voice and video, and realize the functions of front-end intelligent voice linkage announcements and client real-time voice announcements.

the

发明内容 Contents of the invention

本发明针对现有技术的上述缺陷，提供一种解决无人值守环境智能语音喊话、同时兼容有人值守作业时实时语音喊话功能的智能语音联动喊话实现方法及系统。 Aiming at the above-mentioned defects of the prior art, the present invention provides a method and system for realizing intelligent voice linkage shouting that solves the problem of intelligent voice shouting in an unattended environment and is compatible with the real-time voice shouting function during manned operations.

本发明提供了智能语音联动喊话实现方法及系统，由于系统和方法具有相同的特定技术特征：即可以是前端智能视频装置触发也可以是后端客户开启喊话功能，因此属于同一个发明构思，可以作为一件申请提出。 The present invention provides a method and system for realizing intelligent voice linkage shouting. Since the system and method have the same specific technical features: that is, it can be triggered by the front-end intelligent video device or the back-end client can start the shouting function, so they belong to the same inventive concept and can be Filed as one application.

为实现上述发明目的，本发明的技术方案为：一种智能语音联动喊话实现方法，其特征在于，利用视频装置识别方法自动联动音频装置语音喊话和客户端实时喊话，该方法包括以下步骤： In order to achieve the above-mentioned purpose of the invention, the technical solution of the present invention is: a method for realizing intelligent voice-linked shouting, which is characterized in that the video device recognition method is used to automatically link the voice shouting of the audio device and the real-time shouting of the client, and the method includes the following steps:

A、前端视频装置根据智能分析模块分析事件类型决定是否自动联动语音喊话；需要智能语音联动喊话时，开启音频通道； A. The front-end video device determines whether to automatically link voice calls according to the type of event analyzed by the intelligent analysis module; when an intelligent voice linkage call is required, the audio channel is turned on;

或 B、客户端开启实时喊话功能，经媒体服务器到达音频和视频装置与现场进行实时喊话； or B. The client terminal enables the real-time calling function, and reaches the audio and video device and the scene through the media server for real-time calling;

所述的步骤A的具体过程如下： The concrete process of described step A is as follows:

A00,视频装置对设备、环境进行查看，利用智能分析模块对场景进行分析并采取喊话联动功作； A00, the video device checks the equipment and the environment, uses the intelligent analysis module to analyze the scene and takes the call linkage function;

A01,视频装置分配喊话音频通道资源，准备音频喊话内容； A01, the video device allocates audio channel resources for shouting, and prepares audio shouting content;

A02,发送喊话内容至音频装置，对现场进行喊话告警； A02, send the shouting content to the audio device, and shout the alarm to the scene;

所述的步骤B中的具体过程如下： The specific process in the described step B is as follows:

B00，客户端用户开启喊话功能，此时管理服务器查看客户端状态、权限； B00, the client user opens the calling function, at this time the management server checks the status and permissions of the client;

B01，管理服务器对客户端用户进行权限分配和认证，媒体服务器为视频装置分配通道资源； B01, the management server assigns and authenticates the client user, and the media server assigns channel resources to the video device;

B02，媒体服务器将客户端喊话内容转发至视频装置，由视频装置开启音频通道发送喊话内容；同时前端现场喊话内容由音频装置采集后上传至媒体服务器进而转发到客户端用户。 B02, the media server forwards the content of the client's shouting to the video device, and the video device opens the audio channel to send the shouting content; at the same time, the front-end on-site shouting content is collected by the audio device and uploaded to the media server and then forwarded to the client user.

前述的一种智能语音联动喊话实现方法，在步骤A00中，智能联动语音喊话动作时，经分析视频场景未合乎智能喊话条件时不开启音频通道，智能分析模块对视频场景和喊话内容进行逻辑关联，采用事件方式，由事件类型判断喊话方式和喊话时长，根据定义分析事件等级将采用喊话告警提示、喊话报警处理、喊话信息通知管理人员处理；所述步骤A01中，喊话内容是以*.WAM文件格式录入视频装置SD卡中。 The aforementioned method for implementing intelligent voice linkage shouting, in step A00, when the intelligent linkage voice shouting action, the audio channel is not opened when the analyzed video scene does not meet the conditions for intelligent shouting, and the intelligent analysis module logically associates the video scene and the shouting content , using the event method, judging the shouting method and shouting duration by the event type, and analyzing the event level according to the definition, the shouting alarm prompt, shouting alarm processing, and shouting information are notified to the management personnel for processing; in the step A01, the shouting content is in *.WAM The file format is recorded in the SD card of the video device.

前述的一种智能语音联动喊话实现方法，在步骤B01中，当存在多客户端对同一视频装置和音频装置进行智能喊话操作，此时根据权限高低区别对待，若权限相同根据先到先得的顺序选择。 In the aforementioned method for implementing intelligent voice linkage shouting, in step B01, when there are multiple clients performing intelligent shouting operations on the same video device and audio device, at this time, they are treated differently according to the level of authority. sequential selection.

前述的一种智能语音联动喊话实现方法，在步骤B01中，管理服务器根据权限判断是否开启实时喊话后，管理服务器查询视频装置接入标识和IP地址是否正确，再查询视频装置运行状态和通信状态，一切正常时媒体服务器给视频装置分配音频通道资源并转发客户端用户音频数据给视频装置。 In the aforementioned method for implementing intelligent voice linkage calling, in step B01, after the management server judges whether to enable real-time calling according to the authority, the management server inquires whether the access ID and IP address of the video device are correct, and then inquires about the operating status and communication status of the video device , when everything is normal, the media server allocates audio channel resources to the video device and forwards the client user audio data to the video device.

前述的一种智能语音联动喊话实现方法，客户端喊话完成后，客户端需要手动关闭喊话功能，此时媒体服务器将告知视频装置关闭音频通道释放资源。 In the aforementioned method for implementing intelligent voice-linked shouting, after the client shouts out, the client needs to manually turn off the shouting function, and the media server will notify the video device to close the audio channel to release resources.

前述的一种智能语音联动喊话实现方法，在对讲的过程中通信链路断开，管理服务器每5秒与前端视频装置进行1次心跳数据发送，如果5秒内未收到视频装置数据时，认为设备通信故障，将根据客户端ID将此信息通告客户端，在5个心跳周期内还未正常，通知媒体服务器关闭音频通道资源。 In the aforementioned method for implementing intelligent voice linkage shouting, the communication link is disconnected during the intercom, and the management server sends heartbeat data to the front-end video device every 5 seconds. If no video device data is received within 5 seconds , it is considered that the device communication is faulty, and this information will be notified to the client according to the client ID. If it is not normal within 5 heartbeat cycles, the media server will be notified to close the audio channel resource.

一种智能语音联动喊话系统，其特征在于，由如下部分组成： An intelligent voice linkage calling system is characterized in that it consists of the following parts:

音频装置，包括音频播放单元和音频采集单元，分别用于将解码后音频内容对现场发出喊话内容、对现场喊话内容采集传送给视频装置编码； The audio device includes an audio playback unit and an audio acquisition unit, which are respectively used to send the decoded audio content to the scene, collect and transmit the content of the scene to the video device for encoding;

视频装置，用于音频采集后进行编码再传送至媒体服务器和存储服务器，为音频播放内容提供解码功能，预存音频文件为智能语音喊话提供内容； The video device is used to encode the audio after collecting and then transmit it to the media server and storage server, provide decoding function for the audio playback content, and pre-store the audio file to provide content for the intelligent voice call;

客户端，用于实时开启或关闭语音喊话，实时语音喊话经过媒体服务器和管理服务器处理后向音、视频装置提供实时语音喊话，也用于接收媒体服务器转发的音视频装置的现场语音喊话内容； The client is used to turn on or off the voice call in real time. After the real-time voice call is processed by the media server and the management server, the real-time voice call is provided to the audio and video devices, and it is also used to receive the on-site voice call content of the audio and video device forwarded by the media server;

存储服务器，为喊话内容和过程提供存储空间； Storage server, which provides storage space for the shoutout content and process;

媒体服务器，对前端接入的视频装置进行喊话音频通道资源分配和音、视频数据转发； The media server is used to allocate audio channel resources and forward audio and video data to the video devices connected to the front end;

管理服务器，集中管理客户端、存储服务器、媒体服务器、视频装置设备信息、设备状态、用户权限。 The management server centrally manages the client, storage server, media server, video device device information, device status, and user rights.

前述的一种智能语音联动喊话系统，视频装置设置智能分析模块，智能分析模块对视频场景和喊话内容进行逻辑关联，采用事件方式，由事件类型判断喊话方式和喊话时长，根据定义分析事件等级将采用喊话告警提示、喊话报警处理、喊话信息通知管理人员处理。 In the aforementioned intelligent voice-linked shouting system, the video device is equipped with an intelligent analysis module, and the intelligent analysis module logically associates the video scene with the shouting content, adopts the event method, judges the shouting method and the shouting duration by the event type, and analyzes the event level according to the definition. Call out alarm prompts, call out to alarm processing, and call out information to notify management personnel for processing.

前述的一种智能语音联动喊话系统，视频装置具有音频编解码单元、音频存储单元、音频发送单元；所述音频编解码单元为喊话音频提供G.711标准音频；所述音频存储单元，为可录入喊话音频内容的SD卡；所述音频发送单元，将音频装置采集信息发送至媒体服务器并转发给客户端。 Aforesaid a kind of intelligent speech linkage calling system, video device has audio codec unit, audio storage unit, audio transmission unit; Described audio codec unit provides G.711 standard audio frequency for shouting audio frequency; Described audio storage unit is An SD card for recording the audio content of the call; the audio sending unit sends the information collected by the audio device to the media server and forwards it to the client. the

本发明提供智能语音联动喊话实现方法，体现喊话的灵活性，智能性；可以由前端视频装置根据视频分析结果智能联动喊话装置同时提供灵活的喊话内容，实现无人值守时物与物、物与人之间的信息传递；在有人值守时可以由客户端开启实时音频喊话实现人与人之间的信息传递。 The invention provides a method for realizing intelligent voice linkage shouting, reflecting the flexibility and intelligence of shouting; the front-end video device can intelligently link the shouting device according to the video analysis results to provide flexible shouting content at the same time, and realize things and things, things and things when no one is on duty. Information transmission between people; when someone is on duty, the client can start a real-time audio call to realize the information transmission between people.

the

附图说明 Description of drawings

图1现有技术中的语音喊话流程图； The flow chart of voice shouting in the prior art of Fig. 1;

图2本发明实施例的前端视频装置智能联动语音喊话基本流程图； Fig. 2 is the basic flow chart of the intelligent linkage voice shouting of the front-end video device according to the embodiment of the present invention;

图3本发明实施例的客户端开启实时语音喊话基本流程图； Fig. 3 is the basic flow chart of the client starting the real-time voice calling in the embodiment of the present invention;

图4本发明实施例的前端视频装置智能联动语音喊话详细流程图； Fig. 4 is a detailed flow chart of the intelligent linkage voice shouting of the front-end video device according to the embodiment of the present invention;

图5本发明实施例的客户端开启实时语音喊话详细流程图； Fig. 5 is a detailed flow chart of the client enabling real-time voice calling in the embodiment of the present invention;

图6本发明实施例的系统结构图； Fig. 6 is a system structure diagram of an embodiment of the present invention;

图7本发明实施例前端设备结构图。 Fig. 7 is a structural diagram of a front-end device according to an embodiment of the present invention.

the

具体实施方案 specific implementation plan

为使本发明的目标、技术方案和优势更加清楚，下面结合附图和具体实施例对本发明进行详细描述。 In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

本发明实施例提供的智能语音喊话主要由前端视频装置智能联动语音喊话和客户端开启实时语音喊话。参见图2和图3，图2和图3为本发明实施例的基本流程图。如图2 所示前端视频装置智能联动语音喊话，该流程包括以下步骤： The intelligent voice call provided by the embodiment of the present invention is mainly composed of the intelligent linkage voice call of the front-end video device and the real-time voice call of the client. Referring to Fig. 2 and Fig. 3, Fig. 2 and Fig. 3 are basic flowcharts of the embodiment of the present invention. As shown in Figure 2, the front-end video device intelligently links voice calls, and the process includes the following steps:

步骤201，视频装置录存喊话内容，设置视频装置智能分析触发语音喊话条件； Step 201, the video device records and saves the shouting content, and sets the video device intelligent analysis to trigger the voice shouting condition;

此处需要说明的是，步骤201预录音频内容可以上千个喊话音频，根据不同要求现场喊话内容预录存SD卡，预存音频格式为*.wam。 What needs to be explained here is that the pre-recorded audio content in step 201 can be thousands of chanting audios, and the on-site chanting content can be pre-recorded and stored in the SD card according to different requirements, and the pre-stored audio format is *.wam.

步骤202，前端视频装置根据视频智能分析结果联动开启语音喊话； Step 202, the front-end video device starts the voice call according to the video intelligent analysis result;

视频装置配置智能分析模块，对视频场景和喊话内容进行逻辑关联，采用事件方式，由事件类型判断喊话方式和喊话时长。 The video device is equipped with an intelligent analysis module to logically correlate the video scene and the content of the call, adopts the event method, and judges the call method and call duration by the event type.

此处需要说明的是，事件方式根据定义分析事件等级将采用喊话告警提示、喊话报警处理、喊话信息通知管理人员处理；喊话时长定义，根据智能分析结果后采用喊话时长为连续5次。 What needs to be explained here is that, according to the definition of the event method, the analysis of the event level will use the shouting alarm prompt, shouting alarm processing, and shouting information to notify the management personnel to deal with it; the definition of shouting duration, according to the intelligent analysis results, adopts the shouting duration as 5 consecutive times.

步骤203，前端音频装置根据智能视频装置推送喊话内容，对周围进对喊话告警； Step 203, the front-end audio device pushes the shouting content according to the intelligent video device, and sends out a shouting alarm to the surroundings;

如图3所示客户端开启实时语音喊话，该流程包括以下步骤： As shown in Figure 3, the client starts real-time voice calls, and the process includes the following steps:

步骤301，客户端开启实时语音喊话功能； Step 301, the client enables the real-time voice calling function;

步骤302，媒体服务器转发语音喊话内容至前端视频装置，由视频装置将喊话内容推送至音频装置。 Step 302, the media server forwards the content of the voice call to the front-end video device, and the video device pushes the content of the call to the audio device. the

步骤303，前端音频装置播放喊话内容，实时采集现场音频传输至客户端； Step 303, the front-end audio device plays the shouting content, collects live audio in real time and transmits it to the client;

至此，通过上述步骤实现了本发明实施例提供的基本流程图。 So far, the basic flowchart provided by the embodiment of the present invention has been realized through the above steps.

以上对本发明的方法进行了基本描述，为使本发明的方法更加清楚，下面对本发明的方法进行详述。 The method of the present invention has been basically described above. In order to make the method of the present invention clearer, the method of the present invention will be described in detail below.

参见图4和图5，为本发明的详细流程图。如图4 所示，前端视频装置智能联动语音喊话，该流程可包括以下步骤： Referring to Fig. 4 and Fig. 5, it is a detailed flowchart of the present invention. As shown in Figure 4, the front-end video device intelligently links voice calls, and the process may include the following steps:

步骤401，视频装置在智能分析条件触发向音频装置发送智能语音喊话。 In step 401, the video device sends an intelligent voice call to the audio device when the intelligent analysis condition is triggered.

视频装置由智能分析模块对视频场景进行分析，在分析得出结果后决定是否采用联动语音喊话，如果需要采用联动喊话将自动打开喊话开关，准备向音频装置喊话。 The video device uses the intelligent analysis module to analyze the video scene. After the analysis results, it is decided whether to use linked voice calls. If it is necessary to use linked calls, it will automatically turn on the call switch and prepare to call out to the audio device.

步骤402，在分析条件未合乎智能喊话条件时不开启音频接口通道。 Step 402, do not open the audio interface channel when the analysis condition does not meet the intelligent call condition.

步骤403，在分析条件合乎智能喊话条件时联动视频装置的*.WAM音频内容，可根据不同喊话内容预录多个音源文件，分配需要喊话的音频内容。 Step 403, when the analysis condition meets the condition of intelligent utterance, link the *.WAM audio content of the video device, pre-record multiple audio source files according to different utterance contents, and allocate the audio content that needs to be uttered.

步骤404，开启喊话功能，打开音频通道开关。 Step 404, enable the calling function, and turn on the audio channel switch.

此处需要说明，实时音频喊话采用G.711编解码格式，与视频信息分开。 It should be noted here that the real-time audio call adopts the G.711 codec format, which is separated from the video information.

步骤405，对现场进行喊话，发送喊话音频内容。 Step 405, calling out to the scene, and sending the audio content of the calling out.

此处需要说明，音频装置和视频装置此时只进行单向喊话，对现场状态进行声音采集。喊话完成后，视频装置关闭音频通道。 It should be noted here that the audio device and the video device only perform one-way shouting at this time, and collect the sound of the scene state. After the call is complete, the video device closes the audio channel.

如图5 所示，客户端开启实时语音喊话，该流程可包括以下步骤： As shown in Figure 5, the client starts real-time voice calls, and the process may include the following steps:

步骤501，客户端用户向前端视频装置发送启用实时喊话功能。 Step 501 , the client user sends a real-time calling function to the front-end video device.

这里需要说明的是，当存在多客户端对同一视频装置和音频装置进行智能喊话操作，此时根据不同权限高低去区别对待；如果权限相同根据先到先得的顺序选择，同时最高相同权限用户进行60秒计时，轮换方式到下一个客户端用户。 What needs to be explained here is that when there are multiple clients performing intelligent calling operations on the same video device and audio device, they will be treated differently according to the level of different permissions; The user counts for 60 seconds, and the rotation mode goes to the next client user.

步骤502，管理服务器对客户端用户进行权限分配和认证，客户端用户认证通过，有权开启喊话时将点击“开启喊话”成功。 Step 502 , the management server assigns and authenticates the client user's rights, and if the client user passes the authentication and has the right to open the callout, he will click "start callout" successfully.

步骤503，管理服务器在收到开启喊话功能指令后，检查视频装置接入标识和IP地址是否规范。 Step 503 , after receiving the command to enable the calling function, the management server checks whether the access ID and IP address of the video device are standardized.

此处需要说明的是，管理服务器对视频装置、客户端、媒体服务器、进行接入规范性管理，同时进行运行状态进行检测。 What needs to be explained here is that the management server performs standardized access management on the video device, the client, and the media server, and at the same time detects the running status.

步骤504，管理服务器对视频装置通信状态和运行状态进行检测。 Step 504, the management server detects the communication status and running status of the video device.

步骤505，管理服务器检测状态正常后，媒体服务器分配视频装置通道资源，同时分配音频内存缓冲区，将音频转发至视频装置。 Step 505, after the management server detects that the status is normal, the media server allocates channel resources of the video device, and at the same time allocates an audio memory buffer, and forwards the audio to the video device.

步骤506，视频装置开启喊话功能音频端口，对客户端音频数据进行解码；同时对音频装置声音采集进行编码上传至媒体服务器。 Step 506, the video device opens the audio port of the calling function, and decodes the audio data of the client; at the same time, encodes the sound collected by the audio device and uploads it to the media server.

步骤507，音频装置发出实时客户端作业人员喊话内容，同时对现场声音进行采集。 Step 507, the audio device sends out the real-time shouting content of the client operator, and at the same time collects the on-site sound.

此处需要说明的是，现场通过音频装置与客户端喊话完成后，客户端需要手动关闭喊话功能，此时媒体服务器将通告视频装置关闭音频通道释放资源。 What needs to be explained here is that after the on-site communication with the client through the audio device is completed, the client needs to manually turn off the function of calling. At this time, the media server will notify the video device to close the audio channel to release resources.

步骤508，喊话内容形成音频文件经视频装置编码上传至媒体服务器同时转发一份至存储服务器。 In step 508, the shouting content is formed into an audio file, which is encoded by the video device and uploaded to the media server while forwarding a copy to the storage server.

至此，结束本发明实施提供的流程。 So far, the flow provided by the implementation of the present invention is ended.

以上所述中还有一特殊情况，在对讲的过程中通信链路断开，管理服务器每5秒与前端视频装置进行1次心跳数据发送，如果5秒内未收到视频装置数据时，认为设备通信故障，将根据客户端ID将此信息通告客户端，在5个心跳周期内还未正常，通知媒体服务器关闭音频通道资源。客户端喊话功能同关闭。 There is also a special case in the above, when the communication link is disconnected during the intercom, the management server will send heartbeat data to the front-end video device every 5 seconds, if no video device data is received within 5 seconds, it will be considered If the device fails to communicate, it will notify the client of this information according to the client ID. If it fails to work within 5 heartbeat cycles, it will notify the media server to close the audio channel resource. The calling function of the client is the same as turning off.

以上是对本发明喊话实现方法进行的描述，下面结合具体实施例对本发明提供的系统和设备进行描述。 The above is the description of the method for implementing the call of the present invention, and the system and equipment provided by the present invention will be described below in conjunction with specific embodiments.

参见图6，图6为本发明系统结构图，如图6所示该系统包括音频装置601，视频装置602,管理服务器603，存储服务器604，媒体服务器605，客户端606。 Referring to FIG. 6, FIG. 6 is a structural diagram of the system of the present invention. As shown in FIG.

其中音频装置601与视频装置602之间双向数据通信，音频装置601为视频装置602提供现场声音采集数据，视频装置602为音频装置601提供喊话声音源或者客户端实时喊话声音音源。 The two-way data communication between the audio device 601 and the video device 602, the audio device 601 provides live sound collection data for the video device 602, and the video device 602 provides the audio device 601 with a shouting sound source or a client real-time shouting sound source.

视频装置602与管理服务器603为单向数据通信，视频装置初始化时向管理服务器注册，发送设备类型ID和IP地址与管理服务器进行确认，如果正确将正常注册到管理服务器，否则管理服务器将不进行处理。正常后每5秒向管理服务器发送心跳数据，每5秒发送一次状态数据。 The video device 602 and the management server 603 are one-way data communication. When the video device is initialized, it registers with the management server, and sends the device type ID and IP address to confirm with the management server. If it is correct, it will normally register with the management server, otherwise the management server will not proceed. deal with. After normal, send heartbeat data to the management server every 5 seconds, and send status data every 5 seconds.

存储服务器604与管理服务器603进行单向数据通信，存储服务器初始化时向管理服务器注册，发送设备类型ID和IP地址与管理服务器进行确认，如果正确将正常注册到管理服务器，否则管理服务器将不进行处理。正常后每5秒向管理服务器发送心跳数据，每5秒发送一次状态数据。 The storage server 604 performs one-way data communication with the management server 603. When the storage server is initialized, it registers with the management server, and sends the device type ID and IP address to confirm with the management server. If it is correct, it will normally register with the management server, otherwise the management server will not deal with. After normal, send heartbeat data to the management server every 5 seconds, and send status data every 5 seconds.

存储服务器604与媒体服务器605双向通信，存储服务器只接收媒体服务器转发音频数据。 The storage server 604 communicates with the media server 605 bidirectionally, and the storage server only receives the audio data forwarded by the media server.

媒体服务器605与管理服务器603进行单向数据通信，媒体服务器初始化时向管理服务器注册，发送设备类型ID和IP地址与管理服务器进行确认，如果正确将正常注册到管理服务器，否则管理服务器将不进行处理。正常后每5秒向管理服务器发送心跳数据，每5秒发送一次状态数据。 The media server 605 performs one-way data communication with the management server 603. When the media server is initialized, it registers with the management server, and sends the device type ID and IP address to confirm with the management server. If it is correct, it will normally register with the management server, otherwise the management server will not deal with. After normal, send heartbeat data to the management server every 5 seconds, and send status data every 5 seconds.

媒体服务器605与客户端606双向数据通信，客户端接入媒体服务器数据，同时媒体服务器也转发客户端音频数据。 The media server 605 communicates with the client 606 in two-way data communication, the client accesses the media server data, and the media server also forwards the client audio data.

本发明实施例中前端设备音频装置，视频装置具体实现中有多种结构形式，下图7所示的其中一种结构为例进行描述。参见图7，图7 为本发明实施例提供了前端设备的一种结构图。如图7所示，该音频装置包括音频播放单元701、音频采集单元702；视频装置包括音频发送单元703，音频存储单元704，音频编/解码单元705 In the embodiment of the present invention, the front-end equipment audio device and the video device have multiple structural forms in specific implementation, and one of the structures shown in FIG. 7 below is described as an example. Referring to FIG. 7, FIG. 7 provides a structural diagram of a front-end device according to an embodiment of the present invention. As shown in Figure 7, the audio device includes an audio playback unit 701 and an audio acquisition unit 702; the video device includes an audio sending unit 703, an audio storage unit 704, and an audio encoding/decoding unit 705

其中音频播放单元701，对视频装置发送喊话内容进行播放，对客户端喊话内容进行播放。音频采集单元702，对现场声音进行采集拾音上传。音频发送单元703，由视频装置音频发送单元将音频数据经TCP/IP协议打包发送至媒体服务器。 Among them, the audio playing unit 701 plays the shouting content sent by the video device, and plays the shouting content of the client. The audio collection unit 702 collects and uploads live sounds. The audio sending unit 703 is used to package and send the audio data to the media server through the TCP/IP protocol by the audio sending unit of the video device.

音频存储单元704，视频装置对音频采集单元数据进行存储，直接存储到SD卡，也可以由媒体服务器转发至存储服务器存储。 In the audio storage unit 704, the video device stores the data of the audio collection unit, directly stores it in an SD card, or forwards it to the storage server for storage by the media server.

音频编解码单元705，主要进行音频发送前处理，编/解码成G.711格式。 The audio codec unit 705 mainly performs pre-processing of audio transmission, and codes/decodes into G.711 format.

由以上技术方案可以看出，本发明提供智能语音联动喊话实现方法，体现喊话的灵活性，智能性；可以由前端视频装置根据视频分析结果智能联动喊话装置同时提供灵活的喊话内容实现无人值守时物与物、物与人之间的信息传递；在有人值守时可以由客户端开启实时音频喊话实现人与人之间的信息传递。 It can be seen from the above technical solutions that the present invention provides a method for implementing intelligent voice linkage shouting, which reflects the flexibility and intelligence of shouting; the front-end video device can intelligently link the shouting device according to the video analysis results while providing flexible shouting content to realize unattended Information transmission between things and things, things and people; when someone is on duty, the real-time audio call can be turned on by the client to realize the information transmission between people.

发明可用其他的不违背本发明的精神和主要特征的具体形式来概括，因此，本发明的上述实施方案都只能认为是对本发明的说明而不能限制本发明，在与本发明的权利要求相当的含义和范围内任何改变，都应认为是包括在权利要求书的范围内。 The invention can be summarized by other specific forms that do not violate the spirit and main features of the present invention. Therefore, the above-mentioned embodiments of the present invention can only be considered as illustrations of the present invention and cannot limit the present invention. They are equivalent to the claims of the present invention. Any changes within the meaning and scope of the claims should be considered to be included in the scope of the claims.

Claims

1. the intelligent sound interlock implementation method of propagandaing directed to communicate, is characterized in that, utilizes video-unit recognition methods automatic linkage audio devices voice to propaganda directed to communicate and client is propagandaed directed to communicate in real time, and the method comprises the following steps:

A, front end video-unit are analyzed event type according to intelligent analysis module and are determined whether automatic linkage voice are propagandaed directed to communicate; While needing intelligent sound interlock to propaganda directed to communicate, open voice-grade channel;

Or B, client open the function of propagandaing directed to communicate in real time, arrive Voice & Video device and propaganda directed to communicate in real time in scene through media server;

The detailed process of described steps A is as follows:

A00, video-unit checks equipment, environment, utilizes intelligent analysis module to analyze scene and the interlock merit of taking to propaganda directed to communicate is done;

A01, video-unit distributes the voice-grade channel resource of propagandaing directed to communicate, and prepares the audio frequency content of propagandaing directed to communicate;

A02, sends and propagandas directed to communicate content to audio devices, to the scene alarm of propagandaing directed to communicate;

Detailed process in described step B is as follows:

B00, client user opens propaganda directed to communicate function, now management server viewing client-side state, authority;

B01, management server carries out right assignment and certification to client user, and media server is that video-unit distributes channel resource;

B02, the client content of propagandaing directed to communicate is forwarded to video-unit by media server, opens voice-grade channel send the content of propagandaing directed to communicate by video-unit; Propaganda directed to communicate after content is gathered by audio devices and be uploaded to media server and then be forwarded to client user in front end scene simultaneously.

2. a kind of intelligent sound interlock according to claim 1 implementation method of propagandaing directed to communicate, it is characterized in that, in steps A 00, when action propagandaed directed to communicate in intelligent linkage voice, video scene does not conform with and when intelligence is propagandaed directed to communicate condition, does not open voice-grade channel by analysis, intelligent analysis module is carried out logic association to video scene and the content of propagandaing directed to communicate, adopt event mode, by event type judgement propaganda directed to communicate mode and the duration of propagandaing directed to communicate, according to defined analysis event class by employing propaganda directed to communicate alarm prompt, propaganda directed to communicate report to the police process, the information of propagandaing directed to communicate notifies administrative staff to process; In described steps A 01, the content of propagandaing directed to communicate is with in * .WAM file format typing video-unit SD card.

3. a kind of intelligent sound interlock according to claim 1 implementation method of propagandaing directed to communicate, it is characterized in that: in step B01, when existing multi-client to carry out the intelligence operation of propagandaing directed to communicate to same video-unit and audio devices, now treat with a certain discrimination according to authority height, if authority is identical according to the selective sequential of First come first served.

4. a kind of intelligent sound interlock according to claim 1 implementation method of propagandaing directed to communicate, it is characterized in that, in step B01, after management server judges whether that according to authority unlatching is propagandaed directed to communicate in real time, whether management server queries video-unit access mark and IP address be correct, inquire about video-unit running status and communications status, when all are normal, media server distributes voice-grade channel resource and forwards client user's voice data to video-unit to video-unit again.

5. a kind of intelligent sound interlock according to claim 1 implementation method of propagandaing directed to communicate, is characterized in that, after client has been propagandaed directed to communicate, client needs the manual-lock function of propagandaing directed to communicate, and now media server will inform that video-unit cuts out voice-grade channel releasing resource.

6. a kind of intelligent sound interlock according to claim 1 implementation method of propagandaing directed to communicate, it is characterized in that, in the process of intercommunication, communication link disconnects, management server carries out 1 heartbeat data with front end video-unit in every 5 seconds and sends, if while not receiving video-unit data in 5 seconds, think devices communicating fault, will be according to client id by this information announcement client, also abnormal in 5 heart beat cycles, notice media server is closed voice-grade channel resource.

7. the intelligent sound interlock system of propagandaing directed to communicate, is characterized in that, is made up of following part:

Audio devices, comprises audio playing unit and audio collection unit, is respectively used to audio content after decoding to send the content of propagandaing directed to communicate, the scene content acquisition of propagandaing directed to communicate is sent to video-unit coding scene;

Video-unit, is resent to media server and storage server for encoding after audio collection, for audio frequency play content provides decoding function, the audio file that prestores is propagandaed directed to communicate content is provided for intelligent sound;

Client, propaganda directed to communicate for opening or close voice in real time, real-time voice is propagandaed directed to communicate provides real-time voice to propaganda directed to communicate through media server and the backward sound of management servers process, video-unit, the on-the-spot voice of audio-video equipment that also forward for the receiving media server content of propagandaing directed to communicate;

Storage server, in order to propaganda directed to communicate, content and process provide memory space;

Media server, forwards the propaganda directed to communicate distribution of voice-grade channel resource and sound, video data of the video-unit of front end access;

Management server, centralized management client, storage server, media server, video-unit facility information, equipment state, user right.

8. a kind of intelligent sound interlock according to claim 7 system of propagandaing directed to communicate, it is characterized in that: video-unit arranges intelligent analysis module, intelligent analysis module is carried out logic association to video scene and the content of propagandaing directed to communicate, adopt event mode, by event type judgement propaganda directed to communicate mode and the duration of propagandaing directed to communicate, according to defined analysis event class by employing propaganda directed to communicate alarm prompt, propaganda directed to communicate report to the police process, the information of propagandaing directed to communicate notifies administrative staff to process.

9. a kind of intelligent sound interlock according to claim 7 system of propagandaing directed to communicate, is characterized in that: video-unit has audio coding decoding unit, Audio storage unit, audio frequency transmitting element; Described audio coding decoding unit provides G.711 standard audio for the audio frequency of propagandaing directed to communicate; Described Audio storage unit is the SD card of the audio content of can typing propagandaing directed to communicate; Described audio frequency transmitting element, is sent to audio devices Information Monitoring media server and is transmitted to client.