CN111147894A

CN111147894A - Sign language video generation method, device and system

Info

Publication number: CN111147894A
Application number: CN201911251154.7A
Authority: CN
Inventors: 金国卿
Original assignee: Suning Intelligent Terminal Co ltd
Current assignee: Suning Intelligent Terminal Co ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2020-05-12

Abstract

The application discloses a sign language video generation method, a sign language video generation device and a sign language video generation system, wherein the method comprises the following steps: processing received character stream data by using a natural language processing technology to obtain a word segmentation result and a dependency syntax analysis result of the character stream data; searching and obtaining sign language image data corresponding to the word segmentation result according to a mapping relation between pre-stored word segmentation and sign language images; and sequencing and combining the sign language image data according to the dependency syntax analysis result to generate a sign language video and sending the sign language video to the user side for presentation by the user side, so that the conversion of character stream data into the sign language video which can be watched by the hearing-impaired user is realized, the video can be conveniently watched by the hearing-impaired user, and the use experience of the hearing-impaired user is improved.

Description

Sign language video generation method, device and system

Technical Field

The invention relates to the technical field of computers, in particular to a method, a device and a system for generating a sign language video.

Background

When viewing video image data such as television, a user with auditory handicap often cannot view the video normally without subtitles. Even if the current video has subtitles, the meaning of the subtitles cannot be accurately understood by users with low culture level and weak literacy, so that the users cannot normally watch the video even if the users have the subtitles. This brings very big inconvenience for hearing impaired user, has greatly influenced hearing impaired user's use experience.

Disclosure of Invention

In order to solve the defects of the prior art, the invention mainly aims to provide a sign language video generation method, a sign language video generation device, a sign language video generation system and a computer system.

In order to achieve the above object, a first aspect of the present invention provides a method for generating a sign language video, the method comprising:

processing received character stream data by using a natural language processing technology to obtain a word segmentation result and a dependency syntax analysis result of the character stream data;

searching and obtaining sign language image data corresponding to the word segmentation result according to a mapping relation between pre-stored word segmentation and sign language images;

and sequencing and combining the sign language image data according to the dependency syntax analysis result to generate a sign language video and sending the sign language video to a user side for presentation by the user side.

In some embodiments, prior to processing the received text stream data using natural language processing techniques, the method further comprises:

voice data is received and converted into text stream data.

In some embodiments, after receiving the voice data and converting to text stream data, the method further comprises:

and sending the text stream data to a user side so that the user side can generate subtitles to display.

In some embodiments, the sorting and combining the sign language images according to the dependency parsing result to generate the sign language video specifically includes:

sequentially arranging the sign language image data according to the dependency syntax analysis result to obtain sequentially arranged sign language image data;

and endowing the sequentially arranged sign language image data to a pre-constructed virtual role to generate a sign language video.

In a second aspect, the present invention provides a method for generating a sign language video, where the method includes:

the server side uses a natural language processing technology to process the received character stream data to obtain a word segmentation result and a dependency syntax analysis result of the character stream data;

the server side searches and obtains sign language image data corresponding to the word segmentation result according to a mapping relation between pre-stored word segmentation and sign language images;

the server side sorts and combines the sign language image data according to the dependency syntax analysis result to generate a sign language video and sends the sign language video to the user side;

and the user side receives and displays the sign language video.

In a third aspect, the present invention provides an apparatus for generating sign language video, the apparatus comprising:

the communication module is used for receiving character stream data and sending the generated sign language video to a user side;

the processing module is used for processing the character stream data;

the data storage module is used for storing the sign language image data and the mapping relation between the participles and the sign language images;

and the video generation module is used for sequencing and combining the sign language images to generate the sign language video.

In a fourth aspect, the present invention provides a sign language video generating system, including:

the server is used for processing the character stream data, matching the corresponding sign language image according to the word segmentation result, generating sign language video data and sending the sign language video data to the client;

and the user side is used for receiving and presenting the sign language video data returned by the server side.

In a fifth aspect, the present invention provides a computer system, the system comprising:

one or more processors;

and memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:

According to the specific embodiments provided herein, the present application discloses the following technical effects:

processing character stream data by using a natural language processing technology to obtain a word segmentation result and a dependency syntactic analysis result; the corresponding sign language image data are obtained according to the mapping relation between the word segmentation and the sign language image, and are sequenced and combined according to the dependency syntax analysis result, so that a sign language video is generated, the character stream data are converted into the sign language video which can be watched by the hearing-impaired user, the hearing-impaired user can watch the video conveniently, and the use experience of the hearing-impaired user is improved;

voice data is converted into character stream data in real time, and the conversion from the final voice data to sign language video is realized;

the text stream data is converted into the subtitles to be displayed, even if sound is not conveniently played, information which is wanted to be transmitted by the voice of the video can be still known, and the efficiency and convenience of watching the video by all users are improved.

Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a scene diagram of the present application;

FIG. 2 is a scenario flow diagram of the present application;

FIG. 3 is a flow chart of a method of the present application;

FIG. 4 is a flow chart of a method of the present application;

FIG. 5 is a diagram of the apparatus structure of the present application;

fig. 6 is a computer system configuration diagram of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Taking the smart television as an example, when the hearing-impaired user watches the video played in the smart television, the hearing-impaired user cannot watch the video normally if the watched video has no subtitles because the audio played in the television cannot be heard.

In order to improve the watching experience of the hearing-impaired user, the invention aims to provide a sign language video generation method, voice data are converted into character stream data, then the character stream data are processed according to the converted character stream data to obtain a word segmentation result and a dependency syntax analysis result, a matched sign language image in a sign language library is searched according to the word segmentation result, the matched sign language image is sequenced according to the dependency syntax analysis result, the sequenced sign language image is given to a pre-constructed virtual role to generate a sign language video watched by the hearing-impaired user, and the hearing-impaired user can watch the video without subtitles greatly conveniently.

The dependency parsing described in the present invention can be used to determine the syntactic structure of a sentence or the dependency between words in a sentence. The method mainly comprises two aspects of contents, namely, determining a grammar system of the language, namely, giving formal definition to a grammar structure of a legal sentence in the language; another aspect is syntactic analysis techniques, i.e. the automatic derivation of the syntactic structure of a sentence, according to a given syntactic hierarchy, the analysis of the syntactic units contained in the sentence and the relations between these syntactic units. And constructing a syntactic analysis tree according to the dependency syntactic analysis result, and determining the arrangement sequence of the target sentences according to the syntactic analysis tree.

Fig. 1 shows a system structure diagram of the present invention, which includes a server and a client. The server can be a service provider such as a cloud server with the functions of communication, word segmentation, video generation, data storage and the like. The device related to the user side can be any device with communication and display functions, such as an intelligent television, a computer, a mobile phone, a tablet and the like, voice data can be uploaded to the server side through the Internet, a sign language video is generated by the server side according to the voice data and is transmitted back to the user side, and the user side displays the received sign language video in a proper size and displays the sign language video on a screen for the user to watch.

The sign language image is a pre-drawn sign language action graph, the meaning of the graph is conveyed through sign language, and a pre-established virtual character can be endowed to generate a sign language video.

The invention can also be used for the communication between the user without hearing failure and the user with hearing failure. The user without hearing impairment can upload the voice data to the server side through the first user side, the server side can convert the received voice data into character stream data, the sign language video is generated by using the sign language video generation method provided by the invention according to the converted character stream data, and the character stream data and the sign language video are sent to the second user side of the user without hearing impairment, so that the user without hearing impairment can freely communicate with other users.

Specifically, as shown in fig. 2, taking the smart television as an example of a user side, the above scheme can be specifically implemented by the following steps:

210. the intelligent television and the server establish communication connection through three-way handshake.

After the three-way handshake establishes communication connection, the server side calls a local ASR (voice recognition) function module, requests an address of the smart television, and judges whether the smart television currently establishing communication connection has the authority to call the requested interface, wherein the interface can be used for realizing functions of natural language analysis, voice data conversion, text word segmentation and the like so as to prepare for receiving voice data uploaded by the smart television.

220. And the smart television uploads the real-time voice data to the cloud server.

230. The server receives and converts the real-time voice data into character stream data.

And the service end calls the ASR voice recognition function module to convert the voice data uploaded by the intelligent television into character stream data, so that the follow-up operation is laid.

240. And the server side transmits the converted text stream data back to the intelligent television.

250. The intelligent television displays the received character stream data in real time.

When the smart television judges that the currently played video has no subtitles or a user sends a subtitle display instruction, the received character stream data is displayed on a screen for the user to watch.

260. The server side carries out word segmentation and dependency syntactic analysis on the character stream data by using a natural language processing technology to obtain word segmentation results and dependency syntactic analysis results.

270. And the server side searches and obtains sign language image data corresponding to the word segmentation result according to a mapping relation between pre-stored word segmentation and sign language images.

And the server removes words which cannot be expressed by the hand language, such as articles and the like contained in the word segmentation result according to the word segmentation result, searches the sign language image corresponding to the residual words contained in the word segmentation result and obtains sign language image data corresponding to the word segmentation result.

280. And the server-side sorts the obtained sign language image data according to the dependency syntax analysis result, and endows the sorted sign language image data to a pre-constructed virtual character to generate a sign language animation video.

And the server sorts the hand language image data according to the dependency syntax analysis result and factors influencing the sorting, such as logical relations and the like, of the hand language image data, and then gives the sorted hand language image data to a pre-constructed virtual character to generate the hand language animation video.

The pre-constructed virtual character may be pre-created using Unity technology, which may enable rendering of image content such as 3D animation.

290. And the server side sends the generated sign language animation video to the smart television.

After receiving the sign language animation video sent by the server, the smart television displays the sign language animation video on the lower right corner of the current video in a small window mode, and therefore a user can watch the sign language animation video conveniently.

Example one

Correspondingly to the above steps, an embodiment of the present invention provides a method for generating a sign language video, which is applied to a server, and as shown in fig. 3, the method includes:

310. processing received character stream data by using a natural language processing technology to obtain a word segmentation result and a dependency syntax analysis result of the character stream data;

320. searching and obtaining sign language image data corresponding to the word segmentation result according to a mapping relation between pre-stored word segmentation and sign language images;

330. and sequencing and combining the sign language image data according to the dependency syntax analysis result to generate a sign language video and sending the sign language video to a user side for presentation by the user side.

When the user side receives the sign language video, the sign language video can be displayed on the screen for the user to watch, and the user experience of watching video programs and the like by the user with hearing disabilities is improved.

Preferably, when the user sends the voice data to the server, before processing the received text stream data by using the natural language processing technology, the method further includes:

301. receiving voice data and converting the voice data into character stream data;

preferably, in order to improve the user experience of watching the video, the text stream data generated by conversion can be sent to the user terminal to generate subtitles for display; after receiving the voice data and converting the voice data into text stream data, the method further comprises:

302. and sending the text stream data to a user side so that the user side can generate subtitles to display.

After receiving the text stream data, the user terminal can display the current video as a subtitle when judging that the current video has no subtitle or when the user sends a command for displaying the subtitle.

Preferably, the sorting and combining the sign language image data according to the dependency syntax analysis result may specifically include:

331. sequentially arranging the sign language image data according to the dependency syntax analysis result to obtain sequentially arranged sign language image data;

332. and endowing the sequentially arranged sign language image data to a pre-constructed virtual role to generate a sign language video.

The virtual character may be a pre-constructed virtual character, and may be a 3D character image pre-created through Unity technology.

Example two

Corresponding to the above embodiment, the present application further provides a method for generating a sign language video, so as to implement interaction between a user side and a server side. As shown in fig. 4, the method includes:

410. the server side uses a natural language processing technology to process the received character stream data to obtain a word segmentation result and a dependency syntax analysis result of the character stream data;

420. the server side searches and obtains sign language image data corresponding to the word segmentation result according to a mapping relation between pre-stored word segmentation and sign language images;

430. the server side sorts and combines the sign language image data according to the dependency syntax analysis result to generate a sign language video and sends the sign language video to the user side;

440. and the user side receives and displays the sign language video.

Preferably, when the user sends the voice data to the server, before the server uses the natural language processing technology to process the received text stream data, the method further includes:

401. the server receives the voice data and converts the voice data into text stream data.

Preferably, in order to improve the user experience of watching the video, the server may send the converted text stream data to the client to generate subtitles for display; after receiving the voice data and converting the voice data into text stream data, the method further comprises:

402. the server side sends the character stream data to the user side;

403. and generating the subtitles by the user side for presentation.

Preferably, the step of ordering and combining the sign language image data by the server according to the dependency syntax analysis result may specifically include:

431. the server sequentially arranges the sign language image data according to the dependency syntax analysis result to obtain sequentially arranged sign language image data;

432. and endowing the sequentially arranged sign language image data to a pre-constructed virtual role to generate a sign language video.

EXAMPLE III

In response to the first embodiment, the present application provides a sign language video generating device, which acts on a server, and as shown in fig. 5, the device includes:

a communication module 510, configured to receive text stream data and send the generated sign language video to a user side;

preferably, when the user terminal sends the voice data, the communication module may also be configured to receive the voice data sent by the user terminal.

The processing module 520 is configured to process the text stream data to obtain a word segmentation result and a dependency parsing result of the text stream data;

the data storage module 530 is used for storing the sign language image data and the mapping relation between the participles and the sign language images;

the data storage module comprises a sign language library, and sign language image data and a mapping relation between participles and sign language images are stored and used for providing the sign language image data;

and a video generating module 540, configured to sort and combine the sign language images to generate a sign language video.

Preferably, for the purpose of generating the sign language video by the voice data transmitted from the user side, the sign language video generating device may further include:

and a voice conversion module 550, configured to convert the voice data into text stream data.

Example four

Corresponding to the second embodiment, the present application further provides a sign language video generating system, as shown in fig. 1, including a user side and a server side:

EXAMPLE five

In accordance with the above embodiments, the present application also provides a computer system comprising one or more processors; and memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:

Fig. 6 illustrates an architecture of a computer system, which may include, in particular, a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, video display adapter 1511, disk drive 1512, input/output interface 1513, network interface 1514, and memory 1520 may be communicatively coupled via a communication bus 1530.

The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided by the present Application.

The Memory 1520 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the computer system 1500, a Basic Input Output System (BIOS) for controlling low-level operations of the computer system 1500. In addition, a web browser 1523, a data storage management system 1524, an icon font processing system 1525, and the like can also be stored. The icon font processing system 1525 may be an application program that implements the operations of the foregoing steps in this embodiment of the application. In summary, when the technical solution provided by the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1520 and called for execution by the processor 1510.

The input/output interface 1513 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The network interface 1514 is used to connect a communication module (not shown) to enable the device to communicatively interact with other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

The bus 1530 includes a path to transfer information between the various components of the device, such as the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520.

In addition, the computer system 1500 may also obtain information of specific extraction conditions from the virtual resource object extraction condition information database 1541 for performing condition judgment, and the like.

It should be noted that although the above devices only show the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus 1530, etc., in a specific implementation, the devices may also include other components necessary for proper operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a cloud server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for generating sign language video, the method comprising:

2. The method of generating as claimed in claim 1, wherein prior to processing the received text stream data using natural language processing techniques, the method further comprises:

voice data is received and converted into text stream data.

3. The method of generating as claimed in claim 2, wherein after receiving voice data and converting to text stream data, the method further comprises:

4. The method according to any of claims 1-3, wherein the sorting and combining the sign language images according to the dependency parsing result to generate a sign language video specifically comprises:

5. A method for generating sign language video, the method comprising:

and the user side receives and displays the sign language video.

6. An apparatus for generating sign language video, the apparatus comprising:

the processing module is used for processing the character stream data to obtain a word segmentation result and a dependency syntactic analysis result of the character stream data;

and the video generation module is used for sequencing and combining the hand language image data to generate a sign language video.

7. The generation apparatus of claim 6, wherein the apparatus further comprises:

and the voice conversion module is used for converting the voice data into character stream data.

8. The generating device of claim 6 or 7, wherein the communication module is further configured to transmit the text stream data to a user terminal.

9. A system for generating sign language video, the system comprising:

10. A computer system, the system comprising:

one or more processors;