CN118364830A

CN118364830A - Translation method, device and translator combining image analysis and voice recognition

Info

Publication number: CN118364830A
Application number: CN202410789540.6A
Authority: CN
Inventors: 车建波
Original assignee: Shenzhen Bepal Intelligent Technology Co ltd
Current assignee: Shenzhen Bepal Intelligent Technology Co ltd
Priority date: 2024-06-19
Filing date: 2024-06-19
Publication date: 2024-07-19
Anticipated expiration: 2044-06-19
Also published as: CN118364830B

Abstract

The invention provides a translation method, a device and a translator combining image analysis and voice recognition, which comprise the following steps: acquiring a face image of a translator user, and analyzing the face image to obtain first identity information of the translator user; collecting voice information of a translator user, and analyzing the voice information to obtain second identity information of the translator user; based on the first identity information, matching corresponding first model parameters; based on the second identity information, matching corresponding second model parameters; performing fusion calculation on the first model parameters and the second model parameters to obtain fusion model parameters; updating a preset language translation model based on the fusion model parameters to obtain an updated translation model; and translating the call of the translator based on the updated translation model. In the invention, the translation model is customized according to the identity information of the user, and the defects that the existing translation system is easy to cause translation deviation and has low accuracy are overcome.

Description

Translation method, device and translator combining image analysis and voice recognition

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a translation method, apparatus, and translator that combine image analysis and speech recognition.

Background

Traditional translation methods rely mainly on text input to achieve text-to-text conversion by building a huge language database and complex algorithmic models.

Most of the existing translation devices or applications rely on voice or text input for translation, neglecting multi-dimensional verification of user identity, which affects the individuation degree and security of translation services. For example, the identity features of the user are underutilized to optimize the translation model, resulting in a lack of pertinence in the translation results.

Specifically, the conventional translation system often adopts a unified translation model to serve all users, and ignores personalized requirements such as language habits, professional term preferences and the like of different users. The translation mode is difficult to meet complex and changeable communication scenes, has no pertinence to users, and is easy to cause translation deviation and low in accuracy.

Disclosure of Invention

The invention mainly aims to provide a translation method, a translation device and a translation machine combining image analysis and voice recognition, which aim to overcome the defects that the existing translation system is easy to cause translation deviation and low in accuracy.

In order to achieve the above object, the present invention provides a translation method combining image analysis and speech recognition, comprising the steps of:

Acquiring a face image of a user of a translator, and analyzing the face image based on an image analysis model to obtain first identity information of the user of the translator;

Collecting voice information of a translator user, and analyzing the voice information based on a voice recognition model to obtain second identity information of the translator user;

Based on the first identity information, matching corresponding first model parameters; based on the second identity information, matching corresponding second model parameters; the first model parameter and the second model parameter are different parameter values corresponding to the model parameter of the same type;

Performing fusion calculation on the first model parameters and the second model parameters to obtain fusion model parameters;

updating a preset language translation model based on the fusion model parameters to obtain an updated translation model; and translating the call of the translator based on the updated translation model.

Further, the analyzing the face image based on the image analysis model to obtain first identity information of the translator user includes:

and extracting the face features of the face image based on the image analysis model, analyzing the face features, and predicting to obtain nationality information of the translator user as the first identity information.

Further, the analyzing the voice information based on the voice recognition model to obtain the second identity information of the translator user includes:

And extracting voice characteristics of the voice information based on the voice recognition model, analyzing the voice characteristics, and predicting to obtain regional information of the translator user as the second identity information.

Further, the performing fusion calculation on the first model parameter and the second model parameter to obtain a fusion model parameter includes:

Acquiring weight values respectively corresponding to the first model parameters and the second model parameters;

And carrying out weighted calculation on the first model parameter and the second model parameter based on the weight values respectively corresponding to the first model parameter and the second model parameter, wherein the obtained weighted model parameter is used as the fusion model parameter.

Further, after translating the call of the translator based on the updated translation model, the method includes:

The call content of the translator is stored in a database;

Determining a corresponding target encryption algorithm based on the first identity information and the second identity information;

encrypting the first identity information into a first character combination and encrypting the second identity information into a second character combination based on the target encryption algorithm;

Acquiring the total call duration of the time;

generating an identification code based on the first character combination, the second character combination and the total call duration;

and identifying the call content stored in the database based on the identification code.

Further, the determining a corresponding target encryption algorithm based on the first identity information and the second identity information includes:

converting the first identity information into pinyin characters, and acquiring the total number of characters in the pinyin characters as a first number;

Converting the second identity information into Chinese names, and acquiring the total number of Chinese characters in the Chinese names as a second number;

Acquiring a preset encryption algorithm matrix; the encryption algorithm matrix comprises a plurality of rows and a plurality of columns, and different encryption algorithm names are added to each position of the encryption algorithm matrix;

acquiring an encryption algorithm corresponding to the encryption algorithm name at the nth row and nth column positions in the encryption algorithm matrix as a target encryption algorithm; wherein m is a number corresponding to the first number, and n is a number corresponding to the second number.

Further, nine characters are respectively included in the first character combination and the second character combination;

The generating an identification code based on the first character combination, the second character combination and the total call duration includes:

Creating a blank data table with three rows and three columns, and sequentially adding characters in the first character combination into the blank data table one by one to obtain a first data table;

Sequentially adding the characters in the second character combination into the first data table one by one to obtain a second data table; wherein, in each cell of the second data table, the character in the second character combination is located after the character in the first character combination;

Acquiring the number of minutes and the number of seconds in the total call duration; the number corresponding to the minutes is a first number, and the number corresponding to the seconds is a second number;

Respectively constructing a first ray and a second ray by taking the lower left corner of the second data table as a vertex; the included angle between the first ray and the bottom of the second data table is a first included angle, and the first included angle is the same as a first number; the included angle between the second ray and the bottom of the second data table is a second included angle, and the second included angle is the same as a second number;

And extracting characters in cells intersected with the first ray and the second ray from the second data table, and sequentially combining the characters to obtain combined characters serving as the identification codes.

The invention also provides a translation device combining image analysis and voice recognition, which comprises:

the first acquisition unit is used for acquiring a face image of a user of the translator, and analyzing the face image based on an image analysis model to obtain first identity information of the user of the translator;

The second acquisition unit is used for acquiring voice information of the translator user, analyzing the voice information based on a voice recognition model and obtaining second identity information of the translator user;

The matching unit is used for matching corresponding first model parameters based on the first identity information; based on the second identity information, matching corresponding second model parameters; the first model parameter and the second model parameter are different parameter values corresponding to the model parameter of the same type;

The computing unit is used for carrying out fusion computation on the first model parameters and the second model parameters to obtain fusion model parameters;

the translation unit is used for updating a preset language translation model based on the fusion model parameters to obtain an updated translation model; and translating the call of the translator based on the updated translation model.

The invention also provides a translation machine comprising a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of any one of the methods when executing the computer program.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.

The invention provides a translation method, a device and a translator combining image analysis and voice recognition, which comprise the following steps: acquiring a face image of a user of a translator, and analyzing the face image based on an image analysis model to obtain first identity information of the user of the translator; collecting voice information of a translator user, and analyzing the voice information based on a voice recognition model to obtain second identity information of the translator user; based on the first identity information, matching corresponding first model parameters; based on the second identity information, matching corresponding second model parameters; the first model parameter and the second model parameter are different parameter values corresponding to the model parameter of the same type; performing fusion calculation on the first model parameters and the second model parameters to obtain fusion model parameters; updating a preset language translation model based on the fusion model parameters to obtain an updated translation model; and translating the call of the translator based on the updated translation model. In the invention, the image analysis and voice recognition technology is comprehensively utilized, so that the accuracy of user identification is improved, and a translation model can be customized according to the user identification information, thereby providing more accurate and personalized translation service and overcoming the defects that the conventional translation system is easy to cause translation deviation and low accuracy.

Drawings

FIG. 1 is a schematic diagram showing steps of a translation method combining image analysis and speech recognition according to an embodiment of the present invention;

FIG. 2 is a block diagram showing a translation device combining image analysis and speech recognition according to an embodiment of the present invention;

FIG. 3 is a block diagram illustrating a translation engine according to an embodiment of the present invention.

The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, in one embodiment of the present invention, a translation method combining image analysis and speech recognition is provided, including the following steps:

Step S1, acquiring a face image of a user of a translator, and analyzing the face image based on an image analysis model to obtain first identity information of the user of the translator;

s2, collecting voice information of a user of the translator, and analyzing the voice information based on a voice recognition model to obtain second identity information of the user of the translator;

Step S3, matching corresponding first model parameters based on the first identity information; based on the second identity information, matching corresponding second model parameters; the first model parameter and the second model parameter are different parameter values corresponding to the model parameter of the same type;

S4, carrying out fusion calculation on the first model parameter and the second model parameter to obtain a fusion model parameter;

Step S5, updating a preset language translation model based on the fusion model parameters to obtain an updated translation model; and translating the call of the translator based on the updated translation model.

In this embodiment, the method specifically includes:

In the above step S1, the image acquisition technique: a high-definition camera is adopted, and simultaneously, an infrared sensor or a depth sensor can be combined to capture facial images of a user under various light conditions. Privacy protection measures are taken into account during collection, so that user consent is ensured and only when necessary is performed.

Image analysis model: the model is usually based on a Convolutional Neural Network (CNN) or other deep learning architecture, and can identify facial feature points such as eye corners, nose tips, mouth corners and the like through training of a large number of face images, and analyze the facial feature points through geometric relations, texture features and the like. In addition, the model can analyze auxiliary information such as nationality, expression, age, gender and the like, and the robustness of identity recognition is improved.

In the above step S2, the voice acquisition technique: the high-sensitivity microphone array is utilized, and algorithms such as noise suppression, echo cancellation and the like are combined, so that the user voice can be clearly captured in a noisy environment. The directivity is emphasized during the collection, and the environmental interference is reduced.

Speech recognition and voiceprint recognition model: the speech recognition portion converts the audio signal into text using long term memory network (LSTM), transducer, etc. models. The voiceprint recognition is further based on the characteristics of spectrum analysis, mel Frequency Cepstrum Coefficient (MFCC) and the like, the unique acoustic characteristics of the individual are recognized through a machine learning algorithm, and the identity information of the user is recognized through the acoustic characteristics, wherein the identity information mainly comprises nationality and region information.

In the above step S3, the parameter library is personalized: a database is maintained containing user personalization parameters that are model parameters refined based on past usage habits, language habits, common expressions, etc. of the user.

Parameter matching logic: based on the identity information determined in the above steps, the model parameters that best match the user are found or calculated in a database. For example, for users who often use specialized terms, their language model parameters may favor the accuracy of translation of the specialized vocabulary; the parameter set adapted to the specific accent can improve the accuracy of the accent identification, and users in different nationalities and regions can correspond to different model parameters. It should be understood that the first model parameter and the second model parameter are different parameter values corresponding to the model parameter of the same type, that is, the model parameter types are consistent, and specific parameter values are different.

In the above step S4, the fusion algorithm: fusion calculations may employ a variety of strategies such as simple weighted averages, more complex attention mechanisms, or adaptive fusion using a deep learning network. The key is to balance the importance of visual and audible information, and the weights can be dynamically adjusted according to the scene.

Parameter optimization: the fusion process can also comprise fine adjustment of parameters, and the newly generated fusion model parameters can be ensured to reflect individual requirements of users and can keep generalization capability of the translation model by utilizing optimization algorithms such as gradient descent and the like.

In the above step S5, the model update flow: the fused parameters are injected into a basic translation model, and model parameters are quickly adjusted. This process involves on-line learning or migration learning strategies of the model, ensuring a continuous improvement in translation quality.

Real-time application: the updated translation model is deployed to translation equipment or cloud service in real time, and a translation request of a user is responded in real time. In the actual dialogue, the translation model can quickly and accurately understand and convert the language of the user, and meanwhile, the natural smoothness and cultural adaptability of the translation are maintained, and the user experience is improved. And customizing a translation model according to the identity information of the user, and providing more accurate and personalized translation service.

In this embodiment, the image analysis and voice recognition technology is comprehensively applied, so that the accuracy of user identification is improved, and a translation model can be customized according to the identity information of the user, so that more accurate and personalized translation service is provided, and the defects that the existing translation system is easy to cause translation deviation and low in accuracy are overcome.

In an embodiment, the analyzing the face image based on the image analysis model to obtain the first identity information of the translator user includes:

In this embodiment, the method specifically includes:

Image analysis model construction: this image analysis model is typically based on deep learning, particularly Convolutional Neural Networks (CNNs) or more advanced architectures such as ResNet, mobileNet, etc., which are excellent in handling image recognition and classification tasks. They extract complex features from the original pixel data through multi-layer neuron learning, thereby achieving efficient discrimination of different classes.

Data set preparation: in order for the model to be able to predict nationality, a face image dataset containing a broad nationality population is required, with each sample having the correct nationality tag attached. This dataset should be as comprehensive as possible, covering populations of different ages, sexes, skin colors and cultural backgrounds to ensure generalization ability of the model.

Model training: by using the data set, the model learns to associate the specific face characteristics with the corresponding nationalities in a supervised learning mode. In the training process, the model adjusts internal parameters to minimize errors between the predicted nationality and the true nationality.

Feature and nationality mapping: the model learns not a direct nationality label but a complex mapping relation between the face features and nationalities. The model predicts the most likely corresponding nationality by comparing the user's face feature vector with the feature distribution of each category (i.e., different nationalities) in the training set.

After the image analysis model is constructed, nationality identification can be performed by using the image analysis model.

First-advanced face feature extraction, which includes key point detection: the image analysis model can locate and mark key feature points in the face image, such as positions of eyes, nose and mouth, facial contours and the like, and the step is helpful for standardization and alignment of the face image, so that subsequent analysis is facilitated.

Feature vector extraction: then, the model converts the whole face image into a high-dimensional feature vector through a series of convolution layers and pooling layers, and the vector contains comprehensive information such as the shape, the structure and the texture of the face and is a core representation of individual differences of the face. And then analyzing the high-dimensional feature vector based on the image analysis model, and comparing the high-dimensional feature vector with feature distribution of each category (namely different nationalities), so as to predict the most probably corresponding nationalities as the first identity information.

In an embodiment, the analyzing the voice information based on the voice recognition model to obtain the second identity information of the translator user includes:

In this embodiment, the method specifically includes:

voice collection and pretreatment: first, the translation device collects a voice sample of the user through a built-in high quality microphone. The preprocessing stage comprises noise reduction, reverberation removal, voice segmentation and the like, so that the extracted characteristics are pure and accurate and are not interfered by environmental factors.

Feature extraction: key parameters reflecting physical and linguistic characteristics of speech are extracted from the processed speech signal using techniques such as Mel-frequency cepstral coefficient (MFCC), fundamental frequency (F0), speech envelope, and the like. These features include not only basic properties such as pitch, intensity, etc., but also more complex linguistic features such as pronunciation habits, rhythms, intonation variations, etc.

Regional voice model: existing regional voice databases are constructed or utilized, which contain voice samples from different regional populations, each sample labeled with the approximate regional information of the speaker. The breadth and depth of the database directly affect the accuracy of the predictions.

Model training and classification: based on the characteristics and the region labels, a machine learning or deep learning model (such as a support vector machine, a random forest, a deep neural network and the like) is used for training, and the distinction between the voice characteristics of different regions is learned. The training process aims to build a mapping model from speech features to geographical information.

Region identification: after the user voice is extracted by the features, the model compares the features with the region feature modes obtained by training, and predicts the region to which the user most probably belongs through a classification algorithm (such as nearest neighbor, maximum likelihood estimation and the like). The regional information may be countries, regions, or even finer geographical divisions, depending on the degree of refinement of the model and the richness of the training data.

In an embodiment, the performing a fusion calculation on the first model parameter and the second model parameter to obtain a fused model parameter includes:

In this embodiment, by distributing the contribution degree (i.e., the weight value) of the image and the voice information, the advantages of the image and the voice information are complemented to form a comprehensive model parameter which can better represent the identity of the user, so as to guide the personalized update of the translation model, and improve the quality of the translation service and the user experience. In this embodiment, the weight values corresponding to the first model parameter and the second model parameter are preset values.

In an embodiment, after translating the call of the translator based on the updated translation model, the method includes:

The call content of the translator is stored in a database;

Acquiring the total call duration of the time;

In this embodiment, the method specifically includes:

Data archiving: firstly, the call content which is translated by the translator is recorded and stored in a database. The method is convenient for subsequent inquiry, analysis or review, and is important for quality of service monitoring, user feedback processing and continuous optimization of the translation model.

Identity information identification and algorithm selection: based on the first identity information and the second identity information, a corresponding secure encryption algorithm is determined. This step allows for customized security measures to be implemented, taking into account the fact that different users or groups may have different security requirements or preferences.

Encryption processing: the first and second identity information are converted into a first character combination and a second character combination, respectively, which are difficult to interpret, using a determined target encryption algorithm. This procedure enhances user privacy protection and even if data is illegally accessed, direct identity information cannot be easily revealed.

Recording the call duration: the total duration of the call is recorded and this information can be used for statistical analysis or as part of a security identification.

Generating an identification code: and generating a unique identification code by combining the encrypted first character combination, the encrypted second character combination and the total call duration. The identification code is a highly abstract information, which contains the basic information of the call and has high security due to encryption processing.

And (3) identifying a database: and finally, marking the call content stored in the database by using the generated identification code. This means that when a specific call record needs to be retrieved or accessed, the user identity information does not need to be directly exposed, but is performed by a secure identification code, further enhancing data management and privacy protection.

In summary, through a series of detailed operations, the scheme not only ensures the effective storage of the translation call content to support service optimization and management, but also greatly improves the data security through encryption and anonymization processing, and meets the current strict requirements on personal privacy protection. In this way, the demand of the service function is satisfied, and the privacy of the user information is also ensured.

In an embodiment, the determining the corresponding target encryption algorithm based on the first identity information and the second identity information includes:

In this embodiment, the method specifically includes:

Converting the first identity information into pinyin character counts: first, first identity information (e.g., china) is converted to its Pinyin form, and then the total number of all Pinyin characters is calculated. For example, if the first identity information is "Chinese", the pinyin is converted to "Zhong guo", and the total number of characters is 8.

Converting the second identity information into Chinese names to count Chinese characters: then, the second identity information is confirmed and kept in the form of Chinese names, and then the total number of Chinese characters in the second identity information is counted. If the second identity information is "Guangdong", the total number of Chinese characters is 2.

The encryption algorithm matrix is constructed by designing a preset encryption algorithm matrix which consists of a plurality of rows and a plurality of columns, and each grid represents a different encryption algorithm. For example, the first row and first column of the matrix may correspond to the AES encryption algorithm, the second row and first column to the RSA algorithm, etc. The design of the matrix is pre-configured, and the matrix contains a plurality of efficient and widely accepted encryption algorithms, each encryption algorithm having a fixed location.

The encryption algorithm is selected according to the number by using the first number (total number of pinyin characters) and the second number (total number of hanzi) obtained as described above as the row number and column number of the matrix, i.e. m and n. If the first number is 8 and the second number is 2, the encryption algorithm is extracted from the 8 th row and the 2 nd column of the matrix as a target encryption algorithm. Thus, each selection is dynamic and unique, depending on the particular identity information characteristics of the participant.

In this embodiment, the above mechanism enables a flexible and seemingly random way of encryption algorithm selection by mapping simple statistical features of identity information to specific locations of an encryption algorithm matrix. The method has the advantages that a personalized encryption scheme can be provided for each communication, meanwhile, the method does not depend on the specific content of the identity information directly, and the difficulty of predicting or cracking an encryption mode by an attacker is increased. In addition, it can simplify management and configuration because the actual encryption algorithm selection logic is hidden under a relatively simple mathematical operation.

In an embodiment, the first character combination and the second character combination respectively comprise nine characters;

In this embodiment, the method specifically includes:

First, a basic three-row three-column blank data table is created. This data table will serve as a base framework for subsequent processing.

Filling the first character combination, namely filling the characters in the first character combination into the blank data table one by one according to the sequence to form a first data table. For example, if the first character combination is "ABC123456", then the nine characters would be sequentially filled into the nine cells of the data table one by one.

And merging the second character combinations, namely adding the characters of the second character combinations into the first data table one by one in sequence, wherein the characters of the first character combinations in each cell are immediately followed by corresponding characters of the second character combinations. For example, if the second character combination is "1016DEF58", the characters in the respective cells of the second data table are, eventually, in order: a1, B0, C1, 16, 2D, 3E, 4F, 55, 68.

And (3) extracting call time information, namely separating the minutes and seconds of the total call time, and marking the minutes and seconds as a first number and a second number respectively. For example, if the call duration is 12 minutes 49 seconds, then the first number is 12 and the second number is 49.

Constructing rays and locating characters using the lower left corner of the second data table as a starting point, constructing two rays from the first number and the second number. The first ray forms a first angle with the bottom of the data table equal to the angle corresponding to the first number (e.g., the first number is 12, then the first angle is 12). Similarly, the second ray and the second included angle of the bottom are the angles corresponding to the second number (for example, the second number is 49, and the second included angle is 49 °).

And finally, extracting characters in the cells where the intersection points are located according to the intersection points of the two rays and the cells of the data table, and combining the characters according to the sequence of the cells to form the final identification code. The intersection refers to that the two rays pass through the corresponding cells to be intersected.

In summary, the scheme creatively combines character sequences and time information, and utilizes geometric figures (intersections of rays and tables) to select specific characters, so that a unique identification code which is associated with information of two communication parties and integrates call duration is generated. The method not only enhances the individuation and the security of the identification code, but also reveals a novel information coding thought.

Referring to fig. 2, in an embodiment of the present invention, there is further provided a translation device combining image analysis and speech recognition, including:

In this embodiment, for specific implementation of each unit in the above embodiment of the apparatus, please refer to the description in the above embodiment of the method, and no further description is given here.

Referring to fig. 3, in an embodiment of the present invention, a translator is further provided, and the internal structure of the translator may be as shown in fig. 3. The translator includes a processor, memory, display, input device, network interface, and database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the translator includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the translator is used for storing the corresponding data in this embodiment. The network interface of the translator is used for communicating with an external terminal through a network connection. Which computer program, when being executed by a processor, carries out the above-mentioned method.

It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture associated with the present inventive arrangements and is not intended to limit the translation engine to which the present inventive arrangements are applied.

An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above method. It is understood that the computer readable storage medium in this embodiment may be a volatile readable storage medium or a nonvolatile readable storage medium.

In summary, the translation method, device and translator for combining image analysis and voice recognition provided in the embodiments of the present invention include: acquiring a face image of a user of a translator, and analyzing the face image based on an image analysis model to obtain first identity information of the user of the translator; collecting voice information of a translator user, and analyzing the voice information based on a voice recognition model to obtain second identity information of the translator user; based on the first identity information, matching corresponding first model parameters; based on the second identity information, matching corresponding second model parameters; the first model parameter and the second model parameter are different parameter values corresponding to the model parameter of the same type; performing fusion calculation on the first model parameters and the second model parameters to obtain fusion model parameters; updating a preset language translation model based on the fusion model parameters to obtain an updated translation model; and translating the call of the translator based on the updated translation model. In the invention, the image analysis and voice recognition technology is comprehensively utilized, so that the accuracy of user identification is improved, and a translation model can be customized according to the user identification information, thereby providing more accurate and personalized translation service and overcoming the defects that the conventional translation system is easy to cause translation deviation and low accuracy.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present invention and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims

1.A translation method combining image analysis and speech recognition, comprising the steps of:

2. The method for translating in combination with image analysis and speech recognition according to claim 1, wherein the analyzing the face image based on the image analysis model to obtain the first identity information of the translator user comprises:

3. The method for translating in combination with image analysis and speech recognition according to claim 1, wherein analyzing the speech information based on a speech recognition model to obtain the second identity information of the translator user comprises:

4. The method for combining image analysis and speech recognition according to claim 1, wherein the performing fusion calculation on the first model parameter and the second model parameter to obtain a fused model parameter includes:

5. The method for translating a call to a translator according to claim 1, wherein after translating the call to the translator based on the updated translation model, the method comprises:

The call content of the translator is stored in a database;

Acquiring the total call duration of the time;

6. The method of claim 5, wherein determining a corresponding target encryption algorithm based on the first identity information and the second identity information comprises:

7. The method according to claim 5, wherein the first character set and the second character set each include nine characters;

8. A translation device combining image analysis and speech recognition, comprising:

9. A translation machine comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 1 to 7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.