Handwritten poetry recognition method integrating deep learning with scenic spot knowledge map
Technical Field
The invention belongs to the field of electronic information, and relates to a handwritten poetry recognition method based on deep learning and knowledge maps and applicable to scenic spots.
Background
With the rapid development of economy in China, scenic spot visiting and touring increasingly become indispensable contents for daily life of people. Meanwhile, in order to improve the popularity and attract passengers, different scenic spots in the scenic spots usually quote poetry works of famous poetry famous ancient and modern times so as to improve the personal information of the scenic spots. These cited poems usually copy the handwritten characters of the famous poetry of the cited poems, resulting in different styles of the handwritten poems in scenic spots and also causing difficulty in recognizing all the characters in the cited poems completely by tourists today. Therefore, the recognition method of the handwritten poetry in the scenic spot obviously becomes a key problem. At present, a mature technical method for the field exists at home and abroad, and the method can be mainly divided into two types: the first method is a method for respectively training a scene poetry image text detection and recognition model, wherein the text detection method mainly refers to an algorithm in the field of target detection and performs regression detection on a text box of a poetry text; the text recognition method mainly refers to an algorithm in the field of speech recognition, performs characteristic coding on poetry text image regions, and further decodes poetry text content information by using a model. The second is an end-to-end recognition method that combines text detection with a text recognition model, which can optimize the text detection model by the text recognition result, but has a greater computational complexity than the first method. Although the two methods have achieved good recognition effect at present, the recognition effect is poor due to the factors of complex text content fonts, poor imaging effect and complex background of handwritten poetry images in scenic spots. In view of the fact that at present, more and more scenic spots establish knowledge maps facing the scenic spots, tourists can conveniently retrieve scenic spot information in the scenic spots, and a new solution is brought to the problem that the recognition effect of handwritten poetry image texts in the scenic spots is poor.
Disclosure of Invention
In order to solve the problem that a poetry text extracted from a scenic spot handwritten poetry image by adopting a traditional text recognition method is difficult to cover a main text of the poetry, the invention provides a scenic spot handwritten poetry recognition method integrating deep learning with a scenic spot knowledge map. The method mainly interacts with a smart phone of a user, senses the position and orientation information of the user by means of an integrated Beidou/GPS sensor, and the user acquires image data of handwritten poems in a scenic spot through a camera of the smart phone; detecting poetry text position information in the scenic spot handwritten poetry image data through a FPN-based scenic spot handwritten poetry detection technology, and extracting specific area position information of poetry texts; identifying poetry text information in the region by using an ACE-based scenic spot handwritten poetry identification method to obtain a preliminary poetry text identification result; and correcting and perfecting the poetry text result primarily identified through the scenic spot knowledge map, and returning the final poetry text identification result to the user for display.
The invention comprises the following steps:
step 1, acquiring a scene region handwritten poetry text image to be identified and related attribute information of the image, including an image shooting geographical position, an image background texture and a character direction, weakening noise in the image through a spatial domain enhancement algorithm, and enabling a structural similarity index of an original image and a denoised image to be larger than 0.9 to obtain a preprocessing result.
And 2, inputting the preprocessed scenic spot handwritten poetry images into a feature extraction network based on VGG16 to perform feature extraction of poetry texts, obtaining a handwritten poetry text weight sequence of the scenic spot images through a poetry text classifier, and training the classifier.
And 3, inputting the extracted handwritten poetry text characteristic graph into a handwritten poetry detection network based on the FPN for fusion to obtain a single-character Gaussian thermodynamic diagram of the handwritten poetry text, and further obtaining the text region position information of the handwritten poetry image in the scenic spot through a multi-character text frame linking algorithm.
And 4, performing area cutting on the handwritten poetry images in the scenic region according to the extracted position information of the text images, and sequentially inputting the cut areas into an ACE (adaptive communication environment) -based handwritten poetry recognition network to perform Encoder-Decoder processing on the handwritten poetry texts to obtain recognition results of the handwritten poetry images in the scenic region.
And 5, inputting relevant attribute information of the handwritten poetry images of the scenic spot into a knowledge map of the scenic spot, carrying out graph search to obtain a search knowledge result set, and obtaining a final text recognition result of the handwritten poetry images of the scenic spot by using the recognition result obtained in the step 4 and the search knowledge result set through a matching algorithm of the handwritten poetry of the scenic spot.
Effects of the invention
The handwritten poetry recognition method based on fusion of vision and scenic spot knowledge maps designed by the smart phone can facilitate scenic spot visitors to perform text recognition on handwritten poetry in scenic spots, enhance understanding of scenic spot culture, improve scenic spot pedestrian volume and promote scenic spot relevant culture. The method comprises the steps of performing text recognition on handwritten poetry images in a scenic spot by introducing deep learning, extracting relevant poetry information from a scenic spot knowledge map according to the position and orientation information of tourists acquired by a Beidou/GPS sensor and poetry relevant attribute information, and correcting a text recognition result to assist the tourists in accurately recognizing the handwritten poetry image texts.
Invention difficulties
(1) A detection technology of scene handwritten poetry based on FPN is designed, the difficulty lies in accurate prediction of large and small text areas in scene handwritten poetry images, and meanwhile, the requirement of low delay is also guaranteed.
(2) The recognition method of the handwritten poetry in the scenic spot based on the ACE is designed, the text recognition is carried out on the detected poetry image in the text area, and the method is considered to be applied to smart phone interaction, so that the difficulty lies in ensuring the requirements on the recognition accuracy and the real-time performance of the poetry text.
(3) A poetry correcting technology based on a knowledge graph is designed, the poetry text result which is preliminarily recognized is corrected and perfected, and the difficulty lies in how to perform fusion matching calculation on the recognized text and the knowledge entity in the knowledge graph of the scenic spot.
Drawings
Fig. 1 is a frame diagram of recognition of handwritten poems in a scenic spot.
Fig. 2 is a multi-feature module network architecture.
Fig. 3 is a diagram of the overall architecture of the system.
FIG. 4 is a view of scenic spot knowledge acquisition and processing steps
Fig. 5 is a scenic spot knowledge cloud recognition service process diagram.
Fig. 6 is a flow chart of result matching of poetry in a scenic region.
FIG. 7 is a fusion architecture diagram of a knowledge graph and a scenic region poetry recognition algorithm.
Core algorithm of the invention
(1) Detection technology of handwritten poetry in scenic spot based on FPN
The structure is as shown in a scene area handwritten poetry image text region detection module in fig. 1, the whole structure is totally divided into 3 parts, which are respectively: the method comprises the steps of handwritten poetry space feature extraction network, a character key point calibration algorithm and a multi-character text box linking algorithm. Firstly, inputting handwritten poetry images in a scenic spot into a handwritten poetry space characteristic extraction network to extract handwritten poetry text characteristics; then, marking key points of single characters by extracted poetry text characteristics through a character key point calibration algorithm; and finally, processing the marked single-character key points by a multi-character text box linking algorithm to obtain region coordinate information of handwritten poetry text in the scenic spot, and transmitting the information as input to a scenic spot handwritten poetry image text character recognition module. The detailed description thereof is as follows:
the method comprises the steps of extracting the characteristics of the handwritten poetry images of the scenic spot by the VGG16 to obtain the poetry text characteristic diagram of the handwritten poetry images of the scenic spot, wherein the characteristics of the convolutional network have hierarchy, and the characteristics of different hierarchies can be subjected to information complementation, so that the network can be directly and effectively incorporated into the multilayer characteristic fusion of a single model, and the accuracy of the network is improved. The backbone network with the multi-feature module fusion uses a feature pyramid network. In order to solve the problem of multi-scale target detection, the feature pyramid network extracts features of different scales from bottom to top from the same picture by using a convolutional neural network, and all layers are connected with each other by using the existing network. The invention adopts 7 layers of pyramid networks, and the layers are connected with each other to promote the characteristic multiplexing. In the pyramid network of the present invention, data transmission is performed using a horizontal pairing of a bottom-up path and a top-down path. The network structure is shown in fig. 2.
1) A bottom-up path: refers to the upward flow of data. As shown on the left side of fig. 2, each layer includes a convolutional layer, a pooling layer, an activation function layer, and a cyclic layer. This path can result in 7 multiscale feature maps, labeled { c }1,c2,c3,c4,c5,c6,c7And recording character characteristics of different levels by different characteristic graphs, wherein the low-level characteristics reflect the characteristic boundaries of poetry characters of a shallow level, and the high-level characteristics reflect the characteristics of poetry characters of a deeper level.
2) A top-down path: refers to the downward flow of data as shown on the right side of fig. 2. By up-sampling the feature map, a high-resolution feature map with strong semantic information is provided, which is important for detecting the handwritten poetry texts in the scenic region. The outputs of the previous layers are used as the input of the current layer, and richer features are extracted. Meanwhile, a deformable convolution module Def-Incept is added, the characteristics of partial deformed texts are extracted, and then a multilayer characteristic diagram { p } is generated1,p2,p3,p4,p5,p6,p7And finally, carrying out convolution operation on the characteristic diagram once, so that the number of parameters is reduced, and the confounding effect caused by deformation is eliminated. The following equation (1) shows the process of feature extraction.
Where, Conv denotes a convolution operation,
representing feature fusion, UpSample representing an upsampling operation, and Def-inclusion representing a deformable convolution operation. The feature pyramid network module constructed by the invention can fuse shallow features and deep featuresAnd adding the deep semantic information into the shallow feature map, and fusing the features of the shallow feature map and the deep semantic information to generate more small target features, which is very helpful for improving the detection capability of the model on the small text image. And processing the output information of the operation through a Gaussian kernel function to obtain a Gaussian thermodynamic diagram of the character key nodes of the image text.
The invention designs a multi-character text box linking algorithm. The method comprises the steps of taking Gaussian thermodynamic diagrams of character key nodes of an obtained image text as a precondition, and calculating a link relation between the character key nodes to obtain a final scene handwritten poetry image text detection box. The calculation process of the multi-character text box linking algorithm will be described in detail below.
Firstly, analyzing and calculating each character in a handwritten poetry image text in a scenic region through a character key point Gaussian thermodynamic diagram to obtain the maximum diameter of the Gaussian thermodynamic diagram of each character, drawing a square box by taking the maximum diameter as a side length, and marking the position of the text box of a single character, so that the situation that the text box is difficult to completely contain character areas due to the fact that the shape of the Gaussian thermodynamic diagram has a rotation angle can be avoided. Secondly, selecting half of the length of the diagonal line of the single character text box as an initial value of the radius r of the outward radiation circle, and setting the maximum value max of the radius r according to the maximum side length of the input image. Then, the step length is taken as

The radius of the radiation circle is continuously increased, the search is carried out in the character direction input by the user, if another character text box is encountered, the search is stopped, and the character text box is linked with the center of the encountered text box; if the radius r reaches the maximum value, the end of the text box link is indicated. And finally, integrating the linked text boxes to obtain a final detection area position result of the handwritten poetry image text in the scenic spot.
(2) ACE-based recognition method for handwritten poems in scenic spot
The architecture is shown as a character recognition network module in fig. 1. In a text recognition network using handwritten poetry in a scenic spot, firstly, a local text area obtained by detecting the handwritten poetry text in the scenic spot is subjected to image normalization processing, so that the data is more standard; then inputting the processed image data into a handwritten poetry word character feature extraction network to perform serialized coding on poetry word text features; and finally, decoding the codes after the poetry text characteristic serialization through a character recognizer to obtain a preliminary scenic spot handwritten poetry text recognition result. The detailed description thereof is as follows:
after normalization processing is carried out on a scene handwritten poetry text image, character feature extraction is carried out on the processed text image, the invention takes a convolution cyclic Neural Network (CRNN) as a backbone Network to carry out feature extraction on the scene handwritten poetry image text, and the character feature extraction Network structure mainly comprises a convolution layer and a cyclic Network layer. Firstly, inputting a poetry text image subjected to normalization processing into a convolution layer, and extracting a convolution characteristic diagram of the image; then, inputting the extracted convolution characteristic graph into a circulation network layer for continuously extracting character sequence characteristics on the basis of the convolution characteristics, wherein the characteristics comprise context information of a poetry text, so that a character recognition result is more accurate; and finally, outputting the extracted features, so as to facilitate further research and analysis.
The method adopts an Aggregation Cross Entropy (ACE) algorithm to decode the characteristic sequence of the handwritten poetry image text in the scenic region so as to realize the recognition of the handwritten poetry image text in the scenic region. The goal of the algorithm is to multiply the gain values by the high frequency portions of the image that express the particular content and then recombine them to obtain a better image. Therefore, calculating the gain factor of the high frequency part is the core of the ACE algorithm. In the initial stage of model training, because different characters are uniformly distributed at different moments and different character categories, the gain coefficient is set to be 1; in the training stage, the probability of a certain category is far higher than that of other categories at different moments, and the gain coefficient is set to be the number of characters of the text in the image. The ACE algorithm can be well suitable for the recognition situation of long texts, can solve the alignment problem caused by sequences of indefinite length, and provides great help for decoding the feature sequences of handwritten poetry texts in scenic spots.
(3) Poetry correcting technology based on knowledge graph
The architecture is shown as a poetry rectification technology module in fig. 1. The invention provides knowledge reasoning and related content knowledge recommendation functions, which specifically comprise the following steps: and (4) searching a map by using the scenic spot knowledge map by taking poetry description information inquired by the user as a searching condition to obtain a searching result set C.
Secondly, after the search result set C and the primary recognition result x of the scenic spot handwritten poetry text image are participled through a participle algorithm f (-), the obtained search result set keyword matrix S is obtainedn×mAnd a result matrix E of primary recognition of handwritten poetry text images in scenic spots1×mWhere n represents the number of entities in the search result set, m represents the number of keywords after processing, k represents the index position of the entities in the search result set, f (-) represents the word segmentation function for segmenting the text, and C [ k ]]Representing a single text in the search result set, and x representing the preliminary recognition result of a handwritten poetry text image in the scenic spot. The calculation formulas are shown in (2) and (3).
Sk×m=f(C[k]) (2)
E1×m=f(x) (3)
Then, the entity keywords in the matrix obtained above are calculated by using the distribution model for generating the word vectors. Each entity key vector v thereofeIs shown in (4), wherein e represents a single sample in the matrix, g (-) represents a distribution model function for generating a word vector, i represents an index position of a keyword of an entity, and Sn×m[e]Representing a single sample data in a keyword matrix of a search result set, Sn×m[e][i]The method comprises the step of gradually taking entity key word data of a single sample in a key word matrix of a search result set.
Obtaining the vector v of the poetry texteIs subjected to normalization treatmentAnd then, obtaining a final text vector V of the poetry text, and combining the n text vectors to generate a search result vector set V. . The calculation formula is shown in (5).
And (4) obtaining a final vector q of the primary recognition result of the handwritten poetry text image in the scenic spot after the primary recognition result of the handwritten poetry text image in the scenic spot is processed.
And finally, calculating the similarity of two poetry text vectors by using the vector q of the primary recognition Result of the handwritten poetry text image in the scenic region and the text vector V of each entity in the search Result vector set V through VSM, outputting the corresponding text Result in the search Result set C of the vector q of the primary recognition Result of the handwritten poetry text image in the scenic region and the poetry text vector with the highest similarity in the search Result vector set V, and taking the corresponding text Result as the text recognition Result of the handwritten poetry image in the scenic region. The calculation formulas are shown as (6) and (7), wherein VSM ((-)) represents a vector space model calculation function, j represents the index position of a single search result vector, and s1And s2For two texts, the word frequency is respectively atAnd btIndicating the index position of the frequency of the text word.
And acquiring a knowledge node with a relation of 'association' with a retrieval Result knowledge node by using a knowledge reasoning method, wherein a three-tuple representation can be described as (. Compared with the prior art, the system has the advantages that the related fragmentation knowledge of the poetry in the scenic spot is more relevant, the poetry body and the poetry characteristic entity are established in a relation by extracting and expressing the knowledge of the original data, so that a user can retrieve the complete information of the poetry more accurately according to the partial content characteristics of the poetry, the scenic spot cultural knowledge and the poetry related resource recommendation service are provided by using the scenic spot poetry knowledge map, and visitors are helped to more effectively and more intuitively receive the scenic spot cultural knowledge.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present disclosure, the technical solutions in the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the present disclosure, and it is obvious that the described implementation examples are only a part of implementation examples of the present disclosure, and not all implementation examples. All other embodiments obtained by a person skilled in the art based on the embodiments in the embodiments of the present invention should fall within the scope of the protection of the embodiments of the present invention.
As pointed out in the background section above, how to improve the accuracy of retrieving and identifying poetry in scenic spots becomes a critical issue. At present, retrieval and poetry identification methods mainly comprise two methods: (1) a traditional scenic region poetry database retrieval system (2) is a scenic region poetry identification method based on computer vision. In the first method, a very rich scenic spot poetry data set is needed, the requirement on database retrieval performance is high, and the system is realized based on keyword or shallow semantic analysis, so that the result is poor and the time and labor cost are greatly consumed. Aiming at the second method, the current popular artificial intelligence method is used, and the collected poetry images in the scenic spots are used for model large-scale training and learning, but the method only uses image single-mode data, has low recognition accuracy rate and cannot meet the condition of searching and recognizing most of poetry scenes in the scenic spots.
In view of the above, the present invention provides a method for identifying handwritten poetry by combining deep learning with a scenic spot knowledge map, which can solve the problems mentioned in the related art.
The following further describes example implementations of the present invention in conjunction with the accompanying drawings.
FIG. 3 shows a system architecture diagram of an example of the present invention. The system mainly comprises a knowledge acquisition and processing module, a knowledge storage module and a knowledge application module. The basic layer comprises a knowledge acquisition and processing module, the database layer and the cache layer comprise a knowledge storage module, and the Service end and the API end comprise a knowledge application module.
And the knowledge acquisition and processing module is used for carrying out three processes of data cleaning, knowledge processing and knowledge representation on the original knowledge of the site poetry knowledge related books, websites, poetry Excel spreadsheets and XML files to obtain a relationship network between site poetry characteristic entities and site poetry bodies. FIG. 4 shows an overall step diagram of the knowledge acquisition and processing module.
For example, taking the laevous guoye scenic spot as an example, the method comprises the steps of collecting data about the laevous guoye poetry in a webpage, performing knowledge processing on the poetry in the scenic spot after data cleaning to obtain structural knowledge shown in table 1 (the data cannot be partially omitted), performing knowledge processing on the poetry in the scenic spot to obtain Chinese word segmentation, part-of-speech tagging, named entity recognition and word vector relation calculation, filtering to obtain characteristic entities of the poetry in the scenic spot as shown in table 2, and obtaining a series of relations through a triple knowledge representation method, such as ' when a doctor throws a lotus root and breaks the lotus root when casting a tassel ', belonging to ' foreign country foreign food and foreign country ', ' belonging to ' foreign country food and country ', belonging to ' foreign country ', and having a relationship (foreign country food, belonging to ' and foam ' if the laevous), and related content knowledge relations such as ' dog's (native dog), association, Ming Guo miscellaneous (miscellaneous), is included in the text of the United states of America and Japan).
The knowledge storage module provides a scenic spot poetry knowledge map storage service by using a Neo4j map database, and stores poetry ontology relations with characteristic entities and related content knowledge ontology relations, wherein poetry ontology attributes comprise ID, name, address, picture, longitude and latitude, content background and related content, and the attribute data types are shown in Table 3.
The system adopts a design mode facing micro-services to carry out platform design, divides a core service line of the system into user identity verification service, user authority control service, poetry characteristic entity extraction service, poetry knowledge reasoning and retrieval service, scenic spot poetry text image recognition service and poetry knowledge recommendation service based on an SOA architecture, and adopts Restful standard design and realizes an API interface. And storing user information and system log records by using a MySQL object relational database, and performing distributed caching by using Redis in consideration of the expandability and high concurrency support of a platform. The knowledge service application of the system platform is packaged by using a Docker container technology, so that the distributed application is convenient for deployment, and the knowledge service system has high transportability and high expandability. The Kubernetes platform management container is adopted, so that the system platform can realize automatic deployment, expansion and management, and the system has high availability.
The knowledge application module comprises common multi-user services, such as: user login, user registration and historical retrieval record query management; knowledge retrieval and reasoning, and using an image text detection and recognition model to assist the knowledge reasoning; recommending poetry content resource knowledge bodies related to the poetry body knowledge according to the poetry body knowledge, if: related books, related poems, and related scenic spots.
Table 1 scenic spot structured knowledge representation example
Table 2 example of representation of characteristic entities of poetry in scenic region
Table 3 attribute data type table
Attribute name
|
Type (B)
|
Description of the invention
|
ID
|
int
|
Knowledge point ID
|
Name
|
string
|
Knowledge point name
|
Image
|
string
|
Poetry picture
|
Address
|
string
|
Geographic location
|
LatitudeLongitude
|
string
|
Latitude and longitude
|
Background
|
string
|
Background
|
RelatedContent
|
string
|
Related content |
Fig. 5 shows a process of performing cloud recognition service on knowledge of scenic spots by a user, the user logs in the system through identity verification to input poetry characteristic information and uploads a poetry text image of the scenic spot, the knowledge inference module and the poetry image recognition module obtain ontology knowledge of related poetry, the ontology knowledge of poetry complete resources is obtained through inference according to the ontology knowledge, the knowledge is integrated and pushed to the user, and a knowledge service flow is completed. Fig. 6 shows a flow chart of result matching of poetry in a scenic region. Fig. 7 shows a deep learning and scenic spot knowledge map fusion architecture diagram.