CN110929125A

CN110929125A - Search recall method, apparatus, device and storage medium thereof

Info

Publication number: CN110929125A
Application number: CN201911126486.2A
Authority: CN
Inventors: 陈诚; 冯帅; 邓威; 王军伟; 方高林; 郑楚涛; 郑黄晓为
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2020-03-27
Anticipated expiration: 2039-11-15
Also published as: CN110929125B

Abstract

The application discloses a search recall method, a search recall device, a search recall equipment and a storage medium thereof. The method comprises the following steps: receiving an input query word; performing query intention recognition on a query word to obtain a recall feature vector, wherein the recall feature vector comprises a first feature, and the first feature is represented by information for uniquely identifying an entity name in the query word; and recalling the target document related to the first characteristic from the candidate documents according to a pre-established inverted index list, wherein the inverted index list is established after the candidate documents are subjected to named entity recognition processing in advance, and comprises a corresponding relation between the first characteristic and at least one document identifier. According to the technical scheme of the embodiment of the application, the information for uniquely identifying the entity name in the query word represents the entity name in the query word, and the pre-established inverted index table is searched based on the information of the unique identification, so that the accuracy of the recall result is effectively improved.

Description

Search recall method, apparatus, device and storage medium thereof

Technical Field

The present application relates to the field of internet technologies, and in particular, to a search recall method, apparatus, device, and storage medium.

Background

The news information searching function provides a quick channel for obtaining information results for users. The search engine recalls the query results related to the query terms in the network according to the query terms input by the user, then sorts the query results, and displays the query results sorted in the front to the user.

During the search process, the results obtained by the user, although formally associated with the query terms, do not match the user's query objectives in substance. In particular, when it is desired to search for a query result related to a professional field, the accuracy of the query result obtained based on the query term is not high.

Disclosure of Invention

In view of the above-mentioned drawbacks and deficiencies of the prior art, it is desirable to provide a method, an apparatus, a device and a storage medium for recalling search, which improve the accuracy of the recall result by uniquely identifying the information object during the information search process.

In one aspect, an embodiment of the present application provides a search recall method, which includes the following steps:

receiving an input query word;

performing query intention recognition on a query word to obtain a recall feature vector, wherein the recall feature vector comprises a first feature, and the first feature is represented by information for uniquely identifying an entity name in the query word;

and recalling the target document related to the first characteristic from the candidate documents according to a pre-established inverted index list, wherein the inverted index list is established after the candidate documents are subjected to named entity recognition processing in advance, and comprises a corresponding relation between the first characteristic and at least one document identifier.

In one aspect, an embodiment of the present application provides a search recall apparatus, which includes:

the receiving unit is used for receiving input query words;

the identification unit is used for identifying the query intention of the query word to obtain a recall feature vector, wherein the recall feature vector comprises a first feature, and the first feature is represented by information for uniquely identifying an entity name in the query word;

and the recalling unit is used for recalling the target document related to the first characteristic from the candidate documents according to a pre-established inverted index list, wherein the inverted index list is established after the candidate documents are subjected to named entity recognition processing in advance, and the inverted index list comprises a corresponding relation between the first characteristic and at least one document identifier.

In one aspect, embodiments of the present application provide a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the method as described in embodiments of the present application.

In one aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to:

which when executed by a processor implements a method as described in embodiments of the present application.

According to the search recall method, the search recall device, the search recall equipment and the storage medium, the received query words are identified according to the query intentions, and unified labels of entity names contained in the query words are constructed, namely, the entity names in the query words are represented by information for uniquely identifying the entity names in the query words, a pre-established inverted index list is searched based on the information of the unique identification, and the inverted index list is also established in advance based on named entity identification processing.

Optionally, the uniformly labeled sorting feature is referred to in the sorting stage, and the recall result can be optimally sorted and provided for the user through the sorting feature, so that the display efficiency is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic structural diagram illustrating an implementation environment related to a search recall method provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a search recall method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating a search recall method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a search recall method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a data structure of an inverted index list provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram illustrating a search recall apparatus 500 according to an embodiment of the present application;

FIG. 7 is a block diagram illustrating an exemplary structure of a search recall apparatus 600 provided in accordance with an embodiment of the present application;

FIG. 8 is a diagram illustrating a complete flow of a search recall method provided by an embodiment of the present application;

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use in implementing the computer device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be further noted that, for the convenience of description, only the portions relevant to the disclosure are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The following first describes an implementation environment related to the search recall method provided in the embodiment of the present application. Referring to fig. 1, fig. 1 is a schematic structural diagram illustrating an implementation environment related to a search recall method according to an embodiment of the present application. As shown in fig. 1, the implementation environment includes a terminal 11 and a server 12. The type of the terminal 11 includes, but is not limited to, a smart phone, a desktop computer, a notebook computer, a tablet computer, a wearable device, a multimedia playing device, and the like, and various applications, such as news information software, stock information software, or other information software, may be installed on the terminal, which is not specifically limited in this embodiment of the present application.

In the embodiment of the present application, the terminal 11 is configured to obtain a query term input by a user, and send the obtained query term to the server 12 in a network request manner, and the server 12 is configured to return a result related to the query term to the terminal 11 according to the query term sent by the terminal 11, so that the terminal 11 displays the result to the user. The server may be an independent server, a server cluster composed of several servers, or a cloud computing center. The server can provide query processing services for the terminal. The server may be a backend server of the application, for example: the server can be an intermediate server, and the terminal can interact with the server through an application program, so that the query processing flow is realized. The terminal can interact with the server in a wired or wireless mode, so that the query processing flow is realized.

The search recall method provided by the embodiment of the application can be executed by taking the search recall device as an execution subject. The search recall means may be integrated in a computer device such as a terminal or a server, and the search recall means may be hardware or a software module. Or may be performed by a single terminal or server, or by a combination of both.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a search recall method according to an embodiment of the present application. The method may be performed by a search recall apparatus.

Step 101, receiving an input query word;

and receiving the input query words, namely a query process. The query term refers to a term input by a user in an input area of the search interface, namely the content of the query. The query words can be words, sentences, numbers, English letters, or the combination of the above forms.

And 102, performing query intention identification on the query words to obtain recall feature vectors. The recall feature vector includes a first feature that is an information representation that uniquely identifies an entity name in the query term.

In this step, query intent recognition, which may also be referred to as user intent recognition, may understand the user search intent.

For the understanding of the search intention, various analysis processes may be performed on the query term in combination with the user historical behavior, for example, after the query term is subjected to a word segmentation process, a rewriting process may be performed. The rewriting process may be understood to include an error correction process, an expansion process, and the like for the query word. The error correction processing may be, for example, rewriting a query word from a complex font to a shorthand font, recognizing and rewriting a full-angle symbol and a half-angle symbol, unifying upper and lower cases of english characters, and removing punctuation and a last word from the query word. And the adjustment of the error correction result can be carried out based on pinyin error correction and font error correction and based on the session analysis of the search log. The expansion processing is to expand the words close to and related to the user query words. Preferably, the extension can be performed based on an identification attribute of the entity name, for example, a stock code, a security name, an english name, a pinyin abbreviation, a company full name, and the like of the entity name. And may further include board of listed companies, originators, chief executive officials, etc. Stock entities are identified by a plurality of dimensions.

Query intent recognition may also include classifying search intents based on the query terms entered by the user. By classifying the search intent, it may be clear that the user is the search direction. For example, it can be identified whether the user intends to know information related to the query term or whether the user wants to acquire a demand related to the query term. Query intent recognition is the information that is desired to be understood by the query terms entered by the user to obtain the most relevant. Query intent recognition may extract information through a named entity recognition algorithm (also known as entity recognition, entity chunking, and entity extraction) that aims to locate and classify named entities in text into predefined categories such as stock codes, security names, english names, pinyin abbreviations, company acronyms, company totals, and the like. Recognition of named entities can also be achieved by Conditional Random Field (CRF) algorithms, neural network-like algorithms, BERTs (acronyms of Bidirectional Encoder responses from transformations), and modified algorithms thereof.

The recalling characteristic vector is a result obtained by rewriting a query word input by a user, identifying at least one entity name according to a named entity rule, and labeling each entity name, so that a plurality of entity names can be uniformly labeled as a unique expression mode.

The recall feature vector may also include ranking features, which may be characterized by correlations between the re-adapted features. The recall feature vector may include the first feature or the second feature. The first feature is labeled with information for uniquely identifying an entity name, and may indicate that the entity name is a stock entity. The second feature is labeled by information for uniquely identifying an entity name and a negative component, and may indicate that the entity name is not a stock entity. For example, the information for uniquely identifying the entity name may be, for example, a stock code, or information obtained by encrypting and mapping a stock code, or information generated by using a stock code together with a stock name and corporate information of a stock entity, and the identification information may also be information for uniquely identifying the entity name. That is, a first feature may be labeled with a stock code and a second feature may be labeled with a stock code and a negative component.

The method for identifying the query intention of the query word to obtain the recall feature vector comprises the following steps:

performing word segmentation processing on the query word to obtain at least one word segmentation;

rewriting each word segmentation;

and carrying out named entity recognition on the participles after the processing to obtain at least one entity name, and determining that each entity name is represented by a first characteristic or a second characteristic.

And 103, recalling the target document related to the first characteristic from the candidate documents according to a pre-established inverted index list. The inverted index list is established after the candidate documents are subjected to named entity recognition processing in advance, and comprises a corresponding relation between the first characteristic and at least one document identifier.

In this step, the inverted index list is a pre-established data structure, and the data structure may include a correspondence between the first feature and at least one document identifier, and may further include a correspondence between the second feature and at least one document identifier. As shown in fig. 4, fig. 4 is a schematic diagram illustrating a data structure of an inverted index list provided in an embodiment of the present application. Where 401 denotes the first feature, 402 denotes the document identification, 403 denotes the negative component, which in combination with 401 constitutes the second feature. 401 may be, for example: stock code 1, stock code 2, stock code 3, stock code 4; 402, which corresponds to stock code 1, may include document 1, document 2. 402, which corresponds to stock code 2, may include document 3, document 5. The 402 corresponding to the stock code 3 may include document 1, document 4, and document N, N being a natural number. Where 403 represents a negative component that is combined with stock code 4, and 402 represents that what does not correspond to stock code 4 may include document 2, document 3. 403 may directly employ NO + stock code 4, or 10+ stock code 4.

Candidate documents are documents obtained through crawling and crawling techniques, or other documents that contain query targets, such as news information, bulletin papers, data summaries, and the like.

And obtaining a recall feature vector after performing query intention recognition on a query word input by a user, wherein the recall feature vector can comprise a first feature. According to the first characteristic, a pre-established reverse index list is searched to obtain a target document.

The pre-established reverse index list comprises the following steps:

acquiring a candidate document;

performing word segmentation and keyword extraction processing on the title and the text of the candidate document to obtain at least one word segmentation and at least one keyword;

carrying out named entity recognition on the participles and the keywords to obtain at least one entity name;

and determining whether each entity name is represented by a first characteristic or a second characteristic.

Taking stock codes XXXXX.HK of company A as an example, company A is called ABCD, Chinese Pinyin of company A is called ABCD, stock name of company A is AB stock, English name of company A is TT, and company A is called AB.

Suppose that a user inputs a query word in a financial information interface as AB stock control, and after word segmentation and rewriting, the query word is expanded to { XXXXX.HK, ABCD, TT, AB. And identifying the query intention of each rewritten word, and marking the AB stock control by XXXXX.HK to be used as the first characteristic of the recall characteristic vector if the AB stock control input by the user is determined to be a stock entity.

Looking up at least one document associated with XXXXX.HK in the inverted index table based on XXXXX.HK as the target document.

According to the embodiment of the application, the incidence relation between the query words and the information for uniquely identifying the entity names is established in the indexing and recalling stages in the information searching process, so that the accuracy of recalling can be effectively improved.

In the prior art financial security scenario, if the english name of company a is entered, only the information result containing the english name may be recalled, and if only the chinese name or stock code of company a is referred to in the partial document, the partial document may be missed. Or after the existing expansion processing of the english name, splitting the english name in the word splitting process, and recalling according to the split word, wherein the recall result may not be related to the english name of company a input by the user at all, and belongs to an incorrect recall result. Sometimes, even when the related results cannot be searched when the query is carried out by using the pinyin abbreviation of company A, the related results can not be searched.

According to the embodiment of the application, the unified information for the unique identification entity name is used as the characteristic value, all relevant results of the information of the unique identification entity name can be efficiently searched, irrelevant news information can be avoided being recalled by mistake, the requirement of multi-dimensional input query of a user can be met, and the recall accuracy and efficiency are improved.

Aiming at the query scene that the query object of the user is a non-stock entity, the embodiment of the application also provides a search recall method to improve the recall accuracy.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a search recall method according to an embodiment of the present application. The method may be performed by a search recall apparatus.

Step 201, receiving an input query word;

step 202, performing query intention identification on the query word to obtain a recall feature vector, wherein the recall feature vector comprises a second feature. The second feature is represented by information for uniquely identifying an entity name in the query term and a negative component indicating that the information for uniquely identifying an entity name in the query term is false.

And step 203, recalling the target document related to the second characteristic from the candidate documents according to a pre-established inverted index list. The inverted index list comprises a corresponding relation between the first characteristic and at least one document identification, and also comprises a corresponding relation between the second characteristic and at least one document identification.

In the above steps, a query word input by a user is received, query intention recognition is performed on the query word to obtain one or more entity names, and for each entity name, the entity name is labeled to represent a stock entity or labeled to represent a non-stock entity. Non-stock entities refer to non-listed companies. For example, the query term entered by the user is a fruit, and the results recalled according to the prior art may include the fruit and the company named for the fruit. The fruit query method and the fruit query device can understand and identify the query intention of the fruit queried by the user through the query intention identification so as to determine whether the fruit intention or the fruit named company is expected to be queried by the user. If it is a fruit named company, the fruit is labeled with the stock code of the fruit company, and if it is the fruit itself, the stock code and negative component of the fruit company. For example, the user inputs: eating fruit, segmenting the fruit, and then carrying out named entity recognition to obtain the fruit with two meanings, wherein the first meaning is a company name, and the second meaning is the fruit per se. Recognizing that eating fruit is the second meaning in connection with contextual understanding, the recall feature vector includes stock codes and negative components of the fruit company, indicating that "fruit" occurring in eating fruit is not a stock code, etc., precisely defining the object that the user intends to find within the document area that is not a stock code.

And searching the document which is not related to the stock code corresponding to the fruit in the inverted index list by using the stock code and the negative component in the recall feature vector as a recall result.

In the embodiment of the application, the recall feature vector comprises the second feature, the document associated with the second feature is searched by pre-establishing the corresponding relation between the second feature and the document identifier, and the retrieval range of the document is narrowed within the document range associated with the second feature, so that the recall accuracy is effectively improved.

In order to better show the result based on the first characteristic recall, the application also provides a search recall method. Referring to fig. 4, fig. 4 is a flow chart illustrating a search recall method according to an embodiment of the present application. The method may be performed by a search recall apparatus.

Step 301, receiving an input query term;

step 302, performing query intention identification on the query word to obtain a recall feature vector. The recall feature vector includes a first feature that is represented by information that uniquely identifies an entity name in the query term;

step 303, searching a document related to the first feature in the candidate documents according to a pre-established inverted index list;

step 304, a first numerical value of the recalled feature vector including the first feature and a second numerical value of each document including the first feature are obtained, and each document is a document related to the first feature and found from the candidate documents.

And 305, recalling the document related to the first characteristic as the target document when the first value is smaller than or equal to the second value.

After obtaining the target document, the method may further include:

step 306, forming a recalled target document related to the first characteristic into a recalled document list;

step 307, acquiring a user query feature vector based on the recall feature vector;

step 308, extracting a document feature vector from the recalled document list;

step 309, inputting the user query feature vector, the document feature vector and the sorting feature into a pre-trained rearrangement model, and outputting a reordered target document, wherein the sorting feature is calculated according to the number of the stock codes contained in the query word and the number of the stock codes contained in the document to be selected.

In the above steps, a query term is received, and query intent recognition is performed on the query term to obtain a recall feature vector, which can be described with reference to fig. 2 and fig. 3.

Before obtaining the recall document list, the processor may further obtain a first numerical value of the first feature included in the recall feature vector, obtain a second numerical value of the first feature included in each document of the documents related to the first feature found from the candidate documents, and determine whether to recall the document related to the first feature based on a comparison result of the first numerical value and the second numerical value. And when the first value is smaller than or equal to the second value, recalling the document related to the first characteristic as the target document. For example, if the query word input by the user includes the stock code of company a and the english name of company B, the recalled document at least includes the stock code of company a and the stock code of company B. If a document includes only the stock codes of company a, such document will not be recalled.

In the above embodiment, when the first feature is a stock code, the user query feature vector is obtained based on the recall feature vector, and the user query feature vector is used to indicate the correlation between the user query words after the word segmentation processing. The recall feature vector may contain stock codes or stock codes and negative components that result in different ranges of recall results. For example, if the recall feature vector includes a stock code 1, the recall document list indexed by the stock code 1 at least includes a document 1, a document 2, and a document N. Based on the number of stock codes contained in the recall feature vector, for example, only stock code 1 is contained in the query term, and it is assumed that the recall document list contains document 1, document 2, and document N. The document 1 contains a stock code 1, the stock code 3, the document 2 contains a stock code 1, and the document N contains a stock code 1. At this time, the number of the stock codes included in the recall feature vector is 1, and the number of the stock codes included in each document may be that document 1 is 2, document 2 is 1, and document N is 1.

A document feature vector is then extracted based on the recalled document list, the document feature vector being used to represent a correlation between keywords associated with the stock codes and the document. After calculating the correlation, the ranking features are further calculated. The sorting feature is obtained by calculation according to the number of the stock codes contained in the query words and the number of the stock codes contained in the document to be selected. The sorting characteristic can be calculated according to the following formula:

wherein, StockSet_queryRepresenting the number of stock codes contained in the query word;

StockSet_docrepresenting the number of stock codes contained in the document;

a represents a positive number less than 1.

And calculating according to the formula to obtain a ranking characteristic, wherein the ranking characteristic can influence the ranking results of all the documents in the recalled document list and preferentially show the information results which are most relevant to the query words input by the user.

According to the method and the device, the sequencing result is influenced by introducing the sequencing characteristics in the sequencing stage, so that the sequencing result is optimized, and the accuracy of the display result is effectively improved.

Preferably, the embodiment of the present application may further store the inverted index list in the blockchain network. The inverted index list includes a corresponding relationship between the first feature and at least one document identifier, and may be stored in a certain file of the disk, so as to form an inverted file. In order to better share data and maintain data consistency, the reverse index list may be preferably stored in the blockchain network.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.

It should be noted that while the operations of the disclosed methods are depicted in the above-described figures in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

The above method steps may be executed by a device corresponding thereto, and referring to fig. 6, fig. 6 shows a schematic structural diagram of a search recall device 500 provided in an embodiment of the present application. The apparatus 500 comprises:

a receiving unit 501, configured to receive an input query term;

an identifying unit 502, configured to perform query intent identification on a query term to obtain a recall feature vector, where the recall feature vector includes a first feature, and the first feature is represented by information for uniquely identifying an entity name in the query term;

a recalling unit 503, configured to recall, from the candidate documents, the target document related to the first feature according to a pre-established inverted index list, where the inverted index list is established after performing named entity identification processing on the candidate documents in advance, and the inverted index list includes a correspondence between the first feature and at least one document identifier.

On the basis of the above embodiment, the inverted index list further includes a corresponding relationship between a second feature and at least one document identifier, where the second feature is represented by information for uniquely identifying an entity name in the query term and a negative component, and the negative component represents that the information for uniquely identifying the entity name in the query term is false, then the identifying unit 502 is further configured to perform query intention identification on the query term to obtain a recall feature vector, where the recall feature vector includes the second feature;

the recalling unit 503 is further configured to recall the target document related to the second feature from the candidate documents according to a pre-established inverted index list.

The recall unit 503 may further include:

the acquisition subunit is used for acquiring a first numerical value of the recall feature vector, wherein the first numerical value comprises a first feature; obtaining a second numerical value of each document containing the first characteristic, wherein each document is a document which is searched from candidate documents and is related to the first characteristic;

and the recalling subunit is used for recalling the document related to the first characteristic as the target document when the first numerical value is smaller than or equal to the second numerical value.

The identifying unit 502 may further include:

the word segmentation subunit is used for performing word segmentation processing on the query word to obtain at least one word segmentation;

a rewriting subunit, configured to perform rewriting processing on each participle;

and the first entity name identification subunit is used for carrying out named entity identification on the processed participles to obtain at least one entity name, and determining that each entity name is represented by a first characteristic or a second characteristic.

The apparatus 500 may further include an inverted index building unit 504 for building an inverted index list in advance, which may include:

a document acquisition subunit, configured to acquire a candidate document;

the first extraction subunit is used for performing word segmentation and keyword extraction processing on the title and the text of the candidate document to obtain at least one word segmentation and at least one keyword;

the second entity name identification subunit is used for carrying out named entity identification on the participles and the keywords to obtain at least one entity name; and determining whether each entity name is represented by a first characteristic or a second characteristic.

On the basis of the above embodiments, referring to fig. 7, fig. 7 shows an exemplary structural block diagram of a search recall apparatus 600 provided according to a further embodiment of the present application. The information for uniquely identifying the entity name in the query term is a stock code, and on the basis of the apparatus 500, the apparatus 600 further includes:

a list constructing unit 505 configured to construct a recalled target document related to the first feature into a recalled document list;

a first extraction unit 506 for extracting a user query feature vector based on the recall feature vector;

a second extracting unit 507, configured to extract a document feature vector from the recalled document list;

and the sorting unit 508 is configured to input the user query feature vector, the document feature vector, and the sorting feature into a pre-trained rearrangement model, and output a reordered target document, where the sorting feature is calculated according to the number of stock codes included in the query term and the number of stock codes included in the document to be selected.

It should be understood that the units or modules described in the apparatus 500-600 correspond to the various steps in the method described with reference to fig. 1-3. Thus, the operations and features described above with respect to the method are equally applicable to the apparatus 500-600 and the units included therein, and will not be described again here. The corresponding units in the apparatus 500-600 may cooperate with units in the electronic device to implement the solution of the embodiment of the present application.

The division into several modules or units mentioned in the above detailed description is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

For clarity of understanding of the present application, stock codes are used to uniquely identify entity names so that all consultations related to the stock codes can be searched. The method can be applied to the information search function of financial/security products, the search function of a news information reading platform related to financial/security/stock market and the document search scene related to listed companies. Referring to fig. 8, fig. 8 is a schematic diagram illustrating a complete flow of a search recall method according to an embodiment of the present application. The method may comprise three stages.

And an information indexing stage, wherein entity name recognition is carried out on each candidate document in the candidate document set through a stock named entity recognition algorithm from the candidate document set, and whether the entity names are uniformly marked by stock codes is recognized. The stock named entity recognition algorithm is used for recognizing the stock entities of listed companies in the document through a natural language processing algorithm, and the recognition dimensions of the stock named entity recognition algorithm can comprise stock codes, stock names, English names, Pinyin systems, company short names, company full names and the like. Further, other features, such as base features and ranking features, may also be extracted during the information indexing stage. Wherein the basic features are basic information for identifying the document, such as article title, article identification, article media source, article type, article publishing time, and the like. Ranking features such as Tittle2vec, article quality, etc. Wherein the ordering characteristic is used to influence the ordering of the final search result.

The stock entity name recognition algorithm can be realized by a deep learning algorithm, and the algorithm mainly comprises recognition processing and disambiguation processing. Wherein the identification process may discover potential stock entities by matching each document against pre-collected attribute text having a function of identifying stocks. The disambiguation processing is to classify potential stock entities through a plurality of classifier algorithms after word segmentation processing is carried out on the potential stock entities according to the context information. The classifier here may be, for example, a Multilayer Perceptron algorithm (MLP), an xgboost algorithm (xgboost, Extreme Gradient Boosting), a BERT algorithm, or the like. The classification results of the multiple classifiers are then voted to finally determine whether the entity name is a stock entity. For example, the first query term is eating fruit and the second query term is using fruit. Assuming that the fruit in the first query word is classified by three classifiers, the fruit candidate 'fruit' is judged to be non-stock; assuming the second query term, the judgment result after classification processing by the three classifiers is [ non-stock, stock ], and the candidate 'fruit' is the stock entity.

An association relationship is established between the document and the stock code processed by the stock named entity recognition algorithm, and an index database is constructed as shown in fig. 4. Can be realized by an ES (elastic search) index database, which is a distributed full-text retrieval framework that stores data in JSON (JavaScript Object Notification) format and adopts inverted index. By adopting the ES index database, the data searching speed can be greatly improved, and the processing time is saved. The method comprises the steps of performing word segmentation on a text, recording information such as words, word frequency and text identifications, and finding the text identifications based on contents (calculating scores according to word vectors, word frequency word vectors and the like) during searching.

And the information recalling stage is used for receiving the query words and processing the query words according to the same stock named entity identification algorithm as that in the information indexing stage so as to identify whether the query words contain stock entities. Other features, such as ranking features, may also be extracted during the consultation recall phase.

And in the consultation recall stage, searching a document related to the recall characteristic in an index database based on the recall characteristic obtained by the stock named entity recognition algorithm to obtain a recall list. Extracting article characteristics based on the recall list, extracting user query characteristics based on the recall characteristics, inputting the sequencing characteristics influencing sequencing in the characteristics into a rearrangement model, adjusting a rearrangement rule, and obtaining a result after sequencing the recall list according to the rearrangement rule and a pre-established rearrangement model. The rearrangement model may be implemented based on a machine Learning algorithm, such as a Learning ranking algorithm (LTR), a Gradient Boosting Tree (GBDT) algorithm, and the like.

Referring now to FIG. 9, FIG. 9 illustrates a block diagram of a computer system 800 suitable for use in implementing the computer devices of embodiments of the present application.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, the processes described above with reference to the flow diagrams of fig. 2-4 may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 801.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a receiving unit, an identifying unit, and a recalling unit. Where the names of such units or modules do not in some cases constitute a limitation of the unit or module itself, for example, a receiving unit may also be described as a "unit for receiving an input query word".

As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer-readable storage medium stores one or more programs which, when executed by one or more processors, perform the search recall method described herein.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the disclosure. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A search recall method comprising the steps of:

receiving an input query word;

performing query intention recognition on the query word to obtain a recall feature vector, wherein the recall feature vector comprises first features which are represented by information for uniquely identifying entity names in the query word;

and recalling a target document related to the first characteristic from candidate documents according to a pre-established inverted index list, wherein the inverted index list is established after the candidate documents are subjected to named entity recognition processing in advance, and the inverted index list comprises a corresponding relation between the first characteristic and at least one document identifier.

2. The search recall method of claim 1 wherein the inverted index list further comprises a correspondence between a second feature and at least one document identification, the second feature being represented by information uniquely identifying an entity name in the query term and a negative component representing that the information uniquely identifying an entity name in the query term is false, the method further comprising:

performing query intention recognition on the query word to obtain a recall feature vector, wherein the recall feature vector comprises the second feature;

and recalling the target document related to the second characteristic from the candidate documents according to a pre-established inverted index list.

3. The search recall method of claim 1 wherein after query intent recognition of the query term to obtain a recall feature vector, the method further comprises:

obtaining a first numerical value that the recalled feature vector includes the first feature;

obtaining a second numerical value of each document containing the first feature, wherein each document is a document which is searched from the candidate documents and is related to the first feature;

then the recalling the target document related to the first feature from the candidate documents according to the pre-established inverted index list further comprises the following steps:

and when the first numerical value is smaller than or equal to the second numerical value, recalling the document related to the first characteristic as the target document.

4. The search recall method according to claim 1 or 3 wherein the information for uniquely identifying the entity name in the query term is a stock code, and after recalling a target document related to the first feature from among candidate documents, the method further comprises:

constructing the target document into a recall document list;

extracting a user query feature vector based on the recall feature vector;

extracting a document feature vector from the recalled document list;

inputting the user query feature vector, the document feature vector and the sorting feature into a pre-trained rearrangement model, and outputting a reordered target document, wherein the sorting feature is obtained by calculating according to the number of stock codes contained in the query word and the number of stock codes contained in the document to be selected.

5. The search recall method of any one of claims 1-3 wherein the query intent recognition of the query term to obtain a recall feature vector comprises the steps of:

rewriting each word segmentation;

and carrying out named entity recognition on the above processed word segmentation to obtain at least one entity name, and determining that each entity name is represented by the first characteristic or the second characteristic.

6. The search recall method according to any one of claims 1 to 3 wherein the pre-established inverted index list comprises the steps of:

acquiring the candidate document;

carrying out named entity recognition on the word segmentation and the keyword to obtain at least one entity name;

and determining whether each of the entity names is represented by the first characteristic or the second characteristic.

7. The search recall method of claim 1 further comprising:

and storing the reverse index list to a block chain network.

8. A search recall apparatus, comprising:

the receiving unit is used for receiving input query words;

the identification unit is used for identifying the query word according to the query intention to obtain a recall feature vector, wherein the recall feature vector comprises a first feature, and the first feature is represented by information for uniquely identifying an entity name in the query word;

9. The search recall apparatus of claim 8 wherein the inverted index list further comprises a correspondence between a second feature and at least one document identification, the second feature being represented by information uniquely identifying an entity name in the query term and a negative component representing that the information uniquely identifying an entity name in the query term is false, the apparatus further comprising:

the identification unit is further used for identifying the query intention of the query word to obtain a recall feature vector, and the recall feature vector comprises the second feature;

and the recalling unit is also used for recalling the target document related to the second characteristic from the candidate documents according to a pre-established reverse index list.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.