CN112925939A

CN112925939A - Picture searching method, description information generating method, device and storage medium

Info

Publication number: CN112925939A
Application number: CN201911235986.XA
Authority: CN
Inventors: 陈帅均
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2021-06-08
Anticipated expiration: 2039-12-05
Also published as: CN112925939B

Abstract

Embodiments of the present application provide a picture search method, description information generation method, device, and storage medium, wherein the picture search method includes: acquiring search information; based on picture description information of at least one picture, in the at least one picture Finding a target picture matching the search information; wherein the picture description information is generated based on picture content data; and outputting the target picture. Accordingly, in the embodiment of the present application, in the case where the scene where the picture is located lacks information such as title and context, the picture description information can be analyzed from the content contained in the picture itself. This ensures the achievability of the image search operation in the above-mentioned specific scenario, and the image description information generated accordingly can more accurately characterize the image, thereby effectively improving the accuracy of the image search.

Description

Picture searching method, description information generating method, device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a picture search method, a description information generation method, a device, and a storage medium.

Background

The technology of searching pictures by using texts refers to a technology of returning pictures related to text keywords by using the text keywords as query conditions.

At present, the technical principle of searching pictures with texts is that a web crawler of a search engine crawls pictures on internet web pages to obtain information such as titles and contexts of the pictures, and then, picture search is performed based on input text keywords and information such as the titles and the contexts of the pictures.

However, in many scenes, such as a private cloud, a large number of pictures are often isolated and have no information such as titles and contexts, and in these scenes, picture search cannot be completed based on the existing technology for searching pictures by texts. This seriously affects the adaptability and/or accuracy of searching pictures with texts, and brings trouble to picture searching.

Disclosure of Invention

Aspects of the present disclosure provide an image searching method, a description information generating method, an apparatus, and a storage medium to improve applicability and/or accuracy of searching an image.

The embodiment of the application provides an image searching method, which comprises the following steps:

acquiring search information;

searching a target picture matched with the search information in at least one picture based on picture description information of the at least one picture;

wherein the picture description information is generated based on picture content data;

and outputting the target picture.

The embodiment of the application provides a resource searching method, which comprises the following steps:

acquiring search information;

searching a target resource matched with the search information in at least one resource based on the description information of the at least one resource;

wherein the description information is generated based on resource content data;

and outputting the target resource.

The embodiment of the present application further provides a method for generating picture description information, including:

acquiring a picture to be processed;

extracting picture content data from the picture to be processed;

and generating picture description information of the picture to be processed based on the picture content data.

The embodiment of the application also provides a computing device, which comprises a memory and a processor;

the memory is to store one or more computer instructions;

the processor is coupled with the memory for executing the one or more computer instructions for:

acquiring search information;

and outputting the target picture.

the memory is to store one or more computer instructions;

acquiring search information;

and outputting the target resource.

the memory is to store one or more computer instructions;

acquiring a picture to be processed;

extracting picture content data from the picture to be processed;

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the aforementioned picture search method or the generation method of picture description information.

In the embodiment of the application, the picture description information can be generated based on the picture content data of the picture, so that the picture description information can be analyzed from the content contained in the picture when the scene where the picture is located lacks information such as a title, a context and the like. The method ensures the realizability of picture searching operation in the specific scene, and the generated picture description information can more accurately represent the picture, thereby effectively improving the accuracy of picture searching.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart of an image searching method according to an embodiment of the present application;

FIG. 2 is a logic diagram of an image analysis model according to an embodiment of the present application;

fig. 3 is a schematic diagram of a proprietary cloud scenario provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of a method for generating picture description information according to another embodiment of the present application;

FIG. 5 is a schematic diagram of a computing device according to yet another embodiment of the present application;

FIG. 6 is a schematic block diagram of another computing device according to yet another embodiment of the present application;

fig. 7 is a flowchart illustrating a resource searching method according to another embodiment of the present application;

fig. 8 is a schematic structural diagram of another computing device according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, in many scenes, such as a private cloud, a large number of pictures are often isolated and have no information such as titles and contexts, and in these scenes, picture searching cannot be completed based on the existing picture searching technology. To ameliorate the problems of the prior art, some embodiments of the present application: the picture description information can be generated based on the picture content data of the picture, so that the picture description information can be analyzed from the content contained in the picture under the condition that the scene where the picture is located lacks information such as a title, a context and the like. The method ensures the realizability of picture searching operation in the specific scene, and the generated picture description information can more accurately represent the picture, thereby effectively improving the accuracy of picture searching.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of an image searching method according to an embodiment of the present application. The picture search method provided by the embodiment may be executed by a picture search engine, which may be implemented as software or implemented as a combination of software and hardware, and may be disposed in a computing device.

As shown in fig. 1, the method includes:

100. acquiring search information;

101. searching a target picture matched with the search information in the at least one picture based on the picture description information of the at least one picture; wherein the picture description information is generated based on the picture content data;

102. and outputting the target picture.

The picture searching method provided in this embodiment may be applied to various scenes for searching pictures, especially to scenes in which pictures are relatively isolated and lack information such as titles and contexts, and of course, in addition to such special scenes, this embodiment may also be applied to scenes having information such as titles and contexts, and this embodiment is not limited thereto.

In step 100, search information may be obtained, and a source of the search information may be an individual user, or may be other computing devices, which is not limited in this embodiment.

In this embodiment, at least one picture exists in the search range. The search range here may be all pictures on the private cloud, or one folder in the user terminal device, or of course, may be one directory in one server. This embodiment is not limited to this.

In step 101, picture description information of at least one picture may be obtained, and a target picture matching the search information may be found from the at least one picture.

Wherein the picture description information is generated based on the picture content data, the picture description information being associated with the picture. The picture description information may include a background, a color, a size, an association relationship between objects, and the like, which is not limited in this embodiment.

Therefore, when the picture is searched, the picture description information can be used as a search basis, and the target picture matched with the search information can be found.

In step 102, the target picture may be output as a search result. According to different sources of the search information, the target picture can be returned to different searchers, and in addition, the output form of the target picture is not limited in the embodiment.

In this case, the embodiment may provide a scheme for searching for pictures in text. In this embodiment, the search information may also be a picture, and in this case, this embodiment may provide a scheme for searching for pictures with pictures. Of course, in this embodiment, the search information may also take other implementation forms, and this embodiment is not limited to this.

In this embodiment, the picture description information may be generated based on the picture content data of the picture, and therefore, in a case where the scene where the picture is located lacks information such as a title and a context, the picture description information may be separated from the content included in the picture itself. The method ensures the realizability of picture searching operation in the specific scene, and the generated picture description information can more accurately represent the picture, thereby effectively improving the accuracy of picture searching. Of course, the embodiment is also applicable to the case where the scene of the picture has the picture description information such as the title and the context, so as to replace or supplement the existing picture description information, thereby improving the accuracy of searching the picture.

In the above or below embodiments, the picture description information may be generated in advance for at least one picture before step 100 is performed.

Since the process of generating the picture description information for at least one picture is similar, for convenience of description, the first picture will be taken as an example to describe the method for generating the picture description information, wherein the first picture may be any one of the at least one picture.

In this embodiment, the picture content data may be extracted from the first picture; and generating picture description information of the first picture based on the picture content data.

The picture content data is data that can reflect the content included in the picture. In this embodiment, the association relationship between the objects included in the picture may be used as the picture content data, but it should be noted that this embodiment is not limited thereto, and other contents included in the picture may also be used as the picture content data.

The following describes a scheme for determining picture content data, taking an example of an association relationship between objects included in a picture as picture content data.

In this embodiment, the object information included in the first picture may be detected; and analyzing the association relation among the objects contained in the first picture according to the object information contained in the first picture to serve as the picture content data of the first picture.

The object information includes a position of the object, an abstract feature of the object, an object identifier, and the like, where the object identifier may be a name identifier, a category identifier, and the like, and this embodiment is not limited. The object abstract features are features which are abstracted from a picture area where an object is located and can reflect the characteristics of the object.

In practical application, the object information in the first picture can be detected by using a target detection model. The first picture may be input into a target detection model, and the position of the object, the abstract feature of the object, and the identifier of the object included in the first picture may be detected using the target detection model.

In this case, the abstract features of the object are carried by the feature map of the object, but this embodiment is not limited thereto.

The target detection model can adopt a Faster R-CNN model, an SSD model, a YOLO model or a Retianet model and the like.

Taking a Faster R-CNN model as an example, after a first picture is input into the Faster R-CNN model, the first picture is processed by a pre-trained CNN model in the Faster R-CNN model to obtain a convolution characteristic graph; selecting bounding box regions for each pixel point; searching regions possibly containing object objects and corresponding positions of the regions in the first picture by using a region generation network (RPN) (region pro-position network) as the positions of the objects; classifying the objects in the regions to produce object labels; and (3) adopting Rol Pooling processing to the aforementioned regions possibly containing object objects and the convolution characteristic map obtained by the CNN model processing, and extracting the feature map of the relevant object as the abstract characteristic of the object. Accordingly, the object identification, the position of the object and the abstract feature of the object in the first picture can be detected.

The process of detecting object information by using other target detection models is not described herein again.

It should be noted that, in this embodiment, the detection method of the object information is not limited to this, and other methods may also be used to detect the object information in the first picture. In addition, the object information is not limited to the above-mentioned object identifier, the position of the object, and the abstract feature of the object, and the object information may also include other information.

In this embodiment, the association relationship between the objects may include the objects and the relationships between the objects, and the relationships between the objects may be geometric relationships, all-lattice relationships, or semantic relationships. In practical applications, the relationship between objects can be expressed as a triplet "object, relationship".

In practical application, a picture analysis model may be used to analyze the association relationship between the objects contained in the first picture.

In this embodiment, the object information included in the first picture may be input into the picture analysis model; in the picture analysis model, the association relation between the objects contained in the first picture is analyzed according to the object information contained in the first picture, and the association relation is output as the picture content data of the first picture. In the above, the object information may include the position of the object, the abstract feature of the object, the identifier of the object, and the like.

In this embodiment, the picture analysis model may be connected to the target detection model, the target detection model may input object information, such as an object identifier, a position of the object, and an abstract feature of the object, detected from the first picture, into the picture analysis model, and the picture analysis model may analyze an association relationship between the objects included in the first picture according to the obtained object information.

As mentioned above, the relationship between the objects outputted by the image analysis model can be expressed as "object, relationship" as a triplet. In addition, the object information output by the target detection model can also be expressed as a multi-element group of 'position of object, abstract feature of object, object identification'. For the first picture, the object information output by the target detection model will be represented as a plurality of multi-element groups of "position of object, abstract feature of object, object identification", and the association relationship between objects output by the picture analysis model will be represented as a plurality of triples of "object, relationship".

In this embodiment, the picture analysis model may adopt an encoder-decoder network structure, and fig. 2 is a logic schematic diagram of the picture analysis model provided in an embodiment of the present application, where the picture analysis model adopts the encoder-decoder network structure.

As shown in fig. 2, for the first picture, a plurality of tuples "position B of object, object abstract feature F, object identifier L" corresponding to the first picture may be used as input of the picture analysis model, and the encoder part may employ an RNN network, such as h in fig. 2₁And h₂Etc. represent the hidden layers of the RNN network. The encoder section may encode (typically a non-linear transformation) these inputs to produce an intermediate semantic representation c₁. The decoder part can also adopt RNN network to express c according to the intermediate semantic meaning₁And generating the association relationship between the objects to be generated at the current time by the association relationship between the objects which have been generated before, wherein h'₁And h'₂Etc. represent the hidden layers of the RNN network.

Accordingly, the picture analysis model may output at least one triplet "object O, relationship a" included in the first picture as the picture content data of the first picture.

In this embodiment, an Attention authorization mechanism may be further added to the picture analysis model, which is different from the logic diagram shown in fig. 2 in that the intermediate semantic representation is no longer fixed c₁But is replaced by c which varies continuously according to the output at different times_i. And each c_iThe method is generated based on attention weights output by a plurality of tuples of 'position of object, abstract feature of object, object identification' of input picture analysis model to different moments. Wherein, for the output at the time i, the attention weight value can be determined by the hidden layer node state h 'at the time immediately before the time i'_i-1Hidden layer node states h corresponding to each tuple in input respectively_jAnd comparing to determine the alignment degree as an attention weight. Where i represents the number of triplets output and j represents the number of tuples input.

Of course, the above specific processing procedures are only exemplary, and the present embodiment is not limited to these processing details, and each processing detail may be replaced by another processing manner, which is not limited in the present embodiment.

Accordingly, the association relationship between the objects included in the first picture can be obtained as picture content data.

On this basis, in the present embodiment, at least one piece of feature information may be extracted from the picture content data of the first picture as the picture description information of the first picture. As mentioned above, at least one feature information may be extracted from the association relationship between the objects included in the first picture as the picture description information of the first picture.

The feature information may be a keyword, or may be a picture region capable of reflecting an association relationship between objects included in the first picture, and of course, the feature information in this embodiment is not limited thereto.

For example, if the relationship between the objects included in the first picture is "person, horse, person riding horse", the keywords person, horse, person riding horse, riding, horse riding, etc. can be extracted as the feature information.

In the embodiment, a target detection model and a picture analysis model are matched with each other, object information is extracted from a picture by using the target detection model, and the object information in the picture is analyzed by using the picture analysis model to generate an association relation between objects contained in the picture as picture content data of the picture; based on this, feature information can be extracted from the picture content data as picture description information. This allows the picture description information to describe the picture more accurately, thereby improving the accuracy of the search.

In the above or below embodiments, in order to enable the analysis function of the picture analysis model, the picture analysis model may be trained in advance.

In the embodiment, a sample picture can be obtained, and the object identification, the object position and the association relation among the objects contained in the sample picture are marked;

detecting an object abstract feature contained in a sample picture by using a target detection model;

and taking the position of the object, the abstract characteristic of the object and the object identification of the sample picture as object information contained in the sample picture, inputting the object information into the picture analysis model, and training the picture analysis model by taking the incidence relation among the marked objects in the sample picture as a basis.

In this embodiment, the source of the sample picture is not limited. For example, a sample picture may employ a Visual Genome dataset that has a very dense and complete description of the picture, provides detailed labels for the interactions and attributes of objects, and enforces the Visual concepts to a semantic level. The labeling information of the data set includes object labels, object positions, and association relationships between objects on the picture, and therefore, in this embodiment, the picture and the labeling information in the data set can be used to train the picture analysis model.

In this embodiment, the target detection model may be used to detect the abstract characteristics of the object included in the sample picture, wherein the processing procedure of the target detection model may refer to the description in the foregoing, and is not described herein again.

Based on the position, the abstract characteristics and the identification of the object of the sample picture can be input into the picture analysis model as the object information contained in the sample picture, and the picture analysis model inputs the association relationship among the objects contained in the sample picture. In this embodiment, the picture analysis model may be corrected based on an error between an association relationship between objects labeled in the sample picture and a result output by the picture analysis model. And the image analysis model learns the capability of analyzing the association relation among the objects contained in the image according to the object information such as the position, the abstract feature and the identification of the object of the image.

Accordingly, the training of the picture analysis model can be realized.

In the embodiment, a picture analysis model with high accuracy can be trained by using a large number of sample pictures, so that the accuracy of picture description information generated for at least one picture in a search range by using the picture analysis model is ensured, and the search accuracy is further improved.

In the above or below embodiments, the target picture matching the search information may be found in the at least one picture based on the feature information included in the picture description information of the at least one picture.

As mentioned above, the feature information may include a keyword or a picture region, and the search information may include a keyword or a picture. The following description will be made taking the feature information as a keyword and the search information as a keyword as an example.

In this embodiment, at least one keyword may be extracted from the picture content data of at least one picture, respectively, as the picture description information of at least one picture.

In practical application, word segmentation technology can be adopted to extract keywords from the picture content data. Accordingly, each picture corresponds to at least one keyword.

In this embodiment, the matching degree between the search keyword and at least one keyword corresponding to each picture may be calculated, and the picture including the keyword whose matching degree meets the preset requirement is taken as the target picture.

In practical application, a keyword index can be established for at least one keyword contained in each of the at least one picture, and each keyword in the keyword index is completely different.

For example, for a certain keyword, if the picture includes the keyword, a 1 may be labeled under the keyword in the matrix, and if the picture does not include the keyword, a 0 may be labeled under the keyword in the matrix. Of course, this is merely exemplary, and the present embodiment is not limited thereto.

Based on the keyword index, in this embodiment, a matching degree between the search keyword and at least one keyword in the keyword index may be calculated; and determining the picture under the keyword with the matching degree meeting the preset requirement as the target picture.

Therefore, the calculation of the matching degree of the search keyword and the repeated keywords contained in different pictures can be omitted, so that the calculation amount is greatly reduced, and the search efficiency is improved.

The two implementations can respectively convert at least one keyword in the search keyword and the keyword index into a first word vector and a second word vector; and calculating the similarity between the first word vector and the second word vector as the matching degree between the search keyword and at least one keyword in the keyword index.

The Word2Vec Word vector technology, the knowledge graph, the synonym rewrite, or other query similarity calculation technologies may be used to calculate the matching degree between the search keyword and at least one keyword in the keyword index, which is not limited in this embodiment.

In an implementation manner, the target picture may be determined in a manner of accurate matching of keywords, that is, the picture is determined to be the target picture under the condition that the search keywords are completely matched with the keywords included in the picture.

In another implementation manner, a matching degree threshold may be set, and when the matching degree of a search keyword and a certain keyword is greater than the matching degree threshold, a picture under the keyword may be determined as a target picture.

For example, when the search keyword is "car", if the matching degree between "car" and "car" is greater than the aforementioned threshold value of the matching degree, the picture including "car" will be determined as the target picture.

For the case that the feature information is a picture region and the search information is a picture, the matching degree can be determined by comparing the two types of images, and details are not described herein.

Fig. 3 is a schematic diagram of a proprietary cloud scenario according to an embodiment of the present application. The image searching method of the present application will be described in detail below with reference to the professional cloud scenario shown in fig. 3.

As shown in fig. 3, the picture searching process can be divided into two stages, namely an information preparation stage and an actual searching stage.

In the information preparation stage, the object information of at least one picture on the special cloud can be respectively extracted by using the target detection model, and the extracted object information is input into the picture analysis model. In the picture analysis model, the association relationship between objects included in at least one picture can be analyzed as picture content data based on the object information of the at least one picture. Subsequently, at least one keyword can be extracted from the picture content data of at least one picture respectively as the picture description information of each of the at least one picture. For example, the picture description information of a certain picture may include keywords, such as people, horse, people riding, and the like.

On the proprietary cloud, at least one picture and picture description information thereof can be stored in an associated manner.

In an actual search stage, a user may input a search keyword "horse riding" at a search terminal, the search terminal may generate a search request based on the search keyword and send the search request to the picture search engine in this embodiment, and based on triggering of the search keyword, in this embodiment, the picture search engine may obtain picture description information of at least one picture on a proprietary cloud, and may find a target picture matching the search keyword "horse riding" from the at least one picture by calculating a matching degree between the search keyword "horse riding" and the at least one keyword included in the picture description information of the at least one picture.

Fig. 4 is a schematic flowchart of a method for generating picture description information according to another embodiment of the present application.

As shown in fig. 4, the method includes:

400. acquiring a picture to be processed;

401. extracting picture content data from a picture to be processed;

402. and generating picture description information of the picture to be processed based on the picture content data.

The method for generating picture description information provided in this embodiment may be applied to various scenes in which picture description information needs to be configured for a picture, for example, in the foregoing picture search scene, and of course, other scenes may also be used, which is not limited in this embodiment.

In this embodiment, the to-be-processed picture refers to a picture that needs to be configured with picture description information, and the to-be-processed picture may be a picture on the aforementioned private cloud, or may also be a picture in another scene, which is not limited in this embodiment.

In this embodiment, the picture content data may be extracted from the picture to be processed, where the picture content data refers to data that can reflect the content included in the picture.

Based on the picture content data of the picture to be processed, picture description information of the picture to be processed can be generated. Wherein the picture description information is associated with the picture. The picture description information may include a background, a color, a size, an association relationship between objects, and the like, which is not limited in this embodiment.

In practical application, the object information in the picture to be processed can be detected; and analyzing the association relation among the objects contained in the picture to be processed according to the object information in the picture to be processed, and using the association relation as the picture content data of the picture to be processed. The object information may include a position of the object, an abstract feature of the object, an identifier of the object, and the like, which is not limited in this embodiment.

The extraction process of the object information and the analysis process of the association relationship between the objects may refer to the related description in the foregoing, and are not described herein again.

Based on the picture content data analyzed from the picture to be processed, in this embodiment, at least one keyword may be extracted from the picture content data of the picture to be processed, and the keyword is used as the picture description information of the picture to be processed.

As mentioned above, in this embodiment, at least one keyword can be extracted from the association relationship between the objects included in the to-be-processed picture as the picture description information of the to-be-processed picture.

In this embodiment, the picture description information is generated based on the picture content data of the picture to be processed, so that the picture to be processed can be more accurately described. The picture description information generated for the picture to be processed may be used in the picture search process in the foregoing, and of course, may also be used in other processing processes, which is not limited in this embodiment.

It should be noted that, for reducing the space, the technical details of the process of generating the picture description information in the present embodiment are not detailed, and reference may be made to the related description of the process of generating the picture description information in the foregoing. But this should not be done to loss of scope of the present application.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 100 to 101 may be device a; for another example, the execution subject of

steps

101 and 102 may be device a, and the execution subject of step 100 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 100, 101, etc., are merely used for distinguishing different operations, and the sequence numbers do not represent any execution order per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 5 is a schematic structural diagram of a computing device according to another embodiment of the present application. As shown in fig. 5, the computing device includes: a memory 50 and a processor 51.

A processor 51 coupled to the memory 50 for executing computer programs in the memory for:

acquiring a search keyword;

searching a target picture matched with the search keyword in the at least one picture based on the picture description information of the at least one picture;

wherein the picture description information is generated based on the picture content data;

and outputting the target picture.

In an optional embodiment, the processor 51, before searching for a target picture matching the search keyword in the at least one picture based on the picture description information of the at least one picture, is further configured to:

extracting picture content data from a first picture aiming at the first picture;

generating picture description information of the first picture based on the picture content data;

the first picture is any one of the at least one picture.

In an alternative embodiment, the processor 51, when extracting picture content data from the first picture, is configured to:

detecting object information contained in the first picture, wherein the object information comprises the position of an object, an abstract feature of the object and an identifier of the object;

and analyzing the association relation among the objects contained in the first picture according to the positions of the objects, the abstract characteristics of the objects and the object identifications to serve as picture content data of the first picture.

In an alternative embodiment, the processor 51, when detecting the object information contained in the first picture, is configured to:

inputting the first picture into a target detection model;

and detecting the position, the abstract characteristics and the object identification of the object contained in the first picture by using the target detection model as the object information of the first picture.

In an alternative embodiment, the target detection model employs a Faster R-CNN model, an SSD model, a YOLO model, or a Retianet model.

In an optional embodiment, the processor 51, when analyzing the association relationship between the objects contained in the first picture according to the positions of the objects, the abstract features of the objects, and the identifiers of the objects, is configured to:

inputting the position of an object, the abstract characteristics of the object and the identification of the object contained in the first picture into a picture analysis model;

in the picture analysis model, the association relation among the objects contained in the first picture is analyzed according to the positions of the objects contained in the first picture, the abstract features of the objects and the object identifications, and the association relation is output as picture content data of the first picture.

In an alternative embodiment, the processor 51 is further configured to, before inputting the position of the object, the object abstract feature and the object identifier included in the first picture into the picture analysis model:

obtaining a sample picture, and marking an object identifier, an object position and an association relation among objects contained in the sample picture;

In an alternative embodiment, the picture analysis model includes an attention mechanism.

In an alternative embodiment, the processor 51, when generating the picture description information of the first picture based on the picture content data, is configured to:

at least one keyword is extracted from the incidence relation between the objects contained in the first picture and is used as picture description information of the first picture.

In an alternative embodiment, the processor 51 is further configured to:

establishing a keyword index for at least one keyword contained in at least one picture, wherein the keywords in the keyword index are completely different;

based on the picture description information of at least one picture, searching a target picture matched with the search keyword in the at least one picture, wherein the searching comprises the following steps:

calculating the matching degree between the search keyword and at least one keyword in the keyword index;

and determining the picture under the keyword with the matching degree meeting the preset requirement as the target picture.

In an alternative embodiment, the processor 51, when calculating the degree of match between the search keyword and at least one keyword in the keyword index, is configured to:

converting at least one keyword in the search keyword and the keyword index into a first word vector and a second word vector respectively;

and calculating the similarity between the first word vector and the second word vector as the matching degree between the search keyword and at least one keyword in the keyword index.

Further, as shown in fig. 5, the computing device further includes: communication components 52, power components 53, and the like. Only some of the components are schematically shown in fig. 5, and the computing device is not meant to include only the components shown in fig. 5.

It should be noted that, for the technical details in the embodiments related to the computing device, reference may be made to the technical details described in the embodiments related to the aforementioned image searching method, and for the sake of brevity, no further description is provided herein, but this should not cause a loss of the scope of the present application.

Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the steps that can be executed by a computing device in the foregoing method embodiments when executed.

Fig. 6 is a schematic structural diagram of another computing device according to yet another embodiment of the present application. As shown in fig. 6, the computing device includes: a memory 60 and a processor 61.

A processor 61, coupled to the memory 60, for executing computer programs in the memory for:

acquiring a picture to be processed;

extracting picture content data from a picture to be processed;

In an alternative embodiment, the processor 61, when extracting the picture content data from the picture to be processed, is configured to:

In an optional embodiment, the processor 61, when generating the picture description information of the picture to be processed based on the picture content data, is configured to:

and extracting at least one keyword from the incidence relation among the objects contained in the picture to be processed to be used as the picture description information of the picture to be processed.

Further, as shown in fig. 6, the computing device further includes: communication components 62, power components 63, and the like. Only some of the components are schematically shown in fig. 6, and the computing device is not meant to include only the components shown in fig. 6.

It should be noted that, for the technical details in the embodiments of the computing device, reference may be made to the technical details described in the related embodiments of the aforementioned method for generating picture description information, which are not repeated herein for brevity, but this should not cause a loss of the scope of the present application.

Fig. 7 is a flowchart illustrating a resource searching method according to another embodiment of the present application. As shown in fig. 7, the method includes:

700. acquiring search information;

701. searching a target resource matched with the search information in the at least one resource based on the description information of the at least one resource;

wherein the description information is generated based on the resource content data;

703. and outputting the target resource.

The resource searching method provided in this embodiment may be applied to various scenes for searching resources, especially to scenes where resources are relatively isolated and information such as titles and contexts is lacking.

In this embodiment, the resource may be a picture, a document, a video, or the like, and the storage location of the resource may be a private cloud, a user personal computer, or the like, which is not limited in this embodiment.

In an optional embodiment, before the step of searching for the target resource matching the search information in the at least one resource based on the description information of the at least one resource, the step further includes:

for a first resource, extracting resource content data from the first resource;

generating description information of the first resource based on the resource content data;

the first resource is any one of at least one resource.

In an alternative embodiment, the step of extracting resource content data from the first resource comprises:

detecting object information contained in the first resource, wherein the object information comprises the position of an object, an object abstract characteristic and an object identifier;

and analyzing the incidence relation among the objects contained in the first resource according to the position of the object, the abstract characteristics of the object and the object identification to be used as the resource content data of the first resource.

In practical applications, for resources of picture and video types, the object may be a person in the resource. While for a document class of resources, an object may be a word, formula, etc. in a document. Different types of objects may be of interest for different resources. This embodiment is not limited to this.

In an optional embodiment, the step of detecting object information contained in the first resource includes:

inputting a first resource into a target detection model;

and detecting the position, the object abstract characteristics and the object identification of the object contained in the first resource as the object information of the first resource by using the target detection model.

In an alternative embodiment, the step target detection model adopts a Faster R-CNN model, an SSD model, a YOLO model or a Retianet model.

In an optional embodiment, the step of analyzing an association relationship between objects included in the first resource according to the position of the object, the abstract feature of the object, and the object identifier includes:

inputting the position of an object contained in the first resource, the abstract characteristics of the object and the object identification into a resource analysis model;

in the resource analysis model, the incidence relation among the objects contained in the first resource is analyzed according to the position of the object contained in the first resource, the abstract feature of the object and the object identification, and the incidence relation is output as the resource content data of the first resource.

In an optional embodiment, before the step of inputting the position of the object, the object abstract feature and the object identifier included in the first resource into the resource analysis model, the method further includes:

obtaining sample resources, and marking object identifiers, article positions and incidence relations among objects contained in the sample resources;

detecting an object abstract feature contained in a sample resource by using a target detection model;

and taking the position of the object of the sample resource, the abstract characteristic of the object and the object identification as object information contained in the sample resource to input into the resource analysis model, and training the resource analysis model by taking the incidence relation among the marked objects in the sample resource as a basis.

In an alternative embodiment, the resource analysis model includes an attention mechanism.

In an alternative embodiment, the step of generating description information of the first resource based on the resource content data includes:

at least one piece of feature information is extracted from the association relation between the objects contained in the first resource and is used as description information of the first resource.

The implementation form of the feature information may be different for different resources, and for resources such as pictures or videos, the feature information may be implemented as a keyword or a picture region reflecting an association relationship of an object, and for resources such as documents, the feature information may also be a keyword.

In an optional embodiment, the method further comprises:

establishing an information index for at least one characteristic information contained in at least one resource, wherein the characteristic information in the information index is completely different;

the step of searching a target resource matched with the search information in at least one resource based on the description information of at least one resource comprises the following steps:

calculating the matching degree between the search information and at least one piece of characteristic information in the information index;

and determining the resource under the characteristic information with the matching degree meeting the preset requirement as the target resource.

In an optional embodiment, the step of calculating a matching degree between the search information and at least one feature information in the information index includes:

converting at least one characteristic information in the search information and the information index into a first vector and a second vector respectively;

a similarity between the first vector and the second vector is calculated as a matching degree between the search information and at least one feature information in the information index.

In an alternative embodiment, the feature information includes keywords and the search information includes search keywords.

In an optional embodiment, the at least one resource is stored on a proprietary cloud.

It should be noted that the resource is a generic concept of a picture, and based on this, the technical details in the embodiments related to the resource search method may refer to the technical details described in the related embodiments of the picture search method, and for the sake of brevity, no further description is provided here, but this should not cause a loss of the scope of the present application.

Fig. 8 is a schematic structural diagram of another computing device according to another embodiment of the present application. As shown in fig. 8, the computing device includes: a memory 80 and a processor 81.

A processor 81, coupled to the memory 80, for executing computer programs in the memory for:

acquiring search information;

searching a target resource matched with the search information in the at least one resource based on the description information of the at least one resource;

and outputting the target resource.

The computing device provided in this embodiment may be applied to various scenes of searching for resources, especially to scenes where resources are relatively isolated and information such as titles and contexts is lacking, and of course, in addition to such special scenes, this embodiment may also be applied to scenes with information such as titles and contexts, and this embodiment is not limited to this.

In an optional embodiment, the processor 81, before searching for a target resource matching the search information in the at least one resource based on the description information of the at least one resource, is further configured to:

for a first resource, extracting resource content data from the first resource;

the first resource is any one of at least one resource.

In an alternative embodiment, the processor 81, when extracting resource content data from a first resource, is configured to:

In an alternative embodiment, the processor 81, when detecting the object information contained in the first resource, is configured to:

inputting a first resource into a target detection model;

In an alternative embodiment, the processor 81 employs a Faster R-CNN model, an SSD model, a YOLO model, or a Retianet model in the target detection model.

In an optional embodiment, the processor 81, when analyzing the association relationship between the objects included in the first resource according to the position of the object, the abstract feature of the object, and the object identifier, is configured to:

In an alternative embodiment, the processor 81 is further configured to, before inputting the position of the object, the object abstract feature and the object identifier included in the first resource into the resource analysis model:

In an alternative embodiment, the processor 81, when generating the description information of the first resource based on the resource content data, is configured to:

In an alternative embodiment, processor 81 is further configured to:

the processor 81, when searching for a target resource matching the search information in the at least one resource based on the description information of the at least one resource, is configured to:

In an alternative embodiment, the processor 81, when calculating the degree of matching between the search information and the at least one feature information in the information index, is configured to:

Further, as shown in fig. 8, the computing device further includes: communication components 82, power components 83, and the like. Only some of the components are schematically shown in fig. 8, and the computing device is not meant to include only the components shown in fig. 8.

It should be noted that, for the technical details in the embodiments related to the computing device, reference may be made to the technical details described in the embodiments related to the resource search method, which are not described herein for brevity, but this should not cause a loss of the scope of the present application.

The memories of fig. 5, 6 and 8, among other things, are used to store computer programs and may be configured to store various other data to support the operations on the computing device. Examples of such data include instructions for any application or method operating on the computing device, contact data, phonebook data, messages, pictures, videos, and so forth. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Wherein the communication components of fig. 5, 6 and 8 are configured to facilitate wired or wireless communication between the device in which the communication components are located and other devices. The device in which the communication component is located may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component may be implemented based on Near Field Communication (NFC) technology, Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies to facilitate short-range communications.

The power supply components of figures 5, 6 and 8, among others, provide power to the various components of the device in which the power supply components are located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An image searching method, comprising:

acquiring search information;

and outputting the target picture.

2. The method according to claim 1, wherein before searching for the target picture matching the search information in the at least one picture based on the picture description information of the at least one picture, the method further comprises:

for a first picture, extracting picture content data from the first picture;

the first picture is any one of the at least one picture.

3. The method of claim 2, wherein the extracting picture content data from the first picture comprises:

detecting object information contained in the first picture, wherein the object information comprises the position of an object, an object abstract feature and an object identifier;

4. The method according to claim 3, wherein the detecting the object information contained in the first picture comprises:

inputting the first picture into a target detection model;

5. The method of claim 4, wherein the target detection model is fast R-CNN model, SSD model, YOLO model, or Retianet model.

6. The method according to claim 3, wherein analyzing the association relationship between the objects included in the first picture according to the positions of the objects, the abstract features of the objects, and the identifiers of the objects comprises:

and analyzing the association relation among the objects contained in the first picture according to the position of the object contained in the first picture, the abstract feature of the object and the identification of the object in the picture analysis model, and outputting the association relation as picture content data of the first picture.

7. The method according to claim 6, wherein before inputting the position of the object, the object abstract feature and the object identifier included in the first picture into the picture analysis model, the method further comprises:

detecting an object abstract feature contained in the sample picture by using the target detection model;

and inputting the position of the object, the abstract characteristic of the object and the identification of the object of the sample picture into the picture analysis model as object information contained in the sample picture, and training the picture analysis model by taking the incidence relation among the marked objects in the sample picture as a basis.

8. The method of claim 6 or 7, wherein the picture analysis model includes an attention mechanism.

9. The method of claim 3, wherein generating the picture description information for the first picture based on the picture content data comprises:

and extracting at least one piece of characteristic information from the association relation between the objects contained in the first picture, wherein the characteristic information is used as picture description information of the first picture.

10. The method of claim 9, further comprising:

establishing an information index for at least one piece of characteristic information contained in the at least one picture, wherein the characteristic information in the information index is completely different;

the searching a target picture matched with the search information in the at least one picture based on the picture description information of the at least one picture comprises the following steps:

and determining the picture under the characteristic information with the matching degree meeting the preset requirement as the target picture.

11. The method of claim 10, wherein the calculating the degree of match between the search information and the at least one feature information in the information index comprises:

converting at least one feature information in the search information and the information index into a first vector and a second vector respectively;

and calculating the similarity between the first vector and the second vector as the matching degree between the search information and at least one piece of characteristic information in the information index.

12. The method of claim 9, wherein the feature information comprises keywords and the search information comprises search keywords.

13. The method of claim 1, wherein the at least one picture is stored on a proprietary cloud.

14. A method for resource search, comprising:

acquiring search information;

and outputting the target resource.

15. A picture description information generation method is characterized by comprising the following steps:

acquiring a picture to be processed;

extracting picture content data from the picture to be processed;

16. The method according to claim 15, wherein said extracting picture content data from said picture to be processed comprises:

detecting object information contained in the picture to be processed, wherein the object information comprises the position of an object, abstract characteristics of the object and an object identifier;

and analyzing the association relation among the objects contained in the picture to be processed according to the positions of the objects, the abstract characteristics of the objects and the object identifications to serve as picture content data of the picture to be processed.

17. The method according to claim 16, wherein the generating picture description information of the picture to be processed based on the picture content data comprises:

18. A computing device comprising a memory and a processor;

the memory is to store one or more computer instructions;

acquiring search information;

and outputting the target picture.

19. A computing device comprising a memory and a processor;

the memory is to store one or more computer instructions;

acquiring search information;

and outputting the target resource.

20. A computing device comprising a memory and a processor;

the memory is to store one or more computer instructions;

acquiring a picture to be processed;

extracting picture content data from the picture to be processed;

21. A computer-readable storage medium storing computer instructions, which when executed by one or more processors, cause the one or more processors to perform the picture searching method of any one of claims 1-13, the resource searching method of claim 14, or the picture description information generating method of any one of claims 15-17.