[go: up one dir, main page]

CN116304156A - Picture retrieval method, device, electronic equipment and storage medium - Google Patents

Picture retrieval method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116304156A
CN116304156A CN202310093900.4A CN202310093900A CN116304156A CN 116304156 A CN116304156 A CN 116304156A CN 202310093900 A CN202310093900 A CN 202310093900A CN 116304156 A CN116304156 A CN 116304156A
Authority
CN
China
Prior art keywords
picture
search
pictures
target
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310093900.4A
Other languages
Chinese (zh)
Inventor
徐程
朱雪兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boc Financial Technology Co ltd
Original Assignee
Boc Financial Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boc Financial Technology Co ltd filed Critical Boc Financial Technology Co ltd
Priority to CN202310093900.4A priority Critical patent/CN116304156A/en
Publication of CN116304156A publication Critical patent/CN116304156A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of computers, and provides a picture retrieval method, a picture retrieval device, electronic equipment and a storage medium, wherein the method comprises the following steps: receiving a search keyword; matching the search keywords with keywords corresponding to each picture, and determining the picture global names of the target pictures corresponding to the search keywords and the text content identification results of the target pictures; acquiring the target picture and basic information of the target picture based on the picture global name; and displaying the target picture, and the character content identification result and the basic information of the target picture. The method and the device provided by the invention realize the search of the chat pictures according to the text recognition content in the pictures, and improve the information search efficiency.

Description

Picture retrieval method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for retrieving a picture, an electronic device, and a storage medium.
Background
With the development of internet office technology, a great deal of information exchange is completed through instant chat software. During chat, the communication content may have a large number of shots that are not friendly to the retrieval of chat records.
For example, some text information appears in the form of pictures in chat records. If the text information needs to be retrieved, the picture needs to be retrieved, and the current instant chat software does not support the picture retrieval function. Therefore, the user can only rely on memory to indirectly search and locate according to time or related text chat records, and the information retrieval efficiency is low.
Disclosure of Invention
The invention provides a picture retrieval method, a picture retrieval device, electronic equipment and a storage medium, which are used for solving the technical problems that the picture retrieval cannot be carried out according to pictures and the information retrieval efficiency is low.
The invention provides a picture retrieval method, which comprises the following steps:
receiving a search keyword;
matching the search keywords with keywords corresponding to each picture, and determining the picture global names of the target pictures corresponding to the search keywords and the text content identification results of the target pictures;
acquiring the target picture and basic information of the target picture based on the picture global name;
and displaying the target picture, and the character content identification result and the basic information of the target picture.
In some embodiments, the method further comprises:
storing pictures in chat records of a plurality of chat software to a picture library, and determining picture global names of all pictures in the picture library;
generating basic information of each picture based on the original name, the picture global name, the picture creation time, the chat software, the source storage catalog and the picture storage catalog of each picture;
performing character recognition on each picture to obtain a character content recognition result of each picture;
word segmentation is carried out on the character content recognition results of the pictures to obtain keywords corresponding to the pictures;
and generating inverted indexes of the pictures based on the picture global names, the text content recognition results and the keywords of the pictures.
In some embodiments, the storing the pictures in the chat records of the plurality of chat software in a picture library includes:
based on the current image acquisition time and the last image acquisition time;
and traversing chat record storage files of all chat software, and copying pictures with picture creation time smaller than or equal to the current picture acquisition time and larger than the last picture acquisition time in the chat record to the picture library.
In some embodiments, the matching the search keyword with the keywords corresponding to each picture, determining the global picture name of the target picture corresponding to the search keyword, and the text content recognition result of the target picture, includes:
matching the search keywords with each keyword in the inverted index of each picture;
under the condition that the search keyword is matched with any keyword in the inverted index of any picture, determining the any picture as a target picture corresponding to the search keyword;
and determining the picture global name and the text content identification result of the target picture based on the inverted index of any picture.
In some embodiments, before the matching the search keyword with the keyword corresponding to each picture, the method includes:
matching the search keywords with the historical search keywords in each search history record;
and under the condition that the search keywords are matched with the historical search keywords in any search history record, determining the target picture and the text content identification result and basic information of the target picture based on any search history record.
In some embodiments, the retrieval history is updated based on the steps of:
generating a current search history record based on the current search keyword, the current searched target picture, and the text content identification result and basic information of the target picture;
traversing each retrieval history;
and deleting any search history record under the condition that the search date and/or the repeated search times of any search history record meet the preset conditions.
The invention provides a picture retrieval device, comprising:
the receiving module is used for receiving the search keywords;
the search module is used for matching the search keywords with the keywords corresponding to each picture and determining the picture global names of the target pictures corresponding to the search keywords and the text content identification results of the target pictures;
the acquisition module is used for acquiring the picture global name, the target picture and basic information of the target picture;
and the display module is used for displaying the target picture, and the character content identification result and the basic information of the target picture.
In some embodiments, the apparatus further comprises:
the picture library is used for storing pictures in chat records of a plurality of chat software;
and the storage library is used for storing the text content identification result, the basic information and the inverted index of each picture.
The invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the picture retrieval method when executing the program.
The present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a picture retrieval method as described.
The picture retrieval method, the picture retrieval device, the electronic equipment and the storage medium provided by the invention are characterized in that the retrieval keywords are matched with the keywords corresponding to each picture, and the picture global names of the target pictures corresponding to the retrieval keywords and the text content recognition results of the target pictures are determined; acquiring a target picture and basic information of the target picture according to the picture global name; displaying a target picture, and a character content identification result and basic information of the target picture; through the text recognition of the picture and the keyword retrieval, the text processing is carried out on the picture content, so that the chat picture is retrieved according to the text recognition content in the picture, the user does not need to search and position the picture in the computer by means of memory, the information retrieval efficiency is improved, and the management level of the picture in the computer is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a picture searching method according to the present invention;
fig. 2 is a schematic structural diagram of a picture retrieval device provided by the present invention;
FIG. 3 is a second flowchart of a picture searching method according to the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of a picture retrieval method provided by the present invention, as shown in fig. 1, the method includes steps 110, 120, 130 and 140. The method flow steps are only one possible implementation of the invention.
Step 110, receiving a search keyword.
Specifically, the execution subject of the picture retrieval method provided by the embodiment of the invention is a picture retrieval device. The picture retrieval means may be implemented by software, for example a picture retrieval program running in a computer; the device may also be a device for performing the picture retrieval method, such as a mobile terminal, a tablet computer, a desktop computer, or a server.
The application scene of the picture searching method provided by the embodiment of the invention is that a user uses a plurality of chat software to exchange information, and certain text information appears in a chat record in a picture form. The user needs to search for the information in the form of pictures according to the search keywords.
The search keyword is a word or a combination of words input by the user for searching information. The user can input the search keywords in the picture search device through the modes of physical key input, touch operation input, voice input and the like.
And 120, matching the search keywords with the keywords corresponding to each picture, and determining the picture global names of the target pictures corresponding to the search keywords and the text content recognition results of the target pictures.
Specifically, after receiving the search keyword, the search keyword may be matched with the keyword corresponding to each picture. The matching method can adopt a semantic similarity matching method. Semantic similarity between the search keywords and the keywords corresponding to each picture can be calculated. The higher the semantic similarity, the higher the matching degree between the search keywords and the keywords corresponding to each picture. The preset semantic similarity threshold value can be set, and if the semantic similarity between the keyword corresponding to any picture and the search keyword is greater than the preset semantic similarity threshold value, the keyword corresponding to the picture can be considered to be matched with the search keyword, and the picture can also be considered to be a target picture corresponding to the search keyword.
The naming of pictures by the chat software may have a conflict, i.e. the same naming convention is adopted, resulting in different pictures being given the same name. In addition, due to the personal habits of the user, there may also be a case where the same name is used for different pictures. In order to ensure that each picture has a unique identifier in the computer system, a picture global name may be added to each picture. The picture global names are used for guaranteeing global uniqueness of the picture names.
The text content recognition result is a recognition result of the text content in the picture. For example, OCR character recognition (Optical Character Recognition) may be used to recognize each picture to obtain a character content recognition result.
The keyword, the global name of the picture and the text content recognition result can be stored as index information of the picture. When the search keywords are matched with the keywords corresponding to any picture, a picture global name and a text content identification result can be obtained according to the index information of the picture.
Here, the text content recognition result is used as a part of the index information, and can be fed back directly without going through other search steps, so that the response speed of the picture search device can be improved.
And 130, acquiring the target picture and the basic information of the target picture based on the picture global name.
Specifically, the global picture name is used as a global unique identifier of the picture, and the target picture and basic information of the target picture can be obtained by searching in a file system. The basic information is attribute information related to the chat software, the original name of the picture, the creation date of the picture, and the like.
And 140, displaying the target picture, and identifying the text content of the target picture and the basic information.
Specifically, the target picture, the text content identification result of the target picture and the basic information can be displayed to the user in the visual interface of the picture retrieval device, so that the user can view the target picture and view the text content identification result of the target picture, and corresponding chat software, chat time and the like can be further positioned according to the basic information, thereby obtaining richer chat information.
According to the picture retrieval method provided by the embodiment of the invention, the retrieval keywords are matched with the keywords corresponding to each picture, and the picture global names of the target pictures corresponding to the retrieval keywords and the text content recognition results of the target pictures are determined; acquiring a target picture and basic information of the target picture according to the picture global name; displaying a target picture, and a character content identification result and basic information of the target picture; through the text recognition of the picture and the keyword retrieval, the text processing is carried out on the picture content, so that the chat picture is retrieved according to the text recognition content in the picture, the user does not need to search and position the picture in the computer by means of memory, the information retrieval efficiency is improved, and the management level of the picture in the computer is improved.
It should be noted that each embodiment of the present invention may be freely combined, exchanged in order, or separately executed, and does not need to rely on or rely on a fixed execution sequence.
In some embodiments, the method further comprises:
storing pictures in chat records of a plurality of chat software to a picture library, and determining picture global names of all pictures in the picture library;
generating basic information of each picture based on the original name, the picture global name, the picture creation time, the chat software, the source storage catalog and the picture storage catalog of each picture;
performing character recognition on each picture to obtain a character content recognition result of each picture;
word segmentation is carried out on the character content recognition results of the pictures to obtain keywords corresponding to the pictures;
and generating inverted indexes of the pictures based on the picture global names, the text content recognition results and the keywords of the pictures.
In particular, different chat software creates separate folders in the computer system for storing text, pictures, files, etc. in the chat log. An independent picture library can be created, pictures in chat records of all chat software are copied and then stored in the picture library, and renaming is carried out on each stored picture to obtain the picture global name of the picture.
And when the pictures are stored in the picture library, the attribute information of each picture, including the original name, the picture global name, the picture creation time, the affiliated chat software, the source storage directory and the picture storage directory, can be obtained, and the information is used as the basic information of the pictures, so that a picture information table of the pictures is generated. The picture information table contains each attribute (field), and the type (data format) and related description of each attribute, as shown in table 1.
Table 1 picture information table
Figure BDA0004071136670000081
The text recognition can be carried out on each picture, the text content in the picture is recognized, and the text content recognition result of each picture is obtained. For example, an identification engine may be created in a computer system. The engine can be a background silent program, and obtains a storage catalog of the newly added picture by scanning a picture information table of the newly added picture in a picture library, reads the newly added picture under the catalog, and carries out OCR character recognition on the picture. OCR text recognition is divided into four steps of image preprocessing, text detection, text recognition and language model correction, and finally, a text content recognition result converted into a text form is obtained.
The text content recognition result of the picture can be added into the picture information table through the picture global name, and the original picture information table is expanded, as shown in table 2.
Table 2 Picture information table (expansion)
Figure BDA0004071136670000091
The word segmentation can be carried out on the text content recognition result of each picture, and the keywords corresponding to each picture are obtained. The word segmentation method may include a dictionary-based word segmentation method, a statistical-based word segmentation method, a deep learning-based word segmentation method, and the like. For example, a vocabulary of words may be obtained. The word segmentation table is a list of complete Chinese and English words recorded in the text. The word content recognition result can be split according to the word segmentation table, and the split result is a set of words or phrases, and the set is a subset of the word segmentation table.
The word segmentation result can be used as a keyword for searching the picture. An Inverted index (Inverted index) may be created for a picture to enable retrieval of the picture based on the value of the attribute. The attribute here is a keyword.
And generating inverted indexes of the pictures according to the picture global names, the text content identification results and the keywords of the pictures, as shown in table 3.
TABLE 3 inverted index Table
Figure BDA0004071136670000092
Figure BDA0004071136670000101
On the basis of word segmentation results, an inverted index is established, and the index correlates keywords, text content identification results and picture basic information, specifically, the inverted index table and the picture information table are provided with picture global names.
In some embodiments, storing pictures in chat records of a plurality of chat software to a picture library includes:
based on the current image acquisition time and the last image acquisition time;
traversing the chat record storage files of each chat software, and copying the pictures with the picture creation time smaller than or equal to the current picture acquisition time and larger than the last picture acquisition time in the chat record to a picture library.
Specifically, a plurality of picture acquisition times can be set, and pictures in each chat software are copied to a picture library in a mode of acquiring incremental pictures.
And determining the acquisition time range of the picture according to the current picture acquisition time and the last picture acquisition time. And copying the pictures with the picture creation time smaller than or equal to the current picture acquisition time and larger than the last picture acquisition time in the chat record into a picture library as acquired objects.
The local storage directory (i.e., the source storage directory of pictures) may be configured for each chat software. The delta pictures are then retrieved under each local storage directory. The local storage directory configuration of each chat software is shown in table 4.
Table 4 local storage directory configuration table of chat software
Figure BDA0004071136670000102
Figure BDA0004071136670000111
For full scope collection, the configured directory should be a top-level directory containing all chat records under which to traverse, a specific traversal algorithm consults the depth-first traversal or breadth-first traversal of the tree. And when the traversal starts, a newly added picture (calculated according to the picture creation time) in the time difference range between the last picture acquisition time and the current picture acquisition time is the picture range acquired at the current time. And copying the pictures to a local catalog of the picture library, renaming the picture names, and ensuring the global uniqueness of the picture names in the whole picture library.
In the picture library, pictures can be stored according to different directory levels. The directory hierarchy can sequentially establish different directory hierarchies according to the chat software name and the creation time, and store pictures. The creation time can be divided into a date and a time. For example, for a certain picture, its picture library storage directory may be "picture library root directory/chat software directory/date directory/time directory".
In some embodiments, step 120 comprises:
matching the search keywords with each keyword in the inverted index of each picture;
under the condition that the search keywords are matched with any keyword in the inverted index of any picture, determining any picture as a target picture corresponding to the search keywords;
and determining the picture global name and the text content identification result of the target picture based on the inverted index of any picture.
Specifically, each picture may correspond to a plurality of keywords. These keywords are recorded in the inverted index.
The search keywords can be matched with the keywords in the inverted index of each picture one by one, when the search keywords are matched with any keyword in the inverted index of any picture, the picture can be determined to be a target picture corresponding to the search keywords, and the picture global name and the text content identification result of the target picture are obtained according to the inverted index.
For example, according to the keywords input by the user, searching is performed in the inverted index table through the keyword field, if the keywords are matched, the content corresponding to the word content identification result field is pushed to the user interface for display, so that the user can quickly see the search result. And then, searching complete basic information in the picture information table according to the picture global name, pushing the basic information to a user interface again for displaying, and enabling a user to further position chat software and chat time according to the basic information, so that an original chat record can be searched in the chat software.
In some embodiments, step 120 is preceded by:
matching the search keywords with the history search keywords in each search history record;
when the search keyword matches with a history search keyword in any one of the search histories, a target picture and a text content recognition result and basic information of the target picture are determined based on the search histories.
Specifically, as the picture library increases, and each picture may correspond to a plurality of keywords, in order to improve the retrieval efficiency, each retrieval keyword and the corresponding retrieval result may be saved as a retrieval history.
After the user inputs the search keywords, the search keywords may be first matched with the history search keywords in each search history. If the search keywords are matched with the history search keywords in any search history record, the target picture and the text content identification result and the basic information of the target picture can be determined according to the search history record, and each picture in the picture library does not need to be searched one by one, so that the search efficiency is improved.
In some embodiments, the retrieval history is updated based on the steps of:
generating a current search history record based on the current search keyword, the current searched target picture, and the text content identification result and basic information of the target picture;
traversing each retrieval history;
and deleting the search history record when the search date and/or the repeated search times of any search history record meet the preset condition.
Specifically, the current search history may be generated according to the current search keyword, the current searched target picture, and the text content identification result and the basic information of the target picture. The retrieval history may be embodied in tabular form as shown in table 5.
Table 5 search history table
Figure BDA0004071136670000131
When each picture is searched, the picture is recorded in a search history record table corresponding to the picture, and if the picture is repeatedly searched, the repeated search times are updated.
The history records with longer retrieval time and less repeated retrieval times can be cleaned regularly according to the last retrieval time and the repeated retrieval times. The specific method comprises the following steps: traversing each retrieval history; and deleting the search history record when the search date and/or the repeated search times of any search history record meet the preset condition.
The preset conditions may be set as:
the time difference between the last retrieval time and the current time is greater than a first time difference threshold;
or the time difference between the last retrieval time and the current time is smaller than or equal to the first time difference threshold value and larger than the second time difference threshold value, and the repeated retrieval times are smaller than the first retrieval times threshold value;
alternatively, the time difference between the last retrieval time and the current time is less than or equal to a second time difference threshold, and the number of repeated retrieval times is less than or equal to a second retrieval times threshold.
The first time difference threshold is larger than the second time difference threshold, and the first retrieval times threshold is larger than the second retrieval times threshold. The above thresholds may all be set as desired.
For example, the first time difference threshold may be 3 months, the second time difference threshold may be 1 month, the first search number threshold is 3, and the second search number threshold is 1.
The apparatus provided by the embodiments of the present invention will be described below, and the apparatus described below and the method described above may be referred to correspondingly.
Fig. 2 is a schematic structural diagram of a picture retrieval apparatus according to the present invention, as shown in fig. 2, the apparatus includes:
a receiving module 210, configured to receive a search keyword;
the search module 220 is configured to match the search keyword with keywords corresponding to each picture, and determine a global picture name of a target picture corresponding to the search keyword and a text content recognition result of the target picture;
the obtaining module 230 is configured to obtain the global picture name, the target picture, and basic information of the target picture;
the display module 240 is configured to display the target picture, and a text content recognition result and basic information of the target picture.
The picture retrieval device provided by the embodiment of the invention matches the retrieval keywords with the keywords corresponding to each picture, and determines the picture global names of the target pictures corresponding to the retrieval keywords and the text content identification results of the target pictures; acquiring a target picture and basic information of the target picture according to the picture global name; displaying a target picture, and a character content identification result and basic information of the target picture; through the text recognition of the picture and the keyword retrieval, the text processing is carried out on the picture content, so that the chat picture is retrieved according to the text recognition content in the picture, the user does not need to search and position the picture in the computer by means of memory, the information retrieval efficiency is improved, and the management level of the picture in the computer is improved.
In some embodiments, further comprising:
the management module is used for storing pictures in chat records of a plurality of chat software to the picture library and determining picture global names of the pictures in the picture library;
generating basic information of each picture based on the original name, the picture global name, the picture creation time, the chat software, the source storage catalog and the picture storage catalog of each picture;
performing character recognition on each picture to obtain a character content recognition result of each picture;
word segmentation is carried out on the character content recognition results of the pictures to obtain keywords corresponding to the pictures;
and generating inverted indexes of the pictures based on the picture global names, the text content recognition results and the keywords of the pictures.
In some embodiments, the management module is specifically configured to:
based on the current image acquisition time and the last image acquisition time;
traversing the chat record storage files of each chat software, and copying the pictures with the picture creation time smaller than or equal to the current picture acquisition time and larger than the last picture acquisition time in the chat record to a picture library.
In some embodiments, the management module is specifically configured to:
matching the search keywords with each keyword in the inverted index of each picture;
under the condition that the search keywords are matched with any keyword in the inverted index of any picture, determining any picture as a target picture corresponding to the search keywords;
and determining the picture global name and the text content identification result of the target picture based on the inverted index of any picture.
In some embodiments, the retrieval module is further to:
matching the search keywords with the history search keywords in each search history record;
and determining the target picture and the text content identification result and the basic information of the target picture based on any search history when the search keyword is matched with the history search keyword in any search history.
In some embodiments, the retrieval module is further to:
generating a current search history record based on the current search keyword, the current searched target picture, and the text content identification result and basic information of the target picture;
traversing each retrieval history;
and deleting any search history record under the condition that the search date and/or the repeated search times of any search history record meet the preset conditions.
In some embodiments, further comprising:
the picture library is used for storing pictures in chat records of a plurality of chat software;
and the storage library is used for storing the text content identification result, the basic information and the inverted index of each picture.
Specifically, the image retrieval device provided by the embodiment of the invention can further comprise an image library and a storage library. The storage library is essentially a database, and can be various databases such as a relational database, a non-relational database, a document database and the like. The storage library is responsible for storing picture information, text content recognition results, word segmentation results, inverted indexes, and other management information and configuration information.
In the embodiment of the invention, the receiving module, the retrieving module, the acquiring module and the management module can be realized by one search engine, the character recognition of the picture can be realized by one OCR engine, and the display module can be realized by one visual interface.
On the basis, the picture retrieval device in the embodiment of the invention comprises five parts, namely a picture library, an OCR (optical character recognition) engine, a search engine, a storage library and a visual interface. Wherein, the picture library, the OCR recognition engine and the search engine are core modules of the invention.
Fig. 3 is a second flow chart of the picture searching method provided by the present invention, as shown in fig. 3, the relationship and interaction flow of each module in the picture searching device are as follows:
(1) Picture library
The instant chat software can buffer the chat record under a certain directory of the local computer, the picture library supports the configuration picture directory, searches the files in the picture format from the directory, and copies the files to the picture library directory. The photo library also supports the classification management of the pictures in different dimensions, such as creating a two-level directory by "chat software name-picture creation time". When the picture enters the picture library, the basic information of the picture is stored in a picture information table, wherein the basic information comprises picture names, picture creation time, chat software, a picture original catalog and a picture storage catalog.
(2) OCR recognition engine
The engine is a background silent program, scans incremental pictures of a picture library at regular time, performs OCR (optical character recognition) on the pictures, and stores character content recognition results in a picture information table.
(3) Search engine
The search engine mainly comprises three functions: word segmentation, index maintenance and retrieval. The word segmentation is to split words or phrases of the picture recognition result by using a word stock; on the basis of word segmentation, the search engine establishes an inverted index, and the index correlates words or phrases with picture identification results and picture basic information. And during searching, reversely checking the character content recognition result and the picture basic information through keyword matching. The search results are pushed to the visualization interface.
(4) Storage warehouse
The repository is a database that stores primarily picture information and picture content identification results, search engine indexes, other relevant configuration and management data. The picture information is registered in the repository when the picture library collects pictures. After the OCR engine finishes the picture content recognition, the picture recognition result is pushed to a storage library, and the picture global name is associated with the registered picture basic information. The search engine builds an index and stores the index information in a library.
(5) Visual interface
The visual interface includes application configuration and management, search and result presentation. Application configuration and management is used for maintaining configuration information, such as configuring a picture storage catalog of chat software; or manage the pictures, such as manually confirming the OCR recognition result. The search and result presentation includes a search box and a result presentation portion. The user inputs the keywords in the search box, the visual interface pushes the keywords to the search engine, after the search engine finishes searching, the search result is pushed to the visual interface, the visual interface displays the search result, and according to the basic information of the pictures in the search result, the pictures are searched and displayed in the picture library.
Aiming at the Internet office scene, the embodiment of the invention solves the problem that the chat pictures in the chat software are difficult to search. The commonly used chat software is not self-contained and is based on the retrieval of the picture content, and the embodiment of the invention can be a supplement to the chat software function.
The embodiment of the invention carries out literal processing on the picture content based on OCR recognition and search engine technology and provides retrieval based on the picture content. When the user inquires the historical chat content, the search range is expanded from pure words to pictures, and the problem that the user can only search the historical chat pictures by memorization is solved.
Fig. 4 is a schematic structural diagram of an electronic device according to the present invention, as shown in fig. 4, the electronic device may include: processor (Processor) 410, communication interface (Communications Interface) 420, memory (Memory) 430, and communication bus (Communications Bus) 440, wherein Processor 410, communication interface 420, and Memory 430 complete communication with each other via communication bus 440. The processor 410 may invoke logic commands in the memory 430 to perform the following method:
receiving a search keyword; matching the search keywords with the keywords corresponding to each picture, and determining the picture global names of the target pictures corresponding to the search keywords and the text content identification results of the target pictures; acquiring basic information of a target picture based on the picture global name; and displaying the target picture, and the text content identification result and the basic information of the target picture.
In addition, the logic commands in the memory described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in the form of a software product stored in a storage medium, comprising several commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The processor in the electronic device provided by the embodiment of the invention can call the logic instruction in the memory to realize the method, and the specific implementation mode is consistent with the implementation mode of the method, and the same beneficial effects can be achieved, and the detailed description is omitted here.
The embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the methods provided by the above embodiments.
The specific embodiment is consistent with the foregoing method embodiment, and the same beneficial effects can be achieved, and will not be described herein.
The embodiments of the present invention provide a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A picture retrieval method, comprising:
receiving a search keyword;
matching the search keywords with keywords corresponding to each picture, and determining the picture global names of the target pictures corresponding to the search keywords and the text content identification results of the target pictures;
acquiring the target picture and basic information of the target picture based on the picture global name;
and displaying the target picture, and the character content identification result and the basic information of the target picture.
2. The picture retrieval method as recited in claim 1, further comprising:
storing pictures in chat records of a plurality of chat software to a picture library, and determining picture global names of all pictures in the picture library;
generating basic information of each picture based on the original name, the picture global name, the picture creation time, the chat software, the source storage catalog and the picture storage catalog of each picture;
performing character recognition on each picture to obtain a character content recognition result of each picture;
word segmentation is carried out on the character content recognition results of the pictures to obtain keywords corresponding to the pictures;
and generating inverted indexes of the pictures based on the picture global names, the text content recognition results and the keywords of the pictures.
3. The picture retrieval method as recited in claim 2, wherein storing pictures in chat records of a plurality of chat software in a picture library comprises:
based on the current image acquisition time and the last image acquisition time;
and traversing chat record storage files of all chat software, and copying pictures with picture creation time smaller than or equal to the current picture acquisition time and larger than the last picture acquisition time in the chat record to the picture library.
4. The picture retrieval method according to claim 2, wherein the matching the retrieval keywords with keywords corresponding to respective pictures, determining a picture global name of a target picture corresponding to the retrieval keywords, and a text content recognition result of the target picture, includes:
matching the search keywords with each keyword in the inverted index of each picture;
under the condition that the search keyword is matched with any keyword in the inverted index of any picture, determining the any picture as a target picture corresponding to the search keyword;
and determining the picture global name and the text content identification result of the target picture based on the inverted index of any picture.
5. The picture retrieval method according to any one of claims 1 to 4, wherein before the matching of the retrieval keywords with the keywords corresponding to the respective pictures, the method comprises:
matching the search keywords with the historical search keywords in each search history record;
and under the condition that the search keywords are matched with the historical search keywords in any search history record, determining the target picture and the text content identification result and basic information of the target picture based on any search history record.
6. The picture retrieval method as recited in claim 5, wherein the retrieval history is updated based on the steps of:
generating a current search history record based on the current search keyword, the current searched target picture, and the text content identification result and basic information of the target picture;
traversing each retrieval history;
and deleting any search history record under the condition that the search date and/or the repeated search times of any search history record meet the preset conditions.
7. A picture retrieval apparatus, comprising:
the receiving module is used for receiving the search keywords;
the search module is used for matching the search keywords with the keywords corresponding to each picture and determining the picture global names of the target pictures corresponding to the search keywords and the text content identification results of the target pictures;
the acquisition module is used for acquiring the picture global name, the target picture and basic information of the target picture;
and the display module is used for displaying the target picture, and the character content identification result and the basic information of the target picture.
8. The picture retrieval apparatus as recited in claim 7, further comprising:
the picture library is used for storing pictures in chat records of a plurality of chat software;
and the storage library is used for storing the text content identification result, the basic information and the inverted index of each picture.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the picture retrieval method according to any one of claims 1 to 6 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the picture retrieval method according to any one of claims 1 to 6.
CN202310093900.4A 2023-02-03 2023-02-03 Picture retrieval method, device, electronic equipment and storage medium Pending CN116304156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310093900.4A CN116304156A (en) 2023-02-03 2023-02-03 Picture retrieval method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310093900.4A CN116304156A (en) 2023-02-03 2023-02-03 Picture retrieval method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116304156A true CN116304156A (en) 2023-06-23

Family

ID=86831395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310093900.4A Pending CN116304156A (en) 2023-02-03 2023-02-03 Picture retrieval method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116304156A (en)

Similar Documents

Publication Publication Date Title
US20220261427A1 (en) Methods and system for semantic search in large databases
US6801904B2 (en) System for keyword based searching over relational databases
US8577882B2 (en) Method and system for searching multilingual documents
US9195738B2 (en) Tokenization platform
US20120117051A1 (en) Multi-modal approach to search query input
CN109508458B (en) Legal entity identification method and device
CN107085583B (en) Electronic document management method and device based on content
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
CN108255972A (en) A kind of text searching method and system
CN107844493B (en) File association method and system
WO2020056977A1 (en) Knowledge point pushing method and device, and computer readable storage medium
JP4237813B2 (en) Structured document management system
CN114706938A (en) Document tag determination method and device, electronic equipment and storage medium
CN113220821A (en) Index establishing method and device for test question retrieval and electronic equipment
CN113377896A (en) Full-text quick retrieval method and device, electronic equipment and storage medium
CN116304156A (en) Picture retrieval method, device, electronic equipment and storage medium
CN109710844A (en) Method and device for quickly and accurately locating files based on search engine
US9020995B2 (en) Hybrid relational, directory, and content query facility
KR100659370B1 (en) Method for Forming Document DV by Information Thesaurus Matching and Information Retrieval Method
CN115526601A (en) File management method and device
CN112241463A (en) Search method based on fusion of text semantics and picture information
CN111198980A (en) Open data retrieval method and device, storage medium and server
JPH07120355B2 (en) Image information memory retrieval method
CN114780808B (en) Information retrieval method and device
CN103577560A (en) Method and device for inputting data base operating instructions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination