Disclosure of Invention
In view of this, the embodiments of the present invention provide an information detection method and apparatus, which can automatically detect one product and multiple merchants, save labor power, and reduce cost, and in the embodiments of the present invention, before detecting one product and multiple merchants, sort out items randomly hung and branded, so as to improve accuracy of detection; during detection, the similarity degree between any articles is calculated based on the importance degree of various attributes of the articles, articles with one article and multiple products are determined according to the similarity degree, abundant attribute information is used in the detection process, and the detection accuracy is effectively guaranteed.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an information detection method including:
acquiring detail data of a plurality of articles to be analyzed, wherein the detail data comprises category information and brand information;
determining the object to be analyzed with wrong category information and brand information according to the detail data, and taking the object to be analyzed except the object to be analyzed with wrong brand information and category information as a target object;
for each of the target items, determining a plurality of attributes of the target item and a weight corresponding to each of the attributes;
calculating the similarity between any two target objects according to a plurality of attributes of any two target objects and the weight corresponding to each attribute, and determining the target objects belonging to one item and a plurality of quotients according to the similarity.
Optionally, the step of determining the object to be analyzed with wrong category information according to the detail data includes:
acquiring a category keyword dictionary;
identifying category keywords of the to-be-analyzed object by using a preset detection method and the category keyword dictionary to obtain keywords to be matched;
matching the keywords to be matched with the category information in the detail data, and taking the object to be analyzed of which the keyword to be matched and the category information in the detail data are not matched as the object to be analyzed of which the category information is wrong.
Optionally, the step of taking the object to be analyzed whose category information in the keyword to be matched and the detail data does not match as the object to be analyzed whose category information is wrong includes:
taking the object to be analyzed matched with the keyword to be matched and the category information in the detail data as a first object, and taking the object to be analyzed with the keyword to be matched and the category information in the detail data not matched as a second object;
training a category classification model according to the detail data of the first article;
determining the category to be matched of the second article according to the category classification model and the detail data of the second article;
and comparing the category to be matched of the second article with the category information in the detailed data, and if the category to be matched of the second article is inconsistent with the category information in the detailed data, determining that the second article is an article to be analyzed with wrong category information.
Optionally, the step of determining the article to be analyzed with wrong brand information according to the detail data includes:
acquiring a brand dictionary;
extracting brand words to be matched in the name of each to-be-analyzed article based on the brand dictionary;
and comparing the brand word to be matched with the brand information in the detail data, and if the brand word to be matched is inconsistent with the brand information in the detail data, determining that the article to be analyzed is the article to be analyzed with wrong brand information.
Optionally, the brand dictionary is obtained according to the following process: carrying out normalization processing on names of a plurality of different languages of the same brand so as to map the names of the plurality of different languages of the brand as standard names; a brand dictionary is generated based on the names of the plurality of different languages of the brand and the mapped standard names.
Optionally, characterized in that the method further comprises: receiving weight configuration information, wherein the weight configuration information records the number and the weight of attributes related to different categories according to category dimensions;
for each of the target items, the step of determining a plurality of attributes of the target item and a weight corresponding to each of the attributes comprises: and determining a plurality of attributes of the target object and the weight corresponding to each attribute according to the weight configuration information for each target object.
Optionally, calculating a similarity between any two target items according to a plurality of attributes of any two target items and a weight corresponding to each of the attributes, wherein the step of determining the target items belonging to a plurality of one-item-quotient according to the similarity includes:
grouping the target articles according to the category information of the target articles;
and aiming at each group of target articles, calculating the similarity between any two target articles according to a plurality of attributes of any two target articles in the group and the weight corresponding to each attribute, and determining the target articles belonging to one article and a plurality of quotient according to the similarity.
Optionally, for each group of target items, calculating a similarity between any two target items according to a plurality of attributes of any two target items in the group and a weight corresponding to each of the attributes, and determining the target items belonging to a plurality of products according to the similarity includes:
aiming at each group of target articles, determining any two target articles in the group of target articles, and calculating the attribute similarity of the same attribute of any two target articles according to a plurality of attributes of any two target articles;
calculating a weighted sum of the similarity of the attributes according to the weight corresponding to each attribute;
calculating the similarity between any two target objects according to the weighted sum of the attribute similarities and the brand information of any two target objects;
and if the similarity between any two target articles is greater than a preset threshold value, determining that the any two target articles determine the target articles belonging to one article and multiple quotients.
Optionally, the step of calculating the attribute similarity of the same kind of attributes of any two target articles includes: converting the attribute values of the same attributes of any two target articles into feature vectors; and calculating the similarity of the feature vectors, and taking the similarity of the feature vectors as the attribute similarity of the same attribute of any two target articles.
To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided an information detecting apparatus including:
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring detailed data of a plurality of items to be analyzed, and the detailed data comprises category information and brand information;
the screening module is used for determining the object to be analyzed with wrong category information and brand information according to the detail data and taking the object to be analyzed except the object to be analyzed with wrong brand information and category information as a target object;
a determination module for determining, for each of the target items, a plurality of attributes of the target item and a weight corresponding to each of the attributes;
the detection module is used for calculating the similarity between any two target articles according to the multiple attributes of any two target articles and the weight corresponding to each attribute, and determining the target articles belonging to one article and multiple products according to the similarity.
Optionally, the screening module is further configured to: acquiring a category keyword dictionary; identifying category keywords of the to-be-analyzed object by using a preset detection method and the category keyword dictionary to obtain keywords to be matched; matching the keywords to be matched with the category information in the detail data, and taking the object to be analyzed of which the keyword to be matched and the category information in the detail data are not matched as the object to be analyzed of which the category information is wrong.
Optionally, the screening module is further configured to:
taking the object to be analyzed matched with the keyword to be matched and the category information in the detail data as a first object, and taking the object to be analyzed with the keyword to be matched and the category information in the detail data not matched as a second object;
training a category classification model according to the detail data of the first article;
determining the category to be matched of the second article according to the category classification model and the detail data of the second article;
and comparing the category to be matched of the second article with the category information in the detailed data, and if the category to be matched of the second article is inconsistent with the category information in the detailed data, determining that the second article is an article to be analyzed with wrong category information.
Optionally, the screening module is further configured to: acquiring a brand dictionary; extracting brand words to be matched in the name of each to-be-analyzed article based on the brand dictionary; and comparing the brand word to be matched with the brand information in the detail data, and if the brand word to be matched is inconsistent with the brand information in the detail data, determining that the article to be analyzed is the article to be analyzed with wrong brand information.
Optionally, the apparatus further includes a dictionary constructing module, configured to perform normalization processing on names of multiple different languages of the same brand to map the names of the multiple different languages of the brand to a standard name; a brand dictionary is generated based on the names of the plurality of different languages of the brand and the mapped standard names.
Optionally, the apparatus further includes a receiving module, configured to receive weight configuration information, where the weight configuration information records, according to category dimensions, the number and weight of attributes related to different categories;
the determination module is further to: and for each target item, determining a plurality of attributes of the target item and the weight corresponding to each attribute according to the weight configuration information.
Optionally, the detection module is further configured to: grouping the target articles according to the category information of the target articles; and aiming at each group of target articles, calculating the similarity between any two target articles according to the multiple attributes of any two target articles in the group and the weight corresponding to each attribute, and determining the target articles belonging to one article and multiple quotient according to the similarity.
Optionally, the detection module is further configured to: aiming at each group of target articles, determining any two target articles in the group of target articles, and calculating the attribute similarity of the same attribute of any two target articles according to a plurality of attributes of any two target articles; calculating a weighted sum of the similarity of the attributes according to the weight corresponding to each attribute; calculating the similarity between any two target objects according to the weighted sum of the attribute similarities and the brand information of any two target objects; and if the similarity between any two target articles is greater than a preset threshold value, determining that the any two target articles determine the target articles belonging to one article and multiple quotients.
Optionally, the detection module is further configured to: converting the attribute values of the same attributes of any two target articles into feature vectors; and calculating the similarity of the feature vectors, and taking the similarity of the feature vectors as the attribute similarity of the same attribute of any two target articles.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the information detection method according to the embodiment of the present invention.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing the information detection method of an embodiment of the present invention when executed by a processor.
One embodiment of the above invention has the following advantages or benefits: acquiring detail data of a plurality of items to be analyzed, wherein the detail data comprises category information and brand information; determining the object to be analyzed with wrong category information and brand information according to the detail data, and taking the object to be analyzed except the object to be analyzed with wrong brand information and category information as a target object; for each of the target items, determining a plurality of attributes of the target item and a weight corresponding to each of the attributes; the technical means of calculating the similarity between any two target articles according to the multiple attributes of any two target articles and the weight corresponding to each attribute and determining the target articles belonging to one commodity and multiple commodities according to the similarity can automatically detect one commodity and multiple commodities, save labor and reduce cost; when detecting a product and a plurality of merchants, the method eliminates the article with wrong category and brand, reduces the noise and improves the accuracy.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic flow chart of main steps of an information detection method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
step S101: acquiring detail data of a plurality of articles to be analyzed, wherein the detail data comprises category information and brand information;
step S102: determining the object to be analyzed with wrong category information and brand information according to the detail data, and taking the object to be analyzed except the object to be analyzed with wrong brand information and category information as a target object;
step S103: for each of the target items, determining a plurality of attributes of the target item and a weight corresponding to each of the attributes;
step S104: calculating the similarity between any two target objects according to a plurality of attributes of any two target objects and the weight corresponding to each attribute, and determining the target objects belonging to one item and a plurality of quotients according to the similarity.
For step S101, after the item is purchased from the supplier, the data of the item is recorded in a predetermined file in the data table. The items of data may include basic information (or information about the article of distribution), specification and model, and parameters of the article, such as category, name, brand, model, series, price, time on shelf, place of production, and warranty. Therefore, when the detail data of the object to be analyzed is acquired, information such as category information, name information, brand information, model information, and the like, of the object to be analyzed, specification model, and parameters, can be acquired.
As for step S102, in the embodiment, the object to be analyzed with the wrong category information and brand information is determined, so that the object to be analyzed with the wrong information record can be screened out, that is, the object to be analyzed with the wrong category and brand can be screened out, noise can be reduced, and the accuracy of detecting multiple commodities can be improved. In the process of determining the object to be analyzed with wrong category information and brand information, the machine model can be trained to predict the category and brand of the object to be analyzed, the category and brand of the object to be analyzed can also be recognized based on the dictionary, the predicted or recognized category and brand are compared with the category information and brand information in the acquired detail data, and if the category information and brand information are consistent, the object to be analyzed with wrong category information and brand information is determined. In an optional embodiment, after the object to be analyzed with wrong category information and brand information is determined, the object to be analyzed with wrong category information and brand information may also be sent to an object manager for verification.
For step S103, the embodiment of the present invention calculates the similarity degree between the target items based on the importance degrees of the various attributes of the target items. It is therefore necessary to determine the attributes of the respective target items and the weight corresponding to each attribute. In view of the fact that there may be a large difference between the attributes of the articles in different categories, different attributes and corresponding weight values may be set in advance for the characteristics of each category of articles in the embodiment. Specifically, the information detection method according to the embodiment of the present invention may further include: and receiving weight configuration information, wherein the weight configuration information records the number and the weight of the attributes related to different categories according to the category dimension. In order to set the attribute weight more accurately, the corresponding category experts can determine the weight and the number of the attributes related to the current category according to the category dimension, the weight range is [0, 1], and a learning model based on an attention mechanism can also be adopted to independently learn the weight of each attribute.
Therefore, when determining the attribute of each target item and the weight corresponding to each attribute, the corresponding attribute and the weight corresponding to each attribute in the weight configuration information can be searched according to the category information of the target item.
Regarding the attribute of the target object, the attribute of the target object can be divided into three parts, namely a distribution field, a specification parameter and an extended attribute by taking an industrial object as an object. The field of the goods generally comprises basic information of the goods, such as brand, model, series, whether dangerous goods exist, and the like; business information such as price, units, time on shelf, place of production, warranty, etc. The specification parameters may include volume, weight, and the like. The extended attributes may include a rated voltage, a rated current, and the like.
For step S104, the articles of different purposes generally do not belong to the same commodity, and there is no problem of multiple commodities per commodity. Therefore, it is determined in this step that the target items belonging to one item and multiple merchants belong to the same category. Specifically, the method comprises the following steps: grouping the target articles according to the category information of the target articles; and aiming at each group of target articles, calculating the similarity between any two target articles according to a plurality of attributes of any two target articles in the group and the weight corresponding to each attribute, and determining the target articles belonging to one article and a plurality of quotient according to the similarity. More specifically, as shown in fig. 2, the steps include:
step S201: aiming at each group of target articles, determining any two target articles in the group of target articles, and calculating the attribute similarity of the same attribute of any two target articles according to a plurality of attributes of any two target articles;
step S202: calculating a weighted sum of the similarity of the attributes according to the weight corresponding to each attribute;
step S203: calculating the similarity between any two target objects according to the weighted sum of the attribute similarities and the brand information of any two target objects;
step S204: and if the similarity between any two target articles is greater than a preset threshold value, determining that the any two target articles determine the target articles belonging to one article and multiple quotients.
In step S201, considering that it is difficult to calculate the similarity using an accurate text distance formula due to various attribute contents and some non-normative phenomena in attributes such as specification and model, the embodiment of the present invention converts the attribute contents into a feature vector form, and uses the similarity of the feature vector as the attribute similarity. Specifically, the method comprises the following steps: converting the attribute values of the same attributes of any two target articles into feature vectors; and calculating the similarity of the feature vectors, and taking the similarity of the feature vectors as the attribute similarity of the same attribute of any two target articles. As an example, it is possible to convert attribute values of all attributes into the form of feature vectors using the FastTex method, and calculate the similarity between the feature vectors using the cosine similarity formula. FastText is another tool for computing word vectors from Google open source.
For step S202, after determining the attribute similarity of the same kind of attributes of any two target items, all the attribute similarities are weighted and summed based on the determined attribute weights.
For step S203, in this embodiment, the detection of the one-item multi-quotient problem is defined as a similarity calculation problem between two items, and different thresholds can be set according to category dimensions, that is, an item with one-item multi-quotient can be detected when the similarity between two items is greater than the threshold. In an actual scene, objects of different brands do not belong to the same commodity, so that target objects of different brands do not belong to one commodity and multiple merchants. Therefore, in this embodiment, a parameter of brand score may be set to indicate whether the brands of the two target items are the same, and if the brand information of the two target items is the same, the brand scores of the two target items are set to be 1, and if not, the brand scores are set to be 0. After determining the brand scores of any two target items, calculating the product of the brand scores of the two target items and the weighted sum of the attribute similarity, taking the product as the similarity between the two target items, and if the product is greater than a preset threshold value, determining that the two target items belong to one or more commodities.
Specifically, the similarity between any two target objects can be calculated according to the following formula:
wherein S is
ijRepresenting the similarity between any two target articles, wherein N represents the number of attributes related to the articles in common; lambda [ alpha ]
mRepresenting the influence weight of different attributes on the determination of the identity of the article;
representing the similarity of the same kind of attributes between two items,
it means that when the brands of two items are the same, the item is given a score of 1 and the different score is 0, i.e. the brands are different and certainly not belonging to the same item.
The information detection method provided by the embodiment of the invention can automatically detect one product and multiple products, saves manpower, reduces cost, and calculates the similarity degree between any articles based on the importance degree of various attributes of the articles, thereby determining the articles of one product and multiple products, using abundant attribute information, and effectively ensuring the accuracy of detection; when detecting a product and a plurality of merchants, the method eliminates the article with wrong category and brand, reduces the noise and improves the accuracy.
In an optional embodiment, after the articles of one product and multiple products are detected, a unique article code can be set for the articles of one product and multiple products, so that the same articles of different suppliers are combined, and the standardization and normalization degree of article information is improved.
Fig. 3 is a schematic flow chart of main steps of an information detection method according to another embodiment of the present invention, as shown in fig. 3, the method includes:
step S301: acquiring detail data of a plurality of articles to be analyzed, wherein the detail data comprises category information and brand information;
step S302: acquiring a category keyword dictionary;
step S303: identifying category keywords of the to-be-analyzed object by using a preset detection method and the category keyword dictionary to obtain keywords to be matched;
step S304: matching the keywords to be matched with the category information in the detail data, and taking the object to be analyzed, of which the keyword to be matched and the category information in the detail data are not matched, as the object to be analyzed with wrong category information;
step S305: acquiring a brand dictionary;
step S306: extracting brand words to be matched in the name of each to-be-analyzed article based on the brand dictionary;
step S307: comparing the brand word to be matched with the brand information in the detail data, and if the brand word to be matched is inconsistent with the brand information in the detail data, determining that the article to be analyzed is an article to be analyzed with wrong brand information;
step S308: taking the articles to be analyzed except the articles to be analyzed with wrong brand information and category information as target articles; for each of the target items, determining a plurality of attributes of the target item and a weight corresponding to each of the attributes;
step S309: calculating the similarity between any two target objects according to a plurality of attributes of any two target objects and the weight corresponding to each attribute, and determining the target objects belonging to one item and a plurality of quotients according to the similarity.
Steps S301, S308 and S309 are the same as the embodiment shown in fig. 1, and the present invention is not described herein again.
For steps S302-S304, it is used to remove the random-category articles to be analyzed. When applied, a category keyword dictionary may be obtained based on the collected category keywords. Based on the category keywords, identifying the detail data of the object to be analyzed by using a DFA keyword matching method to obtain the keywords to be matched. And if the hit keywords to be matched are not consistent with the category information in the detail data, determining that the object to be analyzed is the object to be analyzed with wrong category information. In which DFA, known as Deterministic finish automation, determines a Finite Automaton, which transitions from one state to another through a series of events. In the keyword matching field, the DFA can take each text segment as a state, and identifies and judges each input text segment to determine a final state or a state set and a path which can be reached by the DFA.
For steps S305-S307, brand confusion detection is performed on the object to be analyzed for eliminating the category confusion. Firstly, carrying out normalization processing on a brand covered by an article pool, and forming normalization mapping on different forms of names of the same brand, such as Chinese and English, Chinese and English names, so as to map the names of a plurality of different languages of the brand into standard names; a brand dictionary is generated based on the names of the plurality of different languages of the brand and the mapped standard names. And extracting brand words in the name of the article to be analyzed based on the normalized brand dictionary, comparing the brand words with the brand fields (namely the brand information in the detail data) carried by the article, and judging that the brand is randomly carried if the brand fields are different, namely the article to be analyzed is the article to be analyzed with wrong brand information.
In an alternative embodiment, steps S305 to S307 may be performed first, and then steps S302 to S304 are performed, that is, the brand random hanging object may be detected first, and then the category random hanging detection may be performed on the brand random hanging-removed object.
The information detection method provided by the embodiment of the invention can automatically detect one product and multiple products, saves manpower, reduces cost, and calculates the similarity degree between any articles based on the importance degree of various attributes of the articles, thereby determining the articles of one product and multiple products to use abundant attribute information and effectively ensuring the accuracy of detection; after the articles randomly hung in categories and brands are removed, one-article multi-commodity detection is carried out, noise data are reduced, and accuracy is improved; and the articles randomly hung in categories and brands are identified based on category keyword check and brand word check, so that the method is simple and convenient, and the detection efficiency is improved.
Fig. 4 is a schematic flow chart of main steps of an information detection method according to another embodiment of the present invention, as shown in fig. 4, the method includes:
step S401: acquiring detail data of a plurality of articles to be analyzed, wherein the detail data comprises category information and brand information;
step S402: acquiring a category keyword dictionary;
step S403: identifying category keywords of the to-be-analyzed object by using a preset detection method and the category keyword dictionary to obtain keywords to be matched;
step S404: comparing the keywords to be matched with the category information in the detail data, taking the articles to be analyzed matched with the keywords to be matched with the category information in the detail data as first articles, and taking the articles to be analyzed with the keywords to be matched with the category information in the detail data unmatched with the category information in the detail data as second articles;
step S405: training a category classification model according to the detail data of the first article;
step S406: determining the category to be matched of the second article according to the category classification model and the detail data of the second article;
step S407: and for each second article, comparing the category to be matched of the second article with the category information in the detail data, and if the category to be matched of the second article is inconsistent with the category information in the detail data, determining that the second article is the article to be analyzed with wrong category information.
Step S408: acquiring a brand dictionary;
step S409: extracting brand words to be matched in the name of each to-be-analyzed article based on the brand dictionary;
step S410: comparing the brand word to be matched with the brand information in the detail data, and if the brand word to be matched is inconsistent with the brand information in the detail data, determining that the article to be analyzed is an article to be analyzed with wrong brand information;
step S411: taking the articles to be analyzed except the articles to be analyzed with wrong brand information and category information as target articles; for each of the target items, determining a plurality of attributes of the target item and a weight corresponding to each of the attributes;
step S412: calculating the similarity between any two target objects according to a plurality of attributes of any two target objects and the weight corresponding to each attribute, and determining the target objects belonging to one item and a plurality of quotients according to the similarity.
In this embodiment, steps S401 and S408-S412 are the same as the embodiment shown in fig. 3, and the description of the present invention is omitted here.
For steps S402 to S407, after identifying an object to be analyzed whose category keyword does not match the category information based on the category keyword dictionary, the name information and model information of the object to be analyzed whose category keyword is consistent with the category information are used as training data, and a BERT-CRF method is used to perform category classification model training, and the trained category classification model performs category prediction on a commodity whose category keyword does not respond, that is, performs category prediction on a second item, thereby obtaining a category to be matched. If the predicted category to be matched is not consistent with the mounted category, determining that the article is hung in disorder according to the category, namely determining that the second article is the article to be analyzed with wrong category information.
The information detection method of the embodiment of the invention is different from the embodiment shown in fig. 2 in that the embodiment performs secondary detection on the recognized random category hanging object to be analyzed after recognizing the random category hanging object to be analyzed based on the category keyword dictionary, so that the probability of false recognition is reduced, and the accuracy of detection is improved.
Fig. 5 is a schematic diagram of main blocks of an information detection apparatus 500 according to an embodiment of the present invention, and as shown in fig. 5, the information detection apparatus 500 includes:
an obtaining module 501, configured to obtain detail data of multiple items to be analyzed, where the detail data includes category information and brand information;
the screening module 502 is configured to determine, according to the detail data, an object to be analyzed for which category information and brand information are wrong, and use an object to be analyzed other than the object to be analyzed for which the brand information and category information are wrong as a target object;
a determining module 503, configured to determine, for each of the target items, a plurality of attributes of the target item and a weight corresponding to each of the attributes;
the detecting module 504 is configured to calculate a similarity between any two target items according to a plurality of attributes of any two target items and a weight corresponding to each of the attributes, and determine a target item belonging to a plurality of products according to the similarity.
Optionally, the screening module is further configured to: acquiring a category keyword dictionary; identifying category keywords of the to-be-analyzed object by using a preset detection method and the category keyword dictionary to obtain keywords to be matched; matching the keywords to be matched with the category information in the detail data, and taking the object to be analyzed of which the keyword to be matched and the category information in the detail data are not matched as the object to be analyzed of which the category information is wrong.
Optionally, the screening module is further configured to:
taking the object to be analyzed matched with the keyword to be matched and the category information in the detail data as a first object, and taking the object to be analyzed with the keyword to be matched and the category information in the detail data not matched as a second object;
training a category classification model according to the detail data of the first article;
determining the category to be matched of the second article according to the category classification model and the detail data of the second article;
and comparing the category to be matched of the second article with the category information in the detailed data, and if the category to be matched of the second article is inconsistent with the category information in the detailed data, determining that the second article is an article to be analyzed with wrong category information.
Optionally, the screening module is further configured to: acquiring a brand dictionary; extracting brand words to be matched in the name of each to-be-analyzed article based on the brand dictionary; and comparing the brand word to be matched with the brand information in the detail data, and if the brand word to be matched is inconsistent with the brand information in the detail data, determining that the article to be analyzed is the article to be analyzed with wrong brand information.
Optionally, the apparatus further includes a receiving module, configured to receive weight configuration information, where the weight configuration information records, according to category dimensions, the number and weight of attributes related to different categories;
the determination module is further to: and for each target item, determining a plurality of attributes of the target item and the weight corresponding to each attribute according to the weight configuration information.
Optionally, the detection module is further configured to: grouping the target articles according to the category information of the target articles; and aiming at each group of target articles, calculating the similarity between any two target articles according to the multiple attributes of any two target articles in the group and the weight corresponding to each attribute, and determining the target articles belonging to one article and multiple quotient according to the similarity.
Optionally, the detection module is further configured to: aiming at each group of target articles, determining any two target articles in the group of target articles, and calculating the attribute similarity of the same attribute of any two target articles according to a plurality of attributes of any two target articles; calculating a weighted sum of the similarity of the attributes according to the weight corresponding to each attribute; calculating the similarity between any two target objects according to the weighted sum of the attribute similarities and the brand information of any two target objects; and if the similarity between any two target articles is greater than a preset threshold value, determining that the any two target articles determine the target articles belonging to one article and multiple quotients.
Optionally, the detection module is further configured to: converting the attribute values of the same attributes of any two target articles into feature vectors; and calculating the similarity of the feature vectors, and taking the similarity of the feature vectors as the attribute similarity of the same attribute of any two target articles.
According to the information detection device, the detail data of a plurality of objects to be analyzed are acquired, and the detail data comprise category information and brand information; determining the object to be analyzed with wrong category information and brand information according to the detail data, and taking the object to be analyzed except the object to be analyzed with wrong brand information and category information as a target object; for each of the target items, determining a plurality of attributes of the target item and a weight corresponding to each of the attributes; the technical means of determining the target articles belonging to one product and multiple suppliers can automatically detect the one product and the multiple suppliers, save labor and reduce cost according to multiple attributes of any two target articles and the weight corresponding to each attribute, and the embodiment of the invention calculates the similarity degree between any articles based on the importance degree of various attributes of the articles, thereby determining the articles of the one product and the multiple suppliers, using rich attribute information and effectively ensuring the accuracy of detection; when detecting a product and a plurality of merchants, the method eliminates the article with wrong category and brand, reduces the noise and improves the accuracy.
The device can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
Fig. 6 shows an exemplary system architecture 600 to which the information detection method or the information detection apparatus of the embodiments of the present invention can be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. Various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the terminal devices 601, 602, and 603.
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 601, 602, and 603. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the information detection method provided in the embodiment of the present invention is generally executed by the server 605, and accordingly, the information detection apparatus is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
acquiring detail data of a plurality of items to be analyzed, wherein the detail data comprises category information and brand information;
determining the object to be analyzed with wrong category information and brand information according to the detail data, and taking the object to be analyzed except the object to be analyzed with wrong brand information and category information as a target object;
for each of the target items, determining a plurality of attributes of the target item and a weight corresponding to each of the attributes;
calculating the similarity of any two target objects according to a plurality of attributes of any two target objects and the weight corresponding to each attribute, and determining the target objects belonging to one item and a plurality of quotients according to the similarity.
The technical scheme of the embodiment of the invention can automatically detect one product and multiple products, saves labor power and reduces cost, and the embodiment of the invention calculates the similarity degree between any articles based on the importance degree of various attributes of the articles, thereby determining the articles of one product and multiple products, using abundant attribute information and effectively ensuring the accuracy of detection; when detecting a product and a plurality of merchants, the method eliminates the article with wrong category and brand, reduces the noise and improves the accuracy.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.