[go: up one dir, main page]

CN111191001A - Enterprise multi-element label identification method for paper package and related industries thereof - Google Patents

Enterprise multi-element label identification method for paper package and related industries thereof Download PDF

Info

Publication number
CN111191001A
CN111191001A CN201911335749.0A CN201911335749A CN111191001A CN 111191001 A CN111191001 A CN 111191001A CN 201911335749 A CN201911335749 A CN 201911335749A CN 111191001 A CN111191001 A CN 111191001A
Authority
CN
China
Prior art keywords
labels
label
enterprise
data
industry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911335749.0A
Other languages
Chinese (zh)
Inventor
陈家银
龚小龙
陈曦
麻志毅
彭军民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Zhejiang Great Shengda Packing Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Zhejiang Great Shengda Packing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd, Zhejiang Great Shengda Packing Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN201911335749.0A priority Critical patent/CN111191001A/en
Publication of CN111191001A publication Critical patent/CN111191001A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an enterprise multi-element label identification method, in particular to an enterprise multi-element label identification method for paper packaging and related industries. The invention provides a hierarchical iterative identification framework, which can construct a required large data set based on a small amount of labeled data derivative models, and finally applies a lightweight XML-CNN depth model method to improve the overall label identification efficiency. A multi-element label identification method for enterprises in paper packaging and related industries improves the retrieval efficiency of complex information of the enterprises.

Description

Enterprise multi-element label identification method for paper package and related industries thereof
Technical Field
The invention relates to an enterprise multi-element label identification method, in particular to an enterprise multi-element label identification method for paper packaging and related industries.
Background
The enterprise multi-element label means that one enterprise entity comprises a plurality of attribute labels, and one label is a high-level abstract summary of certain dimension information of the enterprise. Although the enterprise multi-element label can be applied to a plurality of fields such as accurate classification of enterprises, complex information retrieval, effective label recommendation and the like[1,2]However, the current application status is not mature, and the main reasons are as follows: firstly, the existing multi-element label system is single, and the requirements of the personalized industry in the vertical field are difficult to meet; second, the multi-tag identification technique requires a large amount of labeling data.
Currently, the multi-element label system of enterprises mainly focuses on the dimensions of industry categories, major-business products and the like, such as 'food manufacturing', 'cultural entertainment', 'milk tea' and the like. In the paper packaging industry, enterprises pay more attention to deep dimensional information such as logistics transportation distance, enterprise marketing relation and enterprise paper packaging demand, and the existing enterprise multi-element label system cannot meet deep requirements in the industry. Therefore, the invention provides the following ideas that concept labels such as 'same district', 'same city', 'same province' and the like are designed on the logistics transportation distance dimension, and concept labels such as 'customer', 'same bank', 'supplier' and the like are designed on the enterprise marketing relationship. The label system is convenient for marketing personnel in the paper packaging industry to quickly know the distance between the target enterprise and the self enterprise and know the possible relation between the target enterprise and the self enterprise.
In the task of identifying the enterprise multi-element labels, because a deep learning method needs a large amount of labeling data, and in practical application, the labeling data of the enterprise multi-element labels are extremely deficient, manual labeling of the data needs to consume manpower, and a large amount of labor cost is brought. Based on the method, a method of iterative identification and comparison of a plurality of traditional models is adopted to generate a large amount of high-quality marking data, and the manual marking cost is reduced. And then, a lightweight XML-CNN depth model method is applied to improve the overall tag identification efficiency. The invention effectively combines the two steps and provides a hierarchical iterative identification framework based on deep learning.
Patent similar to the present invention 'a multi-label classification method for enterprise industry' (CN109783818A)[3]The invention uses a double-layer recurrent neural network to identify the industry class labels of enterprises, and does not carry out deep label identification facing to the individual requirements of specific industries. In addition, compared with other depth models, the double-layer cyclic neural network model is complex and low in operation and deployment efficiency. Aiming at the problem, the invention applies a lightweight XML-CNN depth model[4]The method is used for identifying the enterprise multi-element labels, based on a Convolutional Neural Network (CNN), a Dynamic Pooling (Dynamic Max Pooling) and Bottleneck type Hidden Layer (high bottle neck Layer) strategy is used, and the overall identification effect is superior to that of other depth models.
Disclosure of Invention
The invention mainly aims to solve the defects in the prior art and solve the problem that a large amount of label data needed by deep learning is often difficult to meet in an actual application scene, and provides a hierarchical iterative identification framework which can construct a needed large data set based on a small amount of label data derivative models and finally apply a lightweight XML-CNN deep model method to improve the overall label identification efficiency.
The technical problem of the invention is mainly solved by the following technical scheme:
a paper package and related industry oriented enterprise multi-element label identification method comprises the following steps:
firstly, constructing a multi-element label system:
enterprise multi-element labels refer to enterprise data without direct commercial value, information which can directly meet business requirements is abstracted through cleaning, sorting and mining, and then displayed in a form of a plurality of labels, so that the related requirements of accurate classification, high-efficiency and complex query of enterprises are supported;
the invention utilizes seven characteristic data disclosed by enterprises, specifically comprising company name, registration address, registration capital, company type, affiliated industry, operation range and company profile, to construct an enterprise multi-label system facing paper packaging and related industries;
the label system mainly covers five dimensions: the transportation distance, the paper packaging demand, the paper packaging type, the enterprise marketing relationship and the industry category; the transportation distance comprises 5 labels, the paper packaging demand amount comprises 3 labels, the paper packaging type comprises 5 labels, the enterprise marketing relationship comprises 3 labels, the industry category comprises 198 labels, and the total number of the labels is 214;
the labels in these five dimensions are described in detail below:
① transportation distance label is the summarization of geographical position information between enterprises, specifically takes on the values of 'same district', 'same city', 'same province', 'domestic' and 'foreign', the label is mainly identified from the characteristic data of the enterprise 'registration address', some 'company name' also covers the information related to the transportation distance label;
② paper packaging demand label is an enterprise's summary of paper packaging demand information, specifically taking values of ' large amount ', ' medium amount ', ' small amount ', and the label mainly depends on three characteristic data of ' industry of the enterprise ', ' registration capital ' and ' company type ';
③ paper package type labels, specifically taking the values of 'carton', 'carton' and 'paper bag', indicate which type the paper package required by an enterprise mainly belongs to, and can be judged and identified from 'the industry to which the paper package belongs';
④ enterprise marketing relation labels, specifically taking values of 'same line', 'client', 'supplier', classifying upstream and downstream enterprises of a company, mainly identifying data on two characteristics of 'affiliated industry' and 'operating range' of the enterprise, having the enterprise marketing relation labels, enabling the company in the paper packaging industry to easily know the relations between the enterprises in other industries and the company, and further adopting different coping strategies;
⑤ industry type labels, mainly refer to the industry classification standard published in 2017 of the country, and the market characteristics and business requirements of the paper packaging industry, are reduced and modified on the basis of the national industry classification standard, and a set of industry classification labels facing the paper packaging industry is newly formulated, mainly comprising the manufacturing industry and the transportation, storage and postal industry, and are displayed by labels of other industries in consideration of the fact that many industries do not have requirements on paper packaging;
in the multi-label system, the transportation distance, the paper package demand, the paper package type and the enterprise marketing relation dimension only comprise first-level labels, the industry category dimension comprises third-level labels and has a hierarchical relation, and the other dimension labels have relevance to the industry category labels; in the above-described label system, if an enterprise matches the "other industries" label in the industry category dimension, then the other labels of the enterprise are all null, i.e., not the enterprise under consideration; in addition to the above, each enterprise will get 7 corresponding multi-element tags; (II) identifying the multi-element label:
at present, many learning algorithms related to multi-element labels exist, and the learning algorithms can be mainly divided into two categories: the method based on problem transformation is abbreviated as follows: the solution idea of Problem Transformation is to convert Problem data to make it suitable for the existing algorithm, and secondly, a method based on algorithm adaptation, abbreviated as: the Algorithm addition is expanded aiming at a specific Algorithm, so that the problem of the multi-element label can be directly processed; the former is mainly a traditional machine learning method, and the latter is mainly a deep learning-based method;
considering that the method based on the traditional machine learning is simple and is suitable for identifying the single-dimensional labels, the method firstly utilizes the method to respectively carry out iterative identification on the four-dimensional labels, and solves the problem of label data shortage; the method comprises the following specific steps: (1) generating a small amount of marking data based on rules, and training by utilizing a KNN model, a decision tree model and a binary classification model; (2) identifying the unmarked data by using the trained three models, putting a marked data set if the identification results are consistent, and handing over to manual operation for correction if the identification results are inconsistent; (3) repeating the first two steps, and iteratively forming new marking data; with enough marking data, the mutual relation among the labels is considered, and finally, an XML-CNN deep learning model is used for identifying all types of labels together, so that the accuracy is improved, and the problem of difficulty in later maintenance of a plurality of independent models is solved;
① generates a small amount of initial data based on rules:
generating label data corresponding to respective labels on the basis of a rule heuristic form on four dimensions of a label system; (1) on the transportation distance label, searching and matching on corresponding enterprise information by utilizing a place name word stock table and a company type word stock on the network, mainly matching data of a comparison rule, and then converting into a corresponding label; (2) on the industrial category label, the invention utilizes the established paper packaging industry classification standard to arrange a simple industrial category mapping word bank, uses the double-array wire tree method to match with the first words of ' the industry and ' the operation range ', and identifies a small number of labels; (3) in the business marketing relationship, throughThe identified industry label and the main business product are used for judging the enterprise relationship, for example, if the industry is paper making and paper product industry, the identified industry label and the main business product can be identified as a 'peer' label; (4) on the basis of the paper packaging demand and the paper packaging type, according to industry knowledge and business experience, the 'affiliated industry', 'registered capital' and 'company type' are used for rule inspiration, for example, if one enterprise is the household appliance industry, the paper packaging type is 'carton', and if the stock system enterprise or the 'registered capital' is more than 5000 ten thousand RMB, the paper packaging demand is 'large'; by the above rule heuristic, an initial tagged data set S is generated0Entering the next link;
② Multi-model identification iteration:
a small amount of marking data S generated in the previous step0In the invention, the traditional common algorithm KNN, decision tree and binary classification method are used for iterative training learning to generate a large amount of labeled data; specifically, three corresponding models are trained on a single label dimension, then unlabeled data are predicted, and if the prediction results of the three models are consistent, the data can be added into a training data set; if the data are inconsistent, the data are manually corrected, then a training set is added, next iteration is carried out, and when the training data set exceeds a certain amount, such as 20 ten thousand, a complete multi-element label recognition algorithm is established by using a depth model method; setting initial data as S0(partition training set and test set), unlabeled dataset D { (x)1,y1),(x2,y2),...,(xn,yn)},xi={xi1,xi2,..,xi6},xiRepresenting each piece of data has 6 corresponding characteristic data vectors, each vector is formed by splicing word vectors trained by word2vec according to rows after the characteristic data text is participled, and xi∈Rh*dWherein h is the length of each feature vector, d is the dimension of a word vector, and generally 100 dimensions are taken; y isiValue of y for the corresponding labeliIs of [ L1,L2,..,Lt]N is the number of samples, and t is the number of labels; converting training samples into individual labels during recognitionSecondly, classifying and identifying;
②.1KNN recognition model:
the KNN model idea is to calculate the distance between every two samples and then judge which known samples the unknown samples are closer to; then, determining the label of the unknown sample by using a voting mode; the loss function adopts a commonly used square loss function, and the distance calculation formula is as follows:
Figure RE-GDA0002396518040000061
generally, an euclidean distance of p-2 is taken for calculation, and when a prediction sample is adjacent to k surrounding samples, the class with the most label categories of the k samples is taken as the label of the prediction sample;
②.2 decision tree:
in decision tree selection, a CART classification tree is selected, and information purity is measured by using a kini coefficient, specifically:
Figure RE-GDA0002396518040000062
wherein p isiThe probability that the sample belongs to the i category is adopted, the process adopts an integration method of random forests, and a CART classification tree method is adopted in consideration of the fact that the step is the fusion of multiple algorithms;
②, class two 3:
the idea of the binary classification method is to respectively establish classifiers according to the number of labels, the classification method can be logistic regression and SVM, the invention selects the SVM method, the SVM refers to the support vector, and the predicted result of each classifier is added during prediction to obtain the final result;
in a data set S0Respectively training the three models, wherein f corresponds to1,f2,f3Adjusting each model to be optimal by adopting an F value evaluation standard; the result of identifying the same unmarked data is r1,r2, r3If r is1=r2=r3No correction is required, and the data here refers to xj,yj(ii) a Otherwise, manually participating in correction; adding newly generated annotation data to S0Iteration is performed in the above manner;
③ XML-CNN Enterprise multi-element tag identification:
when the marked data set meets a certain amount, applying an XML-CNN depth model to train a full version recognition model; the reason is that the incidence relation information between the labels is expressed and learned, so that the overall identification effect of the labels is improved; the XML-CNN model is a variant of the CNN model, the CNN refers to a convolutional neural network, and compared with other deep models such as a bidirectional cyclic neural network and a transformer model, the model has much higher operation efficiency and the best recognition effect; specifically, each information dimension of an enterprise is characterized according to the granularity of words, then convolution and dynamic pooling are carried out, then a full connection layer is added, finally output is carried out in a sigmoid binary loss mode, the probability problem of a multi-element label is converted, and if the probability is larger than a set threshold value, the label is output;
(1)Embedding:
characterizing an enterprise by information dimension as e1:m=[e1,..,em]∈Rm*dWherein m is the total length of the text in the seven dimensional information; wherein the "business scope" and "company profile" have length limitations, and the whole number in the "registered capital" is treated as a word by cutting if the text length exceeds 200; d is the dimension size of the word, typically 100 dimensions;
(2)Convoluation:
ci=gc(vTei:j+h-1) The convolution kernel size v ∈ Rf*dGenerally, f is 2, 3 and 4, different window sizes are represented, N-gram features are extracted, different convolution kernels are used, semantic information of different layers is extracted, and the number of kernels is generally 128; one convolution kernel yields c ═ c1,..,cr]R ═ m-h + 1;
(3)Dynamic Max Pooling:
after convolution, c is divided into p segments (p is 3 according to the present invention because the maximum depth of the label system is 3), then each segment is maximized and finally output, and P (c) [ [ max [ ] is last-mentionedc1:r/p},..,max{cr-r/p:r}];
(4)Fully connected bottleneck layer:
Adding the result after dynamic pooling into a bottleneck-shaped full-connection layer, namely the number of hidden units of the layer is far less than that of labels of an output layer, so that the advantage of improving the fitting capability is realized; f ═ wog(whP) wherein Wh∈Rh×t×pand Wo∈RL×hT is the number of convolution kernels, h is the number of the hidden units on the layer, L is the number of output labels, g is an activation function, and tanh is adopted; the output layer is connected behind the full-connection layer, and the sigmoid function is used for prediction;
(5)Loss function
the loss function used is a two-class loss function, and the expression is:
Figure RE-GDA0002396518040000071
wherein: sigma is the sigmoid function of the signal,
Figure RE-GDA0002396518040000072
when the evaluation is carried out, a DCG @ K and NDCG @ K method in the sequencing field is adopted, wherein K is 7, two additional classification rules are additionally added for limitation, if the parent class of one label is wrong in prediction, the data is wrong in prediction no matter whether the subclass is wrong or not; NDCG represents normalized depreciation cumulative gain, and the label correlation score value rel of each prediction listiAdding, and dividing by the logarithm of the position, which means that the more front label is more important, the NDCG is normalized on the basis of DCG;
Figure RE-GDA0002396518040000081
Figure RE-GDA0002396518040000082
on the basis of the enterprise multi-element label system, the multi-element label design is carried out according to five dimensions of logistics transportation distance, paper packaging demand, paper packaging type, enterprise marketing relation and industry category from the requirement of the paper packaging industry. In the identification of the multi-element label, the following three basic steps are provided: (1) based on a rule heuristic method, generating a small amount of annotation data firstly; (2) a plurality of traditional identification algorithms are used for identification and judgment, certain manual correction is added, new marking data are generated, and the manual marking cost is reduced; (3) and repeating the previous two steps, and after a certain amount of data is accumulated, applying an XML-CNN depth model to perform overall label identification.
The XML-CNN has the advantages of simple method, high calculation efficiency and the like, can solve the problem of fusion of a plurality of traditional models, and reduces complexity. The method is adopted to identify all the tags at one time, the assumption that the tags are independent from each other is avoided, the dependency relationship among the tags is fully considered, and the model identification performance is improved.
Therefore, the enterprise multi-label identification method facing paper packaging and related industries improves the retrieval efficiency of complex information of enterprises.
Drawings
FIG. 1 is a schematic flow diagram of an enterprise multi-tag system of the present invention;
FIG. 2 is a schematic flow chart of the enterprise multi-tag identification process of the present invention;
FIG. 3 is a schematic diagram of a structure diagram of the XML-CNN training under an enterprise sample in the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the attached drawings.
Example 1: as shown in the figure, the enterprise multi-element label identification method for paper packaging and related industries comprises the following steps:
firstly, constructing a multi-element label system:
enterprise multi-element labels refer to enterprise data without direct commercial value, information which can directly meet business requirements is abstracted through cleaning, sorting and mining, and then displayed in a form of a plurality of labels, so that the related requirements of accurate classification, high-efficiency and complex query of enterprises are supported;
the invention utilizes seven characteristic data disclosed by enterprises, specifically comprising company name, registration address, registration capital, company type, affiliated industry, operation range and company profile, to construct an enterprise multi-label system facing paper packaging and related industries;
the label system mainly covers five dimensions: the transportation distance, the paper packaging demand, the paper packaging type, the enterprise marketing relationship and the industry category; the transportation distance comprises 5 labels, the paper packaging demand amount comprises 3 labels, the paper packaging type comprises 5 labels, the enterprise marketing relationship comprises 3 labels, the industry category comprises 198 labels, and the total number of the labels is 214;
the labels in these five dimensions are described in detail below:
① transportation distance label is the summarization of geographical position information between enterprises, specifically takes on the values of 'same district', 'same city', 'same province', 'domestic' and 'foreign', the label is mainly identified from the characteristic data of the enterprise 'registration address', some 'company name' also covers the information related to the transportation distance label;
② paper packaging demand label is an enterprise's summary of paper packaging demand information, specifically taking values of ' large amount ', ' medium amount ', ' small amount ', and the label mainly depends on three characteristic data of ' industry of the enterprise ', ' registration capital ' and ' company type ';
③ paper package type labels, specifically taking the values of 'carton', 'carton' and 'paper bag', indicate which type the paper package required by an enterprise mainly belongs to, and can be judged and identified from 'the industry to which the paper package belongs';
④ enterprise marketing relation labels, specifically taking values of 'same line', 'client', 'supplier', classifying upstream and downstream enterprises of a company, mainly identifying data on two characteristics of 'affiliated industry' and 'operating range' of the enterprise, having the enterprise marketing relation labels, enabling the company in the paper packaging industry to easily know the relations between the enterprises in other industries and the company, and further adopting different coping strategies;
⑤ industry type labels, mainly refer to the industry classification standard published in 2017 of the country, and the market characteristics and business requirements of the paper packaging industry, are reduced and modified on the basis of the national industry classification standard, and a set of industry classification labels facing the paper packaging industry is newly formulated, mainly comprising the manufacturing industry and the transportation, storage and postal industry, and are displayed by labels of other industries in consideration of the fact that many industries do not have requirements on paper packaging;
in the multi-label system, the transportation distance, the paper package demand, the paper package type and the enterprise marketing relation dimension only comprise first-level labels, the industry category dimension comprises third-level labels and has a hierarchical relation, and the other dimension labels have relevance to the industry category labels; in the above-described label system, if an enterprise matches the "other industries" label in the industry category dimension, then the other labels of the enterprise are all null, i.e., not the enterprise under consideration; in addition to the above, each enterprise will get 7 corresponding multi-element tags; (II) identifying the multi-element label:
at present, many learning algorithms related to multi-element labels exist, and the learning algorithms can be mainly divided into two categories: the method based on problem transformation is abbreviated as follows: the solution idea of Problem Transformation is to convert Problem data to make it suitable for the existing algorithm, and secondly, a method based on algorithm adaptation, abbreviated as: the Algorithm addition is expanded aiming at a specific Algorithm, so that the problem of the multi-element label can be directly processed; the former is mainly a traditional machine learning method, and the latter is mainly a deep learning-based method;
considering that the method based on the traditional machine learning is simple and is suitable for identifying the single-dimensional labels, the method firstly utilizes the method to respectively carry out iterative identification on the four-dimensional labels, and solves the problem of label data shortage; the method comprises the following specific steps: (1) generating a small amount of marking data based on rules, and training by utilizing a KNN model, a decision tree model and a binary classification model; (2) identifying the unmarked data by using the trained three models, putting a marked data set if the identification results are consistent, and handing over to manual operation for correction if the identification results are inconsistent; (3) repeating the first two steps, and iteratively forming new marking data; with enough marking data, the mutual relation among the labels is considered, and finally, an XML-CNN deep learning model is used for identifying all types of labels together, so that the accuracy is improved, and the problem of difficulty in later maintenance of a plurality of independent models is solved; the overall identification process is detailed as follows.
TABLE 1 comparison of multiple tag identification methods
Figure RE-GDA0002396518040000111
① generates a small amount of initial data based on rules:
generating label data corresponding to respective labels on the basis of a rule heuristic form on four dimensions of a label system; (1) on the transportation distance label, searching and matching on corresponding enterprise information by utilizing a place name word stock table and a company type word stock on the network, mainly matching data of a comparison rule, and then converting into a corresponding label; (2) on the industrial category label, the invention utilizes the established paper packaging industry classification standard to arrange a simple industrial category mapping word bank, uses the double-array wire tree method to match with the first words of ' the industry and ' the operation range ', and identifies a small number of labels; (3) in the enterprise marketing relationship, the enterprise relationship is judged through the identified industry label and the main operation product, and for example, if the industry is the paper making and paper product industry, the enterprise relationship can be identified as a 'peer' label; (4) on the basis of the paper packaging demand and the paper packaging type, according to industry knowledge and business experience, the paper packaging type is 'carton' if a company is a household appliance industry and is a stock system company or 'registered capital'If the number of RMB is more than 5000 ten thousand, the paper packaging demand is 'large'; by the above rule heuristic, an initial tagged data set S is generated0Entering the next link;
② Multi-model identification iteration:
a small amount of marking data S generated in the previous step0In the invention, the traditional common algorithm KNN, decision tree and binary classification method are used for iterative training learning to generate a large amount of labeled data; specifically, three corresponding models are trained on a single label dimension, then unlabeled data are predicted, and if the prediction results of the three models are consistent, the data can be added into a training data set; if the data are inconsistent, the data are manually corrected, then a training set is added, next iteration is carried out, and when the training data set exceeds a certain amount, such as 20 ten thousand, a complete multi-element label recognition algorithm is established by using a depth model method; setting initial data as S0(partition training set and test set), unlabeled dataset D { (x)1,y1),(x2,y2),...,(xn,yn)},xi={xi1,xi2,..,xi6},xiRepresenting each piece of data has 6 corresponding characteristic data vectors, each vector is formed by splicing word vectors trained by word2vec according to rows after the characteristic data text is participled, and xi∈Rh*dWherein h is the length of each feature vector, d is the dimension of a word vector, and generally 100 dimensions are taken; y isiValue of y for the corresponding labeliIs of [ L1,L2,..,Lt]N is the number of samples, and t is the number of labels; in the identification process, the training samples are converted into two classes of a single label for identification;
②.1KNN recognition model:
the KNN model idea is to calculate the distance between every two samples and then judge which known samples the unknown samples are closer to; then, determining the label of the unknown sample by using a voting mode; the loss function adopts a commonly used square loss function, and the distance calculation formula is as follows:
Figure RE-GDA0002396518040000121
generally, an euclidean distance of p-2 is taken for calculation, and when a prediction sample is adjacent to k surrounding samples, the class with the most label categories of the k samples is taken as the label of the prediction sample;
②.2 decision tree:
in decision tree selection, a CART classification tree is selected, and information purity is measured by using a kini coefficient, specifically:
Figure RE-GDA0002396518040000131
wherein p isiThe probability that the sample belongs to the i category is adopted, the process adopts an integration method of random forests, and a CART classification tree method is adopted in consideration of the fact that the step is the fusion of multiple algorithms;
②, class two 3:
the idea of the binary classification method is to respectively establish classifiers according to the number of labels, the classification method can be logistic regression and SVM, the invention selects the SVM method, the SVM refers to the support vector, and the predicted result of each classifier is added during prediction to obtain the final result;
in a data set S0Respectively training the three models, wherein f corresponds to1,f2,f3Adjusting each model to be optimal by adopting an F value evaluation standard; the result of identifying the same unmarked data is r1,r2, r3If r is1=r2=r3No correction is required, and the data here refers to xj,yj(ii) a Otherwise, manually participating in correction; adding newly generated annotation data to S0Iteration is performed in the above manner;
③ XML-CNN Enterprise multi-element tag identification:
when the marked data set meets a certain amount, applying an XML-CNN depth model to train a full version recognition model; the reason is that the incidence relation information between the labels is expressed and learned, so that the overall identification effect of the labels is improved; the XML-CNN model is a variant of the CNN model, the CNN refers to a convolutional neural network, and compared with other deep models such as a bidirectional cyclic neural network and a transformer model, the model has much higher operation efficiency and the best recognition effect; specifically, each information dimension of an enterprise is characterized according to the granularity of words, then convolution and dynamic pooling are carried out, then a full connection layer is added, finally output is carried out in a sigmoid binary loss mode, the probability problem of a multi-element label is converted, and if the probability is larger than a set threshold value, the label is output;
(1)Embedding:
characterizing an enterprise by information dimension as e1:m=[e1,..,em]∈Rm*dWherein m is the total length of the text in the seven dimensional information; wherein the "business scope" and "company profile" have length limitations, and the whole number in the "registered capital" is treated as a word by cutting if the text length exceeds 200; d is the dimension size of the word, typically 100 dimensions;
(2)Convoluation:
ci=gc(vTei:j+h-1) The convolution kernel size v ∈ Rf*dGenerally, f is 2, 3 and 4, different window sizes are represented, N-gram features are extracted, different convolution kernels are used, semantic information of different layers is extracted, and the number of kernels is generally 128; one convolution kernel yields c ═ c1,..,cr]R ═ m-h + 1;
(3)Dynamic Max Pooling:
after convolution, c is divided into p sections (p is 3 in the invention because the maximum depth of the label system is 3), then each section takes the maximum value, and finally output, P (c) ([ max { c) }1:r/p},..,max{cr-r/p:r}];
(4)Fully connected bottleneck layer:
Adding the result after dynamic pooling into a bottleneck-shaped full-connection layer, namely the number of hidden units of the layer is far less than that of labels of an output layer, so that the advantage of improving the fitting capability is realized; f ═ wog(whP) wherein Wh∈Rh×t×pand Wo∈RL×hT is the number of convolution kernels, h is the number of the hidden units on the layer, L is the number of output labels, g is an activation function, and tanh is adopted; the output layer is connected behind the full-connection layer, and the sigmoid function is used for prediction;
(5)Loss function
the loss function used is a two-class loss function, and the expression is:
Figure RE-GDA0002396518040000141
wherein: sigma is the sigmoid function of the signal,
Figure RE-GDA0002396518040000142
when the evaluation is carried out, a DCG @ K and NDCG @ K method in the sequencing field is adopted, wherein K is 7, two additional classification rules are additionally added for limitation, if the parent class of one label is wrong in prediction, the data is wrong in prediction no matter whether the subclass is wrong or not; NDCG represents normalized depreciation cumulative gain, and the label correlation score value rel of each prediction listiAdding, and dividing by the logarithm of the position, which means that the more front label is more important, the NDCG is normalized on the basis of DCG;
Figure RE-GDA0002396518040000143
Figure RE-GDA0002396518040000151

Claims (1)

1. a paper package and related industry oriented enterprise multi-element label identification method is characterized by comprising the following steps:
firstly, constructing a multi-element label system:
enterprise multi-element labels refer to enterprise data without direct commercial value, information which can directly meet business requirements is abstracted through cleaning, sorting and mining, and then displayed in a form of a plurality of labels, so that the related requirements of accurate classification, high-efficiency and complex query of enterprises are supported;
the invention utilizes seven characteristic data disclosed by enterprises, specifically comprising company name, registration address, registration capital, company type, affiliated industry, operation range and company profile, to construct an enterprise multi-label system facing paper packaging and related industries;
the label system mainly covers five dimensions: the transportation distance, the paper packaging demand, the paper packaging type, the enterprise marketing relationship and the industry category; the transportation distance comprises 5 labels, the paper packaging demand amount comprises 3 labels, the paper packaging type comprises 5 labels, the enterprise marketing relationship comprises 3 labels, the industry category comprises 198 labels, and the total number of the labels is 214;
the labels in these five dimensions are described in detail below:
① transportation distance label is the summarization of geographical position information between enterprises, specifically takes on the values of 'same district', 'same city', 'same province', 'domestic' and 'foreign', the label is mainly identified from the characteristic data of the enterprise 'registration address', some 'company name' also covers the information related to the transportation distance label;
② paper packaging demand label is an enterprise's summary of paper packaging demand information, specifically taking values of ' large amount ', ' medium amount ', ' small amount ', and the label mainly depends on three characteristic data of ' industry of the enterprise ', ' registration capital ' and ' company type ';
③ paper package type labels, specifically taking the values of 'carton', 'carton' and 'paper bag', indicate which type the paper package required by an enterprise mainly belongs to, and can be judged and identified from 'the industry to which the paper package belongs';
④ enterprise marketing relation labels, specifically taking values of 'same line', 'client', 'supplier', classifying upstream and downstream enterprises of a company, mainly identifying data on two characteristics of 'affiliated industry' and 'operating range' of the enterprise, having the enterprise marketing relation labels, enabling the company in the paper packaging industry to easily know the relations between the enterprises in other industries and the company, and further adopting different coping strategies;
⑤ industry type labels, mainly refer to the industry classification standard published in 2017 of the country, and the market characteristics and business requirements of the paper packaging industry, are reduced and modified on the basis of the national industry classification standard, and a set of industry classification labels facing the paper packaging industry is newly formulated, mainly comprising the manufacturing industry and the transportation, storage and postal industry, and are displayed by labels of other industries in consideration of the fact that many industries do not have requirements on paper packaging;
in the multi-label system, the transportation distance, the paper package demand, the paper package type and the enterprise marketing relation dimension only comprise first-level labels, the industry category dimension comprises third-level labels and has a hierarchical relation, and the other dimension labels have relevance to the industry category labels; in the above-described label system, if an enterprise matches the "other industries" label in the industry category dimension, then the other labels of the enterprise are all null, i.e., not the enterprise under consideration; in addition to the above, each enterprise will get 7 corresponding multi-element tags;
(II) identifying the multi-element label:
at present, many learning algorithms related to multi-element labels exist, and the learning algorithms can be mainly divided into two categories: the method based on problem transformation is abbreviated as follows: the solution idea of Problem Transformation is to convert Problem data to make it suitable for the existing algorithm, and secondly, a method based on algorithm adaptation, abbreviated as: the Algorithm addition is expanded aiming at a specific Algorithm, so that the problem of the multi-element label can be directly processed; the former is mainly a traditional machine learning method, and the latter is mainly a deep learning-based method;
considering that the method based on the traditional machine learning is simple and is suitable for identifying the single-dimensional labels, the method firstly utilizes the method to respectively carry out iterative identification on the four-dimensional labels, and solves the problem of label data shortage; the method comprises the following specific steps: (1) generating a small amount of marking data based on rules, and training by utilizing a KNN model, a decision tree model and a binary classification model; (2) identifying the unmarked data by using the trained three models, putting a marked data set if the identification results are consistent, and handing over to manual operation for correction if the identification results are inconsistent; (3) repeating the first two steps, and iteratively forming new marking data; with enough marking data, the mutual relation among the labels is considered, and finally, an XML-CNN deep learning model is used for identifying all types of labels together, so that the accuracy is improved, and the problem of difficulty in later maintenance of a plurality of independent models is solved;
① generates a small amount of initial data based on rules:
generating label data corresponding to respective labels on the basis of a rule heuristic form on four dimensions of a label system; (1) on the transportation distance label, searching and matching on corresponding enterprise information by utilizing a place name word stock table and a company type word stock on the network, mainly matching data of a comparison rule, and then converting into a corresponding label; (2) on the industrial category label, the invention utilizes the established paper packaging industry classification standard to arrange a simple industrial category mapping word bank, uses the double-array wire tree method to match with the first words of ' the industry and ' the operation range ', and identifies a small number of labels; (3) in the enterprise marketing relationship, the enterprise relationship is judged through the identified industry label and the main operation product, and for example, if the industry is the paper making and paper product industry, the enterprise relationship can be identified as a 'peer' label; (4) on the basis of the paper packaging demand and the paper packaging type, according to industry knowledge and business experience, the 'affiliated industry', 'registered capital' and 'company type' are used for rule inspiration, for example, if one enterprise is the household appliance industry, the paper packaging type is 'carton', and if the stock system enterprise or the 'registered capital' is more than 5000 ten thousand RMB, the paper packaging demand is 'large'; by the above rule heuristic, an initial tagged data set S is generated0Entering the next link;
② Multi-model identification iteration:
at the upper partA small amount of marking data S generated in one step0In the invention, the traditional common algorithm KNN, decision tree and binary classification method are used for iterative training learning to generate a large amount of labeled data; specifically, three corresponding models are trained on a single label dimension, then unlabeled data are predicted, and if the prediction results of the three models are consistent, the data can be added into a training data set; if the data are inconsistent, the data are manually corrected, then a training set is added, next iteration is carried out, and when the training data set exceeds a certain amount, such as 20 ten thousand, a complete multi-element label recognition algorithm is established by using a depth model method; setting initial data as S0(partition training set and test set), unlabeled dataset D { (x)1,y1),(x2,y2),...,(xn,yn)},xi={xi1,xi2,..,xi6},xiRepresenting each piece of data has 6 corresponding characteristic data vectors, each vector is formed by splicing word vectors trained by word2vec according to rows after the characteristic data text is participled, and xi∈Rh*dWherein h is the length of each feature vector, d is the dimension of a word vector, and generally 100 dimensions are taken; y isiValue of y for the corresponding labeliIs of [ L1,L2,..,Lt]N is the number of samples, and t is the number of labels; in the identification process, the training samples are converted into two classes of a single label for identification;
②.1KNN recognition model:
the KNN model idea is to calculate the distance between every two samples and then judge which known samples the unknown samples are closer to; then, determining the label of the unknown sample by using a voting mode; the loss function adopts a commonly used square loss function, and the distance calculation formula is as follows:
Figure RE-FDA0002396518030000041
generally, an euclidean distance of p-2 is taken for calculation, and when a prediction sample is close to k surrounding samples, the class with the most label categories of the k samples is taken as the label of the prediction sample;
②.2 decision tree:
in decision tree selection, a CART classification tree is selected, and information purity is measured by using a kini coefficient, specifically:
Figure RE-FDA0002396518030000051
wherein p isiThe probability that the sample belongs to the i category is adopted, the process adopts an integration method of random forests, and a CART classification tree method is adopted in consideration of the fact that the step is the fusion of multiple algorithms;
②, class two 3:
the idea of the binary classification method is to respectively establish classifiers according to the number of labels, the classification method can be logistic regression and SVM, the invention selects the SVM method, the SVM refers to the support vector, and the predicted result of each classifier is added during prediction to obtain the final result;
in a data set S0Respectively training the three models, wherein f corresponds to1,f2,f3Adjusting each model to be optimal by adopting an F value evaluation standard; the result of identifying the same unmarked data is r1,r2,r3If r is1=r2=r3No correction is required, and the data here refers to xj,yj(ii) a Otherwise, manually participating in correction; adding newly generated annotation data to S0Iteration is performed in the above manner;
③ XML-CNN Enterprise multi-element tag identification:
when the marked data set meets a certain amount, applying an XML-CNN depth model to train a full version recognition model; the reason is that the incidence relation information between the labels is expressed and learned, so that the overall identification effect of the labels is improved; the XML-CNN model is a variant of the CNN model, the CNN refers to a convolutional neural network, and compared with other deep models such as a bidirectional cyclic neural network and a transformer model, the model has much higher operation efficiency and the best recognition effect; specifically, each information dimension of an enterprise is characterized according to the granularity of words, then convolution and dynamic pooling are carried out, then a full connection layer is added, finally output is carried out in a sigmoid binary loss mode, the probability problem of a multi-element label is converted, and if the probability is larger than a set threshold value, the label is output;
(1)Embedding:
characterizing an enterprise by information dimension as e1:m=[e1,..,em]∈Rm*dWherein m is the total length of the text in the seven dimensional information; wherein the "business scope" and "company profile" have length limitations, and the whole number in the "registered capital" is treated as a word by cutting if the text length exceeds 200; d is the dimension size of the word, typically 100 dimensions;
(2)Convoluation:
ci=gc(vTei:j+h-1) The convolution kernel size v ∈ Rf*dGenerally, f is 2, 3 and 4, different window sizes are represented, N-gram features are extracted, different convolution kernels are used, semantic information of different layers is extracted, and the number of kernels is generally 128; one convolution kernel yields c ═ c1,..,cr]R ═ m-h + 1;
(3)Dynamic Max Pooling:
after convolution, c is divided into p sections (p is 3 in the invention because the maximum depth of the label system is 3), then each section takes the maximum value, and finally output, P (c) ([ max { c) }1:r/p},..,max{cr-r/p:r}];
(4)Fully connected bottleneck layer:
Adding the result after dynamic pooling into a bottleneck-shaped full-connection layer, namely the number of hidden units of the layer is far less than that of labels of an output layer, so that the advantage of improving the fitting capability is realized; f ═ wog(whP) wherein Wh∈Rh×t×pandWo∈RL×hT is the number of convolution kernels, h is the number of the hidden units on the layer, L is the number of output labels, g is an activation function, and tanh is adopted; the output layer is connected behind the full-connection layer, and the sigmoid function is used for prediction;
(5)Loss function
the loss function used is a two-class loss function, and the expression is:
Figure RE-FDA0002396518030000061
wherein: sigma is the sigmoid function of the signal,
Figure RE-FDA0002396518030000062
when the evaluation is carried out, a DCG @ K and NDCG @ K method in the sequencing field is adopted, wherein K is 7, two additional classification rules are additionally added for limitation, if the parent class of one label is wrong in prediction, the data is wrong in prediction no matter whether the subclass is wrong or not; NDCG represents normalized depreciation cumulative gain, and the label correlation score value rel of each prediction listiAdding, and dividing by the logarithm of the position, which means that the more front label is more important, the NDCG is normalized on the basis of DCG;
Figure RE-FDA0002396518030000071
Figure RE-FDA0002396518030000072
CN201911335749.0A 2019-12-23 2019-12-23 Enterprise multi-element label identification method for paper package and related industries thereof Withdrawn CN111191001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911335749.0A CN111191001A (en) 2019-12-23 2019-12-23 Enterprise multi-element label identification method for paper package and related industries thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911335749.0A CN111191001A (en) 2019-12-23 2019-12-23 Enterprise multi-element label identification method for paper package and related industries thereof

Publications (1)

Publication Number Publication Date
CN111191001A true CN111191001A (en) 2020-05-22

Family

ID=70709287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911335749.0A Withdrawn CN111191001A (en) 2019-12-23 2019-12-23 Enterprise multi-element label identification method for paper package and related industries thereof

Country Status (1)

Country Link
CN (1) CN111191001A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680895A (en) * 2020-05-26 2020-09-18 中国平安财产保险股份有限公司 Data automatic labeling method and device, computer equipment and storage medium
CN112580332A (en) * 2020-11-19 2021-03-30 淮阴工学院 Enterprise portrait method based on label layering and deepening modeling
CN113159709A (en) * 2021-03-24 2021-07-23 深圳闪回科技有限公司 Automatic label system and system
CN113298352A (en) * 2021-04-28 2021-08-24 北京网核精策科技管理中心(有限合伙) Enterprise industry information processing method and device, electronic equipment and readable storage medium
CN113378907A (en) * 2021-06-04 2021-09-10 南京大学 Automatic software traceability recovery method for enhancing data preprocessing process
US20230162020A1 (en) * 2021-11-23 2023-05-25 Microsoft Technology Licensing, Llc Multi-Task Sequence Tagging with Injection of Supplemental Information
CN118984254A (en) * 2024-10-22 2024-11-19 江苏康缘药业股份有限公司 A standardized module and method for connecting traditional Chinese medicine enterprise nodes with secondary nodes

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045911A (en) * 2015-08-12 2015-11-11 北京搜狗科技发展有限公司 Label generating method for user to mark and label generating equipment for user to mark
CN106777335A (en) * 2017-01-13 2017-05-31 深圳爱拼信息科技有限公司 It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model
CN107133293A (en) * 2017-04-25 2017-09-05 中国科学院计算技术研究所 A kind of ML kNN improved methods and system classified suitable for multi-tag
CN107944480A (en) * 2017-11-16 2018-04-20 广州探迹科技有限公司 A kind of enterprises ' industry sorting technique
CN108536800A (en) * 2018-04-03 2018-09-14 有米科技股份有限公司 File classification method, system, computer equipment and storage medium
CN109582792A (en) * 2018-11-16 2019-04-05 北京奇虎科技有限公司 A kind of method and device of text classification
CN109783818A (en) * 2019-01-17 2019-05-21 上海三零卫士信息安全有限公司 A kind of enterprises ' industry multi-tag classification method
CN110532542A (en) * 2019-07-15 2019-12-03 西安交通大学 It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045911A (en) * 2015-08-12 2015-11-11 北京搜狗科技发展有限公司 Label generating method for user to mark and label generating equipment for user to mark
CN106777335A (en) * 2017-01-13 2017-05-31 深圳爱拼信息科技有限公司 It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model
CN107133293A (en) * 2017-04-25 2017-09-05 中国科学院计算技术研究所 A kind of ML kNN improved methods and system classified suitable for multi-tag
CN107944480A (en) * 2017-11-16 2018-04-20 广州探迹科技有限公司 A kind of enterprises ' industry sorting technique
CN108536800A (en) * 2018-04-03 2018-09-14 有米科技股份有限公司 File classification method, system, computer equipment and storage medium
CN109582792A (en) * 2018-11-16 2019-04-05 北京奇虎科技有限公司 A kind of method and device of text classification
CN109783818A (en) * 2019-01-17 2019-05-21 上海三零卫士信息安全有限公司 A kind of enterprises ' industry multi-tag classification method
CN110532542A (en) * 2019-07-15 2019-12-03 西安交通大学 It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JINGZHOU LIU等: "Deep Learning for Extreme Multi-label Text Classification", 《SIGIR "17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680895A (en) * 2020-05-26 2020-09-18 中国平安财产保险股份有限公司 Data automatic labeling method and device, computer equipment and storage medium
CN112580332A (en) * 2020-11-19 2021-03-30 淮阴工学院 Enterprise portrait method based on label layering and deepening modeling
CN112580332B (en) * 2020-11-19 2022-07-12 淮阴工学院 An enterprise portrait method based on label layered and deep modeling
CN113159709A (en) * 2021-03-24 2021-07-23 深圳闪回科技有限公司 Automatic label system and system
CN113298352A (en) * 2021-04-28 2021-08-24 北京网核精策科技管理中心(有限合伙) Enterprise industry information processing method and device, electronic equipment and readable storage medium
CN113378907A (en) * 2021-06-04 2021-09-10 南京大学 Automatic software traceability recovery method for enhancing data preprocessing process
CN113378907B (en) * 2021-06-04 2024-01-09 南京大学 Automated software traceability recovery method for enhancing data preprocessing process
US20230162020A1 (en) * 2021-11-23 2023-05-25 Microsoft Technology Licensing, Llc Multi-Task Sequence Tagging with Injection of Supplemental Information
US12353998B2 (en) * 2021-11-23 2025-07-08 Microsoft Technology Licensing, Llc Multi-task sequence tagging with injection of supplemental information
CN118984254A (en) * 2024-10-22 2024-11-19 江苏康缘药业股份有限公司 A standardized module and method for connecting traditional Chinese medicine enterprise nodes with secondary nodes
CN118984254B (en) * 2024-10-22 2025-03-18 江苏康缘药业股份有限公司 A standardized module and method for connecting traditional Chinese medicine enterprise nodes with secondary nodes

Similar Documents

Publication Publication Date Title
CN111191001A (en) Enterprise multi-element label identification method for paper package and related industries thereof
CN116541911B (en) Packaging design system based on artificial intelligence
CN100464332C (en) Picture inquiry method and system
CN106407352A (en) Traffic image retrieval method based on depth learning
CN103810299A (en) Image retrieval method on basis of multi-feature fusion
CN106599037A (en) Recommendation method based on label semantic normalization
CN114723994B (en) Hyperspectral image classification method based on dual classifier antagonism enhancement network
CN107391565B (en) Matching method of cross-language hierarchical classification system based on topic model
CN106778834A (en) A kind of AP based on distance measure study clusters image labeling method
CN113378913A (en) Semi-supervised node classification method based on self-supervised learning
CN114817454A (en) NLP knowledge graph construction method combining information content and BERT-BilSTM-CRF
CN107169061A (en) A kind of text multi-tag sorting technique for merging double information sources
CN114818963A (en) Small sample detection algorithm based on cross-image feature fusion
CN112465226B (en) User behavior prediction method based on feature interaction and graph neural network
CN117173702A (en) Multi-view multi-label learning method based on deep feature map fusion
CN113076490A (en) Case-related microblog object-level emotion classification method based on mixed node graph
CN116934531A (en) An intelligent management method and system for wine information based on data analysis
CN108876643A (en) It is a kind of social activity plan exhibition network on acquire(Pin)Multimodal presentation method
CN102663445A (en) Image understanding system based on layered temporal memory algorithm and image understanding method thereof
CN109697257A (en) It is a kind of based on the network information retrieval method presorted with feature learning anti-noise
CN111339303B (en) Text intention induction method and device based on clustering and automatic abstracting
CN116823321B (en) Method and system for analyzing economic management data of electric business
CN113254688A (en) Trademark retrieval method based on deep hash
CN119807335A (en) A method and system for generating customer service information based on preset fields
CN117493962A (en) Method and device for classifying bulk commodity events by fusing event attributes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200522

WW01 Invention patent application withdrawn after publication