CN111191001A - Enterprise multi-element label identification method for paper package and related industries thereof - Google Patents
Enterprise multi-element label identification method for paper package and related industries thereof Download PDFInfo
- Publication number
- CN111191001A CN111191001A CN201911335749.0A CN201911335749A CN111191001A CN 111191001 A CN111191001 A CN 111191001A CN 201911335749 A CN201911335749 A CN 201911335749A CN 111191001 A CN111191001 A CN 111191001A
- Authority
- CN
- China
- Prior art keywords
- labels
- label
- enterprise
- data
- industry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000004806 packaging method and process Methods 0.000 claims abstract description 59
- 238000013527 convolutional neural network Methods 0.000 claims description 29
- 238000004422 calculation algorithm Methods 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 24
- 239000013598 vector Substances 0.000 claims description 18
- 238000003066 decision tree Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 11
- 238000012937 correction Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000008901 benefit Effects 0.000 claims description 4
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 230000010485 coping Effects 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 3
- 238000013136 deep learning model Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000011144 upstream manufacturing Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 241001122767 Theaceae Species 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an enterprise multi-element label identification method, in particular to an enterprise multi-element label identification method for paper packaging and related industries. The invention provides a hierarchical iterative identification framework, which can construct a required large data set based on a small amount of labeled data derivative models, and finally applies a lightweight XML-CNN depth model method to improve the overall label identification efficiency. A multi-element label identification method for enterprises in paper packaging and related industries improves the retrieval efficiency of complex information of the enterprises.
Description
Technical Field
The invention relates to an enterprise multi-element label identification method, in particular to an enterprise multi-element label identification method for paper packaging and related industries.
Background
The enterprise multi-element label means that one enterprise entity comprises a plurality of attribute labels, and one label is a high-level abstract summary of certain dimension information of the enterprise. Although the enterprise multi-element label can be applied to a plurality of fields such as accurate classification of enterprises, complex information retrieval, effective label recommendation and the like[1,2]However, the current application status is not mature, and the main reasons are as follows: firstly, the existing multi-element label system is single, and the requirements of the personalized industry in the vertical field are difficult to meet; second, the multi-tag identification technique requires a large amount of labeling data.
Currently, the multi-element label system of enterprises mainly focuses on the dimensions of industry categories, major-business products and the like, such as 'food manufacturing', 'cultural entertainment', 'milk tea' and the like. In the paper packaging industry, enterprises pay more attention to deep dimensional information such as logistics transportation distance, enterprise marketing relation and enterprise paper packaging demand, and the existing enterprise multi-element label system cannot meet deep requirements in the industry. Therefore, the invention provides the following ideas that concept labels such as 'same district', 'same city', 'same province' and the like are designed on the logistics transportation distance dimension, and concept labels such as 'customer', 'same bank', 'supplier' and the like are designed on the enterprise marketing relationship. The label system is convenient for marketing personnel in the paper packaging industry to quickly know the distance between the target enterprise and the self enterprise and know the possible relation between the target enterprise and the self enterprise.
In the task of identifying the enterprise multi-element labels, because a deep learning method needs a large amount of labeling data, and in practical application, the labeling data of the enterprise multi-element labels are extremely deficient, manual labeling of the data needs to consume manpower, and a large amount of labor cost is brought. Based on the method, a method of iterative identification and comparison of a plurality of traditional models is adopted to generate a large amount of high-quality marking data, and the manual marking cost is reduced. And then, a lightweight XML-CNN depth model method is applied to improve the overall tag identification efficiency. The invention effectively combines the two steps and provides a hierarchical iterative identification framework based on deep learning.
Patent similar to the present invention 'a multi-label classification method for enterprise industry' (CN109783818A)[3]The invention uses a double-layer recurrent neural network to identify the industry class labels of enterprises, and does not carry out deep label identification facing to the individual requirements of specific industries. In addition, compared with other depth models, the double-layer cyclic neural network model is complex and low in operation and deployment efficiency. Aiming at the problem, the invention applies a lightweight XML-CNN depth model[4]The method is used for identifying the enterprise multi-element labels, based on a Convolutional Neural Network (CNN), a Dynamic Pooling (Dynamic Max Pooling) and Bottleneck type Hidden Layer (high bottle neck Layer) strategy is used, and the overall identification effect is superior to that of other depth models.
Disclosure of Invention
The invention mainly aims to solve the defects in the prior art and solve the problem that a large amount of label data needed by deep learning is often difficult to meet in an actual application scene, and provides a hierarchical iterative identification framework which can construct a needed large data set based on a small amount of label data derivative models and finally apply a lightweight XML-CNN deep model method to improve the overall label identification efficiency.
The technical problem of the invention is mainly solved by the following technical scheme:
a paper package and related industry oriented enterprise multi-element label identification method comprises the following steps:
firstly, constructing a multi-element label system:
enterprise multi-element labels refer to enterprise data without direct commercial value, information which can directly meet business requirements is abstracted through cleaning, sorting and mining, and then displayed in a form of a plurality of labels, so that the related requirements of accurate classification, high-efficiency and complex query of enterprises are supported;
the invention utilizes seven characteristic data disclosed by enterprises, specifically comprising company name, registration address, registration capital, company type, affiliated industry, operation range and company profile, to construct an enterprise multi-label system facing paper packaging and related industries;
the label system mainly covers five dimensions: the transportation distance, the paper packaging demand, the paper packaging type, the enterprise marketing relationship and the industry category; the transportation distance comprises 5 labels, the paper packaging demand amount comprises 3 labels, the paper packaging type comprises 5 labels, the enterprise marketing relationship comprises 3 labels, the industry category comprises 198 labels, and the total number of the labels is 214;
the labels in these five dimensions are described in detail below:
① transportation distance label is the summarization of geographical position information between enterprises, specifically takes on the values of 'same district', 'same city', 'same province', 'domestic' and 'foreign', the label is mainly identified from the characteristic data of the enterprise 'registration address', some 'company name' also covers the information related to the transportation distance label;
② paper packaging demand label is an enterprise's summary of paper packaging demand information, specifically taking values of ' large amount ', ' medium amount ', ' small amount ', and the label mainly depends on three characteristic data of ' industry of the enterprise ', ' registration capital ' and ' company type ';
③ paper package type labels, specifically taking the values of 'carton', 'carton' and 'paper bag', indicate which type the paper package required by an enterprise mainly belongs to, and can be judged and identified from 'the industry to which the paper package belongs';
④ enterprise marketing relation labels, specifically taking values of 'same line', 'client', 'supplier', classifying upstream and downstream enterprises of a company, mainly identifying data on two characteristics of 'affiliated industry' and 'operating range' of the enterprise, having the enterprise marketing relation labels, enabling the company in the paper packaging industry to easily know the relations between the enterprises in other industries and the company, and further adopting different coping strategies;
⑤ industry type labels, mainly refer to the industry classification standard published in 2017 of the country, and the market characteristics and business requirements of the paper packaging industry, are reduced and modified on the basis of the national industry classification standard, and a set of industry classification labels facing the paper packaging industry is newly formulated, mainly comprising the manufacturing industry and the transportation, storage and postal industry, and are displayed by labels of other industries in consideration of the fact that many industries do not have requirements on paper packaging;
in the multi-label system, the transportation distance, the paper package demand, the paper package type and the enterprise marketing relation dimension only comprise first-level labels, the industry category dimension comprises third-level labels and has a hierarchical relation, and the other dimension labels have relevance to the industry category labels; in the above-described label system, if an enterprise matches the "other industries" label in the industry category dimension, then the other labels of the enterprise are all null, i.e., not the enterprise under consideration; in addition to the above, each enterprise will get 7 corresponding multi-element tags; (II) identifying the multi-element label:
at present, many learning algorithms related to multi-element labels exist, and the learning algorithms can be mainly divided into two categories: the method based on problem transformation is abbreviated as follows: the solution idea of Problem Transformation is to convert Problem data to make it suitable for the existing algorithm, and secondly, a method based on algorithm adaptation, abbreviated as: the Algorithm addition is expanded aiming at a specific Algorithm, so that the problem of the multi-element label can be directly processed; the former is mainly a traditional machine learning method, and the latter is mainly a deep learning-based method;
considering that the method based on the traditional machine learning is simple and is suitable for identifying the single-dimensional labels, the method firstly utilizes the method to respectively carry out iterative identification on the four-dimensional labels, and solves the problem of label data shortage; the method comprises the following specific steps: (1) generating a small amount of marking data based on rules, and training by utilizing a KNN model, a decision tree model and a binary classification model; (2) identifying the unmarked data by using the trained three models, putting a marked data set if the identification results are consistent, and handing over to manual operation for correction if the identification results are inconsistent; (3) repeating the first two steps, and iteratively forming new marking data; with enough marking data, the mutual relation among the labels is considered, and finally, an XML-CNN deep learning model is used for identifying all types of labels together, so that the accuracy is improved, and the problem of difficulty in later maintenance of a plurality of independent models is solved;
① generates a small amount of initial data based on rules:
generating label data corresponding to respective labels on the basis of a rule heuristic form on four dimensions of a label system; (1) on the transportation distance label, searching and matching on corresponding enterprise information by utilizing a place name word stock table and a company type word stock on the network, mainly matching data of a comparison rule, and then converting into a corresponding label; (2) on the industrial category label, the invention utilizes the established paper packaging industry classification standard to arrange a simple industrial category mapping word bank, uses the double-array wire tree method to match with the first words of ' the industry and ' the operation range ', and identifies a small number of labels; (3) in the business marketing relationship, throughThe identified industry label and the main business product are used for judging the enterprise relationship, for example, if the industry is paper making and paper product industry, the identified industry label and the main business product can be identified as a 'peer' label; (4) on the basis of the paper packaging demand and the paper packaging type, according to industry knowledge and business experience, the 'affiliated industry', 'registered capital' and 'company type' are used for rule inspiration, for example, if one enterprise is the household appliance industry, the paper packaging type is 'carton', and if the stock system enterprise or the 'registered capital' is more than 5000 ten thousand RMB, the paper packaging demand is 'large'; by the above rule heuristic, an initial tagged data set S is generated0Entering the next link;
② Multi-model identification iteration:
a small amount of marking data S generated in the previous step0In the invention, the traditional common algorithm KNN, decision tree and binary classification method are used for iterative training learning to generate a large amount of labeled data; specifically, three corresponding models are trained on a single label dimension, then unlabeled data are predicted, and if the prediction results of the three models are consistent, the data can be added into a training data set; if the data are inconsistent, the data are manually corrected, then a training set is added, next iteration is carried out, and when the training data set exceeds a certain amount, such as 20 ten thousand, a complete multi-element label recognition algorithm is established by using a depth model method; setting initial data as S0(partition training set and test set), unlabeled dataset D { (x)1,y1),(x2,y2),...,(xn,yn)},xi={xi1,xi2,..,xi6},xiRepresenting each piece of data has 6 corresponding characteristic data vectors, each vector is formed by splicing word vectors trained by word2vec according to rows after the characteristic data text is participled, and xi∈Rh*dWherein h is the length of each feature vector, d is the dimension of a word vector, and generally 100 dimensions are taken; y isiValue of y for the corresponding labeliIs of [ L1,L2,..,Lt]N is the number of samples, and t is the number of labels; converting training samples into individual labels during recognitionSecondly, classifying and identifying;
②.1KNN recognition model:
the KNN model idea is to calculate the distance between every two samples and then judge which known samples the unknown samples are closer to; then, determining the label of the unknown sample by using a voting mode; the loss function adopts a commonly used square loss function, and the distance calculation formula is as follows:
generally, an euclidean distance of p-2 is taken for calculation, and when a prediction sample is adjacent to k surrounding samples, the class with the most label categories of the k samples is taken as the label of the prediction sample;
②.2 decision tree:
in decision tree selection, a CART classification tree is selected, and information purity is measured by using a kini coefficient, specifically:
wherein p isiThe probability that the sample belongs to the i category is adopted, the process adopts an integration method of random forests, and a CART classification tree method is adopted in consideration of the fact that the step is the fusion of multiple algorithms;
②, class two 3:
the idea of the binary classification method is to respectively establish classifiers according to the number of labels, the classification method can be logistic regression and SVM, the invention selects the SVM method, the SVM refers to the support vector, and the predicted result of each classifier is added during prediction to obtain the final result;
in a data set S0Respectively training the three models, wherein f corresponds to1,f2,f3Adjusting each model to be optimal by adopting an F value evaluation standard; the result of identifying the same unmarked data is r1,r2, r3If r is1=r2=r3No correction is required, and the data here refers to xj,yj(ii) a Otherwise, manually participating in correction; adding newly generated annotation data to S0Iteration is performed in the above manner;
③ XML-CNN Enterprise multi-element tag identification:
when the marked data set meets a certain amount, applying an XML-CNN depth model to train a full version recognition model; the reason is that the incidence relation information between the labels is expressed and learned, so that the overall identification effect of the labels is improved; the XML-CNN model is a variant of the CNN model, the CNN refers to a convolutional neural network, and compared with other deep models such as a bidirectional cyclic neural network and a transformer model, the model has much higher operation efficiency and the best recognition effect; specifically, each information dimension of an enterprise is characterized according to the granularity of words, then convolution and dynamic pooling are carried out, then a full connection layer is added, finally output is carried out in a sigmoid binary loss mode, the probability problem of a multi-element label is converted, and if the probability is larger than a set threshold value, the label is output;
(1)Embedding:
characterizing an enterprise by information dimension as e1:m=[e1,..,em]∈Rm*dWherein m is the total length of the text in the seven dimensional information; wherein the "business scope" and "company profile" have length limitations, and the whole number in the "registered capital" is treated as a word by cutting if the text length exceeds 200; d is the dimension size of the word, typically 100 dimensions;
(2)Convoluation:
ci=gc(vTei:j+h-1) The convolution kernel size v ∈ Rf*dGenerally, f is 2, 3 and 4, different window sizes are represented, N-gram features are extracted, different convolution kernels are used, semantic information of different layers is extracted, and the number of kernels is generally 128; one convolution kernel yields c ═ c1,..,cr]R ═ m-h + 1;
(3)Dynamic Max Pooling:
after convolution, c is divided into p segments (p is 3 according to the present invention because the maximum depth of the label system is 3), then each segment is maximized and finally output, and P (c) [ [ max [ ] is last-mentionedc1:r/p},..,max{cr-r/p:r}];
(4)Fully connected bottleneck layer:
Adding the result after dynamic pooling into a bottleneck-shaped full-connection layer, namely the number of hidden units of the layer is far less than that of labels of an output layer, so that the advantage of improving the fitting capability is realized; f ═ wog(whP) wherein Wh∈Rh×t×pand Wo∈RL×hT is the number of convolution kernels, h is the number of the hidden units on the layer, L is the number of output labels, g is an activation function, and tanh is adopted; the output layer is connected behind the full-connection layer, and the sigmoid function is used for prediction;
(5)Loss function
the loss function used is a two-class loss function, and the expression is:
when the evaluation is carried out, a DCG @ K and NDCG @ K method in the sequencing field is adopted, wherein K is 7, two additional classification rules are additionally added for limitation, if the parent class of one label is wrong in prediction, the data is wrong in prediction no matter whether the subclass is wrong or not; NDCG represents normalized depreciation cumulative gain, and the label correlation score value rel of each prediction listiAdding, and dividing by the logarithm of the position, which means that the more front label is more important, the NDCG is normalized on the basis of DCG;
on the basis of the enterprise multi-element label system, the multi-element label design is carried out according to five dimensions of logistics transportation distance, paper packaging demand, paper packaging type, enterprise marketing relation and industry category from the requirement of the paper packaging industry. In the identification of the multi-element label, the following three basic steps are provided: (1) based on a rule heuristic method, generating a small amount of annotation data firstly; (2) a plurality of traditional identification algorithms are used for identification and judgment, certain manual correction is added, new marking data are generated, and the manual marking cost is reduced; (3) and repeating the previous two steps, and after a certain amount of data is accumulated, applying an XML-CNN depth model to perform overall label identification.
The XML-CNN has the advantages of simple method, high calculation efficiency and the like, can solve the problem of fusion of a plurality of traditional models, and reduces complexity. The method is adopted to identify all the tags at one time, the assumption that the tags are independent from each other is avoided, the dependency relationship among the tags is fully considered, and the model identification performance is improved.
Therefore, the enterprise multi-label identification method facing paper packaging and related industries improves the retrieval efficiency of complex information of enterprises.
Drawings
FIG. 1 is a schematic flow diagram of an enterprise multi-tag system of the present invention;
FIG. 2 is a schematic flow chart of the enterprise multi-tag identification process of the present invention;
FIG. 3 is a schematic diagram of a structure diagram of the XML-CNN training under an enterprise sample in the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the attached drawings.
Example 1: as shown in the figure, the enterprise multi-element label identification method for paper packaging and related industries comprises the following steps:
firstly, constructing a multi-element label system:
enterprise multi-element labels refer to enterprise data without direct commercial value, information which can directly meet business requirements is abstracted through cleaning, sorting and mining, and then displayed in a form of a plurality of labels, so that the related requirements of accurate classification, high-efficiency and complex query of enterprises are supported;
the invention utilizes seven characteristic data disclosed by enterprises, specifically comprising company name, registration address, registration capital, company type, affiliated industry, operation range and company profile, to construct an enterprise multi-label system facing paper packaging and related industries;
the label system mainly covers five dimensions: the transportation distance, the paper packaging demand, the paper packaging type, the enterprise marketing relationship and the industry category; the transportation distance comprises 5 labels, the paper packaging demand amount comprises 3 labels, the paper packaging type comprises 5 labels, the enterprise marketing relationship comprises 3 labels, the industry category comprises 198 labels, and the total number of the labels is 214;
the labels in these five dimensions are described in detail below:
① transportation distance label is the summarization of geographical position information between enterprises, specifically takes on the values of 'same district', 'same city', 'same province', 'domestic' and 'foreign', the label is mainly identified from the characteristic data of the enterprise 'registration address', some 'company name' also covers the information related to the transportation distance label;
② paper packaging demand label is an enterprise's summary of paper packaging demand information, specifically taking values of ' large amount ', ' medium amount ', ' small amount ', and the label mainly depends on three characteristic data of ' industry of the enterprise ', ' registration capital ' and ' company type ';
③ paper package type labels, specifically taking the values of 'carton', 'carton' and 'paper bag', indicate which type the paper package required by an enterprise mainly belongs to, and can be judged and identified from 'the industry to which the paper package belongs';
④ enterprise marketing relation labels, specifically taking values of 'same line', 'client', 'supplier', classifying upstream and downstream enterprises of a company, mainly identifying data on two characteristics of 'affiliated industry' and 'operating range' of the enterprise, having the enterprise marketing relation labels, enabling the company in the paper packaging industry to easily know the relations between the enterprises in other industries and the company, and further adopting different coping strategies;
⑤ industry type labels, mainly refer to the industry classification standard published in 2017 of the country, and the market characteristics and business requirements of the paper packaging industry, are reduced and modified on the basis of the national industry classification standard, and a set of industry classification labels facing the paper packaging industry is newly formulated, mainly comprising the manufacturing industry and the transportation, storage and postal industry, and are displayed by labels of other industries in consideration of the fact that many industries do not have requirements on paper packaging;
in the multi-label system, the transportation distance, the paper package demand, the paper package type and the enterprise marketing relation dimension only comprise first-level labels, the industry category dimension comprises third-level labels and has a hierarchical relation, and the other dimension labels have relevance to the industry category labels; in the above-described label system, if an enterprise matches the "other industries" label in the industry category dimension, then the other labels of the enterprise are all null, i.e., not the enterprise under consideration; in addition to the above, each enterprise will get 7 corresponding multi-element tags; (II) identifying the multi-element label:
at present, many learning algorithms related to multi-element labels exist, and the learning algorithms can be mainly divided into two categories: the method based on problem transformation is abbreviated as follows: the solution idea of Problem Transformation is to convert Problem data to make it suitable for the existing algorithm, and secondly, a method based on algorithm adaptation, abbreviated as: the Algorithm addition is expanded aiming at a specific Algorithm, so that the problem of the multi-element label can be directly processed; the former is mainly a traditional machine learning method, and the latter is mainly a deep learning-based method;
considering that the method based on the traditional machine learning is simple and is suitable for identifying the single-dimensional labels, the method firstly utilizes the method to respectively carry out iterative identification on the four-dimensional labels, and solves the problem of label data shortage; the method comprises the following specific steps: (1) generating a small amount of marking data based on rules, and training by utilizing a KNN model, a decision tree model and a binary classification model; (2) identifying the unmarked data by using the trained three models, putting a marked data set if the identification results are consistent, and handing over to manual operation for correction if the identification results are inconsistent; (3) repeating the first two steps, and iteratively forming new marking data; with enough marking data, the mutual relation among the labels is considered, and finally, an XML-CNN deep learning model is used for identifying all types of labels together, so that the accuracy is improved, and the problem of difficulty in later maintenance of a plurality of independent models is solved; the overall identification process is detailed as follows.
TABLE 1 comparison of multiple tag identification methods
① generates a small amount of initial data based on rules:
generating label data corresponding to respective labels on the basis of a rule heuristic form on four dimensions of a label system; (1) on the transportation distance label, searching and matching on corresponding enterprise information by utilizing a place name word stock table and a company type word stock on the network, mainly matching data of a comparison rule, and then converting into a corresponding label; (2) on the industrial category label, the invention utilizes the established paper packaging industry classification standard to arrange a simple industrial category mapping word bank, uses the double-array wire tree method to match with the first words of ' the industry and ' the operation range ', and identifies a small number of labels; (3) in the enterprise marketing relationship, the enterprise relationship is judged through the identified industry label and the main operation product, and for example, if the industry is the paper making and paper product industry, the enterprise relationship can be identified as a 'peer' label; (4) on the basis of the paper packaging demand and the paper packaging type, according to industry knowledge and business experience, the paper packaging type is 'carton' if a company is a household appliance industry and is a stock system company or 'registered capital'If the number of RMB is more than 5000 ten thousand, the paper packaging demand is 'large'; by the above rule heuristic, an initial tagged data set S is generated0Entering the next link;
② Multi-model identification iteration:
a small amount of marking data S generated in the previous step0In the invention, the traditional common algorithm KNN, decision tree and binary classification method are used for iterative training learning to generate a large amount of labeled data; specifically, three corresponding models are trained on a single label dimension, then unlabeled data are predicted, and if the prediction results of the three models are consistent, the data can be added into a training data set; if the data are inconsistent, the data are manually corrected, then a training set is added, next iteration is carried out, and when the training data set exceeds a certain amount, such as 20 ten thousand, a complete multi-element label recognition algorithm is established by using a depth model method; setting initial data as S0(partition training set and test set), unlabeled dataset D { (x)1,y1),(x2,y2),...,(xn,yn)},xi={xi1,xi2,..,xi6},xiRepresenting each piece of data has 6 corresponding characteristic data vectors, each vector is formed by splicing word vectors trained by word2vec according to rows after the characteristic data text is participled, and xi∈Rh*dWherein h is the length of each feature vector, d is the dimension of a word vector, and generally 100 dimensions are taken; y isiValue of y for the corresponding labeliIs of [ L1,L2,..,Lt]N is the number of samples, and t is the number of labels; in the identification process, the training samples are converted into two classes of a single label for identification;
②.1KNN recognition model:
the KNN model idea is to calculate the distance between every two samples and then judge which known samples the unknown samples are closer to; then, determining the label of the unknown sample by using a voting mode; the loss function adopts a commonly used square loss function, and the distance calculation formula is as follows:
generally, an euclidean distance of p-2 is taken for calculation, and when a prediction sample is adjacent to k surrounding samples, the class with the most label categories of the k samples is taken as the label of the prediction sample;
②.2 decision tree:
in decision tree selection, a CART classification tree is selected, and information purity is measured by using a kini coefficient, specifically:
wherein p isiThe probability that the sample belongs to the i category is adopted, the process adopts an integration method of random forests, and a CART classification tree method is adopted in consideration of the fact that the step is the fusion of multiple algorithms;
②, class two 3:
the idea of the binary classification method is to respectively establish classifiers according to the number of labels, the classification method can be logistic regression and SVM, the invention selects the SVM method, the SVM refers to the support vector, and the predicted result of each classifier is added during prediction to obtain the final result;
in a data set S0Respectively training the three models, wherein f corresponds to1,f2,f3Adjusting each model to be optimal by adopting an F value evaluation standard; the result of identifying the same unmarked data is r1,r2, r3If r is1=r2=r3No correction is required, and the data here refers to xj,yj(ii) a Otherwise, manually participating in correction; adding newly generated annotation data to S0Iteration is performed in the above manner;
③ XML-CNN Enterprise multi-element tag identification:
when the marked data set meets a certain amount, applying an XML-CNN depth model to train a full version recognition model; the reason is that the incidence relation information between the labels is expressed and learned, so that the overall identification effect of the labels is improved; the XML-CNN model is a variant of the CNN model, the CNN refers to a convolutional neural network, and compared with other deep models such as a bidirectional cyclic neural network and a transformer model, the model has much higher operation efficiency and the best recognition effect; specifically, each information dimension of an enterprise is characterized according to the granularity of words, then convolution and dynamic pooling are carried out, then a full connection layer is added, finally output is carried out in a sigmoid binary loss mode, the probability problem of a multi-element label is converted, and if the probability is larger than a set threshold value, the label is output;
(1)Embedding:
characterizing an enterprise by information dimension as e1:m=[e1,..,em]∈Rm*dWherein m is the total length of the text in the seven dimensional information; wherein the "business scope" and "company profile" have length limitations, and the whole number in the "registered capital" is treated as a word by cutting if the text length exceeds 200; d is the dimension size of the word, typically 100 dimensions;
(2)Convoluation:
ci=gc(vTei:j+h-1) The convolution kernel size v ∈ Rf*dGenerally, f is 2, 3 and 4, different window sizes are represented, N-gram features are extracted, different convolution kernels are used, semantic information of different layers is extracted, and the number of kernels is generally 128; one convolution kernel yields c ═ c1,..,cr]R ═ m-h + 1;
(3)Dynamic Max Pooling:
after convolution, c is divided into p sections (p is 3 in the invention because the maximum depth of the label system is 3), then each section takes the maximum value, and finally output, P (c) ([ max { c) }1:r/p},..,max{cr-r/p:r}];
(4)Fully connected bottleneck layer:
Adding the result after dynamic pooling into a bottleneck-shaped full-connection layer, namely the number of hidden units of the layer is far less than that of labels of an output layer, so that the advantage of improving the fitting capability is realized; f ═ wog(whP) wherein Wh∈Rh×t×pand Wo∈RL×hT is the number of convolution kernels, h is the number of the hidden units on the layer, L is the number of output labels, g is an activation function, and tanh is adopted; the output layer is connected behind the full-connection layer, and the sigmoid function is used for prediction;
(5)Loss function
the loss function used is a two-class loss function, and the expression is:
when the evaluation is carried out, a DCG @ K and NDCG @ K method in the sequencing field is adopted, wherein K is 7, two additional classification rules are additionally added for limitation, if the parent class of one label is wrong in prediction, the data is wrong in prediction no matter whether the subclass is wrong or not; NDCG represents normalized depreciation cumulative gain, and the label correlation score value rel of each prediction listiAdding, and dividing by the logarithm of the position, which means that the more front label is more important, the NDCG is normalized on the basis of DCG;
Claims (1)
1. a paper package and related industry oriented enterprise multi-element label identification method is characterized by comprising the following steps:
firstly, constructing a multi-element label system:
enterprise multi-element labels refer to enterprise data without direct commercial value, information which can directly meet business requirements is abstracted through cleaning, sorting and mining, and then displayed in a form of a plurality of labels, so that the related requirements of accurate classification, high-efficiency and complex query of enterprises are supported;
the invention utilizes seven characteristic data disclosed by enterprises, specifically comprising company name, registration address, registration capital, company type, affiliated industry, operation range and company profile, to construct an enterprise multi-label system facing paper packaging and related industries;
the label system mainly covers five dimensions: the transportation distance, the paper packaging demand, the paper packaging type, the enterprise marketing relationship and the industry category; the transportation distance comprises 5 labels, the paper packaging demand amount comprises 3 labels, the paper packaging type comprises 5 labels, the enterprise marketing relationship comprises 3 labels, the industry category comprises 198 labels, and the total number of the labels is 214;
the labels in these five dimensions are described in detail below:
① transportation distance label is the summarization of geographical position information between enterprises, specifically takes on the values of 'same district', 'same city', 'same province', 'domestic' and 'foreign', the label is mainly identified from the characteristic data of the enterprise 'registration address', some 'company name' also covers the information related to the transportation distance label;
② paper packaging demand label is an enterprise's summary of paper packaging demand information, specifically taking values of ' large amount ', ' medium amount ', ' small amount ', and the label mainly depends on three characteristic data of ' industry of the enterprise ', ' registration capital ' and ' company type ';
③ paper package type labels, specifically taking the values of 'carton', 'carton' and 'paper bag', indicate which type the paper package required by an enterprise mainly belongs to, and can be judged and identified from 'the industry to which the paper package belongs';
④ enterprise marketing relation labels, specifically taking values of 'same line', 'client', 'supplier', classifying upstream and downstream enterprises of a company, mainly identifying data on two characteristics of 'affiliated industry' and 'operating range' of the enterprise, having the enterprise marketing relation labels, enabling the company in the paper packaging industry to easily know the relations between the enterprises in other industries and the company, and further adopting different coping strategies;
⑤ industry type labels, mainly refer to the industry classification standard published in 2017 of the country, and the market characteristics and business requirements of the paper packaging industry, are reduced and modified on the basis of the national industry classification standard, and a set of industry classification labels facing the paper packaging industry is newly formulated, mainly comprising the manufacturing industry and the transportation, storage and postal industry, and are displayed by labels of other industries in consideration of the fact that many industries do not have requirements on paper packaging;
in the multi-label system, the transportation distance, the paper package demand, the paper package type and the enterprise marketing relation dimension only comprise first-level labels, the industry category dimension comprises third-level labels and has a hierarchical relation, and the other dimension labels have relevance to the industry category labels; in the above-described label system, if an enterprise matches the "other industries" label in the industry category dimension, then the other labels of the enterprise are all null, i.e., not the enterprise under consideration; in addition to the above, each enterprise will get 7 corresponding multi-element tags;
(II) identifying the multi-element label:
at present, many learning algorithms related to multi-element labels exist, and the learning algorithms can be mainly divided into two categories: the method based on problem transformation is abbreviated as follows: the solution idea of Problem Transformation is to convert Problem data to make it suitable for the existing algorithm, and secondly, a method based on algorithm adaptation, abbreviated as: the Algorithm addition is expanded aiming at a specific Algorithm, so that the problem of the multi-element label can be directly processed; the former is mainly a traditional machine learning method, and the latter is mainly a deep learning-based method;
considering that the method based on the traditional machine learning is simple and is suitable for identifying the single-dimensional labels, the method firstly utilizes the method to respectively carry out iterative identification on the four-dimensional labels, and solves the problem of label data shortage; the method comprises the following specific steps: (1) generating a small amount of marking data based on rules, and training by utilizing a KNN model, a decision tree model and a binary classification model; (2) identifying the unmarked data by using the trained three models, putting a marked data set if the identification results are consistent, and handing over to manual operation for correction if the identification results are inconsistent; (3) repeating the first two steps, and iteratively forming new marking data; with enough marking data, the mutual relation among the labels is considered, and finally, an XML-CNN deep learning model is used for identifying all types of labels together, so that the accuracy is improved, and the problem of difficulty in later maintenance of a plurality of independent models is solved;
① generates a small amount of initial data based on rules:
generating label data corresponding to respective labels on the basis of a rule heuristic form on four dimensions of a label system; (1) on the transportation distance label, searching and matching on corresponding enterprise information by utilizing a place name word stock table and a company type word stock on the network, mainly matching data of a comparison rule, and then converting into a corresponding label; (2) on the industrial category label, the invention utilizes the established paper packaging industry classification standard to arrange a simple industrial category mapping word bank, uses the double-array wire tree method to match with the first words of ' the industry and ' the operation range ', and identifies a small number of labels; (3) in the enterprise marketing relationship, the enterprise relationship is judged through the identified industry label and the main operation product, and for example, if the industry is the paper making and paper product industry, the enterprise relationship can be identified as a 'peer' label; (4) on the basis of the paper packaging demand and the paper packaging type, according to industry knowledge and business experience, the 'affiliated industry', 'registered capital' and 'company type' are used for rule inspiration, for example, if one enterprise is the household appliance industry, the paper packaging type is 'carton', and if the stock system enterprise or the 'registered capital' is more than 5000 ten thousand RMB, the paper packaging demand is 'large'; by the above rule heuristic, an initial tagged data set S is generated0Entering the next link;
② Multi-model identification iteration:
at the upper partA small amount of marking data S generated in one step0In the invention, the traditional common algorithm KNN, decision tree and binary classification method are used for iterative training learning to generate a large amount of labeled data; specifically, three corresponding models are trained on a single label dimension, then unlabeled data are predicted, and if the prediction results of the three models are consistent, the data can be added into a training data set; if the data are inconsistent, the data are manually corrected, then a training set is added, next iteration is carried out, and when the training data set exceeds a certain amount, such as 20 ten thousand, a complete multi-element label recognition algorithm is established by using a depth model method; setting initial data as S0(partition training set and test set), unlabeled dataset D { (x)1,y1),(x2,y2),...,(xn,yn)},xi={xi1,xi2,..,xi6},xiRepresenting each piece of data has 6 corresponding characteristic data vectors, each vector is formed by splicing word vectors trained by word2vec according to rows after the characteristic data text is participled, and xi∈Rh*dWherein h is the length of each feature vector, d is the dimension of a word vector, and generally 100 dimensions are taken; y isiValue of y for the corresponding labeliIs of [ L1,L2,..,Lt]N is the number of samples, and t is the number of labels; in the identification process, the training samples are converted into two classes of a single label for identification;
②.1KNN recognition model:
the KNN model idea is to calculate the distance between every two samples and then judge which known samples the unknown samples are closer to; then, determining the label of the unknown sample by using a voting mode; the loss function adopts a commonly used square loss function, and the distance calculation formula is as follows:
generally, an euclidean distance of p-2 is taken for calculation, and when a prediction sample is close to k surrounding samples, the class with the most label categories of the k samples is taken as the label of the prediction sample;
②.2 decision tree:
in decision tree selection, a CART classification tree is selected, and information purity is measured by using a kini coefficient, specifically:
wherein p isiThe probability that the sample belongs to the i category is adopted, the process adopts an integration method of random forests, and a CART classification tree method is adopted in consideration of the fact that the step is the fusion of multiple algorithms;
②, class two 3:
the idea of the binary classification method is to respectively establish classifiers according to the number of labels, the classification method can be logistic regression and SVM, the invention selects the SVM method, the SVM refers to the support vector, and the predicted result of each classifier is added during prediction to obtain the final result;
in a data set S0Respectively training the three models, wherein f corresponds to1,f2,f3Adjusting each model to be optimal by adopting an F value evaluation standard; the result of identifying the same unmarked data is r1,r2,r3If r is1=r2=r3No correction is required, and the data here refers to xj,yj(ii) a Otherwise, manually participating in correction; adding newly generated annotation data to S0Iteration is performed in the above manner;
③ XML-CNN Enterprise multi-element tag identification:
when the marked data set meets a certain amount, applying an XML-CNN depth model to train a full version recognition model; the reason is that the incidence relation information between the labels is expressed and learned, so that the overall identification effect of the labels is improved; the XML-CNN model is a variant of the CNN model, the CNN refers to a convolutional neural network, and compared with other deep models such as a bidirectional cyclic neural network and a transformer model, the model has much higher operation efficiency and the best recognition effect; specifically, each information dimension of an enterprise is characterized according to the granularity of words, then convolution and dynamic pooling are carried out, then a full connection layer is added, finally output is carried out in a sigmoid binary loss mode, the probability problem of a multi-element label is converted, and if the probability is larger than a set threshold value, the label is output;
(1)Embedding:
characterizing an enterprise by information dimension as e1:m=[e1,..,em]∈Rm*dWherein m is the total length of the text in the seven dimensional information; wherein the "business scope" and "company profile" have length limitations, and the whole number in the "registered capital" is treated as a word by cutting if the text length exceeds 200; d is the dimension size of the word, typically 100 dimensions;
(2)Convoluation:
ci=gc(vTei:j+h-1) The convolution kernel size v ∈ Rf*dGenerally, f is 2, 3 and 4, different window sizes are represented, N-gram features are extracted, different convolution kernels are used, semantic information of different layers is extracted, and the number of kernels is generally 128; one convolution kernel yields c ═ c1,..,cr]R ═ m-h + 1;
(3)Dynamic Max Pooling:
after convolution, c is divided into p sections (p is 3 in the invention because the maximum depth of the label system is 3), then each section takes the maximum value, and finally output, P (c) ([ max { c) }1:r/p},..,max{cr-r/p:r}];
(4)Fully connected bottleneck layer:
Adding the result after dynamic pooling into a bottleneck-shaped full-connection layer, namely the number of hidden units of the layer is far less than that of labels of an output layer, so that the advantage of improving the fitting capability is realized; f ═ wog(whP) wherein Wh∈Rh×t×pandWo∈RL×hT is the number of convolution kernels, h is the number of the hidden units on the layer, L is the number of output labels, g is an activation function, and tanh is adopted; the output layer is connected behind the full-connection layer, and the sigmoid function is used for prediction;
(5)Loss function
the loss function used is a two-class loss function, and the expression is:
when the evaluation is carried out, a DCG @ K and NDCG @ K method in the sequencing field is adopted, wherein K is 7, two additional classification rules are additionally added for limitation, if the parent class of one label is wrong in prediction, the data is wrong in prediction no matter whether the subclass is wrong or not; NDCG represents normalized depreciation cumulative gain, and the label correlation score value rel of each prediction listiAdding, and dividing by the logarithm of the position, which means that the more front label is more important, the NDCG is normalized on the basis of DCG;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911335749.0A CN111191001A (en) | 2019-12-23 | 2019-12-23 | Enterprise multi-element label identification method for paper package and related industries thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911335749.0A CN111191001A (en) | 2019-12-23 | 2019-12-23 | Enterprise multi-element label identification method for paper package and related industries thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111191001A true CN111191001A (en) | 2020-05-22 |
Family
ID=70709287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911335749.0A Withdrawn CN111191001A (en) | 2019-12-23 | 2019-12-23 | Enterprise multi-element label identification method for paper package and related industries thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111191001A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680895A (en) * | 2020-05-26 | 2020-09-18 | 中国平安财产保险股份有限公司 | Data automatic labeling method and device, computer equipment and storage medium |
CN112580332A (en) * | 2020-11-19 | 2021-03-30 | 淮阴工学院 | Enterprise portrait method based on label layering and deepening modeling |
CN113159709A (en) * | 2021-03-24 | 2021-07-23 | 深圳闪回科技有限公司 | Automatic label system and system |
CN113298352A (en) * | 2021-04-28 | 2021-08-24 | 北京网核精策科技管理中心(有限合伙) | Enterprise industry information processing method and device, electronic equipment and readable storage medium |
CN113378907A (en) * | 2021-06-04 | 2021-09-10 | 南京大学 | Automatic software traceability recovery method for enhancing data preprocessing process |
US20230162020A1 (en) * | 2021-11-23 | 2023-05-25 | Microsoft Technology Licensing, Llc | Multi-Task Sequence Tagging with Injection of Supplemental Information |
CN118984254A (en) * | 2024-10-22 | 2024-11-19 | 江苏康缘药业股份有限公司 | A standardized module and method for connecting traditional Chinese medicine enterprise nodes with secondary nodes |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045911A (en) * | 2015-08-12 | 2015-11-11 | 北京搜狗科技发展有限公司 | Label generating method for user to mark and label generating equipment for user to mark |
CN106777335A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model |
CN107133293A (en) * | 2017-04-25 | 2017-09-05 | 中国科学院计算技术研究所 | A kind of ML kNN improved methods and system classified suitable for multi-tag |
CN107944480A (en) * | 2017-11-16 | 2018-04-20 | 广州探迹科技有限公司 | A kind of enterprises ' industry sorting technique |
CN108536800A (en) * | 2018-04-03 | 2018-09-14 | 有米科技股份有限公司 | File classification method, system, computer equipment and storage medium |
CN109582792A (en) * | 2018-11-16 | 2019-04-05 | 北京奇虎科技有限公司 | A kind of method and device of text classification |
CN109783818A (en) * | 2019-01-17 | 2019-05-21 | 上海三零卫士信息安全有限公司 | A kind of enterprises ' industry multi-tag classification method |
CN110532542A (en) * | 2019-07-15 | 2019-12-03 | 西安交通大学 | It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example |
-
2019
- 2019-12-23 CN CN201911335749.0A patent/CN111191001A/en not_active Withdrawn
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045911A (en) * | 2015-08-12 | 2015-11-11 | 北京搜狗科技发展有限公司 | Label generating method for user to mark and label generating equipment for user to mark |
CN106777335A (en) * | 2017-01-13 | 2017-05-31 | 深圳爱拼信息科技有限公司 | It is a kind of to be remembered based on shot and long term(LSTM)The multi-tag trade classification method and device of model |
CN107133293A (en) * | 2017-04-25 | 2017-09-05 | 中国科学院计算技术研究所 | A kind of ML kNN improved methods and system classified suitable for multi-tag |
CN107944480A (en) * | 2017-11-16 | 2018-04-20 | 广州探迹科技有限公司 | A kind of enterprises ' industry sorting technique |
CN108536800A (en) * | 2018-04-03 | 2018-09-14 | 有米科技股份有限公司 | File classification method, system, computer equipment and storage medium |
CN109582792A (en) * | 2018-11-16 | 2019-04-05 | 北京奇虎科技有限公司 | A kind of method and device of text classification |
CN109783818A (en) * | 2019-01-17 | 2019-05-21 | 上海三零卫士信息安全有限公司 | A kind of enterprises ' industry multi-tag classification method |
CN110532542A (en) * | 2019-07-15 | 2019-12-03 | 西安交通大学 | It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example |
Non-Patent Citations (1)
Title |
---|
JINGZHOU LIU等: "Deep Learning for Extreme Multi-label Text Classification", 《SIGIR "17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680895A (en) * | 2020-05-26 | 2020-09-18 | 中国平安财产保险股份有限公司 | Data automatic labeling method and device, computer equipment and storage medium |
CN112580332A (en) * | 2020-11-19 | 2021-03-30 | 淮阴工学院 | Enterprise portrait method based on label layering and deepening modeling |
CN112580332B (en) * | 2020-11-19 | 2022-07-12 | 淮阴工学院 | An enterprise portrait method based on label layered and deep modeling |
CN113159709A (en) * | 2021-03-24 | 2021-07-23 | 深圳闪回科技有限公司 | Automatic label system and system |
CN113298352A (en) * | 2021-04-28 | 2021-08-24 | 北京网核精策科技管理中心(有限合伙) | Enterprise industry information processing method and device, electronic equipment and readable storage medium |
CN113378907A (en) * | 2021-06-04 | 2021-09-10 | 南京大学 | Automatic software traceability recovery method for enhancing data preprocessing process |
CN113378907B (en) * | 2021-06-04 | 2024-01-09 | 南京大学 | Automated software traceability recovery method for enhancing data preprocessing process |
US20230162020A1 (en) * | 2021-11-23 | 2023-05-25 | Microsoft Technology Licensing, Llc | Multi-Task Sequence Tagging with Injection of Supplemental Information |
US12353998B2 (en) * | 2021-11-23 | 2025-07-08 | Microsoft Technology Licensing, Llc | Multi-task sequence tagging with injection of supplemental information |
CN118984254A (en) * | 2024-10-22 | 2024-11-19 | 江苏康缘药业股份有限公司 | A standardized module and method for connecting traditional Chinese medicine enterprise nodes with secondary nodes |
CN118984254B (en) * | 2024-10-22 | 2025-03-18 | 江苏康缘药业股份有限公司 | A standardized module and method for connecting traditional Chinese medicine enterprise nodes with secondary nodes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111191001A (en) | Enterprise multi-element label identification method for paper package and related industries thereof | |
CN116541911B (en) | Packaging design system based on artificial intelligence | |
CN100464332C (en) | Picture inquiry method and system | |
CN106407352A (en) | Traffic image retrieval method based on depth learning | |
CN103810299A (en) | Image retrieval method on basis of multi-feature fusion | |
CN106599037A (en) | Recommendation method based on label semantic normalization | |
CN114723994B (en) | Hyperspectral image classification method based on dual classifier antagonism enhancement network | |
CN107391565B (en) | Matching method of cross-language hierarchical classification system based on topic model | |
CN106778834A (en) | A kind of AP based on distance measure study clusters image labeling method | |
CN113378913A (en) | Semi-supervised node classification method based on self-supervised learning | |
CN114817454A (en) | NLP knowledge graph construction method combining information content and BERT-BilSTM-CRF | |
CN107169061A (en) | A kind of text multi-tag sorting technique for merging double information sources | |
CN114818963A (en) | Small sample detection algorithm based on cross-image feature fusion | |
CN112465226B (en) | User behavior prediction method based on feature interaction and graph neural network | |
CN117173702A (en) | Multi-view multi-label learning method based on deep feature map fusion | |
CN113076490A (en) | Case-related microblog object-level emotion classification method based on mixed node graph | |
CN116934531A (en) | An intelligent management method and system for wine information based on data analysis | |
CN108876643A (en) | It is a kind of social activity plan exhibition network on acquire(Pin)Multimodal presentation method | |
CN102663445A (en) | Image understanding system based on layered temporal memory algorithm and image understanding method thereof | |
CN109697257A (en) | It is a kind of based on the network information retrieval method presorted with feature learning anti-noise | |
CN111339303B (en) | Text intention induction method and device based on clustering and automatic abstracting | |
CN116823321B (en) | Method and system for analyzing economic management data of electric business | |
CN113254688A (en) | Trademark retrieval method based on deep hash | |
CN119807335A (en) | A method and system for generating customer service information based on preset fields | |
CN117493962A (en) | Method and device for classifying bulk commodity events by fusing event attributes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200522 |
|
WW01 | Invention patent application withdrawn after publication |