CN113342984A

CN113342984A - Garden enterprise classification method and system, intelligent terminal and storage medium

Info

Publication number: CN113342984A
Application number: CN202110756765.8A
Authority: CN
Inventors: 杨毅; 吴孝林
Original assignee: Shenzhen Yungu Xingchen Information Technology Co ltd
Current assignee: Shenzhen Yungu Xingchen Information Technology Co ltd
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-09-03

Abstract

The invention relates to a park enterprise classification method, a system, an intelligent terminal and a storage medium, wherein the park enterprise classification method comprises the steps of extracting key words of camping information of each enterprise in a park, carrying out cluster analysis on the enterprises according to the key words, and classifying each enterprise in the park into a corresponding industry; acquiring enterprise information, capturing a plurality of high-frequency words, and fusing the high-frequency words to generate a feature word set; analyzing the similarity between enterprises for the feature word set, and performing clustering analysis according to the similarity between the enterprises to form a plurality of industry classes; and classifying and summarizing the enterprises according to the industries and the industry classes to form a plurality of industry class groups. The method can extract enterprise characteristic values based on the acquired enterprise operation information and enterprise information, classify the park enterprises into different industries and industrial classes by adopting a cluster analysis method, and classify and summarize the park enterprises according to the two dimensions of the industries and the industrial classes so as to realize scientific classification management and control on the park enterprises.

Description

Garden enterprise classification method and system, intelligent terminal and storage medium

Technical Field

The invention relates to the technical field of park management, in particular to a park enterprise classification method, a park enterprise classification system, an intelligent terminal and a storage medium.

Background

The intelligent park refers to a standard building or building group which is generally planned and constructed by governments (cooperation of civil enterprises and governments), has complete water supply, power supply, gas supply, communication, roads, storage and other supporting facilities and reasonable layout, and can meet the requirements of production and scientific experiments in a certain specific industry, and comprises an industrial park, a logistics park, a metropolitan industrial park, a scientific and technological park, an creative park and the like. ". At present, the information of the park enterprises is mainly manually input into a file form for archiving according to the number of the building. Thus, when a certain type of enterprise needs to be known, workers are required to peruse each archive form for selection, time and energy are wasted, and manual screening of the enterprise is easy to miss, so that the enterprise in the park is difficult to effectively manage and promote on the whole.

Disclosure of Invention

The invention mainly aims to provide a park enterprise classification method, a park enterprise classification system, an intelligent terminal and a storage medium, wherein the park enterprise classification method can extract enterprise characteristic values based on acquired enterprise operation information and enterprise information, classify park enterprises into different industries and industrial classes by adopting a cluster analysis method, and classify and summarize the park enterprises according to the two dimensions of the industries and the industrial classes so as to realize scientific classification management and control on the park enterprises. The technical scheme comprises the following contents.

On one hand, the park enterprise classification method comprises the following steps:

s1, extracting keywords of the business information in each enterprise of the park, and performing clustering analysis on the enterprises according to the keywords to classify each enterprise of the park into a corresponding industry;

s2, acquiring enterprise information, capturing a plurality of high-frequency words in the enterprise information, and fusing the high-frequency words to generate a feature word set;

s3, analyzing the similarity between enterprises for the feature word set, and performing clustering analysis according to the similarity between enterprises to form a plurality of industry classes;

and S4, classifying and summarizing the enterprises according to the industries and the industry classes to form a plurality of industry class groups.

In one possible implementation, step S1 includes:

s11, acquiring the business information of the enterprise, carrying out semantic analysis on the business information, and extracting a plurality of information fragments;

s12, processing the information fragments to construct an information outline;

and S13, extracting and screening phrases exceeding a first preset threshold value in the information outline, and setting the phrases as keywords.

In one possible implementation, the processing method for the plurality of pieces of information at step S12 includes at least one of: cleaning, removing, combining, remolding and standardizing.

In one possible implementation, step S2 includes:

s21, screening and de-duplicating the information to capture the original information;

s22, extracting a plurality of high-frequency words in the original identification information;

and S23, performing semantic analysis on the high-frequency words, and enabling all high-frequency words with the same or similar word senses to be represented by one word to form a feature word set.

In one possible implementation, step S21 is specifically: and performing semantic analysis on the information, and if the similarity of semantic analysis results of any two pieces of information is greater than a first preset threshold value, taking the information with earlier release time as the original information, and deleting the other piece of information.

In one possible implementation, step S3 includes:

s31, calculating the characteristic value of the characteristic word according to the category information and the word frequency of the characteristic value;

s32, establishing a vector model for the enterprise according to the characteristic values;

and S33, determining the industry similarity between enterprises according to the cosine similarity between the models of the vector model.

In one aspect, a campus enterprise classification system is provided, comprising:

the information extraction and analysis module is used for extracting key words of the business information in each enterprise of the park and carrying out cluster analysis on the enterprises according to the key words so as to classify each enterprise of the park into a corresponding industry;

the characteristic word extraction processing module is used for acquiring enterprise information, capturing a plurality of high-frequency words in the enterprise information, and fusing the high-frequency words to generate a characteristic word set;

the cluster analysis module is used for analyzing the similarity between enterprises for the feature word sets and carrying out cluster analysis according to the similarity between the enterprises to form a plurality of industry classes;

and the data analysis and summarization module is used for classifying and summarizing enterprises according to industries and industry classes to form a plurality of industry class groups.

In one aspect, there is provided a smart terminal comprising a memory and a processor, the memory having stored thereon a computer program that can be loaded by the processor and executed to perform any of the methods described above.

In one aspect, a computer readable storage medium is provided, storing a computer program that can be loaded by a processor and executed to perform any of the methods described above.

According to the method for classifying the park enterprises, the park enterprises are pre-classified according to industries according to the operation information of the enterprises, then the characteristic words of the enterprise information are extracted by acquiring the enterprise information, a cluster analysis method is utilized, so that the park enterprises form a plurality of industry groups, and finally the enterprises are classified and summarized according to the industries and the industry groups to form a plurality of industry groups, so that park managers can realize scientific classification management and control on the park enterprises.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a schematic diagram of the overall steps of the campus enterprise classification method of the present invention.

Fig. 2 is a schematic diagram illustrating the detailed step of step S1 in fig. 1.

Fig. 3 is a detailed step diagram of step S2 in fig. 1.

Fig. 4 is a detailed step diagram of step S3 in fig. 1.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The description relating to "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicit to the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include a single feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Referring to fig. 1, the park enterprise classifying method disclosed by the present invention includes steps S1 to S4.

In step S1, keywords of the business information in each enterprise of the campus are extracted, and the enterprises are clustered according to the keywords, so as to classify each enterprise of the campus into a corresponding industry.

In step S1, the business information of each enterprise may be obtained through a crawler technology, and mainly information such as business products and staff size in the business information is extracted as a keyword to divide the enterprise into industries;

in step S2, the business information includes news reports, periodicals, web page information, etc. The enterprise information includes information on growth history of the enterprise, personnel scale, whether the enterprise is on the market, company product introduction, enterprise prize winning, intellectual property and the like, and the characteristic words of the enterprise can be formed through the related information of the mountain fruit lake enterprise, so that corresponding labels are made for the enterprise. If company A is listed, belonging to the A stock listed, the stock code is XXX.

S3, analyzing the similarity between enterprises for the feature word set, and carrying out cluster analysis according to the similarity between the enterprises to form a plurality of industry classes;

in step S3, comparing the feature words between any two enterprises according to the algorithm preset by the system, calculating the similarity between the enterprises, and if the similarity between the enterprises is greater than the preset value, classifying the two enterprises into the same industry class.

In this embodiment, the multidimensional clustering is adopted, including clustering the enterprises according to the plate dimensions, clustering according to the regional dimensions, clustering according to the staff dimensions, and clustering according to the scale dimensions of the enterprises.

Specifically, the description will be made of plate dimension clustering for an enterprise. The method comprises the steps of crawling original enterprise information by using a web crawler technology, filtering repeated information, extracting characteristic words related to plates of industries, concepts and the like in the enterprise information, eliminating meaningless words, combining synonyms or words with similarity meeting a threshold value in the keywords, and generating a plate characteristic word bank. For example, the generated plate feature words include, but are not limited to, securities, banks, environmental protection projects, real estate developments, petroleum, chemical engineering, and the like. And analyzing and clustering the enterprises according to the plate feature word bank.

In step S4, the enterprises in the park are classified and summarized according to industry and industry class to form a plurality of industry groups, so that compared with the building-based filing method, the method is more scientific and comprehensive in classifying and collecting the enterprises, and is beneficial for park managers to realize scientific classification management and control on the park enterprises.

Compared with the prior art, in this embodiment, carry out presorting according to the trade to the enterprise in garden according to the business information of enterprise earlier, through obtaining enterprise's information after that, extract the characteristic word of enterprise's information to utilize cluster analysis method, make garden enterprise form a plurality of industry clusters, classify and gather the enterprise according to trade and industry class at last, form a plurality of industry clusters, be favorable to the garden administrator to realize scientific classification management and control to garden enterprise.

In an alternative embodiment, as shown in fig. 2, step S1 includes the following steps:

s12, processing the information fragments to construct an information outline;

and S13, extracting phrases exceeding a first preset threshold value in the screening information outline, and setting the phrases as keywords.

In this embodiment, an information body is extracted from the business information by a semantic analysis method, and a plurality of important information segments are segmented, and in step S12, the information segments may be processed by one or more of cleaning, removing, merging, reshaping, and standardizing to form an information schema meeting the requirements, so as to facilitate later-stage extraction of words of the schema as keywords.

Referring to fig. 3, in an alternative embodiment, step S2 includes the following steps:

Wherein, step S21 specifically includes: and performing semantic analysis on the information, and if the similarity of semantic analysis results of any two pieces of information is greater than a first preset threshold value, taking the information with earlier release time as the original information, and deleting the other piece of information. The first preset threshold value can be set according to experience, original information is determined through a semantic analysis method and publication time, redundant repeated information is removed, the load of a server is reduced, the operation efficiency is improved, and a data deduplication scheme which is capable of saving computing resources and achieving deduplication accuracy is achieved. If the feature words are extracted too much, the feature dimensionality is too high, the later-stage clustering analysis is not facilitated, and the feature words processed in the step S23 are more accurate and concise.

In one embodiment, referring to fig. 4, step S3 includes:

s31, calculating the characteristic value of the characteristic word according to the category information, the word frequency and the characteristic word weight of the characteristic value;

In this embodiment, the phrase feature selection is considered based on the word frequency, the category information and the mutual information, and the category information refers to the category of the word-segmented phrases, such as place name, algorithm, and the like; mutual information, which may measure the mutual nature between two objects. And the method is used for measuring the distinguishing degree of the features to the subject in the filtering problem. Mutual information is a concept in information theory, is used for representing the relationship between information and is a measure of statistical correlation of two random variables, and the characteristic extraction by using the mutual information theory is based on the assumption that terms with high occurrence frequency in a certain category but low occurrence frequency in other categories are larger than the mutual information of the category. Mutual information is usually used as a measure between feature words and categories, and their mutual information amount is the largest if the feature words belong to the category. And the word frequency is used for calculating the capability of the word describing the document content. The formula for calculating the eigenvalues is as follows:

TF-IDF(w_i,b_j)＝tfi(w_i)×MI(wi,bj)×N/N_ij；

wherein: TF-IDF (w)_i,b_j) For the current term w_iIn subject b_jCharacteristic value of，tf_iThe expression w_iIn subject b_jOf (d) is_i,b_j) Is w_iSubject b_jN denotes the total number of topics, N_ijFor multiple subject occurrences of the current term w in all subjects_iThe number of (2).

Based on the same inventive concept, the embodiment of the invention provides a park enterprise classification system, which comprises:

The embodiments of the present application also provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments can be implemented.

The embodiment of the application provides an intelligent terminal, a memory and a processor, wherein the memory is stored with a computer program which can be loaded by the processor and can execute any one of the methods.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the above method embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying computer program code to a camera/terminal device, a recording medium, computer Memory, ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, etc. The computer-readable storage medium referred to herein may be a non-volatile storage medium, in other words, a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A park enterprise classification method is characterized by comprising the following steps:

2. The campus enterprise classification method of claim 1, wherein step S1 comprises:

s12, processing the information fragments to construct an information outline;

3. The campus enterprise classification method of claim 2, wherein the processing method of the plurality of pieces of information at step S12 includes at least one of: cleaning, removing, combining, remolding and standardizing.

4. The campus enterprise classification method of claim 1, wherein step S2 comprises:

5. The campus enterprise classification method of claim 4, wherein step S21 is specifically: and performing semantic analysis on the information, and if the similarity of semantic analysis results of any two pieces of information is greater than a first preset threshold value, taking the information with earlier release time as the original information, and deleting the other piece of information.

6. The campus enterprise classification method of claim 1, wherein step S3 comprises:

7. A campus enterprise classification system, comprising:

8. An intelligent terminal, comprising a memory and a processor, the memory having stored thereon a computer program that can be loaded by the processor and that executes the method according to any one of claims 1 to 6.

9. A computer-readable storage medium, in which a computer program is stored which can be loaded by a processor and which executes the method of any one of claims 1 to 6.