CN106570168A - Big data analysis-based internet + development index computing method - Google Patents
Big data analysis-based internet + development index computing method Download PDFInfo
- Publication number
- CN106570168A CN106570168A CN201610982627.0A CN201610982627A CN106570168A CN 106570168 A CN106570168 A CN 106570168A CN 201610982627 A CN201610982627 A CN 201610982627A CN 106570168 A CN106570168 A CN 106570168A
- Authority
- CN
- China
- Prior art keywords
- index
- internet
- data
- website
- development
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a big data analysis-based internet + development index computing method. The method is characterized by comprising the steps of collecting various provincial internet industry domain name resources and acquiring original element data, wherein the original element data include domain name data, information content of web sites and industry classification of the web sites; carrying out data cleaning and data association on the original element data to obtain available element data; and inputting the available element data into an internet + development index computing model and calculating an internet + development index. According to the big data analysis-based internet + development index computing method, the general condition of national internet + development, the general condition of provincial internet + development and the proportion conditions in the country and the general development conditions and trends of various industries and all industries in the internet + can be reflected as a whole through collecting multiple internet + development indexes of provincial and national sub-industries and all industries in internet + data computation.
Description
Technical field
The present invention relates to big data analysis technical field, more particularly to a kind of internet+development analyzed based on big data
Index calculation method.
Background technology
Nowadays internet fast development, is also continuously emerged based on the various development indexes of internet, and is increasingly becoming calmly
Quantify to weigh the effective means that internet+industry truly develops.
It is Tengxun that " internet+" development index is in the industry cycle more famous《Chinese " internet+" index report》, the report
Announcement is based primarily upon the immediate communication tools such as QQ, the wechat of Tencent, enters for access behavior pattern, the payment behavior etc. of user
The development index of row comprehensive analysis, can instruct to a great extent judgement of the people to " internet+" development.
But in above-mentioned statistical report, lack industry attributive analysis, the user's visit capacity to all websites in whole internet
Analysis, therefore, it is relatively not comprehensive enough, it is impossible to comprehensive and accurate measurement internet+state of development totality situation.
The content of the invention
In order to more comprehensively and accurately weigh the whole nation and each province internet+state of development, the invention provides one kind is based on
Internet+development index the computational methods of big data analysis.
The technical scheme is as follows:
First aspect, there is provided a kind of internet+development index computational methods analyzed based on big data, its feature is existed
In methods described includes:
Collection each province's internet industry domain name resources, obtain original factor data, wherein, the original factor data include
The trade classification of domain name data, the information content of website and website;
Data cleansing and data correlation are carried out to the original factor data, available factor data is obtained;
The available factor data is input into into internet+development index computation model, internet+development is calculated and is referred to
Number.
With reference in a first aspect, in the first possible embodiment, gathering each province's internet industry domain name resources, acquisition
Original factor data include:
Collection domain name data, obtain enlivening domain-name information and user's visit capacity;
Web page crawl is carried out according to domain name data, the information content of the website is obtained;
According to the information content of the website to websites collection, the trade classification of the website is obtained.
With reference to the first possible embodiment, in second possible embodiment, the information content of the website
Including:Web site name, website URL, the text message of website homepage, the framework layout information of website homepage and crawl the time.
With reference to the first possible embodiment, in the third possible embodiment, the trade classification of the website
Including:Government department, manufacturing industry, agricultural, the energy, finance, medical treatment, education, tourism, logistics, ecommerce, traffic and premises
Produce.
With reference in a first aspect, in the 4th kind of possible embodiment, methods described also includes that building internet+development refers to
Number computation model includes:
Internet+development index computation model is built by the first dimension of base period index;
To build internet+development index computation model as the second dimension when phase index.
It is described with when phase index is as the in the 5th kind of possible embodiment with reference to the 4th kind of possible embodiment
Two-dimensionses build internet+development index computation model to be included:
Build the current index computation model of the described available factor data of the branch trade and the whole industry in each province and the whole nation;
The current index of synthesis for building the described available factor data of the branch trade and the whole industry in each province and the whole nation calculates mould
Type.
It is described with when phase index is as the in the 6th kind of possible embodiment with reference to the 5th kind of possible embodiment
Two-dimensionses build internet industry basic resource development index computation model also to be included:
Build the branch trade in each province and the whole nation and the described available factor data of the whole industry when phase index chain rate amplification meter
Calculate model;
The synthesis of described available factor data of the branch trade and the whole industry in each province and the whole nation is built when phase index chain rate increases
Width computation model.
Second aspect, there is provided a kind of internet+development index computing system analyzed based on big data, its feature is existed
In the system includes:
Data acquisition module, for gathering each province's internet industry domain name resources, obtains original factor data, wherein, institute
State trade classification of the original factor data including domain name data, the information content of website and website;
Data analysis module, for carrying out data cleansing and data correlation to the original factor data, obtains available wanting
Prime number evidence;
Index computing module, for the available factor data to be input into into internet+development index computation model, calculates
To internet+development index.
With reference to second aspect, in the first possible embodiment, the data acquisition module specifically for:
Collection domain name data, obtain enlivening domain-name information and user's visit capacity;
Web page crawl is carried out according to domain name data, the information content of the website is obtained;
According to the information content of the website to websites collection, the trade classification of the website is obtained.
With reference to second aspect, in second possible embodiment, the system also includes that building internet+development refers to
The model construction module of number computation model, specifically for:
Internet+development index computation model is built by the first dimension of base period index;
To build internet+development index computation model as the second dimension when phase index, wherein, when phase index includes each province
And the whole nation branch trade and the whole industry described available factor data when phase index, comprehensive when phase index, chain rate amplification are current
Index and chain rate amplification comprehensively work as phase index.
A kind of internet+development index computational methods analyzed based on big data are embodiments provided, by obtaining
The original factor data of internet industry domain name resources are taken, can be from domain name data, site information content and website trade classification
Internet+state of development is more accurately completely weighed Deng many-sided data;By obtain the whole nation and each province internet+
Data, can react the internet+state of development and development trend of the whole nation and each province, more comprehensively show and estimate internet
+ development;By building internet+development index computation model in current dimension and base period dimension, can estimate accordingly
Internet+and in the state of development and trend of specific period, internet+development is more assessed comprehensively and reasonably.
Description of the drawings
Technical scheme in order to be illustrated more clearly that the embodiment of the present invention, below will be to making needed for embodiment description
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, on the premise of not paying creative work, can be obtaining other according to these accompanying drawings
Accompanying drawing.
Fig. 1 is the internet+development index computational methods analyzed based on big data that one embodiment of the present invention is provided
Flow chart;
Fig. 2 is that the internet+development index analyzed based on big data that another preferred embodiment of the invention is provided calculates system
System structural representation.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the embodiment of the present invention in it is attached
Figure, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only this
Invent a part of embodiment, rather than the embodiment of whole.Based on the embodiment in the present invention, those of ordinary skill in the art exist
The every other embodiment obtained under the premise of creative work is not made, the scope of protection of the invention is belonged to.
1st, define
1.1 " internets+"
" internet+" of indication refers to the " portion of government that business development, unit publicity are carried out by internet in this programme
Door, manufacturing industry, agricultural, the energy, finance, medical treatment, education, tourism, logistics, ecommerce, traffic, real estate " this 12 industries.
This programme is concerned with the quantity of the internet site of this 12 industries, the active degree of website, the distribution of website
Scope etc..
1.2 " internet+" development indexes
Refer to certain time point as starting point (base period), according to certain algorithm or model, with the quantity of internet site, net
The distribution etc. of active degree, website of standing is input, and what is calculated can reflect 12 industries " internet+" development of the above
Situation works as time value.The race condition of the same industry can be investigated out according to the development index of same industry same phase, according to same
The development index of phase different industries can investigate out the balance degree developed between different industries, according to the whole nation, each province, every profession and trade
Can investigate out the variation tendency with the base period when time value, it is also possible to calculate according to chain rate and last development and change
Situation.
2nd, factor analysis
2.1 Websites quantity
There is main body website and accesses two attributes, and what main body location was reflected is the province that the sponsor of website is located, and is connect
Enter location and refer to the province that the access of website is located.
The design and measuring and calculating of entering row index with two, whole nation region dimensions, from for province, a side are saved due to be directed to be divided to
Face will consider the Websites quantity of this province main body, on the other hand also consider the Websites quantity that this province accesses.The Websites quantity of main body
What is reflected is the situation of this province real economy " internet+", and the Websites quantity reflection of access is that this province supports " internet+"
Development.
2.2 website liveness
Website liveness refers to the number of times of internet user access website, in general, within the specific time period, accesses
Number of times is more, illustrates that website is more active, and corresponding " internet+" development is better.
Similar, website liveness is also respectively calculated according to website main body location and access location, to weigh
Certain province real economy " internet+" website is enlivened situation and supports " internet+" basic resource situation about being used (to enliven feelings
Condition).
2.3 website coverages
Website coverage refers to that IP address quantity is accessed in website, and in general, the IP address that website is accessed is more, explanation
The website scale is bigger, and possibility accessed by the user is bigger, therefore corresponding " internet+" development is better.
Website coverage also divides into website main body location and accesses two kinds points of location index and calculated.
3rd, dimensional analysis
3.1 region dimensions
" internet+" index of the whole nation, the dimension of each province is distinguished, in general, the whole nation overall " internet+" refers to
What number reflected is the general status or average case of national " internet+", and the reflection of each province " internet+" index is corresponding province
" internet+" index of part, the index in the good province of " internet+" development can be higher than " internet+" index in the whole nation.
3.2 industry dimensions
The development index of each industry in 12 industries in 1.1 is distinguished, while these indexes will further refine to
Certain is saved and the whole nation.For the development index of same industry, stand from the point of view of the visual angle for saving and national visual angle, it then follows the principle in 3.1;
For between different industries, national and each province can distinguish the gap of industry, that is, the index number system for designing should be able to real embodiment
" internet+" developmental difference of different industries.
4th, radix determines and release cycle
In general, development index always has the relative time for starting, and this time is the base period, and this time is corresponding
Index is base period index (abbreviation radix), and current time is current, and the corresponding index of current time is to work as phase index.In addition, removing
Outside the absolute number of phase index, sometimes also pay close attention to current than last situation of change (chain rate).
The development index issues index and the rate of change compared with last issue in units of the moon.
5th, development index design
In this index design, it is assumed that the data that all External Systems are provided all are accurate.Further, since to consider complete
State and the development of each province, therefore should meet when carrying out index design:
For a fixed province, for a specific industry, the current change with the base period of this province industry can be reflected
Change situation;
For a fixed province, for the whole industry, the current situation of change with the base period of this province whole industry can be reflected;
For between different provinces, identical industry can reflect index difference relative between the same period different province's same industries
It is different;
Relative index differential for the same period whole industry between different provinces, can be reflected;
For the whole nation, for a specific industry, same period whole nation totality index should be equal to each province's index sum;
For the whole nation, for the whole industry, same period whole nation totality index should be equal to each province's index sum.
Referring to Fig. 1, a kind of internet+development analyzed based on big data is provided in a preferred embodiment of the invention
Index calculation method, this method includes:
S110, collection each province's internet industry domain name resources, obtain original factor data, wherein, the original factor number
According to the trade classification including domain name data, the information content of website and website.
Specifically, domain name data is gathered, obtains enlivening domain-name information and user's visit capacity;
Web page crawl is carried out according to domain name data, the information content of website is obtained;
According to the information content of website to websites collection, the trade classification of website is obtained.
Specifically, when domain name data is gathered, enterprise's side domain name data and visit capacity data are gathered first, secondly collection is saved
Level domain name data and visit capacity data, gather again ministerial level domain name data and visit capacity data.
Wherein, enliven domain name packet to include:Enliven domain name, visit capacity, current time and affiliated province etc..Above-mentioned website domain
Name data include that the whole nation and each province more than 500 ten thousand enliven domain name data.
The information content of website includes:Web site name, website URL, the text message of website homepage, the framework of website homepage
Layout information and crawl time etc..
The trade classification of website includes:Government department, manufacturing industry, agricultural, the energy, finance, medical treatment, education, tourism, thing
Stream, ecommerce, 12 industries of traffic and real estate.
Optionally, according to the newly-increased classification of internet industry, the trade classification of the above-mentioned website of training extension automatically.Can be by
The trade classification of " internet+" is under different classifications, and the trade classification of extension website automatically when cannot sort out, and is suitable for model
Enclose wider.
By gathering internet domain name data, above-mentioned three major types original factor data are obtained, can be accurately completely anti-
Provincial and national internet+state of development is answered, and can accordingly estimate the development trend and development model of internet+industry
Deng.
S120, data cleansing and data correlation are carried out to original factor data, obtain available factor data.
Data cleansing is that data are audited and verified again.Inevitably produce in the big data of collection incomplete
Above-mentioned original factor data are carried out data cleansing by data, wrong data or duplicate data based on certain filtering rule, are washed
Fall " dirty data ".
Data correlation, after data cleansing is finished, for the ease of classifying and putting in storage, based on specific Data Identification to data
It is associated, so as to obtain the available factor data for being available for computation model to use.It should be noted that for above-mentioned original factor
Data, Data Identification is different.
Original factor data after data cleansing and data correlation, respectively count by the available factor data for obtaining by input
Calculate the index that each key element is obtained in model.
S130, structure internet+development index computation model.
Specifically, internet+development index computation model is built by the first dimension of base period index;
To build internet+development index computation model as the second dimension when phase index.
By base period index and current index construction computation model, can it is more directly perceived and dynamically reflect internet+
The situation of change of state of development and forthcoming generations.
Wherein, to include when phase index builds internet+development index computation model as the second dimension:
Build the current index computation model of the available factor data of the branch trade and the whole industry in each province and the whole nation;
Build the current index computation model of synthesis of the available factor data of the branch trade and the whole industry in each province and the whole nation.
If current was the i-th >=0 (i=0 is the base period) phase, if Ω={ government department, manufacturing industry, agricultural, the energy, finance, doctor
Treatment, education, tourism, logistics, ecommerce, traffic and real estate }, Ω1={ main body Websites quantity, main body website liveness is main
Body website coverage, quantity of entering web, liveness of entering web, coverage of entering web }.
1) build each province can with factor data when phase index when,
(1) branch trade works as phase index
(l can use factor data)=(current k saves l classes can be with factor data/base period whole nation website sum) * 105, k=
1,2 ... ..., 31, l ∈ Ω.
Wherein, website sum includes:IP address sum is accessed in website number sum, website liveness sum and website.
(2) whole industry works as phase index
2) when the available factor data in the structure whole nation is when phase index,
(1) branch trade index
Optionally, average index is:
(2) whole industry index
Optionally, average index is:
3) when structure each province can use the synthesis of factor data when phase index,
(1) branch trade index
(2) whole industry index
Wherein, when totality is calculated when phase index, weight is set to:
λMain body Websites quantity=12.5%, λMain body website liveness=25%, λMain body website coverage=12.5%,
λEnter web quantity=12.5%, λEnter web liveness=25%, λEnter web coverage=12.5%
4) when the synthesis of the available factor data in the whole nation is built when phase index,
(1) branch trade index
Optionally, average index is:
(2) whole industry index
Optionally, average index is:
Optionally, chain rate amplification computation model is built.
Specifically, build each province and the whole nation branch trade and the whole industry when phase index chain rate amplification computation model;
Build each province and the branch trade in the whole nation and the synthesis of the whole industry works as phase index chain rate amplification computation model.
If current is the i-th phase (i >=1),
5) when each province is built when phase index chain rate amplification,
(1) branch trade amplification
(2) whole industry amplification
6) when the whole nation is built when phase index chain rate amplification,
(1) branch trade amplification
(2) whole industry amplification
7) when each province's synthesis is built when phase index chain rate amplification,
(1) branch trade amplification
(2) whole industry amplification
8) when comprehensive national is built when phase index chain rate amplification,
(1) branch trade amplification
Ri(l)=(Ii(l)/Ii-1(l) -1) * 100%, l ∈ Ω, i >=1
(2) whole industry amplification
Ri=(Ii/Ii-1- 1) * 100%, i >=1
By building above-mentioned computation model, can by available factor data be calculated as needing when phase index and overall current
Index, can reflect index differential relative between current and the base period the situation of change of this province and different provinces, and can
Internet+the state of development and trend of single industry and the whole industry are reflected, each province and the whole nation are weighed in comprehensive and accurate quantization
Internet+state of development and development trend
S140, by available factor data be input into internet+development index computation model, be calculated internet+development and refer to
Number.
Specifically, calculate each factor data when phase index is as follows:
1 main body website quantitative indicator
The main body of website which save just according to be the website of which province calculating, and do not consider the actual access of the website
The situation on ground.
1.1 work as phase index
If current is the i-th >=0 phase (being the base period during i=0), then when phase index is:
1) province's index is divided:
(1) branch trade index
(2) whole industry index
2) national index:
(1) branch trade index
Combined index:
Average index:
(2) whole industry
Combined index:
Average index:
1.2 chain rate amplification
If current is the i-th phase (i >=1), then when phase index chain rate amplification is:
1) province's amplification is divided
(1) certain industry
(2) whole industry
2) national amplification
(1) certain industry
(2) whole industry
2 main body website liveness indexes
Main body website liveness is defined as access times summation of the main body website within a period, certain industry main net
Liveness of standing refers to the algebraical sum of all main body website liveness of the sector.
2.1 work as phase index
If current is the i-th >=0 phase (being the base period during i=0), then when phase index is:
1) province's index is divided:
(1) branch trade index
(2) whole industry index
2) national index:
(1) branch trade index
Combined index:
Average index:
(2) whole industry
Combined index:
Average index:
2.2 chain rate amplification
If current is the i-th phase (i >=1), then when phase index chain rate amplification is:
1) province's amplification is divided
(1) certain industry
(2) whole industry
2) national amplification
(1) certain industry
(2) whole industry
3 main body website coverage indexes
Using the number of the actual access IP address of main body website as the coverage of the website.Certain industry main body website
Coverage index is the algebraical sum accounting that IP address is accessed in all main body websites of the sector.
3.1 work as phase index
If current is the i-th >=0 phase (being the base period during i=0), then when phase index is:
1) province's index is divided:
(1) branch trade index
(2) whole industry index
2) national index:
(1) branch trade index
Combined index:
Average index:
(2) whole industry
Combined index:
Average index:
3.2 chain rate amplification
If current is the i-th phase (i >=1), then when phase index chain rate amplification is:
1) province's amplification is divided
(1) certain industry
(2) whole industry
2) national amplification
(1) certain industry
(2) whole industry
4 enter web quantitative indicator
The access of website which save just according to be which website for saving calculating, and do not consider the main body place of the website
The situation on ground.
4.1 work as phase index
If current is the i-th >=0 phase (being the base period during i=0), then when phase index is:
1) province's index is divided:
(1) branch trade index
(2) whole industry index
2) national index:
(1) branch trade index
Combined index:
Average index:
(2) whole industry index
Combined index:
Average index:
4.2 chain rate amplification
If current is the i-th phase (i >=1), then when phase index chain rate amplification is:
1) province's amplification is divided
(1) certain industry
(2) whole industry
2) national amplification
(1) certain industry
(2) whole industry
5 enter web liveness index
Liveness of entering web is defined as the access times summation that this was entered web within a period, certain industry access network
Liveness of standing refers to the algebraical sum of all liveness of entering web of the sector.
5.1 work as phase index
If current is the i-th >=0 phase (being the base period during i=0), then when phase index is:
1) province's index is divided:
(1) branch trade index
(2) whole industry index
2) national index:
(1) branch trade index
Combined index:
Average index:
(2) whole industry
Combined index:
Average index:
5.2 chain rate amplification
If current is the i-th phase (i >=1), then when phase index chain rate amplification is:
1) province's amplification is divided
(1) certain industry
(2) whole industry
2) national amplification
(1) certain industry
(2) whole industry
6 enter web coverage index
Using the number of actual access IP address entered web as the coverage of the website.What certain industry was entered web
Coverage index is that all the entering web of the sector accesses the algebraical sum accounting of IP address.
6.1 work as phase index
If current is the i-th >=0 phase (being the base period during i=0), then when phase index is:
1) province's index is divided:
(1) branch trade index
(2) whole industry index
2) national index:
(1) branch trade index
Combined index:
Average index:
(2) whole industry
Combined index:
Average index:
6.2 chain rate amplification
If current is the i-th phase (i >=1), then when phase index chain rate amplification is:
1) province's amplification is divided
(1) certain industry
(2) whole industry
2) national amplification
(1) certain industry
The whole industry
7 complex development indexes
7.1 work as phase index
If current is the i-th >=0 phase (being the base period during i=0), then when phase index is:
1) province's index is divided:
(1) branch trade index:
(2) whole industry index:
2) national index:
(1) branch trade index:
Combined index:
Average index:
(2) whole industry index:
Combined index:
Average index:
Average index:
7.2 chain rate amplification
If current is the i-th phase (i >=1), then when phase index chain rate amplification is:
1) province's amplification is divided
(1) by industry
(2) whole industry
2) national amplification
(1) by industry
Ri(l)=(Ii(l)/Ii-1(l) -1) * 100%, l ∈ Ω, i >=1
(2) whole industry
Ri=(Ii/Ii-1- 1) * 100%, i >=1
A kind of internet+development index computational methods analyzed based on big data provided in an embodiment of the present invention, by obtaining
The original factor data of internet industry domain name resources are taken, can be from domain name data, site information content and website trade classification
Internet+state of development is more accurately completely weighed Deng many-sided data;By obtain the whole nation and each province internet+
Data, can react the internet+state of development and development trend of the whole nation and each province, more comprehensively show and estimate internet
+ development;By building internet+development index computation model in current dimension and base period dimension, can estimate accordingly
Internet+and in the state of development and trend of specific period, internet+development is more assessed comprehensively and reasonably.
With reference to shown in Fig. 2, in another preferred embodiment of the present invention, there is provided a kind of interconnection analyzed based on big data
Net+development index computing system, the system includes:
Data acquisition module 210, for gathering each province's internet industry domain name resources, obtains original factor data, wherein,
The original factor data include the trade classification of domain name data, the information content of website and website;
Data analysis module 220, for carrying out data cleansing and data correlation to original factor data, obtains available key element
Data;
Index computing module 230, for available factor data to be input into into internet+development index computation model, calculates
To internet+development index.
Wherein, data acquisition module 210 specifically for:
Collection domain name data, obtains enlivening domain-name information and user's visit capacity;
Web page crawl is carried out according to domain name data, the information content of website is obtained;
According to the information content of website to websites collection, the trade classification of website is obtained.
The system also includes building the model construction module 240 of internet+development index computation model, specifically for:
Internet+development index computation model is built by the first dimension of base period index;
To build internet+development index computation model as the second dimension when phase index.
Wherein, model construction module 240 is additionally operable to:
Build the current index computation model of the available factor data of the branch trade and the whole industry in each province and the whole nation;
Build the current index computation model of synthesis of the available factor data of the branch trade and the whole industry in each province and the whole nation.
Wherein, model construction module 240 is additionally operable to:
Build the branch trade in each province and the whole nation and the available factor data of the whole industry when phase index chain rate amplification calculates mould
Type;
The synthesis for building the available factor data of the branch trade and the whole industry in each province and the whole nation works as phase index chain rate amplification meter
Calculate model.
A kind of internet+development index computing system analyzed based on big data provided in an embodiment of the present invention, by number
The trade classification of the domain name data of multiple each province, the information content of website and website is gathered according to acquisition module 210, based on big data
Can be derived that after analysis more fully with accurate development index;Carried out by the big data of 220 pairs of collections of data analysis module
Cleaning and association analysis, obtain available factor data, can further improve the accuracy of big data, convenient input
In computing module;By model construction module 240, the different models with regard to can use factor data are built, can be in all its bearings
Intuitively to show and weigh when the development of phase index, base period index, every profession and trade and the whole industry, the whole nation and each province;Pass through
What index computing module 230 can obtain the every profession and trade of the whole nation and each province and the whole industry works as phase index, weighs each province and the whole nation
Internet+development.
It should be noted that:Internet+development index the computational methods analyzed based on big data that the embodiment is provided
When, only it is illustrated with the division of each functional module, in practical application, can as desired by the function point
With being completed by different functional modules, will the internal structure of device be divided into different functional modules, to complete above description
All or part of function.In addition, internet+development index calculating the side analyzed based on big data that the embodiment is provided
Method and system belong to same design, and it implements process detailed in Example, repeats no more here.
One of ordinary skill in the art will appreciate that realizing all or part of step of the embodiment can pass through hardware
To complete, it is also possible to which the hardware that correlation is instructed by program is completed, and described program can be stored in a kind of computer-readable
In storage medium, the storage medium mentioned can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, not to limit the present invention, all spirit in the present invention and
Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.
Claims (10)
1. it is a kind of based on big data analyze internet+development index computational methods, it is characterised in that methods described includes:
Collection each province's internet industry domain name resources, obtain original factor data, wherein, the original factor data include domain name
The trade classification of data, the information content of website and website;
Data cleansing and data correlation are carried out to the original factor data, available factor data is obtained;
The available factor data is input into into internet+development index computation model, internet+development index is calculated.
2. method according to claim 1, it is characterised in that collection each province's internet industry domain name resources, obtains original
Factor data includes:
Collection domain name data, obtain enlivening domain-name information and user's visit capacity;
Web page crawl is carried out according to domain name data, the information content of the website is obtained;
According to the information content of the website to websites collection, the trade classification of the website is obtained.
3. method according to claim 2, it is characterised in that the information content of the website includes:Web site name, website
URL, the text message of website homepage, the framework layout information of website homepage and crawl the time.
4. method according to claim 2, it is characterised in that the trade classification of the website includes:Government department, manufacture
Industry, agricultural, the energy, finance, medical treatment, education, tourism, logistics, ecommerce, traffic and real estate.
5. method according to claim 1, it is characterised in that methods described also includes building internet+development index meter
Calculating model includes:
Internet+development index computation model is built by the first dimension of base period index;
To build internet+development index computation model as the second dimension when phase index.
6. method according to claim 5, it is characterised in that it is described with when phase index be the second dimension structure internet+
Development index computation model includes:
Build the current index computation model of the described available factor data of the branch trade and the whole industry in each province and the whole nation;
Build the current index computation model of synthesis of the described available factor data of the branch trade and the whole industry in each province and the whole nation.
7. method according to claim 6, it is characterised in that described to be the second dimension structure internet row when phase index
Industry basic resource development index computation model also includes:
Build the branch trade in each province and the whole nation and the described available factor data of the whole industry when phase index chain rate amplification calculates mould
Type;
The synthesis for building the described available factor data of the branch trade and the whole industry in each province and the whole nation works as phase index chain rate amplification meter
Calculate model.
8. it is a kind of based on big data analyze internet+development index computing system, it is characterised in that the system includes:
Data acquisition module, for gathering each province's internet industry domain name resources, obtains original factor data, wherein, the original
Beginning factor data includes the trade classification of domain name data, the information content of website and website;
Data analysis module, for carrying out data cleansing and data correlation to the original factor data, obtains can use and wants prime number
According to;
Index computing module, for the available factor data to be input into into internet+development index computation model, is calculated mutually
Networking+development index.
9. system according to claim 8, it is characterised in that the data acquisition module specifically for:
Collection domain name data, obtain enlivening domain-name information and user's visit capacity;
Web page crawl is carried out according to domain name data, the information content of the website is obtained;
According to the information content of the website to websites collection, the trade classification of the website is obtained.
10. system according to claim 6, it is characterised in that the system also includes building internet+development index meter
The model construction module of model is calculated, specifically for:
Internet+development index computation model is built by the first dimension of base period index;
To be that the second dimension builds internet+development index computation model when phase index, wherein, when phase index includes each province and entirely
The branch trade of state and the described available factor data of the whole industry when phase index, comprehensive when phase index, chain rate amplification work as phase index
Comprehensively work as phase index with chain rate amplification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610982627.0A CN106570168A (en) | 2016-11-08 | 2016-11-08 | Big data analysis-based internet + development index computing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610982627.0A CN106570168A (en) | 2016-11-08 | 2016-11-08 | Big data analysis-based internet + development index computing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106570168A true CN106570168A (en) | 2017-04-19 |
Family
ID=58540531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610982627.0A Pending CN106570168A (en) | 2016-11-08 | 2016-11-08 | Big data analysis-based internet + development index computing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106570168A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491374A (en) * | 2018-02-12 | 2018-09-04 | 郑长敬 | Dictionary construction method based on real estate industry and system |
CN108880883A (en) * | 2018-06-15 | 2018-11-23 | 恒安嘉新(北京)科技股份公司 | A kind of calculation method of the linking Internet Websites quantity based on the passive data of master |
CN109325675A (en) * | 2018-09-10 | 2019-02-12 | 北京电力交易中心有限公司 | A kind of capital electrical index calculation method and system |
CN113220967A (en) * | 2021-05-11 | 2021-08-06 | 北京百度网讯科技有限公司 | Method and device for measuring ecological health degree of Internet environment and electronic equipment |
-
2016
- 2016-11-08 CN CN201610982627.0A patent/CN106570168A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491374A (en) * | 2018-02-12 | 2018-09-04 | 郑长敬 | Dictionary construction method based on real estate industry and system |
CN108491374B (en) * | 2018-02-12 | 2022-05-27 | 郑长敬 | Word stock construction method and system based on real estate industry |
CN108880883A (en) * | 2018-06-15 | 2018-11-23 | 恒安嘉新(北京)科技股份公司 | A kind of calculation method of the linking Internet Websites quantity based on the passive data of master |
CN108880883B (en) * | 2018-06-15 | 2021-11-05 | 恒安嘉新(北京)科技股份公司 | Method for calculating number of internet access websites based on active and passive data |
CN109325675A (en) * | 2018-09-10 | 2019-02-12 | 北京电力交易中心有限公司 | A kind of capital electrical index calculation method and system |
CN113220967A (en) * | 2021-05-11 | 2021-08-06 | 北京百度网讯科技有限公司 | Method and device for measuring ecological health degree of Internet environment and electronic equipment |
CN113220967B (en) * | 2021-05-11 | 2023-09-22 | 北京百度网讯科技有限公司 | Ecological health degree measuring method and device for Internet environment and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021254027A1 (en) | Method and apparatus for identifying suspicious community, and storage medium and computer device | |
CN111523976B (en) | Commodity recommendation method and device, electronic equipment and storage medium | |
JP6211605B2 (en) | Ranking search results based on click-through rate | |
De Vos et al. | Diagnostic evaluation of conceptual rainfall–runoff models using temporal clustering | |
KR101329350B1 (en) | An updating method for betweenness centrality of graph | |
CN103886068B (en) | Data processing method and device for Internet user's behavioural analysis | |
CN106570168A (en) | Big data analysis-based internet + development index computing method | |
CN102254028A (en) | A personalized product recommendation method and system integrating attribute and structure similarity | |
CN106202108B (en) | Web crawlers grabs method for allocating tasks and device and data grab method and device | |
CN111488420B (en) | Decentralized micro-service regional flood warning water information system and its integration method | |
CN106411965A (en) | Method for determining network server providing counterfeit service, equipment and calculating equipment thereof | |
CN108960927A (en) | A kind of e-tailing development index system based on web crawlers and economic statistics | |
CN110135913A (en) | Training method, shop site selecting method and the device of shop site selection model | |
Simone et al. | Edge betweenness for water distribution networks domain analysis | |
CN110443265A (en) | A kind of behavioral value method and apparatus based on corporations | |
Kim et al. | Fractal tree analysis of drainage patterns | |
Wu et al. | Quantitative assessment of urban flood disaster vulnerability based on text data: Case study in Zhengzhou | |
CN109460398A (en) | Complementing method, device and the electronic equipment of time series data | |
CN106484496A (en) | Virtual machine BOTTOM LAYER ENVIRONMENT feature analysiss based on Bayesian network and performance metric method | |
CN117788158A (en) | Virtual currency address analysis method and system | |
CN105786810B (en) | The method for building up and device of classification mapping relations | |
Jiang et al. | Traveling salesman problems with PageRank Distance on complex networks reveal community structure | |
Xue et al. | Evaluating the impact of spatial variability of precipitation on streamflow simulation using a SWAT model | |
CN110601866B (en) | Flow analysis system, data acquisition device, data processing device and method | |
CN114461899A (en) | Collaborative filtering recommendation method and device for user, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100191 Beijing City, North Third Ring Road West, No. 27, building 25, room five, floor 5002 Applicant after: Heng Jia Jia (Beijing) Technology Co., Ltd. Address before: 100191 Beijing City, North Third Ring Road West, No. 27, building 25, room five, floor 5002 Applicant before: Eversec (Beijing) Technology Co., Ltd. |
|
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170419 |
|
RJ01 | Rejection of invention patent application after publication |