[go: up one dir, main page]

CN113792084A - Data heat analysis method, device, equipment and storage medium - Google Patents

Data heat analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN113792084A
CN113792084A CN202110925776.4A CN202110925776A CN113792084A CN 113792084 A CN113792084 A CN 113792084A CN 202110925776 A CN202110925776 A CN 202110925776A CN 113792084 A CN113792084 A CN 113792084A
Authority
CN
China
Prior art keywords
data
information
heat
counted
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110925776.4A
Other languages
Chinese (zh)
Inventor
徐小康
蔡抒扬
夏曙东
孙智彬
张志平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Transwiseway Information Technology Co Ltd
Original Assignee
Beijing Transwiseway Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Transwiseway Information Technology Co Ltd filed Critical Beijing Transwiseway Information Technology Co Ltd
Priority to CN202110925776.4A priority Critical patent/CN113792084A/en
Publication of CN113792084A publication Critical patent/CN113792084A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种数据热度的分析方法、装置、设备及存储介质,所述方法包括接收数据仓库中待统计数据表的查询信息、引用信息、用户互动信息、业务属性重要程度信息、数据发布时间信息以及数据使用时间信息;根据预训练的线性回归模型计算所述待统计数据表的查询信息、引用信息、用户互动信息以及业务属性重要程度信息的数据维度数值;根据所述数据维度数值、数据发布时间信息、数据使用时间信息计算数据的热度。根据本实施例提供的数据热度的分析方法,综合考虑了数据的多个维度信息,而且使用线性回归算法为模型,通过模型计算出各个维度的权重,得到准确率较高的热度值。

Figure 202110925776

The invention discloses a data heat analysis method, device, equipment and storage medium. The method includes receiving query information, reference information, user interaction information, business attribute importance information, and data publishing of a data table to be counted in a data warehouse. time information and data usage time information; calculate the data dimension values of the query information, reference information, user interaction information, and business attribute importance information of the data table to be counted according to the pre-trained linear regression model; according to the data dimension values, The data release time information and the data usage time information are used to calculate the popularity of the data. According to the data heat analysis method provided in this embodiment, multiple dimension information of the data is comprehensively considered, and a linear regression algorithm is used as a model, and the weight of each dimension is calculated through the model to obtain a heat value with high accuracy.

Figure 202110925776

Description

Data heat analysis method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data analysis technologies, and in particular, to a method, an apparatus, a device, and a storage medium for analyzing data heat.
Background
With the development of big data, the demand of the data is more and more abundant, people can utilize the data to the maximum extent through various researches on the data, wherein the heat of the data can intuitively reflect the influence range and the importance of the data.
In the prior art, rules are mainly used for calculating the heat value, as the newly-increased business of a data warehouse in the field of freight transportation is frequent, the weight threshold value of the data dimension is difficult to set scientifically, and the data heat is determined only by calculating the access times, so that the influence in multiple aspects cannot be considered, and the accurate heat value cannot be obtained.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device, equipment and a storage medium for analyzing data heat. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present disclosure provides a method for analyzing data heat, including:
receiving query information, reference information, user interaction information, business attribute importance degree information, data release time information and data use time information of a data table to be counted in a data warehouse;
calculating data dimension values of query information, reference information, user interaction information and service attribute importance degree information of a data table to be counted according to a pre-trained linear regression model;
and calculating the heat of the data according to the data dimension value, the data release time information and the data use time information.
In one embodiment, after receiving the user interaction information of the data table to be counted in the data warehouse, the method further includes:
calculating the exposure times, browsing times, praise times and user scores of the data table to be counted;
and calculating a user interaction information value according to the exposure times, the browsing times, the praise times and the user scores.
In one embodiment, after receiving the business attribute importance information of the data table to be counted in the data warehouse, the method further includes:
acquiring a service attribute category corresponding to a data table to be counted;
and inquiring the corresponding business attribute importance degree value according to the business attribute category.
In one embodiment, before calculating the data dimension value according to the pre-trained linear regression model, the method further comprises:
carrying out data dimension numerical value labeling on query information, reference information, user interaction information and service attribute importance degree information of a plurality of data tables;
dividing the marked data into a training set and a test set;
and training the linear regression model according to the training set and the test set to obtain the trained linear regression model.
In one embodiment, the formula for the linear regression model is as follows:
S=w0+w1x1+w2x2+w3x3+w4x4
wherein S represents a data dimension value, x1Representing the number of times the data table is queried, x, within a preset period2Representing the number of times the data table is referenced within a preset period, x3Representing a value of user interaction information, x4Representing the importance value, w, of the service attribute0...w4Representing a weight parameter.
In one embodiment, calculating the heat of the data according to the data dimension value, the data publishing time information and the data using time information comprises calculating the heat of the data according to the following formula:
Figure BDA0003209131490000021
wherein f represents the data heat, s represents the data dimension value, MageInhours represents the difference between the data publishing time and the current time, and MusedTimeInHour represents the latest using time of the data.
In one embodiment, after calculating the heat of the data, the method further comprises:
and sequencing all the data in the data warehouse from high to low according to the corresponding heat values, and pushing a preset number of data with the heat values ranked in the top to a client for display.
In a second aspect, an embodiment of the present disclosure provides an apparatus for analyzing data heat, including:
the data receiving module is used for receiving query information, reference information, user interaction information, service attribute importance degree information, data release time information and data use time information of a data table to be counted in the data warehouse;
the first calculation module is used for calculating data dimension values of query information, reference information, user interaction information and service attribute importance degree information of a data table to be counted according to a pre-trained linear regression model;
and the second calculation module is used for calculating the heat of the data according to the data dimension value, the data release time information and the data use time information.
In a third aspect, the present disclosure provides an apparatus for analyzing data heat, including a processor and a memory storing program instructions, where the processor is configured to execute the method for analyzing data heat provided by the foregoing embodiments when executing the program instructions.
In a fourth aspect, the disclosed embodiments provide a computer-readable medium, on which computer-readable instructions are stored, the computer-readable instructions being executable by a processor to implement a method for analyzing data heat provided by the above embodiments.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the data heat degree analysis method provided by the embodiment of the disclosure, data information of multiple dimensions such as query times, reference times, release time, user behaviors and business importance degrees of data are comprehensively considered, a linear regression algorithm is used as a model, the weight of each data dimension is calculated through the model, a heat degree value with high accuracy is obtained, and the method is more suitable for a calculation scene of data heat degree in a data warehouse in the field of freight transportation.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic flow chart diagram illustrating a method for analyzing data heat in accordance with an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating a data relationship according to an exemplary embodiment;
FIG. 3 is a block diagram illustrating an apparatus for analyzing data heat according to an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating an analysis device of data heat according to an exemplary embodiment;
FIG. 5 is a schematic diagram illustrating a computer storage medium in accordance with an exemplary embodiment.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the data heat analysis method in this embodiment, a dimension more suitable for the data heat of the data warehouse is adopted, a linear regression is used to select a suitable dimension weight, the mart data of the data warehouse and the bottom layer on which the mart data depends are used as a mutual momentum dimension, and a user calculates the heat using multiple dimensions such as the dimension, so that the dimension expansion is better compatible, and the method is more suitable for the data warehouse scene.
Fig. 1 is a schematic flow chart illustrating a method for analyzing data heat according to an exemplary embodiment, and referring to fig. 1, the method specifically includes the following steps.
S101, receiving query information, reference information, user interaction information, service attribute importance degree information, data release time information and data use time information of a data table to be counted in a data warehouse.
In this embodiment, for example, the analysis of the data heat in the freight field data warehouse is taken as an example, and in order to improve the accuracy of the data heat statistics, statistics is performed through data of multiple dimensions.
Firstly, acquiring query information of a data table to be counted in a data warehouse, carrying out grammar analysis on an sq1 statement script through the sq1 statement script executed on a computing platform by a user to generate grammar data, then extracting a used table, and accumulating the times of the used table to obtain the query times of the data table to be counted.
And further, calculating the number of references according to the blood relationship of the data table to be counted. Including building the data relationship of the table according to the user executing the sql script program. The table kindred relationship is constructed as follows:
i. collecting DML statements and insert DDL statements;
II, syntax analysis is carried out on the statements collected in the step i, and an abstract syntax tree is generated;
traversing the syntax tree to acquire inputTable and outputTable information in the syntax tree;
and iv, building the relationship obtained in the step iii into the blood relationship of the tree structure.
Fig. 2 is a schematic diagram illustrating a data relationship according to an exemplary embodiment, and as shown in fig. 2, the data table2 refers to the data table, and the data table3 and the data table4 refer to the data table, and the number of times of reference of the data table to be counted can be calculated according to the relationship between the data table and the blood relationship.
Further, the method also comprises the steps of collecting user interaction information of the data table to be counted, and calculating a user interaction value according to the received user interaction information, wherein the steps comprise: and calculating the exposure times, the browsing times, the praise times and the user scores of the data table to be counted, and calculating the user interaction information value according to the exposure times, the browsing times, the praise times and the user scores. In one possible implementation, the user interaction information value is calculated according to the following formula:
Figure BDA0003209131490000051
furthermore, the method also comprises the step of collecting business attribute importance degree information of the data table to be counted, wherein the business attribute importance degree information shows the importance degree of the business corresponding to the data table, for example, in the field of freight transportation, the importance degree corresponding to vehicle data is larger, the importance degree corresponding to financial data is smaller, and the description is carried out through a business attribute importance degree value.
In one embodiment, a service importance information table is set according to the importance of the service corresponding to the data, and the service importance information table stores different types of services and importance values corresponding to the different types of services. As shown in the following table:
business Service attribute importance degree value
Vehicle with a steering wheel 100
User' s 80
Enterprise 60
Others 40
The method comprises the steps of obtaining a service attribute type corresponding to a data table to be counted, inquiring a service importance degree information table according to the service attribute type to obtain a corresponding service attribute importance degree value, obtaining high-heat data which more accord with the application field by obtaining service attribute information, and improving the applicability of the data.
And finally, collecting and storing the time information of the data, wherein the time information comprises the publishing time information of the data and the using time information of the data.
According to the step, the data of multiple dimensions of the data table to be counted are collected and analyzed, and the accuracy and the applicability of heat calculation are improved.
S102, calculating data dimension values of query information, reference information, user interaction information and service attribute importance degree information of the data table to be counted according to the pre-trained linear regression model.
In order to improve the calculation accuracy, the embodiment of the present disclosure calculates the weight of each dimension data by using a linear regression model, and obtains the numerical value of each data dimension.
Specifically, firstly, query information, reference information, user interaction information and service attribute importance information of a large number of data tables are obtained, and query times, reference times, user interaction information values and service attribute importance values of each data table are calculated according to the obtained data.
And then, evaluating the heat degree of the table structure by professional service personnel, and manually labeling the acquired data set to obtain labeled data. Preprocessing the labeled data, for example, attaching zero to null data, deleting abnormal data, normalizing the dimension data to obtain preprocessed labeled data, dividing the preprocessed labeled data into a training set and a test set, and training a linear regression model to obtain a trained linear regression model. The dimension weight parameters in the embodiment are obtained based on a large amount of data training, are more suitable for application scenarios, and solve the problem that manual labeling and rule setting are inaccurate in the prior art.
In one embodiment, the formula for the linear regression model is as follows:
S=w0+w1x1+w2x2+w3x3+w4x4
wherein S represents a data dimension value, x1Representing the number of times the data table is queried, x, within a preset period2Representing the number of times the data table is referenced within a preset period, x3Representing user interactionsInformation value, x4Representing the importance value, w, of the service attribute0...w4Representing a weight parameter.
In an alternative embodiment, the data dimension may be extended according to the following formula:
S=w0+w1x1+w2x2+w3x3+w4x4+…+wnxn
wherein x isnRepresenting an extended data dimension, s representing a data dimension value, x1Representing the number of times the data table is queried, x, within a preset period2Representing the number of times the data table is referenced within a preset period, x3Representing a value of user interaction information, x4Representing the importance value, w, of the service attribute0...wnThe weight parameters are expressed, the linear regression model provided by the embodiment of the disclosure is well compatible with dimension expansion, and a person skilled in the art can expand data dimensions by himself or herself according to practical application.
In a possible implementation manner, when the linear regression model is trained, model parameter selection can be performed through 5-fold cross validation and gridding hyper-parameters, the optimal performance is selected through the RMSE value to test on the test set, and model parameters are adjusted until the effect on the test set reaches the optimal effect and approaches the effect of the training set, so that the trained linear regression model is obtained.
Wherein, the 5-fold cross validation method comprises the following steps of 1: dividing the data set into 5 parts; step 2: selecting one part of the test set as a test set, and taking the other four parts of the test set as a training set; and step 3: and 2, performing the step 5 times, wherein the selected test set is different each time. By performing cross validation and evaluation of the model, the accuracy of model training can be improved.
After the trained linear regression model is obtained, the query times, the reference times, the user interaction information values and the business attribute importance degree values of the data table to be counted in the step S101 are input into the linear regression model to obtain data dimension values.
S103, calculating the heat of the data according to the data dimension value, the data release time information and the data use time information.
In one embodiment, calculating the heat of the data according to the data dimension value, the data publishing time information and the data using time information comprises calculating the heat of the data according to the following formula:
Figure BDA0003209131490000071
wherein f represents the data heat, s represents the data dimension value, MageInhours represents the difference between the data publishing time and the current time, and MusedTimeInHour represents the latest using time of the data.
In an optional embodiment, after the heat of the data is calculated, the heat value is stored in the metadata, and the method further includes sorting all the data in the data warehouse from high to low according to the corresponding heat values, and pushing a preset number of data with the top heat value to the client for display. When a user searches or views data assets, the data with higher heat can be better displayed, so that the user can quickly search the assets which are interested in the user. Furthermore, after the data is released, the value of the assets can be evaluated through the heat degree, resources are put into the iterative development of the data assets with higher heat degree, the quality of the data is improved, and the scene of data application is expanded.
In an optional embodiment, after the heat degree of the data is calculated, the method further includes obtaining types of the data table, classifying the data according to the types, sorting the data in each type from high to low according to the heat degree value, and adding a preset number of data with the heat degree value ranked in the top to the heat degree information table. And storing the data in the heat information table in a classified manner to obtain the data with higher heat in each data type. The data with higher heat can be better displayed, so that the user can quickly search the assets which are interested by the user.
In an optional embodiment, with the development of a large data platform, large data centers such as large-scale data warehouses and data lakes are increasingly common, the data centers also bring storage and performance pressure while continuously settling data, and therefore after the heat degree of the data is calculated, the method further comprises the steps of cleaning the data with the heat degree value lower than a preset heat degree threshold in a preset period, for example, acquiring the storage time of the data, and automatically cleaning the data when the storage time of the data is larger than the preset time degree threshold and the heat degree of the data is lower than the preset heat degree threshold. And the data meeting the cleaning condition can be sent to a manager, and the data is cleaned after a deleting instruction of the manager is received.
The data heat degree analysis method provided by the embodiment of the disclosure considers not only the query use times of data, but also the data of multiple dimensions such as the service attribute importance degree of the data, the user interaction information and the like, uses a linear regression algorithm as a model, calculates the weight of each data dimension through the model, can continuously train and adjust the model in the actual use process, obtains the weight parameters which more accord with the data in the freight transportation field, and further obtains the heat degree value with higher accuracy. And convenience is brought to later data application and expansion.
The embodiment of the present disclosure further provides an apparatus for analyzing data heat, which is configured to execute the method for analyzing data heat of the foregoing embodiment, as shown in fig. 3, the apparatus includes:
the data receiving module 301 is configured to receive query information, reference information, user interaction information, service attribute importance information, data publishing time information, and data using time information of a data table to be counted in a data warehouse;
the first calculation module 302 is configured to calculate data dimension values of query information, reference information, user interaction information, and service attribute importance information of a data table to be counted according to a pre-trained linear regression model;
the second calculating module 303 is configured to calculate the heat of the data according to the data dimension value, the data publishing time information, and the data using time information.
It should be noted that, when the data heat analysis apparatus provided in the foregoing embodiment executes the data heat analysis method, only the division of the functional modules is taken as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules, so as to complete all or part of the functions described above. In addition, the data heat degree analysis device provided by the above embodiment and the data heat degree analysis method embodiment belong to the same concept, and the detailed implementation process is shown in the method embodiment, which is not described herein again.
The embodiment of the present disclosure further provides an electronic device corresponding to the method for analyzing data heat provided by the foregoing embodiment, so as to execute the method for analyzing data heat.
Referring to fig. 4, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 4, the electronic apparatus includes: a processor 400, a memory 401, a bus 402 and a communication interface 403, wherein the processor 400, the communication interface 403 and the memory 401 are connected through the bus 402; the memory 401 stores a computer program that can be executed on the processor 400, and the processor 400 executes the computer program to perform the method for analyzing the heat of data provided by any of the foregoing embodiments of the present application.
The Memory 401 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 403 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 402 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 401 is used for storing a program, and the processor 400 executes the program after receiving an execution instruction, and the method for analyzing data heat disclosed in any of the foregoing embodiments of the present application may be applied to the processor 400, or implemented by the processor 400.
Processor 400 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 400. The Processor 400 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 401, and the processor 400 reads the information in the memory 401 and completes the steps of the method in combination with the hardware.
The electronic device provided by the embodiment of the application and the method for analyzing the data heat provided by the embodiment of the application have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 5, the computer readable storage medium is an optical disc 500, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program performs the method for analyzing the heat of data provided by any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above embodiments of the present application and the method for analyzing data heat provided by the embodiments of the present application have the same beneficial effects as the method adopted, run or implemented by the application program stored in the computer-readable storage medium.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for analyzing data heat is characterized by comprising the following steps:
receiving query information, reference information, user interaction information, business attribute importance degree information, data release time information and data use time information of a data table to be counted in a data warehouse;
calculating data dimension values of query information, reference information, user interaction information and service attribute importance degree information of the data table to be counted according to a pre-trained linear regression model;
and calculating the heat of the data according to the data dimension value, the data release time information and the data use time information.
2. The method of claim 1, further comprising, after receiving user interaction information for the data table to be counted in the data repository:
calculating the exposure times, browsing times, praise times and user scores of the data table to be counted;
and calculating a user interaction information value according to the exposure times, the browsing times, the praise times and the user scores.
3. The method of claim 1, after receiving the business attribute importance information of the data table to be counted in the data warehouse, further comprising:
acquiring a service attribute category corresponding to the data table to be counted;
and inquiring the corresponding service attribute importance degree value according to the service attribute category.
4. The method of claim 1, wherein prior to computing the data dimension values from a pre-trained linear regression model, further comprising:
carrying out data dimension numerical value labeling on query information, reference information, user interaction information and service attribute importance degree information of a plurality of data tables;
dividing the marked data into a training set and a test set;
and training the linear regression model according to the training set and the test set to obtain a trained linear regression model.
5. The method of any one of claims 1-4, wherein the linear regression model has the formula:
S=w0+w1x1+w2x2+w3x3+w4x4
wherein S represents the data dimension value, x1Representing the number of times the data table is queried, x, within a preset period2Representing the number of times the data table is referenced within a preset period, x3Representing a value of user interaction information, x4Representing the importance value, w, of the service attribute0…w4Representing a weight parameter.
6. The method of claim 1, wherein calculating the heat of the data according to the data dimension value, the data publishing time information and the data using time information comprises calculating the heat of the data according to the following formula:
Figure FDA0003209131480000021
wherein f represents the data heat, s represents the data dimension value, MageInhours represents the difference between the data publishing time and the current time, and MusedTimeInHour represents the latest using time of the data.
7. The method of claim 1, wherein after calculating the heat of the data, further comprising:
and sequencing all the data in the data warehouse from high to low according to the corresponding heat values, and pushing a preset number of data with the heat values ranked in the top to a client for display.
8. An apparatus for analyzing a heat degree of data, comprising:
the data receiving module is used for receiving query information, reference information, user interaction information, service attribute importance degree information, data release time information and data use time information of a data table to be counted in the data warehouse;
the first calculation module is used for calculating data dimension values of query information, reference information, user interaction information and service attribute importance degree information of the data table to be counted according to a pre-trained linear regression model;
and the second calculation module is used for calculating the heat of the data according to the data dimension value, the data release time information and the data use time information.
9. An apparatus for analysing data heat, comprising a processor and a memory having stored thereon program instructions, the processor being configured, on execution of the program instructions, to perform a method of analysing data heat according to any of claims 1 to 7.
10. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement a method of data heat analysis as claimed in any one of claims 1 to 7.
CN202110925776.4A 2021-08-12 2021-08-12 Data heat analysis method, device, equipment and storage medium Pending CN113792084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110925776.4A CN113792084A (en) 2021-08-12 2021-08-12 Data heat analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110925776.4A CN113792084A (en) 2021-08-12 2021-08-12 Data heat analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113792084A true CN113792084A (en) 2021-12-14

Family

ID=78875997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110925776.4A Pending CN113792084A (en) 2021-08-12 2021-08-12 Data heat analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113792084A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987372A (en) * 2021-12-27 2022-01-28 昆仑智汇数据科技(北京)有限公司 Hot data acquisition method, device and equipment of domain business object model
CN114722243A (en) * 2022-04-15 2022-07-08 北京科杰科技有限公司 Data table sorting method and device, electronic equipment and storage medium
CN114971289A (en) * 2022-05-26 2022-08-30 国网安徽省电力有限公司信息通信分公司 A data resource intelligent recommendation system based on heat analysis
CN115617923A (en) * 2022-09-09 2023-01-17 中国银行股份有限公司 A data value evaluation method, device and equipment
CN115757480A (en) * 2022-11-15 2023-03-07 平安付科技服务有限公司 Data preheating method and device, storage medium, computer equipment
CN116775808A (en) * 2023-06-26 2023-09-19 中国建设银行股份有限公司 Data processing method and device, equipment, medium and product thereof
CN118445330A (en) * 2024-04-29 2024-08-06 中电云计算技术有限公司 A table dimension statistical caliber calculation method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 A method and device for calculating the heat value of information
CN109325781A (en) * 2018-09-04 2019-02-12 中国平安人寿保险股份有限公司 Client's Quality Analysis Methods, device, computer equipment and storage medium
CN111311321A (en) * 2020-02-14 2020-06-19 北京百度网讯科技有限公司 User consumption behavior prediction model training method, device, equipment and storage medium
CN111915156A (en) * 2020-07-14 2020-11-10 中国联合网络通信集团有限公司 User value-based service push method, electronic device and storage medium
CN112559504A (en) * 2020-12-09 2021-03-26 北京思特奇信息技术股份有限公司 Data cleaning method and device based on data heat and storage medium
CN112883267A (en) * 2021-02-22 2021-06-01 深圳市星网储区块链有限公司 Data heat degree statistical method and device based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 A method and device for calculating the heat value of information
CN109325781A (en) * 2018-09-04 2019-02-12 中国平安人寿保险股份有限公司 Client's Quality Analysis Methods, device, computer equipment and storage medium
CN111311321A (en) * 2020-02-14 2020-06-19 北京百度网讯科技有限公司 User consumption behavior prediction model training method, device, equipment and storage medium
CN111915156A (en) * 2020-07-14 2020-11-10 中国联合网络通信集团有限公司 User value-based service push method, electronic device and storage medium
CN112559504A (en) * 2020-12-09 2021-03-26 北京思特奇信息技术股份有限公司 Data cleaning method and device based on data heat and storage medium
CN112883267A (en) * 2021-02-22 2021-06-01 深圳市星网储区块链有限公司 Data heat degree statistical method and device based on deep learning

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987372A (en) * 2021-12-27 2022-01-28 昆仑智汇数据科技(北京)有限公司 Hot data acquisition method, device and equipment of domain business object model
CN114722243A (en) * 2022-04-15 2022-07-08 北京科杰科技有限公司 Data table sorting method and device, electronic equipment and storage medium
CN114971289A (en) * 2022-05-26 2022-08-30 国网安徽省电力有限公司信息通信分公司 A data resource intelligent recommendation system based on heat analysis
CN115617923A (en) * 2022-09-09 2023-01-17 中国银行股份有限公司 A data value evaluation method, device and equipment
CN115757480A (en) * 2022-11-15 2023-03-07 平安付科技服务有限公司 Data preheating method and device, storage medium, computer equipment
CN116775808A (en) * 2023-06-26 2023-09-19 中国建设银行股份有限公司 Data processing method and device, equipment, medium and product thereof
CN118445330A (en) * 2024-04-29 2024-08-06 中电云计算技术有限公司 A table dimension statistical caliber calculation method and system

Similar Documents

Publication Publication Date Title
CN113792084A (en) Data heat analysis method, device, equipment and storage medium
US11238065B1 (en) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
US11734233B2 (en) Method for classifying an unmanaged dataset
CN111724238B (en) Method, device and equipment for evaluating product recommendation accuracy and storage medium
CN107122467B (en) Search engine retrieval result evaluation method and device and computer readable medium
WO2019214248A1 (en) Risk assessment method and apparatus, terminal device, and storage medium
CN107862022B (en) Cultural resource recommendation system
CN103970753B (en) The method for pushing and device of association knowledge
JP2017512344A (en) System and method for rapid data analysis
JP2015537296A (en) Data profiling using location information
CN111310052A (en) User portrait construction method, device and computer-readable storage medium
CN107273519A (en) Data analysis method, device, terminal and storage medium
CN113704236A (en) Government affair system data quality evaluation method, device, terminal and storage medium
CN111242318A (en) Business model training method and device based on heterogeneous feature library
CN108229999B (en) Method and device for evaluating competitive products
CN116362823A (en) Recommendation model training method, recommendation method and recommendation device for behavior sparse scene
CN116245580A (en) Data asset value acquisition method, device, equipment, medium and program product
Del Bianco et al. Model-based early and rapid estimation of COSMIC functional size–An experimental evaluation
CN111414410A (en) Data processing method, device, equipment and storage medium
CN112949963B (en) Method, device, storage medium and intelligent device for evaluating employee service quality
CN110232119B (en) Meta-analysis-based general intelligent measurement model construction method and system
US11366833B2 (en) Augmenting project data with searchable metadata for facilitating project queries
CN111782691B (en) Method, device, electronic device and storage medium for determining index correlation
CN114049016B (en) Index similarity judgment method, system, terminal device and computer storage medium
CN110033184A (en) A kind of operation flow recommended method and device based on metadata

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211214