CN101621529B - High-efficient and low-cost loading method for heterogeneous mass data - Google Patents
High-efficient and low-cost loading method for heterogeneous mass data Download PDFInfo
- Publication number
- CN101621529B CN101621529B CN200810039896A CN200810039896A CN101621529B CN 101621529 B CN101621529 B CN 101621529B CN 200810039896 A CN200810039896 A CN 200810039896A CN 200810039896 A CN200810039896 A CN 200810039896A CN 101621529 B CN101621529 B CN 101621529B
- Authority
- CN
- China
- Prior art keywords
- file
- data
- interface
- heterogeneous mass
- stowage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011068 loading method Methods 0.000 title abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 9
- 230000002159 abnormal effect Effects 0.000 claims abstract description 5
- 150000001875 compounds Chemical class 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 2
- 238000011161 development Methods 0.000 abstract description 3
- 238000012795 verification Methods 0.000 abstract description 3
- 238000004140 cleaning Methods 0.000 abstract description 2
- 230000005540 biological transmission Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a high-efficiency and low-cost loading method for heterogeneous mass data, which comprises the following steps that: a heterogeneous data source is converted into a general unified flat file interface; a user transmits the data to an FTP server; an interface buffer table is built for an target database creating and an interface table are partitioned by dates; a warehousing loader program receives a file, and then detects whether a verification file is consistent with an original data file; if so, the file is warehoused and is marked to be the warehoused in the warehousing interface buffer table, and the file state is marked as normal; otherwise, the file is marked as abnormal and abnormal cause; and data in a buffer area is subjected to data cleaning, and is inserted to a formal warehousing interface table. Compared with the prior art, the method effectively solves the problem of loading the heterogeneous mass data, avoids purchasing third-party ETL software and employing professional technical personnel at high investment at the same time, and provides a solution for medium and small-sized companies to reduce development cost.
Description
Technical field
The present invention relates to mass data high efficiency and reliable and stable data loading pattern, particularly a kind of stowage of heterogeneous mass data high efficiency, low cost.
Background technology
The information system that the BI running is relied on is a complex data set that is made up of jointly legacy system, incompatible data source, database and application, can not exchange each other between the various piece.See that from this aspect the application system of operation is that enterprise is in the data that spent very big energy and financial resources structure, irreplaceable system, especially system at present.And newly-built BI system purpose will be assisted oneself decision-making through data analysis exactly, and exactly the source of these data, form are different, the difficulty that has caused system implementation, data to be integrated.At this moment; Enterprise hopes to have a comprehensive solution to free the predicament of oneself very much; Solve data consistency and integrated problem, thus can be from all traditional environment and platform image data, and utilize a single solution that it is changed efficiently.This solution is exactly ETL (extraction, transformation and loading).
Implement ETL; The one, adopt third-party instrument, for example Data Integrator, Data stage; Instruments such as Infomatica not only purchase cost are high; Also want special server software and hardware configuration, but also want the technological development personnel and the system maintenance personnel of specialty, these are beyond affordability for most of medium-sized and small enterprises.
Summary of the invention
The object of the invention is exactly in order to overcome the defective that above-mentioned prior art exists, and the stowage of a kind of high efficiency and reliable and stable heterogeneous mass data high efficiency, low cost is provided.
The object of the invention can be realized through following technical scheme: a kind of stowage of heterogeneous mass data high efficiency, low cost, it is characterized in that, and may further comprise the steps:
(1) converts heterogeneous data source to general unified flat file interface;
(2) user is sent to ftp server with data;
(3) target database is created the egress buffer table and docked oral thermometer subregion by date;
(4) after the warehouse-in load module was received file, whether the detection check file was consistent with raw data file;
(5) if the testing result in the step (4) is for being, then warehouse-in, and sign is put warehousing interface buffer table and tab file state in storage for normally;
(6) if the testing result in the step (4) for not, then indicates unusual and abnormal cause;
(7) buffer data is carried out data cleansing and be inserted in the formal warehousing interface table.
This method adopts high parallel directapath to carry out data and loads.
The attribute of described warehousing interface buffer table is for walking abreast closing journal, no index peace treaty bundle.
Described flat file is a compressed file.
Described warehousing interface table adopts compound subregion, and the attribute of described warehousing interface table is for walking abreast closing journal, no index peace treaty bundle.
The compound subregion of described warehousing interface table is at first per diem carrying out subregion to data, secondly according to two subregions of tabulating in ID end.
This method generates interface warehouse-in condition monitoring form automatically.
Compared with prior art, the present invention has effectively solved the problem that heterogeneous mass data loads, and has avoided high investment buying third party's ETL software and professional and technical personnel simultaneously, for small-to-medium business reduces development cost solution is provided.
Description of drawings
Fig. 1 is the flow chart of the stowage of a kind of heterogeneous mass data high efficiency, low cost of the present invention;
Fig. 2 is the hardware configuration sketch map of the stowage of a kind of heterogeneous mass data high efficiency, low cost of the present invention.
Embodiment
As shown in Figure 1, the present invention relates to a kind of stowage of heterogeneous mass data high efficiency, low cost, this method comprises that other manufacturer is sent to ftp server (being in same local area network (LAN) with database server) through CLIENT-FTP with data; Warehouse-in is after load module receives file, inspection verification file and raw data file, if unanimity then puts in storage, and sign puts database caches table and tab file state in storage for normal, otherwise sign is unusual and abnormal cause; Final step is carried out data cleansing with buffer data and is inserted in the formal interface table.
A kind of heterogeneous mass data loads processing method, and this method heterogeneous data source is integrated;
(1) adopt high parallel directapath data to load; Increase warehouse-in buffer interface table, to warehousing interface table subregion per diem;
(2) there is the several data source, merges data from a plurality of storage;
(3) data dump with the different pieces of information source arrives plane cvs file, and interfield is separated with tab;
(4) the plane cvs file that generates is compressed, reduce the Network Transmission flow, improve communication efficiency;
(5) compressed file that generates is transferred to ftp server through Internet or Internat;
(6) warehousing interface table is increased one deck egress buffer table, its attribute is for walking abreast closing journal, no index peace treaty bundle;
(7) warehousing interface table is adopted compound subregion, at first logarithm is executed per diem subregion factually, on the basis of subregion per diem, implements according to two subregions of tabulating in ID end.Its Table Properties is for walking abreast closing journal, no index peace treaty bundle;
(8) interface data adopts the directapath loaded in parallel to the egress buffer table through the SQLLDR order, and the docking port buffer table carries out data cleansing then, adopts the parallel directapath of minimum daily record to be inserted into the interface table;
(9) generate interface warehouse-in condition monitoring form.
This method comprises and converts heterogeneous data source to general unified flat file interface; The coffret file is carried out Compression and Transmission to the target ftp server; Target database is created the egress buffer table and docked oral thermometer subregion by date; Write shell script warehouse-in script, and warehouse-in daily record and verification file are analyzed, generate the warehousing interface summary sheet.
Compare with existing ETL instrument, use for reference the theory of third party ETL instrument, the existing technician of comprehensive utilization our company; Framework a kind of heterogeneous mass data stowage; Carry out uniform data flat file structure dump output at heterogeneous data source, solved the inconsistency of integrated interface management maintenance aspect, different pieces of information source, aspect file transfer, adopt File Compress; Greatly improved efficiency of transmission; Conveniently increased the egress buffer table at the interface warehouse-in, be convenient to us and adopt high parallel directapath loading to provide convenience, docked per diem subregion of oral thermometer simultaneously with it; The detail that only keeps nearest 5 (can finely tune according to actual needs) sky, more the unusual interface document of mass data is mended to pass or cover and has been stayed buffering; Aspect the data loading, adopt shell script, being simple and easy to property of technology at last.
As shown in Figure 2, hardware of the present invention mainly is divided into three layers:
Ground floor: heterogeneous data source dump Compression and Transmission: the data dump in different pieces of information source is arrived general unified flat file, after compression, be transferred to the ftp server through Internet or Internat.
The second layer: shell script control warehouse-in flow process and generation warehousing interface summary sheet: adopt parallel through exploitation Shell script; Be loaded into the 3rd layer that to launch below us after the directapath stowage decompresses interface document, and finish the back at the total interface warehouse-in and generate interface document warehouse-in state consolidated statement.
The 3rd layer: interface table physical Design: the butt joint oral thermometer is subregion per diem; Increased one deck excuse buffer table simultaneously; At first data loading is to the egress buffer table, then data are gone and the cleaning that is listed as after clip to subregion target date, and deletion date subregion and data the earliest.
Claims (7)
1. the stowage of a heterogeneous mass data is characterized in that, may further comprise the steps:
(1) converts heterogeneous data source to general unified flat file interface;
(2) user is sent to ftp server with data;
(3) target database is created the egress buffer table and docked oral thermometer subregion by date;
(4) after the warehouse-in load module was received file, whether the detection check file was consistent with raw data file;
(5) if the testing result in the step (4) is for being, then warehouse-in, and sign is put warehousing interface buffer table and tab file state in storage for normally;
(6) if the testing result in the step (4) for not, then indicates unusual and abnormal cause;
(7) buffer data is carried out data cleansing and be inserted in the formal warehousing interface table.
2. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, this method adopts high parallel directapath to carry out data and loads.
3. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, the attribute of described warehousing interface buffer table is for walking abreast closing journal, no index peace treaty bundle.
4. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, described flat file is a compressed file.
5. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, described warehousing interface table adopts compound subregion, and the attribute of described warehousing interface table is for walking abreast closing journal, no index peace treaty bundle.
6. the stowage of a kind of heterogeneous mass data according to claim 5 is characterized in that, the compound subregion of described warehousing interface table is at first per diem carrying out subregion to data, secondly according to two subregions of tabulating in ID end.
7. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, this method generates interface warehouse-in condition monitoring form automatically.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810039896A CN101621529B (en) | 2008-06-30 | 2008-06-30 | High-efficient and low-cost loading method for heterogeneous mass data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810039896A CN101621529B (en) | 2008-06-30 | 2008-06-30 | High-efficient and low-cost loading method for heterogeneous mass data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101621529A CN101621529A (en) | 2010-01-06 |
CN101621529B true CN101621529B (en) | 2012-10-10 |
Family
ID=41514570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200810039896A Active CN101621529B (en) | 2008-06-30 | 2008-06-30 | High-efficient and low-cost loading method for heterogeneous mass data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101621529B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101800663B (en) * | 2010-02-09 | 2011-12-21 | 中国电信股份有限公司 | interface buffering method and system |
CN102411569A (en) * | 2010-09-20 | 2012-04-11 | 上海众融信息技术有限公司 | Database conversion and cleaning information processing method |
CN102497353B (en) * | 2011-10-28 | 2015-08-26 | 深圳第七大道网络技术有限公司 | Multi-server distributed data processing method, server and system |
CN102591725A (en) * | 2011-12-20 | 2012-07-18 | 浙江鸿程计算机系统有限公司 | Method for multithreading data interchange among heterogeneous databases |
CN104462562B (en) * | 2014-12-29 | 2018-05-18 | 浪潮软件集团有限公司 | Data migration system and method based on data warehouse automation |
CN104750814B (en) * | 2015-03-30 | 2019-03-05 | 大连理工大学 | Automatic storage method of multi-heterogeneous data stream based on multi-sensor |
CN105068805B (en) * | 2015-08-07 | 2018-09-11 | 北京思特奇信息技术股份有限公司 | The method and system of data auditing during a kind of data migration |
CN108205732A (en) * | 2017-12-26 | 2018-06-26 | 云南电网有限责任公司 | A kind of method of calibration of the new energy prediction data access based on file |
CN108776710B (en) * | 2018-06-28 | 2020-06-30 | 农信银资金清算中心有限责任公司 | Concurrent loading method and device for database data |
CN109635023B (en) * | 2018-11-13 | 2021-01-15 | 广州欧赛斯信息科技有限公司 | Lightweight custom source data decomposition reading system and method based on ETL |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1897025A (en) * | 2006-04-27 | 2007-01-17 | 南京联创科技股份有限公司 | Parallel ETL technology of multi-thread working pack in mass data process |
WO2007085634A1 (en) * | 2006-01-26 | 2007-08-02 | International Business Machines Corporation | Autonomic recommendation and placement of materialized query tables for load distribution |
CN101086732A (en) * | 2006-06-11 | 2007-12-12 | 上海全成通信技术有限公司 | A high magnitude of data management method |
-
2008
- 2008-06-30 CN CN200810039896A patent/CN101621529B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007085634A1 (en) * | 2006-01-26 | 2007-08-02 | International Business Machines Corporation | Autonomic recommendation and placement of materialized query tables for load distribution |
CN1897025A (en) * | 2006-04-27 | 2007-01-17 | 南京联创科技股份有限公司 | Parallel ETL technology of multi-thread working pack in mass data process |
CN101086732A (en) * | 2006-06-11 | 2007-12-12 | 上海全成通信技术有限公司 | A high magnitude of data management method |
Also Published As
Publication number | Publication date |
---|---|
CN101621529A (en) | 2010-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101621529B (en) | High-efficient and low-cost loading method for heterogeneous mass data | |
US10748092B2 (en) | Systems and methods for creating intuitive context for analysis data | |
US20240193175A1 (en) | Generic Data Staging and Loading Using Enhanced Metadata and Associated Method | |
US8812403B2 (en) | Long term workflow management | |
US7774301B2 (en) | Use of federation services and transformation services to perform extract, transform, and load (ETL) of unstructured information and associated metadata | |
CN101086777A (en) | Method and system for capturing and reusing intellectual capital in it management | |
WO2006026659A2 (en) | Services oriented architecture for data integration services | |
CN107918836A (en) | A kind of intelligence clearance system and method | |
CN104915262A (en) | A verification system and method based on EXCEL data structure | |
CN108038617A (en) | A kind of intellectual property operating service plateform system | |
EP1577758A2 (en) | User interfaces and software reuse in model based software systems | |
CN112734102A (en) | Cloud manufacturing service system based on industrial cooperation matching and resource sharing business | |
CN106130929B (en) | The service message automatic processing method and system of internet insurance field based on graph-theoretical algorithm | |
CN113806332B (en) | Heterogeneous system integrated data processing method and device and computer equipment | |
EP4081911A1 (en) | Edge table representation of processes | |
CN111290855A (en) | GPU card management method, system and storage medium for multi-GPU server in distributed environment | |
CN116108030A (en) | Enterprise platform data association method based on alliance chain | |
CN108764607A (en) | User month data reinspection method, apparatus, equipment and storage medium | |
CN112669131A (en) | Intelligent account checking method, device, equipment and storage medium | |
CN109584009A (en) | A kind of website data automatic patching system | |
Aprijal et al. | Implementation Unified Modeling Language in the Development E-mail Application System | |
CN116384710B (en) | Decision-making systems, media and electronic equipment based on customer and asset management | |
KR102529547B1 (en) | Big-data collection device and method for data linkage automation | |
Sunindyo et al. | A process model discovery approach for enabling model interoperability in signal engineering | |
Béchade et al. | Common variable immunodeficiency and celiac disease |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |