[go: up one dir, main page]

CN101621529B - High-efficient and low-cost loading method for heterogeneous mass data - Google Patents

High-efficient and low-cost loading method for heterogeneous mass data Download PDF

Info

Publication number
CN101621529B
CN101621529B CN200810039896A CN200810039896A CN101621529B CN 101621529 B CN101621529 B CN 101621529B CN 200810039896 A CN200810039896 A CN 200810039896A CN 200810039896 A CN200810039896 A CN 200810039896A CN 101621529 B CN101621529 B CN 101621529B
Authority
CN
China
Prior art keywords
file
data
interface
heterogeneous mass
stowage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200810039896A
Other languages
Chinese (zh)
Other versions
CN101621529A (en
Inventor
冯谧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUCCESSFULL TELECOM TECHNOLOGY Co Ltd
Original Assignee
SUCCESSFULL TELECOM TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUCCESSFULL TELECOM TECHNOLOGY Co Ltd filed Critical SUCCESSFULL TELECOM TECHNOLOGY Co Ltd
Priority to CN200810039896A priority Critical patent/CN101621529B/en
Publication of CN101621529A publication Critical patent/CN101621529A/en
Application granted granted Critical
Publication of CN101621529B publication Critical patent/CN101621529B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a high-efficiency and low-cost loading method for heterogeneous mass data, which comprises the following steps that: a heterogeneous data source is converted into a general unified flat file interface; a user transmits the data to an FTP server; an interface buffer table is built for an target database creating and an interface table are partitioned by dates; a warehousing loader program receives a file, and then detects whether a verification file is consistent with an original data file; if so, the file is warehoused and is marked to be the warehoused in the warehousing interface buffer table, and the file state is marked as normal; otherwise, the file is marked as abnormal and abnormal cause; and data in a buffer area is subjected to data cleaning, and is inserted to a formal warehousing interface table. Compared with the prior art, the method effectively solves the problem of loading the heterogeneous mass data, avoids purchasing third-party ETL software and employing professional technical personnel at high investment at the same time, and provides a solution for medium and small-sized companies to reduce development cost.

Description

A kind of stowage of heterogeneous mass data high efficiency, low cost
Technical field
The present invention relates to mass data high efficiency and reliable and stable data loading pattern, particularly a kind of stowage of heterogeneous mass data high efficiency, low cost.
Background technology
The information system that the BI running is relied on is a complex data set that is made up of jointly legacy system, incompatible data source, database and application, can not exchange each other between the various piece.See that from this aspect the application system of operation is that enterprise is in the data that spent very big energy and financial resources structure, irreplaceable system, especially system at present.And newly-built BI system purpose will be assisted oneself decision-making through data analysis exactly, and exactly the source of these data, form are different, the difficulty that has caused system implementation, data to be integrated.At this moment; Enterprise hopes to have a comprehensive solution to free the predicament of oneself very much; Solve data consistency and integrated problem, thus can be from all traditional environment and platform image data, and utilize a single solution that it is changed efficiently.This solution is exactly ETL (extraction, transformation and loading).
Implement ETL; The one, adopt third-party instrument, for example Data Integrator, Data stage; Instruments such as Infomatica not only purchase cost are high; Also want special server software and hardware configuration, but also want the technological development personnel and the system maintenance personnel of specialty, these are beyond affordability for most of medium-sized and small enterprises.
Summary of the invention
The object of the invention is exactly in order to overcome the defective that above-mentioned prior art exists, and the stowage of a kind of high efficiency and reliable and stable heterogeneous mass data high efficiency, low cost is provided.
The object of the invention can be realized through following technical scheme: a kind of stowage of heterogeneous mass data high efficiency, low cost, it is characterized in that, and may further comprise the steps:
(1) converts heterogeneous data source to general unified flat file interface;
(2) user is sent to ftp server with data;
(3) target database is created the egress buffer table and docked oral thermometer subregion by date;
(4) after the warehouse-in load module was received file, whether the detection check file was consistent with raw data file;
(5) if the testing result in the step (4) is for being, then warehouse-in, and sign is put warehousing interface buffer table and tab file state in storage for normally;
(6) if the testing result in the step (4) for not, then indicates unusual and abnormal cause;
(7) buffer data is carried out data cleansing and be inserted in the formal warehousing interface table.
This method adopts high parallel directapath to carry out data and loads.
The attribute of described warehousing interface buffer table is for walking abreast closing journal, no index peace treaty bundle.
Described flat file is a compressed file.
Described warehousing interface table adopts compound subregion, and the attribute of described warehousing interface table is for walking abreast closing journal, no index peace treaty bundle.
The compound subregion of described warehousing interface table is at first per diem carrying out subregion to data, secondly according to two subregions of tabulating in ID end.
This method generates interface warehouse-in condition monitoring form automatically.
Compared with prior art, the present invention has effectively solved the problem that heterogeneous mass data loads, and has avoided high investment buying third party's ETL software and professional and technical personnel simultaneously, for small-to-medium business reduces development cost solution is provided.
Description of drawings
Fig. 1 is the flow chart of the stowage of a kind of heterogeneous mass data high efficiency, low cost of the present invention;
Fig. 2 is the hardware configuration sketch map of the stowage of a kind of heterogeneous mass data high efficiency, low cost of the present invention.
Embodiment
As shown in Figure 1, the present invention relates to a kind of stowage of heterogeneous mass data high efficiency, low cost, this method comprises that other manufacturer is sent to ftp server (being in same local area network (LAN) with database server) through CLIENT-FTP with data; Warehouse-in is after load module receives file, inspection verification file and raw data file, if unanimity then puts in storage, and sign puts database caches table and tab file state in storage for normal, otherwise sign is unusual and abnormal cause; Final step is carried out data cleansing with buffer data and is inserted in the formal interface table.
A kind of heterogeneous mass data loads processing method, and this method heterogeneous data source is integrated;
(1) adopt high parallel directapath data to load; Increase warehouse-in buffer interface table, to warehousing interface table subregion per diem;
(2) there is the several data source, merges data from a plurality of storage;
(3) data dump with the different pieces of information source arrives plane cvs file, and interfield is separated with tab;
(4) the plane cvs file that generates is compressed, reduce the Network Transmission flow, improve communication efficiency;
(5) compressed file that generates is transferred to ftp server through Internet or Internat;
(6) warehousing interface table is increased one deck egress buffer table, its attribute is for walking abreast closing journal, no index peace treaty bundle;
(7) warehousing interface table is adopted compound subregion, at first logarithm is executed per diem subregion factually, on the basis of subregion per diem, implements according to two subregions of tabulating in ID end.Its Table Properties is for walking abreast closing journal, no index peace treaty bundle;
(8) interface data adopts the directapath loaded in parallel to the egress buffer table through the SQLLDR order, and the docking port buffer table carries out data cleansing then, adopts the parallel directapath of minimum daily record to be inserted into the interface table;
(9) generate interface warehouse-in condition monitoring form.
This method comprises and converts heterogeneous data source to general unified flat file interface; The coffret file is carried out Compression and Transmission to the target ftp server; Target database is created the egress buffer table and docked oral thermometer subregion by date; Write shell script warehouse-in script, and warehouse-in daily record and verification file are analyzed, generate the warehousing interface summary sheet.
Compare with existing ETL instrument, use for reference the theory of third party ETL instrument, the existing technician of comprehensive utilization our company; Framework a kind of heterogeneous mass data stowage; Carry out uniform data flat file structure dump output at heterogeneous data source, solved the inconsistency of integrated interface management maintenance aspect, different pieces of information source, aspect file transfer, adopt File Compress; Greatly improved efficiency of transmission; Conveniently increased the egress buffer table at the interface warehouse-in, be convenient to us and adopt high parallel directapath loading to provide convenience, docked per diem subregion of oral thermometer simultaneously with it; The detail that only keeps nearest 5 (can finely tune according to actual needs) sky, more the unusual interface document of mass data is mended to pass or cover and has been stayed buffering; Aspect the data loading, adopt shell script, being simple and easy to property of technology at last.
As shown in Figure 2, hardware of the present invention mainly is divided into three layers:
Ground floor: heterogeneous data source dump Compression and Transmission: the data dump in different pieces of information source is arrived general unified flat file, after compression, be transferred to the ftp server through Internet or Internat.
The second layer: shell script control warehouse-in flow process and generation warehousing interface summary sheet: adopt parallel through exploitation Shell script; Be loaded into the 3rd layer that to launch below us after the directapath stowage decompresses interface document, and finish the back at the total interface warehouse-in and generate interface document warehouse-in state consolidated statement.
The 3rd layer: interface table physical Design: the butt joint oral thermometer is subregion per diem; Increased one deck excuse buffer table simultaneously; At first data loading is to the egress buffer table, then data are gone and the cleaning that is listed as after clip to subregion target date, and deletion date subregion and data the earliest.

Claims (7)

1. the stowage of a heterogeneous mass data is characterized in that, may further comprise the steps:
(1) converts heterogeneous data source to general unified flat file interface;
(2) user is sent to ftp server with data;
(3) target database is created the egress buffer table and docked oral thermometer subregion by date;
(4) after the warehouse-in load module was received file, whether the detection check file was consistent with raw data file;
(5) if the testing result in the step (4) is for being, then warehouse-in, and sign is put warehousing interface buffer table and tab file state in storage for normally;
(6) if the testing result in the step (4) for not, then indicates unusual and abnormal cause;
(7) buffer data is carried out data cleansing and be inserted in the formal warehousing interface table.
2. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, this method adopts high parallel directapath to carry out data and loads.
3. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, the attribute of described warehousing interface buffer table is for walking abreast closing journal, no index peace treaty bundle.
4. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, described flat file is a compressed file.
5. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, described warehousing interface table adopts compound subregion, and the attribute of described warehousing interface table is for walking abreast closing journal, no index peace treaty bundle.
6. the stowage of a kind of heterogeneous mass data according to claim 5 is characterized in that, the compound subregion of described warehousing interface table is at first per diem carrying out subregion to data, secondly according to two subregions of tabulating in ID end.
7. the stowage of a kind of heterogeneous mass data according to claim 1 is characterized in that, this method generates interface warehouse-in condition monitoring form automatically.
CN200810039896A 2008-06-30 2008-06-30 High-efficient and low-cost loading method for heterogeneous mass data Active CN101621529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810039896A CN101621529B (en) 2008-06-30 2008-06-30 High-efficient and low-cost loading method for heterogeneous mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810039896A CN101621529B (en) 2008-06-30 2008-06-30 High-efficient and low-cost loading method for heterogeneous mass data

Publications (2)

Publication Number Publication Date
CN101621529A CN101621529A (en) 2010-01-06
CN101621529B true CN101621529B (en) 2012-10-10

Family

ID=41514570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810039896A Active CN101621529B (en) 2008-06-30 2008-06-30 High-efficient and low-cost loading method for heterogeneous mass data

Country Status (1)

Country Link
CN (1) CN101621529B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101800663B (en) * 2010-02-09 2011-12-21 中国电信股份有限公司 interface buffering method and system
CN102411569A (en) * 2010-09-20 2012-04-11 上海众融信息技术有限公司 Database conversion and cleaning information processing method
CN102497353B (en) * 2011-10-28 2015-08-26 深圳第七大道网络技术有限公司 Multi-server distributed data processing method, server and system
CN102591725A (en) * 2011-12-20 2012-07-18 浙江鸿程计算机系统有限公司 Method for multithreading data interchange among heterogeneous databases
CN104462562B (en) * 2014-12-29 2018-05-18 浪潮软件集团有限公司 Data migration system and method based on data warehouse automation
CN104750814B (en) * 2015-03-30 2019-03-05 大连理工大学 Automatic storage method of multi-heterogeneous data stream based on multi-sensor
CN105068805B (en) * 2015-08-07 2018-09-11 北京思特奇信息技术股份有限公司 The method and system of data auditing during a kind of data migration
CN108205732A (en) * 2017-12-26 2018-06-26 云南电网有限责任公司 A kind of method of calibration of the new energy prediction data access based on file
CN108776710B (en) * 2018-06-28 2020-06-30 农信银资金清算中心有限责任公司 Concurrent loading method and device for database data
CN109635023B (en) * 2018-11-13 2021-01-15 广州欧赛斯信息科技有限公司 Lightweight custom source data decomposition reading system and method based on ETL

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897025A (en) * 2006-04-27 2007-01-17 南京联创科技股份有限公司 Parallel ETL technology of multi-thread working pack in mass data process
WO2007085634A1 (en) * 2006-01-26 2007-08-02 International Business Machines Corporation Autonomic recommendation and placement of materialized query tables for load distribution
CN101086732A (en) * 2006-06-11 2007-12-12 上海全成通信技术有限公司 A high magnitude of data management method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007085634A1 (en) * 2006-01-26 2007-08-02 International Business Machines Corporation Autonomic recommendation and placement of materialized query tables for load distribution
CN1897025A (en) * 2006-04-27 2007-01-17 南京联创科技股份有限公司 Parallel ETL technology of multi-thread working pack in mass data process
CN101086732A (en) * 2006-06-11 2007-12-12 上海全成通信技术有限公司 A high magnitude of data management method

Also Published As

Publication number Publication date
CN101621529A (en) 2010-01-06

Similar Documents

Publication Publication Date Title
CN101621529B (en) High-efficient and low-cost loading method for heterogeneous mass data
US10748092B2 (en) Systems and methods for creating intuitive context for analysis data
US20240193175A1 (en) Generic Data Staging and Loading Using Enhanced Metadata and Associated Method
US8812403B2 (en) Long term workflow management
US7774301B2 (en) Use of federation services and transformation services to perform extract, transform, and load (ETL) of unstructured information and associated metadata
CN101086777A (en) Method and system for capturing and reusing intellectual capital in it management
WO2006026659A2 (en) Services oriented architecture for data integration services
CN107918836A (en) A kind of intelligence clearance system and method
CN104915262A (en) A verification system and method based on EXCEL data structure
CN108038617A (en) A kind of intellectual property operating service plateform system
EP1577758A2 (en) User interfaces and software reuse in model based software systems
CN112734102A (en) Cloud manufacturing service system based on industrial cooperation matching and resource sharing business
CN106130929B (en) The service message automatic processing method and system of internet insurance field based on graph-theoretical algorithm
CN113806332B (en) Heterogeneous system integrated data processing method and device and computer equipment
EP4081911A1 (en) Edge table representation of processes
CN111290855A (en) GPU card management method, system and storage medium for multi-GPU server in distributed environment
CN116108030A (en) Enterprise platform data association method based on alliance chain
CN108764607A (en) User month data reinspection method, apparatus, equipment and storage medium
CN112669131A (en) Intelligent account checking method, device, equipment and storage medium
CN109584009A (en) A kind of website data automatic patching system
Aprijal et al. Implementation Unified Modeling Language in the Development E-mail Application System
CN116384710B (en) Decision-making systems, media and electronic equipment based on customer and asset management
KR102529547B1 (en) Big-data collection device and method for data linkage automation
Sunindyo et al. A process model discovery approach for enabling model interoperability in signal engineering
Béchade et al. Common variable immunodeficiency and celiac disease

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant