CN104320460A - Big data processing method - Google Patents
Big data processing method Download PDFInfo
- Publication number
- CN104320460A CN104320460A CN201410577834.9A CN201410577834A CN104320460A CN 104320460 A CN104320460 A CN 104320460A CN 201410577834 A CN201410577834 A CN 201410577834A CN 104320460 A CN104320460 A CN 104320460A
- Authority
- CN
- China
- Prior art keywords
- user
- processing method
- data
- data processing
- hadoop
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 15
- 230000003993 interaction Effects 0.000 claims description 3
- 239000003638 chemical reducing agent Substances 0.000 description 5
- 238000007405 data analysis Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 2
- 230000032683 aging Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a big data processing method which includes the following steps: building a Hadoop cluster on an Open Stack cloud platform to provide basic environment for big data processing; importing data into HDFS and Swift to build a data source; processing the data built in the step 2 by a user, displaying a processing result through the Web page or assigning the processing result to the output file under the specific route. By means of the big data processing method based on Open Stack and Hadoop, the server resource utilization rate is improved, and the big data access requirement is reduced.
Description
Technical field
The invention belongs to large technical field of data processing, relate to a kind of large data processing method.
Background technology
More prevalent along with the Network Information epoch, mobile Internet, social networks, ecommerce have expanded boundary and the application of the Internet greatly, we are in " large data " epoch of a data explosive increase, large data are in social economy, politics, culture, the aspects such as people's life produce far-reaching influence, and the data controling power of large data age to the mankind proposes new Oppertunities and challenges.Large data have magnanimity, diversity, high speed, mutability, data type are various, and data value relative density is low, ageing requires high, beyond the disposal ability of traditional data base handling system.Under cover valuable pattern and information in data, utilize the mode of traditional data processing, excavate the information in large data, needs to take a long time and huge cost, even cannot process some data.The tide that cloud computing and large data revolution start, driven the development of data analysis industry, cloud computing provides base platform, and large market demand operates on this platform, and this is one of method of generally acknowledging the large data of process very efficiently at present.Utilize cloud computing to carry out large data analysis, one of development trend in the future certainly will be become.Wherein be applied as the large data analysis of representative with Hadoop, be best suited for one of business run on cloud platform.
OpenStack is that one is increased income cloud computing technology, its main task be simplify cloud deployment and bring good extensibility for it.
Conveniently carry out Treatment Analysis to large data fast, therefrom excavate the value of data, we propose a kind of new processing method and OpenSatck Sahara, utilize Openstack Sahara can fast and the information excavated in large data of low cost.
Summary of the invention
The object of this invention is to provide a kind of large data processing method, improve the resource utilization of server, and reduce the access threshold of large data.
Technical scheme of the present invention is, a kind of large data processing method, specifically implements according to following steps:
Step 1, openstack cloud platform creates Hadoop cluster, provides the Essential Environment of large data processing;
Step 2, by creating data source by data importing to HDFS and Swift;
Step 3, user processes the data in the data source created in step 2, and result is shown by Web page or result be assigned to the output file under particular path.
Feature of the present invention is also,
Step 1 is specifically implemented according to following steps:
Step 1.1, user applies for OpenStack account, and uses OpenStack account to log in OpenStack cloud platform;
Step 1.2, user uploads mirror image to OpenStack cloud platform and registers mirror image;
Step 1.3, user creates network and route, node group module and cluster template;
Step 1.4, user by selecting Plugin and Hadoop version, fills in cluster name, selects cluster template, foundation image, double secret key and network to create Hadoop cluster.
In step 2, data source comprises HDFS data source and Swift data source.
In step 3, user carries out process to data and comprises user interface process method and order line processing method,
User interface process method refers to carries out man-machine interaction by user interface, creates Job Binaries and Job, and performs operation, check execution result by web page;
Order line processing method refers to that user is under Command Line Interface, is submitted to and is performed operation, checked by the output file under the specified path of Output rusults by order.
In step 3, user adopts the Map-Reduce framework of Hadoop to process data.
The invention has the beneficial effects as follows, utilize Sahara can in openstack cloud environment rapid deployment Hadoop cluster, as the bridge of cloud computing and large data, the integration of openstack cloud platform and Hadoop can be promoted, thus can fast and the information excavated in large data of low cost, improve the resource utilization of server, greatly reduce again the access threshold of large data, it is one of methods of the large data of process very efficiently that large market demand operates on cloud platform.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of a kind of large data processing method of the present invention;
Fig. 2 is the schematic diagram of Hadoop cluster constructive process in the inventive method;
Fig. 3 is the schematic flow sheet of Map-Reduce processing method in the inventive method.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
A kind of large data processing method of the present invention, as shown in Figure 1, comprises the following steps:
Step 1, OpenStack cloud platform creates Hadoop cluster, provides the Essential Environment of large data processing;
Wherein, as shown in Figure 2, step 1 is specifically implemented according to following steps:
Step 1.1, user applies for OpenStack account, and uses OpenStack account to log in OpenStack cloud platform;
Step 1.2, user uploads mirror image to OpenStack cloud platform and registers mirror image;
Step 1.3, user creates network and route, node group module and cluster template;
Step 1.4, user by selecting Plugin and Hadoop version, fills in cluster name, selects cluster template, foundation image, double secret key and network to create Hadoop cluster;
Step 2, by creating data source by data importing to HDFS and Swift;
Wherein, in step 2, data source comprises HDFS data source and Swift data source,
HDFS data source comprises input/output data source name, selects data source types HDFS, I/O URL path.
Swift data source comprises input/output data source name, selection data source types Swift, I/O URL path, input username and password.
Step 3, user can be processed data by two kinds of methods, and one carries out man-machine interaction by user interface, creates Job Binaries, creates job, performs job, checks execution result by web; One is by Command Line Interface, and user, under Command Line Interface, is submitted to by order and performed operation, being checked by the output file under the specified path of Output rusults.Concrete data processing is the Map-Reduce framework adopting Hadoop.Map-reduce is exactly the decomposition of task and gathering of result.Processing procedure is as shown in Figure 3:
The Map stage: Hadoop Map/Reduce framework produces a map task for each InputSplit, and each InputSplit is produced by the InputFormat of this operation; Framework can the value (value) of all pilot processs associated with a specific key point in groups, after the output of Mapper is sorted, is just allocated to each Reducer
Reduce stage: Reducer has 3 Main Stage: shuffle, sort and reduce.The input of Shuffle Reducer is exactly Mapper sorted output.In this stage, framework is that each Reducer obtains piecemeal associated in all Mapper output by HTTP.
In this stage of Sort, the input of value to Reducer according to key is divided into groups by framework (because may have identical key in the output of different mapper).
Two stages of Shuffle and Sort carry out simultaneously; The output of map is also be retrieved while merged.
Reduce is in this stage, and framework is each <key in the input data of having divided into groups, and (list of values) > is to calling a reduce method.The output of Reduce task is normally by calling OutputCollector.collect writing in files system.
Claims (5)
1. a large data processing method, is characterized in that, specifically implements according to following steps:
Step 1, OpenStack cloud platform creates Hadoop cluster, provides the Essential Environment of large data processing;
Step 2, by creating data source by data importing to HDFS and Swift;
Step 3, user processes the data in the data source created in step 2, and result is shown by Web page or result be assigned to the output file under particular path.
2. the large data processing method of one according to claim 1, is characterized in that, described step 1 is specifically implemented according to following steps:
Step 1.1, user applies for OpenStack account, and uses OpenStack account to log in OpenStack cloud platform;
Step 1.2, user uploads mirror image to OpenStack cloud platform and registers mirror image;
Step 1.3, user creates network and route, node group module and cluster template;
Step 1.4, user by selecting Plugin and Hadoop version, fills in cluster name, selects cluster template, foundation image, double secret key and network to create Hadoop cluster.
3. the large data processing method of one according to claim 1, is characterized in that, in step 2, data source comprises HDFS data source and Swift data source.
4. the large data processing method of one according to claim 1, is characterized in that, in described step 3, user carries out process to data and comprises user interface process method and order line processing method,
Described user interface process method refers to carries out man-machine interaction by user interface, creates Job Binaries and Job, and performs operation, check execution result by web page;
Described order line processing method refers to that user is under Command Line Interface, is submitted to and is performed operation, checked by the output file under the specified path of Output rusults by order.
5. the large data processing method of the one according to Claims 1-4 any one, is characterized in that, in step 3, user adopts the Map-Reduce framework of Hadoop to process data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410577834.9A CN104320460A (en) | 2014-10-24 | 2014-10-24 | Big data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410577834.9A CN104320460A (en) | 2014-10-24 | 2014-10-24 | Big data processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104320460A true CN104320460A (en) | 2015-01-28 |
Family
ID=52375629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410577834.9A Pending CN104320460A (en) | 2014-10-24 | 2014-10-24 | Big data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104320460A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104734892A (en) * | 2015-04-02 | 2015-06-24 | 江苏物联网研究发展中心 | Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack |
CN104767813A (en) * | 2015-04-08 | 2015-07-08 | 江苏国盾科技实业有限责任公司 | Public bank big data service platform based on openstack |
CN106971011A (en) * | 2017-05-19 | 2017-07-21 | 肇庆市智高电机有限公司 | A kind of big data analysis method based on cloud platform |
CN108241722A (en) * | 2016-12-23 | 2018-07-03 | 北京金山云网络技术有限公司 | A data processing system, method and device |
CN110647379A (en) * | 2018-06-27 | 2020-01-03 | 复旦大学 | Hadoop cluster automatic scaling deployment and Plugin deployment method based on OpenStack cloud |
CN113341899A (en) * | 2015-10-09 | 2021-09-03 | 费希尔-罗斯蒙特系统公司 | Distributed industrial performance monitoring and analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130332568A1 (en) * | 2012-06-11 | 2013-12-12 | France Telecom | Method of data processing by a navigation module |
CN103561061A (en) * | 2013-10-17 | 2014-02-05 | 南京邮电大学 | Flexible cloud data mining platform deploying method |
CN104065716A (en) * | 2014-06-18 | 2014-09-24 | 江苏物联网研究发展中心 | OpenStack based Hadoop service providing method |
-
2014
- 2014-10-24 CN CN201410577834.9A patent/CN104320460A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130332568A1 (en) * | 2012-06-11 | 2013-12-12 | France Telecom | Method of data processing by a navigation module |
CN103561061A (en) * | 2013-10-17 | 2014-02-05 | 南京邮电大学 | Flexible cloud data mining platform deploying method |
CN104065716A (en) * | 2014-06-18 | 2014-09-24 | 江苏物联网研究发展中心 | OpenStack based Hadoop service providing method |
Non-Patent Citations (3)
Title |
---|
ECCP研发团队博客: "ECCP云平台 Hadoop集群使用文档", 《HTTPS://WWW.TUICOOL.COM/ARTICLES/JRI3EYM》 * |
一棹凌烟: ""Sahara浅析"系列之三——Sahara使用方式", 《HTTPS://WWW.TUICOOL.COM/ARTICLES/7RBQNUN》 * |
杨赛: "OpenStack大数据项目Sahara概述", 《HTTP://WWW.INFOQ.COM/CN/NEWS/2014/04/OPENSTACK-SAHARA/》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104734892A (en) * | 2015-04-02 | 2015-06-24 | 江苏物联网研究发展中心 | Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack |
CN104767813A (en) * | 2015-04-08 | 2015-07-08 | 江苏国盾科技实业有限责任公司 | Public bank big data service platform based on openstack |
CN104767813B (en) * | 2015-04-08 | 2018-06-08 | 江苏国盾科技实业有限责任公司 | Public's row big data service platform based on openstack |
CN113341899A (en) * | 2015-10-09 | 2021-09-03 | 费希尔-罗斯蒙特系统公司 | Distributed industrial performance monitoring and analysis |
CN108241722A (en) * | 2016-12-23 | 2018-07-03 | 北京金山云网络技术有限公司 | A data processing system, method and device |
CN106971011A (en) * | 2017-05-19 | 2017-07-21 | 肇庆市智高电机有限公司 | A kind of big data analysis method based on cloud platform |
CN110647379A (en) * | 2018-06-27 | 2020-01-03 | 复旦大学 | Hadoop cluster automatic scaling deployment and Plugin deployment method based on OpenStack cloud |
CN110647379B (en) * | 2018-06-27 | 2023-10-17 | 复旦大学 | Automatic scaling deployment and Plugin deployment method of Hadoop cluster based on OpenStack cloud |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104320460A (en) | Big data processing method | |
MX2020014293A (en) | Artificial intelligence-based generation of sequencing metadata. | |
KR20210036226A (en) | A distributed computing system including multiple edges and cloud, and method for providing model for using adaptive intelligence thereof | |
WO2020257812A3 (en) | Modeling dependencies with global self-attention neural networks | |
EP4312157A3 (en) | Progressive neurale netzwerke | |
CN108197486A (en) | Big data desensitization method, system, computer-readable medium and equipment | |
JP2019513246A (en) | Training method of random forest model, electronic device and storage medium | |
CN105159148B (en) | Robot instruction processing method and device | |
RU2013128608A (en) | METHODOLOGY FOR ELECTRONIC AGGREGATION OF INFORMATION | |
CN106649426A (en) | Data analysis method, data analysis platform and server | |
CN103927338A (en) | Log information storage processing method and log information storage processing device | |
CN104462037A (en) | Consistency maintaining method in document collaborative editing | |
CN103309893A (en) | Character string comparing method and device | |
CN106919697B (en) | Method for simultaneously importing data into multiple Hadoop assemblies | |
CN112825154A (en) | Method and device for optimizing online reasoning in deep learning and computer storage medium | |
US20200159594A1 (en) | Systems and methods for dynamic partitioning in distributed environments | |
CN102915344A (en) | SQL (structured query language) statement processing method and device | |
CN102402606B (en) | High-efficiency text data mining method | |
WO2005008414A3 (en) | Method and apparatus for parallel action processing | |
CN103853938A (en) | High-throughput sequencing data processing and analysis flow control method | |
CN113177088B (en) | Multi-scale simulation big data management system for material irradiation damage | |
CN103514769A (en) | Intelligent learning line design system and method | |
CN106502842A (en) | Data reconstruction method and system | |
CN106599244B (en) | General original log cleaning device and method | |
CN105808577A (en) | HBase database-based data batch loading method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150128 |
|
RJ01 | Rejection of invention patent application after publication |