CN104391748A - Mapreduce calculation process optimization method - Google Patents
Mapreduce calculation process optimization method Download PDFInfo
- Publication number
- CN104391748A CN104391748A CN201410673548.2A CN201410673548A CN104391748A CN 104391748 A CN104391748 A CN 104391748A CN 201410673548 A CN201410673548 A CN 201410673548A CN 104391748 A CN104391748 A CN 104391748A
- Authority
- CN
- China
- Prior art keywords
- map
- data
- subjob
- file
- mapreduce
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种mapreduce计算过程优化方法,首先将原始数据文件分成若干份文件,从未被处理的文件集中选择一份作为子作业的输入,判断是否有需要合并的文件,若不存在,则提交任务;启动具有相同处理过程的Map任务,执行Map操作,对Map输出排序、合并、分区后接收Map输出结果,执行Reduce操作,保存输出结果;若存在需要合并的文件,则提交任务,启动具有多种处理的Map任务,将不同的输入数据发送给相应的Map,执行Map操作,多输出排序、合并、分区,最后检查原始数据文件集中是否还有数据文件未被处理,若无,结束程序,否则,重新将划分好的数据文件再次执行此过程。本发明分散了其输出时间,降低了瞬时网络传输流量,减少了本地磁盘的占用率,提升MapReduce计算过程。
The invention discloses a method for optimizing the mapreduce calculation process. First, the original data file is divided into several files, and one of the unprocessed files is selected as the input of the sub-job to determine whether there is a file that needs to be merged. If there is no file, Then submit the task; start the Map task with the same processing process, execute the Map operation, sort, merge and partition the Map output, receive the Map output result, execute the Reduce operation, and save the output result; if there are files that need to be merged, submit the task, Start the Map task with multiple processing, send different input data to the corresponding Map, perform Map operations, multi-output sorting, merging, and partitioning, and finally check whether there are still data files in the original data file set that have not been processed. If not, End the program, otherwise, re-partition the good data file and execute this process again. The invention disperses the output time, reduces the instantaneous network transmission flow, reduces the occupancy rate of the local disk, and improves the MapReduce calculation process.
Description
技术领域 technical field
本发明涉及计算机软件及并行计算技术领域,具体描述为是一种通过降低程序运行过程中本地磁盘上保存的中间数据量,降低磁盘负荷提升MapReduce计算过程优化方法。 The invention relates to the technical field of computer software and parallel computing, and is specifically described as an optimization method for improving the MapReduce computing process by reducing the amount of intermediate data stored on a local disk during program operation and reducing disk load.
背景技术 Background technique
随着计算机技术和互联网技术的迅猛发展,网络普及率和互联网用户的规模也在逐年攀升,用户规模不断攀升与数据处理量迅速增长的双重刺激为互联网应用带来了新的挑战。海量的数据需要巨大规模的存储资源作为基础,网络应用对数据的依赖性增加,使得对海量数据进行计算和处理的能力的需求越来越强烈,维护这些应用程序的数据存储的成本和数据计算处理的成本越来越高。Hadoop在出现后的短短几年里就得到蓬勃的发展,证明了其巨大的技术能力和应用价值,但应该看到的是,Hadoop毕竟还很年轻,在很多方面尚有不完整,很多公司纷纷展开对Hadoop完善与优化的研究,进一步提升MapReduce性能是有必要且有意义的。 With the rapid development of computer technology and Internet technology, the network penetration rate and the scale of Internet users are also increasing year by year. The double stimulation of the continuous increase of user scale and the rapid growth of data processing has brought new challenges to Internet applications. Massive data requires a huge amount of storage resources as the basis, and the increasing dependence of network applications on data makes the demand for the ability to calculate and process massive data more and more intense. The cost of maintaining data storage for these applications and data calculation Processing costs are getting higher and higher. Hadoop has developed vigorously in just a few years after its appearance, which proves its huge technical capabilities and application value, but it should be noted that Hadoop is still very young and incomplete in many aspects. It is necessary and meaningful to further improve the performance of MapReduce by carrying out research on the perfection and optimization of Hadoop.
MapReduce是Google提出的一个软件架构,用于大规模数据集(大于1TB)的并行运算。概念“Map(映射)”和“Reduce(化简)”,及他们的主要思想,都是从函数式编程语言借来的,还有从矢量编程语言借来的特性。[1]当前的软件实现是指定一个Map(映射)函数,用来把一组键值对映射成一组新的键值对,指定并发的Reduce(化简)函数,用来保证所有映射的键值对中的每一个共享相同的键组。 MapReduce is a software architecture proposed by Google for parallel computing of large-scale data sets (greater than 1TB). The concepts "Map" and "Reduce", as well as their main ideas, are borrowed from functional programming languages, along with features borrowed from vector programming languages. [1] The current software implementation is to specify a Map (mapping) function to map a set of key-value pairs into a new set of key-value pairs, and specify a concurrent Reduce (simplification) function to ensure that all mapped keys Each of the value pairs shares the same set of keys.
发明内容 Contents of the invention
本发明要解决的技术问题是:为了提高MapReduce任务处理能力,针对MapReduce任务处理过程中内存占用率高,网络资源集中消耗和网络堵塞,磁盘负荷高造成资源紧张的现状,提供一种MapReduce计算过程优化方法。 The technical problem to be solved by the present invention is: in order to improve the MapReduce task processing ability, aiming at the high memory usage rate in the MapReduce task processing process, the concentrated consumption of network resources and network congestion, and the current situation of resource shortage caused by high disk load, a MapReduce calculation process is provided Optimization.
本发明所采用的技术方案为: The technical scheme adopted in the present invention is:
一种mapreduce计算过程优化方法,首先将原始数据文件分成若干份文件,从未被处理的文件集中选择一份作为子作业的输入,判断是否有需要合并的文件,若不存在,则提交任务;启动具有相同处理过程的Map任务,执行Map操作,对Map输出排序、合并、分区后接收Map输出结果,执行Reduce操作,保存输出结果;若存在需要合并的文件,则提交任务,启动具有多种处理的Map任务,将不同的输入数据发送给相应的Map,执行Map操作,多输出排序、合并、分区,最后检查原始数据文件集中是否还有数据文件未被处理,若无,结束程序,否则,重新将划分好的数据文件再次执行此过程。 A method for optimizing the mapreduce calculation process. First, the original data file is divided into several files, and one of the unprocessed files is selected as the input of the sub-job to determine whether there is a file that needs to be merged. If not, the task is submitted; Start the Map task with the same processing process, execute the Map operation, sort, merge, and partition the Map output, receive the Map output result, execute the Reduce operation, and save the output result; if there is a file that needs to be merged, submit the task, start with multiple Process the Map task, send different input data to the corresponding Map, perform Map operations, multi-output sorting, merging, and partitioning, and finally check whether there are data files in the original data file set that have not been processed. If not, end the program, otherwise , re-partitioned data files to perform this process again. the
所述方法涉及的体系结构包括:输入数据处理模块、数据结果合并模块、计算过程和合并过程结合模块,其中: The architecture involved in the method includes: an input data processing module, a data result merging module, a calculation process and a merging process combination module, wherein:
输入数据处理模块,负责将原始数据拆分成若干份,这个拆分不是系统对输入数据的分片处理,需要的是手动划分,数据大小也远远大于分片数据块大小,为每一份数据启动一次作业处理,各子作业按照顺序依次执行;程序运行中任一时刻系统上都只有一个子作业在运行,且只有一部分原始数据被操作,因此这样既可以做到子作业产生的中间数据相对较少,缩短单个作业的运行时间,使中间数据及时删除; The input data processing module is responsible for splitting the original data into several parts. This splitting is not the system's fragmentation processing of the input data. What is needed is manual division, and the data size is much larger than the fragmented data block size. The data starts a job processing, and the sub-jobs are executed in sequence; at any time during the program running, only one sub-job is running on the system, and only a part of the original data is operated, so the intermediate data generated by the sub-jobs can be achieved in this way. Relatively few, shorten the running time of a single job, and delete the intermediate data in time;
由于各子作业的输出只是针对部分数据的计算,不是整个原始数据的计算结果,各部分结果间存在大量重复,数据结果合并模块,负责将重复的结果进行合并,减少冗余中间数据; Since the output of each sub-job is only for the calculation of part of the data, not the calculation results of the entire original data, there are a lot of repetitions among the results of each part, and the data result merging module is responsible for merging the repeated results to reduce redundant intermediate data;
计算过程和合并过程结合模块,负责将子作业计算过程和合并过程结合在一起处理,边计算边合并,除第一个子作业外,下一个子作业同时接收原始输入数据和上一个子作业的计算结果,根据应用类型的不同,对合并数据的map函数做相应的调整,这样在Map过程结束后,中间数据都是具有相同格式的键值对集合,不会对后续的Reduce任务产生影响。 The calculation process and merging process combination module is responsible for combining the sub-job calculation process and the merging process together, and merging while calculating. Except for the first sub-job, the next sub-job receives the original input data and the previous sub-job at the same time. According to the calculation results, according to the different application types, the map function of the combined data is adjusted accordingly, so that after the Map process is completed, the intermediate data is a set of key-value pairs with the same format, which will not affect the subsequent Reduce tasks.
所述方法的实现过程如下: The realization process of described method is as follows:
1)搭建hadoop集群,分别对未优化的单作业MapReduce程序以及优化后的多作业MapReduce程序分配2G、20G的输入数据文件; 1) Build a Hadoop cluster, and allocate 2G and 20G input data files to the unoptimized single-job MapReduce program and the optimized multi-job MapReduce program respectively;
2)将2G、20G的输入数据拆分成若干份数据文件后进行子作业迭代运行,对每一次作业启动一次作业处理,各子作业按照顺序依次执行,利用多输入操作将子作业计算过程和合并过程结合在一起处理,边计算边合并,除第一个子作业外,下一个子作业同时接收原始输入数据和上一个子作业的计算结果,根据应用类型的不同,对合并的数据的map函数做相应调整。 2) Split the input data of 2G and 20G into several data files and then iteratively run the sub-jobs, start a job processing for each job, execute each sub-job in sequence, and use multiple input operations to combine the sub-job calculation process and The merging process is combined and processed together, and merged while calculating. Except for the first sub-job, the next sub-job receives the original input data and the calculation result of the previous sub-job at the same time. According to the different application types, the map of the merged data The function is adjusted accordingly.
本发明有益效果:由于输入数据被分散到多个作业中,每个作业的Map输出的中间数据相对较少,并且每个作业的执行时间较短,导致该作业下的Map阶段输出数据能及时从本地磁盘删除,避免了其长期占用磁盘,实际上,在优化的MapReduce程序中,实际产生的中间数据并不比未优化程序的少,只不过分散了其输出时间,从而分散了磁盘I/O操作,降低了瞬时网络传输流量,及时对中间数据进行后一阶段的Reduce操作,减少了本地磁盘的占用率,从而提升MapReduce计算过程。 Beneficial effects of the present invention: Since the input data is dispersed into multiple jobs, the intermediate data output by the Map of each job is relatively small, and the execution time of each job is short, so that the output data of the Map stage under the job can be timely Deleted from the local disk, avoiding its long-term occupation of the disk. In fact, in the optimized MapReduce program, the actual intermediate data generated is not less than that of the unoptimized program, but the output time is dispersed, thereby dispersing the disk I/O The operation reduces the instantaneous network transmission traffic, and the subsequent stage of the Reduce operation is performed on the intermediate data in time, which reduces the occupancy rate of the local disk, thereby improving the MapReduce calculation process.
附图说明 Description of drawings
图1为本发明优化的MapReduce计算过程流程图。 Fig. 1 is a flow chart of the optimized MapReduce calculation process of the present invention.
具体实施方式 Detailed ways
下面根据说明书附图,结合具体实施例,对本发明进一步说明: Below according to accompanying drawing of description, in conjunction with specific embodiment, the present invention is further described:
一种mapreduce计算过程优化方法,首先需要将原始数据文件分成若干份文件,从未被处理的文件集中选择一份作为子作业的输入,判断是否有需要合并的文件,若不存在,则提交任务;启动具有相同处理过程的Map任务,执行Map操作,对Map输出排序、合并、分区后接收Map输出结果,执行Reduce操作,保存输出结果;若存在需要合并的文件,则提交任务,启动具有多种处理的Map任务,将不同的输入数据发送给相应的Map,执行Map操作,多输出排序、合并、分区,最后检查原始数据文件集中是否还有数据文件未被处理,若无,结束程序,否则,重新将划分好的数据文件再次执行此过程。 A mapreduce calculation process optimization method. First, the original data file needs to be divided into several files, and one of the unprocessed files is selected as the input of the sub-job to determine whether there are files that need to be merged. If not, submit the task. ;Start the Map task with the same processing process, execute the Map operation, sort, merge and partition the Map output, receive the Map output result, execute the Reduce operation, and save the output result; if there is a file that needs to be merged, submit the task, start the multi This is a Map task for processing, sending different input data to the corresponding Map, performing Map operations, multi-output sorting, merging, and partitioning, and finally checking whether there are still data files in the original data file set that have not been processed. If not, end the program. Otherwise, re-partition the good data file and execute this process again. the
所述方法涉及的体系结构包括:输入数据处理模块、数据结果合并模块、计算过程和合并过程结合模块,其中: The architecture involved in the method includes: an input data processing module, a data result merging module, a calculation process and a merging process combination module, wherein:
输入数据处理模块,负责将原始数据拆分成若干份,这个拆分不是系统对输入数据的分片处理,需要的是手动划分,数据大小也远远大于分片数据块大小,为每一份数据启动一次作业处理,各子作业按照顺序依次执行;程序运行中任一时刻系统上都只有一个子作业在运行,且只有一部分原始数据被操作,因此这样既可以做到子作业产生的中间数据相对较少,缩短单个作业的运行时间,使中间数据及时删除; The input data processing module is responsible for splitting the original data into several parts. This splitting is not the system's fragmentation processing of the input data. What is needed is manual division, and the data size is much larger than the fragmented data block size. The data starts a job processing, and the sub-jobs are executed in sequence; at any time during the program running, only one sub-job is running on the system, and only a part of the original data is operated, so the intermediate data generated by the sub-jobs can be achieved in this way. Relatively few, shorten the running time of a single job, and delete the intermediate data in time;
由于各子作业的输出只是针对部分数据的计算,不是整个原始数据的计算结果,各部分结果间存在大量重复,数据结果合并模块,负责将重复的结果进行合并,减少冗余中间数据; Since the output of each sub-job is only for the calculation of part of the data, not the calculation results of the entire original data, there are a lot of repetitions among the results of each part, and the data result merging module is responsible for merging the repeated results to reduce redundant intermediate data;
计算过程和合并过程结合模块,负责将子作业计算过程和合并过程结合在一起处理,边计算边合并,除第一个子作业外,下一个子作业同时接收原始输入数据和上一个子作业的计算结果,根据应用类型的不同,对合并数据的map函数做相应的调整,这样在Map过程结束后,中间数据都是具有相同格式的键值对集合,不会对后续的Reduce任务产生影响。 The calculation process and merging process combination module is responsible for combining the sub-job calculation process and the merging process together, and merging while calculating. Except for the first sub-job, the next sub-job receives the original input data and the previous sub-job at the same time. According to the calculation results, according to the different application types, the map function of the combined data is adjusted accordingly, so that after the Map process is completed, the intermediate data is a set of key-value pairs with the same format, which will not affect the subsequent Reduce tasks.
所述方法的实现过程如下: The realization process of described method is as follows:
1)搭建hadoop集群,分别对未优化的单作业MapReduce程序以及优化后的多作业MapReduce程序分配2G、20G的输入数据文件; 1) Build a Hadoop cluster, and allocate 2G and 20G input data files to the unoptimized single-job MapReduce program and the optimized multi-job MapReduce program respectively;
2)将2G、20G的输入数据拆分成若干份数据文件后进行子作业迭代运行,对每一次作业启动一次作业处理,各子作业按照顺序依次执行,利用多输入操作将子作业计算过程和合并过程结合在一起处理,边计算边合并,除第一个子作业外,下一个子作业同时接收原始输入数据和上一个子作业的计算结果,根据应用类型的不同,对合并的数据的map函数做相应调整。 2) Split the input data of 2G and 20G into several data files and then iteratively run the sub-jobs, start a job processing for each job, execute each sub-job in sequence, and use multiple input operations to combine the sub-job calculation process and The merging process is combined and processed together, and merged while calculating. Except for the first sub-job, the next sub-job receives the original input data and the calculation result of the previous sub-job at the same time. According to the different application types, the map of the merged data The function is adjusted accordingly.
计算过程运行结束后,我们发现,在优化后的MapReduce程序中,输入数据被分散到多个作业中,每个作业的Map输出的中间数据相对较少,并且每个作业的执行时间较短,导致该作业下的Map阶段输出数据能及时从本地磁盘删除,避免了其长期占一用磁盘。在优化后的MapReduce程序中,实际产生的中间数据总量并不比未优化程序的少,只不过我们分散了其输出时间,从而分散了磁盘I/O操作,降低了瞬时网络传输流量,及时对中间数据进行后一阶段的Reduce操作,减少了本地磁盘的占用率,如此可见,优化后的MapReduce程序测试效果更佳。 After the calculation process is finished, we found that in the optimized MapReduce program, the input data is distributed among multiple jobs, the intermediate data output by the Map of each job is relatively small, and the execution time of each job is short. As a result, the output data of the Map stage under this job can be deleted from the local disk in time, avoiding its long-term occupation of the disk. In the optimized MapReduce program, the total amount of intermediate data actually generated is not less than that of the unoptimized program, but we disperse its output time, thereby dispersing disk I/O operations, reducing instantaneous network transmission traffic, and timely processing The intermediate data is subjected to the Reduce operation in the latter stage, which reduces the occupancy rate of the local disk. It can be seen that the optimized MapReduce program has a better test effect.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410673548.2A CN104391748A (en) | 2014-11-21 | 2014-11-21 | Mapreduce calculation process optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410673548.2A CN104391748A (en) | 2014-11-21 | 2014-11-21 | Mapreduce calculation process optimization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104391748A true CN104391748A (en) | 2015-03-04 |
Family
ID=52609656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410673548.2A Pending CN104391748A (en) | 2014-11-21 | 2014-11-21 | Mapreduce calculation process optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391748A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095455A (en) * | 2015-07-27 | 2015-11-25 | 中国联合网络通信集团有限公司 | Data connection optimization method and data operation system |
CN105357124A (en) * | 2015-11-22 | 2016-02-24 | 华中科技大学 | MapReduce bandwidth optimization method |
CN105975577A (en) * | 2016-05-04 | 2016-09-28 | 乐视控股(北京)有限公司 | Data optimization method and device in big data processing |
CN105975600A (en) * | 2016-05-11 | 2016-09-28 | 乐视控股(北京)有限公司 | Big data processing task optimization method and device |
CN106708606A (en) * | 2015-11-17 | 2017-05-24 | 阿里巴巴集团控股有限公司 | MapReduce based data processing method and MapReduce based data processing device |
CN107426611A (en) * | 2017-05-25 | 2017-12-01 | 山大鲁能信息科技有限公司 | Multiple-channel output method and system based on video code conversion |
CN108415912A (en) * | 2017-02-09 | 2018-08-17 | 阿里巴巴集团控股有限公司 | Data processing method based on MapReduce model and equipment |
CN110222105A (en) * | 2019-05-14 | 2019-09-10 | 联动优势科技有限公司 | Data summarization processing method and processing device |
CN110704183A (en) * | 2019-09-18 | 2020-01-17 | 深圳前海大数金融服务有限公司 | Data processing method, system and computer readable storage medium |
CN115422126A (en) * | 2022-11-04 | 2022-12-02 | 浪潮软件股份有限公司 | Method, system and device for rapidly transferring certificate OFD format file to picture |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2182478A1 (en) * | 2008-10-31 | 2010-05-05 | Software AG | Method and server cluster for MapReducing FLOW services and large documents |
CN102456031A (en) * | 2010-10-26 | 2012-05-16 | 腾讯科技(深圳)有限公司 | MapReduce system and method for processing data stream |
US8190610B2 (en) * | 2006-10-05 | 2012-05-29 | Yahoo! Inc. | MapReduce for distributed database processing |
CN102541858A (en) * | 2010-12-07 | 2012-07-04 | 腾讯科技(深圳)有限公司 | Data equality processing method, device and system based on mapping and protocol |
CN103327128A (en) * | 2013-07-23 | 2013-09-25 | 百度在线网络技术(北京)有限公司 | Intermediate data transmission method and system for MapReduce |
CN103699442A (en) * | 2013-12-12 | 2014-04-02 | 深圳先进技术研究院 | Iterable data processing method under MapReduce calculation framework |
CN103838626A (en) * | 2014-02-27 | 2014-06-04 | 华为技术有限公司 | Data processing device and method for processing serial tasks |
-
2014
- 2014-11-21 CN CN201410673548.2A patent/CN104391748A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8190610B2 (en) * | 2006-10-05 | 2012-05-29 | Yahoo! Inc. | MapReduce for distributed database processing |
EP2182478A1 (en) * | 2008-10-31 | 2010-05-05 | Software AG | Method and server cluster for MapReducing FLOW services and large documents |
CN102456031A (en) * | 2010-10-26 | 2012-05-16 | 腾讯科技(深圳)有限公司 | MapReduce system and method for processing data stream |
CN102541858A (en) * | 2010-12-07 | 2012-07-04 | 腾讯科技(深圳)有限公司 | Data equality processing method, device and system based on mapping and protocol |
CN103327128A (en) * | 2013-07-23 | 2013-09-25 | 百度在线网络技术(北京)有限公司 | Intermediate data transmission method and system for MapReduce |
CN103699442A (en) * | 2013-12-12 | 2014-04-02 | 深圳先进技术研究院 | Iterable data processing method under MapReduce calculation framework |
CN103838626A (en) * | 2014-02-27 | 2014-06-04 | 华为技术有限公司 | Data processing device and method for processing serial tasks |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095455B (en) * | 2015-07-27 | 2018-10-19 | 中国联合网络通信集团有限公司 | A kind of data connection optimization method and data arithmetic system |
CN105095455A (en) * | 2015-07-27 | 2015-11-25 | 中国联合网络通信集团有限公司 | Data connection optimization method and data operation system |
CN106708606A (en) * | 2015-11-17 | 2017-05-24 | 阿里巴巴集团控股有限公司 | MapReduce based data processing method and MapReduce based data processing device |
WO2017084509A1 (en) * | 2015-11-17 | 2017-05-26 | 阿里巴巴集团控股有限公司 | Mapreduce-based data processing method and device |
CN106708606B (en) * | 2015-11-17 | 2020-07-07 | 阿里巴巴集团控股有限公司 | Data processing method and device based on MapReduce |
CN105357124A (en) * | 2015-11-22 | 2016-02-24 | 华中科技大学 | MapReduce bandwidth optimization method |
CN105975577A (en) * | 2016-05-04 | 2016-09-28 | 乐视控股(北京)有限公司 | Data optimization method and device in big data processing |
WO2017190469A1 (en) * | 2016-05-04 | 2017-11-09 | 乐视控股(北京)有限公司 | Data optimisation method and apparatus in big data processing |
CN105975600A (en) * | 2016-05-11 | 2016-09-28 | 乐视控股(北京)有限公司 | Big data processing task optimization method and device |
CN108415912A (en) * | 2017-02-09 | 2018-08-17 | 阿里巴巴集团控股有限公司 | Data processing method based on MapReduce model and equipment |
CN107426611B (en) * | 2017-05-25 | 2019-12-10 | 山大鲁能信息科技有限公司 | multi-path output method and system based on video transcoding |
CN107426611A (en) * | 2017-05-25 | 2017-12-01 | 山大鲁能信息科技有限公司 | Multiple-channel output method and system based on video code conversion |
CN110222105A (en) * | 2019-05-14 | 2019-09-10 | 联动优势科技有限公司 | Data summarization processing method and processing device |
CN110704183A (en) * | 2019-09-18 | 2020-01-17 | 深圳前海大数金融服务有限公司 | Data processing method, system and computer readable storage medium |
CN110704183B (en) * | 2019-09-18 | 2021-01-08 | 深圳前海大数金融服务有限公司 | Data processing method, system and computer readable storage medium |
CN115422126A (en) * | 2022-11-04 | 2022-12-02 | 浪潮软件股份有限公司 | Method, system and device for rapidly transferring certificate OFD format file to picture |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104391748A (en) | Mapreduce calculation process optimization method | |
CN107612886B (en) | A Spark Platform Shuffle Process Compression Algorithm Decision Method | |
US9794343B2 (en) | Reconfigurable cloud computing | |
CN108572873B (en) | A load balancing method and device for solving Spark data skew problem | |
CN107229693B (en) | The method and system of big data system configuration parameter tuning based on deep learning | |
CN103309946B (en) | Multimedia file processing method, Apparatus and system | |
CN107888669A (en) | A kind of extensive resource scheduling system and method based on deep learning neutral net | |
CN110308984B (en) | Cross-cluster computing system for processing geographically distributed data | |
CN104915407A (en) | Resource scheduling method under Hadoop-based multi-job environment | |
CN108089921A (en) | Server for cloud big data operation architecture and operation resource optimization method thereof | |
CN103646073A (en) | Condition query optimizing method based on HBase table | |
CN103729246A (en) | Method and device for dispatching tasks | |
CN103942108B (en) | Resource parameters optimization method under Hadoop isomorphism cluster | |
CN110515969A (en) | Data query processing method, electronic device, computer equipment and storage medium | |
CN104239529A (en) | Method and device for preventing Hive data from being inclined | |
CN103970719B (en) | Fitting method and fitting device | |
CN103257896B (en) | A kind of Max-D job scheduling method under cloud environment | |
CN110347489A (en) | A kind of method for stream processing that the multicenter data collaborative based on Spark calculates | |
CN103761146A (en) | Method for dynamically setting quantities of slots for MapReduce | |
CN108829501A (en) | A kind of batch processing scientific workflow task scheduling algorithm based on improved adaptive GA-IAGA | |
CN104536832A (en) | Virtual machine deployment method | |
CN108132840A (en) | Resource regulating method and device in a kind of distributed system | |
CN104408047A (en) | Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server | |
CN103198099A (en) | Cloud-based data mining application method facing telecommunication service | |
CN108170535A (en) | A kind of method of the promotion table joint efficiency based on MapReduce model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150304 |
|
RJ01 | Rejection of invention patent application after publication |