CN104050291A - Parallel processing method and system for account balance data - Google Patents
Parallel processing method and system for account balance data Download PDFInfo
- Publication number
- CN104050291A CN104050291A CN201410306448.6A CN201410306448A CN104050291A CN 104050291 A CN104050291 A CN 104050291A CN 201410306448 A CN201410306448 A CN 201410306448A CN 104050291 A CN104050291 A CN 104050291A
- Authority
- CN
- China
- Prior art keywords
- remaining sum
- record
- account
- output parameter
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title description 4
- 238000000034 method Methods 0.000 claims abstract description 27
- 238000013507 mapping Methods 0.000 claims description 54
- 239000012634 fragment Substances 0.000 claims description 21
- 230000002354 daily effect Effects 0.000 claims 43
- 230000003203 everyday effect Effects 0.000 claims 8
- 241001269238 Data Species 0.000 claims 4
- 238000009826 distribution Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000013467 fragmentation Methods 0.000 description 2
- 238000006062 fragmentation reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2219—Large Object storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种账户余额数据的并行处理方法,该方法包括:一个或多个执行第一任务的Map节点读取账户余额明细数据的不同分片数据,生成所读取的分片数据中每一条余额记录的第一输出参数和第二输出参数;其中,所述第一输出参数至少包括账户ID,所述第二输出参数设定为账户状态信息,所述账户状态信息至少包括:余额值、交易日期和当天交易序号;一个或多个执行第一任务的Reduce节点读取所述执行第一任务的Map节点处理完毕的不同余额记录,根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录;其中,第一输出参数相同的余额记录由同一个Reduce节点读取。本发明能够快速获取大数据量下的帐户日均余额的统计结果。本发明还公开了一种账户余额数据的并行处理系统。
The invention discloses a method for parallel processing of account balance data. The method includes: one or more Map nodes executing the first task read different slice data of account balance detailed data, and generate the data in the read slice data. The first output parameter and the second output parameter of each balance record; wherein, the first output parameter includes at least an account ID, and the second output parameter is set as account status information, and the account status information includes at least: balance value, transaction date, and transaction number of the day; one or more Reduce nodes executing the first task read the different balance records processed by the Map node executing the first task, and according to the first output parameter and the second output parameter of the balance record The two output parameters respectively generate the daily average balance value records of each account; among them, the balance records with the same first output parameter are read by the same Reduce node. The invention can quickly acquire the statistical results of the daily average balance of the account under a large amount of data. The invention also discloses a parallel processing system for account balance data.
Description
技术领域technical field
本发明涉及大数据处理技术领域,尤其涉及的是一种大数据量下的帐户余额数据的并行处理方法和系统。The invention relates to the technical field of big data processing, in particular to a method and system for parallel processing of account balance data under a large amount of data.
背景技术Background technique
数据是企业生产、经营、战略等几乎所有经营活动所依赖的、不可或缺的信息。数据就犹如企业经营者的眼睛一样,通过数据可以反映出经营的问题,就犹如舵手依赖导航一样。随着人类社会全面进入信息时代,数据更是成为与水、石油同等重要的战略资源。目前企业面临着数据量的大规模增长。例如,IDC最近的报告预测称,到2020年,全球数据量将扩大50倍。目前,大数据的规模尚是一个不断变化的指标,单一数据集的规模范围从几十TB到数PB不等。此外,各种意想不到的来源都能产生数据。Data is an indispensable information that almost all business activities such as production, operation, and strategy rely on. Data is like the eyes of business operators. Data can reflect business problems, just like the helmsman relies on navigation. As human society enters the information age, data has become a strategic resource as important as water and oil. Enterprises are currently facing massive growth in data volume. For example, a recent IDC report predicts that by 2020, the amount of data worldwide will expand 50-fold. At present, the size of big data is still a changing indicator, and the size of a single data set ranges from tens of terabytes to several petabytes. In addition, data can be generated from a variety of unexpected sources.
传统业务数据随时间演变已拥有标准的格式,能够被标准的商务智能软件识别。相较传统的业务数据,大数据具有多层结构,这意味着大数据会呈现出多变的形式和类型。由于大数据存在不规则和模糊不清的特性,造成很难甚至无法使用传统的应用软件进行分析。Traditional business data has evolved over time in a standard format that can be recognized by standard business intelligence software. Compared with traditional business data, big data has a multi-layered structure, which means that big data will present various forms and types. Due to the irregular and ambiguous nature of big data, it is difficult or even impossible to analyze it using traditional application software.
目前,企业面临的挑战是从各种形式的复杂数据中挖掘价值。Today, the challenge for enterprises is to extract value from various forms of complex data.
发明内容Contents of the invention
本发明所要解决的技术问题是提供一种账户余额数据的并行处理方法和系统,快速获取大数据量下的帐户日均余额的统计结果。The technical problem to be solved by the present invention is to provide a parallel processing method and system for account balance data, which can quickly obtain statistical results of account daily average balance under a large amount of data.
为了解决上述技术问题,本发明提供了一种账户余额数据的并行处理方法,该方法包括:In order to solve the above technical problems, the present invention provides a method for parallel processing of account balance data, the method comprising:
一个或多个执行第一任务的Map节点读取账户余额明细数据的不同分片数据,生成所读取的分片数据中每一条余额记录的第一输出参数和第二输出参数;其中,所述第一输出参数至少包括账户ID,所述第二输出参数设定为账户状态信息,所述账户状态信息至少包括:余额值、交易日期和当天交易序号;One or more Map nodes performing the first task read different fragmented data of the account balance detailed data, and generate the first output parameter and the second output parameter of each balance record in the read fragmented data; wherein, the The first output parameter includes at least an account ID, and the second output parameter is set as account status information, and the account status information includes at least: balance value, transaction date and transaction number of the day;
一个或多个执行第一任务的Reduce节点读取所述执行第一任务的Map节点处理完毕的不同余额记录,根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录;其中,第一输出参数相同的余额记录由同一个Reduce节点读取。One or more Reduce nodes executing the first task read the different balance records processed by the Map node executing the first task, and generate the daily balance records of each account according to the first output parameter and the second output parameter of the balance record respectively. Average balance value records; where the balance records with the same first output parameter are read by the same Reduce node.
进一步地,该方法还包括下述特点:Further, the method also includes the following features:
所述根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录,包括:The generating of the daily average balance value records of each account according to the first output parameter and the second output parameter of the balance record includes:
按照所述第一输出参数中的账户ID,遍历同一账户的各条余额记录,根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,将每一天的余额值在所述查询起止时间范围内取平均得到该账户的日均余额值,生成该账户的日均余额值记录。According to the account ID in the first output parameter, traverse each balance record of the same account, determine the balance value of the account for each day within the query start and end time range according to the second output parameter of the balance record, and calculate each day The balance value of the account is averaged within the start and end time range of the query to obtain the daily average balance value of the account, and a record of the daily average balance value of the account is generated.
进一步地,该方法还包括下述特点:Further, the method also includes the following features:
在所述执行第一任务的Reduce节点根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录后,还包括:After the Reduce node performing the first task respectively generates the daily average balance value records of each account according to the first output parameter and the second output parameter of the balance record, it also includes:
一个或多个执行第二任务的Map节点读取不同账户的日均余额值记录,生成所读取的日均余额值记录的第一输出参数和第二输出参数;其中,所述日均余额值记录的第一输出参数设定为所述日均余额值所在的区间,所述日均余额值记录的第二输出参数设定为1;One or more Map nodes performing the second task read the daily average balance value records of different accounts, and generate the first output parameter and the second output parameter of the read daily average balance value records; wherein, the daily average balance value The first output parameter of the value record is set to the interval where the daily average balance value is located, and the second output parameter of the daily average balance value record is set to 1;
一个或多个执行第二任务的Reduce节点读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录,根据所述日均余额值记录的第一输出参数和第二输出参数统计各日均余额值区间的账户数,包括:按照所述第一输出参数中的日均余额值区间,遍历同一日均余额值区间的各条日均余额值记录,将每一条日均余额值记录的第二输出参数进行累加,获得该日均余额值区间的账户数;其中,第一输出参数相同的日均余额值记录由同一个Reduce节点读取。One or more Reduce nodes executing the second task read the different daily average balance value records processed by the Map node executing the second task, and according to the first output parameter and the second output parameter of the daily average balance value record Counting the number of accounts in each daily average balance value interval, including: according to the daily average balance value interval in the first output parameter, traversing each daily average balance value record in the same daily average balance value interval, and recording each daily average balance value The second output parameter of the value record is accumulated to obtain the number of accounts in the daily average balance value interval; wherein, the daily average balance value records with the same first output parameter are read by the same Reduce node.
进一步地,该方法还包括下述特点:Further, the method also includes the following features:
根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,包括:According to the second output parameter of the balance record, determine the balance value of the account for each day within the query start and end time range, including:
从查询起始时间的当日至查询终止时间的当日,判断每一天是否存在余额记录,如存在,将当日交易序号最大的余额记录的余额值作为当日的最终余额值,如不存在,追溯早于当日且具有余额记录的最近日期,将所述最近日期那一天的交易序号最大的余额记录的余额值作为当日的最终余额值。From the day of the query start time to the day of the query end time, judge whether there is a balance record for each day. If it exists, use the balance value of the balance record with the largest transaction number on that day as the final balance value of the day. If it does not exist, trace back to earlier than For the latest date with a balance record on that day, the balance value of the balance record with the largest transaction sequence number on the day of the latest date is taken as the final balance value of the day.
进一步地,该方法还包括下述特点:Further, the method also includes the following features:
在一个或多个执行第一任务的Reduce节点读取所述执行第一任务的Map节点处理完毕的不同余额记录之前,还包括:Before one or more Reduce nodes executing the first task read the different balance records processed by the Map node executing the first task, it also includes:
计算每一条余额记录的第一参数的哈希值,建立第一参数的哈希值与所述执行第一任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Reduce节点根据所述映射关系读取对应的余额记录。Calculate the hash value of the first parameter of each balance record, and establish a mapping relationship between the hash value of the first parameter and the Reduce node that executes the first task; wherein, the mapping relationship is used for the execution of the first task. The Reduce node of the task reads the corresponding balance record according to the mapping relationship.
进一步地,该方法还包括下述特点:Further, the method also includes the following features:
在一个或多个执行第二任务的Reduce节点读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录之前,还包括:Before one or more Reduce nodes executing the second task read the different daily average balance value records processed by the Map node executing the second task, it also includes:
计算每一条日均余额值记录的第一参数的哈希值,建立第一参数的哈希值与执行第二任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第二任务的Reduce节点根据所述映射关系读取对应的日均余额值记录。Calculate the hash value of the first parameter of each daily average balance value record, and establish a mapping relationship between the hash value of the first parameter and the Reduce node executing the second task; wherein, the mapping relationship is used for the execution of the second task The Reduce node of the second task reads the corresponding daily average balance value record according to the mapping relationship.
进一步地,该方法还包括下述特点:Further, the method also includes the following features:
在一个或多个执行第一任务的Map节点读取账户余额明细数据的不同分片数据之前,还包括:Before one or more Map nodes performing the first task read different fragmented data of the account balance detailed data, it also includes:
根据查询起止时间确定账户余额明细数据的读取范围,包括:将全量余额明细数据和截止到查询终止时间当天的增量余额明细数据确定为账户余额明细数据的读取范围;Determine the reading range of account balance detailed data according to the start and end time of the query, including: determining the full balance detailed data and the incremental balance detailed data up to the day when the query ends as the reading range of account balance detailed data;
将属于该读取范围内的账户余额明细数据分片,建立每一个分片与执行第一任务的Map节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Map节点根据所述映射关系读取对应的分片数据。Fragmenting the account balance detail data belonging to the reading range, and establishing a mapping relationship between each fragment and the Map node performing the first task; wherein, the mapping relationship is used for the Map node performing the first task The corresponding fragment data is read according to the mapping relationship.
为了解决上述技术问题,本发明还提供了一种账户余额数据的并行处理系统,该系统包括:In order to solve the above technical problems, the present invention also provides a parallel processing system for account balance data, the system includes:
Map处理模块,包括一个或多个执行第一任务的Map节点;各执行第一任务的Map节点用于读取账户余额明细数据的不同分片数据,生成所读取的分片数据中每一条余额记录的第一输出参数和第二输出参数;其中,所述第一输出参数至少包括账户ID,所述第二输出参数设定为账户状态信息,所述账户状态信息至少包括:余额值、交易日期和当天交易序号;The Map processing module includes one or more Map nodes that execute the first task; each Map node that executes the first task is used to read different fragment data of the account balance detailed data, and generates each item in the read fragment data. The first output parameter and the second output parameter of the balance record; wherein, the first output parameter includes at least an account ID, and the second output parameter is set as account status information, and the account status information includes at least: balance value, Transaction date and transaction serial number of the day;
Reduce处理模块,包括一个或多个执行第一任务的Reduce节点;各执行第一任务的Reduce节点用于读取所述执行第一任务的Map节点处理完毕的不同余额记录,根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录;其中,第一输出参数相同的余额记录由同一个Reduce节点读取。The Reduce processing module includes one or more Reduce nodes that perform the first task; each Reduce node that performs the first task is used to read the different balance records that the Map node that performs the first task has processed, according to the balance record The first output parameter and the second output parameter respectively generate the daily average balance value records of each account; wherein, the balance records with the same first output parameter are read by the same Reduce node.
进一步地,该系统还包括下述特点:Further, the system also includes the following features:
所述执行第一任务的Reduce节点用于根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录,包括:按照所述第一输出参数中的账户ID,遍历同一账户的各条余额记录,根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,将每一天的余额值在所述查询起止时间范围内取平均得到该账户的日均余额值,生成该账户的日均余额值记录。The Reduce node performing the first task is used to generate the daily average balance value record of each account according to the first output parameter and the second output parameter of the balance record, including: according to the account ID in the first output parameter , traverse the balance records of the same account, determine the balance value of the account for each day within the query start and end time range according to the second output parameter of the balance record, and calculate the balance value of each day within the query start and end time range Take the average to obtain the daily average balance value of the account, and generate the daily average balance value record of the account.
进一步地,该系统还包括下述特点:Further, the system also includes the following features:
所述Map处理模块还包括一个或多个执行第二任务的Map节点,所述Reduce处理模块还包括一个或多个执行第二任务的Reduce节点;The Map processing module also includes one or more Map nodes that execute the second task, and the Reduce processing module also includes one or more Reduce nodes that execute the second task;
各执行第二任务的Map节点用于读取不同账户的日均余额值记录,生成所读取的日均余额值记录的第一输出参数和第二输出参数;其中,所述日均余额值记录的第一输出参数设定为所述日均余额值所在的区间,所述日均余额值记录的第二输出参数设定为1;Each Map node performing the second task is used to read the daily average balance value records of different accounts, and generate the first output parameter and the second output parameter of the read daily average balance value records; wherein, the daily average balance value The first output parameter of the record is set to the interval where the daily average balance value is located, and the second output parameter of the record of the daily average balance value is set to 1;
各执行第二任务的Reduce节点用于读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录,根据所述日均余额值记录的第一输出参数和第二输出参数统计各日均余额值区间的账户数,包括:按照所述第一输出参数中的日均余额值区间,遍历同一日均余额值区间的各条日均余额值记录,将每一条日均余额值记录的第二输出参数进行累加,获得该日均余额值区间的账户数;其中,第一输出参数相同的日均余额值记录由同一个Reduce节点读取。Each Reduce node executing the second task is used to read the different daily average balance value records processed by the Map node executing the second task, and make statistics according to the first output parameter and the second output parameter of the daily average balance value record The number of accounts in each daily average balance value interval, including: according to the daily average balance value interval in the first output parameter, traverse each daily average balance value record in the same daily average balance value interval, and record each daily average balance value The second output parameter of the record is accumulated to obtain the number of accounts in the daily average balance value interval; wherein, the daily average balance value records with the same first output parameter are read by the same Reduce node.
进一步地,该系统还包括下述特点:Further, the system also includes the following features:
所述执行第一任务的Reduce节点用于根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,包括:从查询起始时间的当日至查询终止时间的当日,判断每一天是否存在余额记录,如存在,将当日交易序号最大的余额记录的余额值作为当日的最终余额值,如不存在,追溯早于当日且具有余额记录的最近日期,将所述最近日期那一天的交易序号最大的余额记录的余额值作为当日的最终余额值。The Reduce node executing the first task is used to determine the balance value of the account for each day within the query start and end time range according to the second output parameter of the balance record, including: from the day of the query start time to the query end time On the current day, judge whether there is a balance record for each day. If it exists, take the balance value of the balance record with the largest transaction number on the current day as the final balance value of the current day. The balance value of the balance record with the largest transaction sequence number on the most recent date mentioned above is taken as the final balance value of the day.
进一步地,该系统还包括下述特点:Further, the system also includes the following features:
所述Reduce处理模块还包括第一任务路由模块:The Reduce processing module also includes the first task routing module:
所述第一任务路由模块,用于在一个或多个执行第一任务的Reduce节点读取所述执行第一任务的Map节点处理完毕的不同余额记录之前,计算每一条余额记录的第一参数的哈希值,建立第一参数的哈希值与所述执行第一任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Reduce节点根据所述映射关系读取对应的余额记录。The first task routing module is used to calculate the first parameter of each balance record before one or more Reduce nodes executing the first task read the different balance records processed by the Map node executing the first task Hash value, establishes the mapping relationship between the hash value of the first parameter and the Reduce node performing the first task; wherein, the mapping relationship is used for the Reduce node performing the first task according to the mapping relationship Read the corresponding balance record.
进一步地,该系统还包括下述特点:Further, the system also includes the following features:
所述Reduce处理模块还包括第二任务路由模块:The Reduce processing module also includes a second task routing module:
所述第二任务路由模块,用于在一个或多个执行第二任务的Reduce节点读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录之前,计算每一条日均余额值记录的第一参数的哈希值,建立第一参数的哈希值与执行第二任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第二任务的Reduce节点根据所述映射关系读取对应的日均余额值记录。The second task routing module is used to calculate each daily average balance value record before one or more Reduce nodes executing the second task read the different daily average balance value records processed by the Map node executing the second task The hash value of the first parameter of the value record, establishes the mapping relationship between the hash value of the first parameter and the Reduce node performing the second task; wherein, the mapping relationship is used for the Reduce node performing the second task according to The mapping relationship reads the corresponding daily average balance value records.
进一步地,该系统还包括下述特点:所述Map处理模块还包括分片模块:Further, the system also includes the following features: the Map processing module also includes a fragmentation module:
所述分片模块,用于在一个或多个执行第一任务的Map节点读取账户余额明细数据的不同分片数据之前,根据查询起止时间确定账户余额明细数据的读取范围,包括:将全量余额明细数据和截止到查询终止时间当天的增量余额明细数据确定为账户余额明细数据的读取范围;将属于该读取范围内的账户余额明细数据分片,建立每一个分片与执行第一任务的Map节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Map节点根据所述映射关系读取对应的分片数据。The sharding module is used to determine the reading range of the account balance detail data according to the query start and end time before one or more Map nodes performing the first task read different shard data of the account balance detail data, including: The full balance detailed data and the incremental balance detailed data up to the day when the query is terminated are determined as the reading range of the account balance detailed data; the account balance detailed data belonging to the reading range is fragmented, and each fragment is established and executed A mapping relationship of the Map node of the first task; wherein, the mapping relationship is used for the Map node executing the first task to read corresponding fragment data according to the mapping relationship.
与现有技术相比,本发明提供的一种账户余额数据的并行处理方法和系统,基于MapReduce将大规模的帐户余额明细数据分成若干份交给Map节点并行处理,Map阶段对数据按照帐户进行了分类,处理完成后根据帐户id分组并路由给多个Reduce节点并行处理,从而快速获取大数据量下的帐户日均余额的统计结果,处理效率高、可扩展性强。Compared with the prior art, the present invention provides a method and system for parallel processing of account balance data. Based on MapReduce, the large-scale account balance detailed data is divided into several parts and handed over to Map nodes for parallel processing. In the Map stage, the data is processed according to the account. After the processing is completed, it is grouped according to the account id and routed to multiple Reduce nodes for parallel processing, so as to quickly obtain the statistical results of the average daily balance of the account under a large amount of data, with high processing efficiency and strong scalability.
附图说明Description of drawings
图1为本发明实施例的一种账户余额数据的并行处理方法中获取各用户的日均余额值的流程图。FIG. 1 is a flow chart of obtaining the daily average balance value of each user in a parallel processing method of account balance data according to an embodiment of the present invention.
图2为本发明实施例的一种账户余额数据的并行处理方法中统计各日均余额值区间的账户数的流程图。Fig. 2 is a flow chart of counting the number of accounts in each daily average balance value interval in a parallel processing method of account balance data according to an embodiment of the present invention.
图3为本发明实施例的一种账户余额数据的并行处理系统的结构示意图。FIG. 3 is a schematic structural diagram of a parallel processing system for account balance data according to an embodiment of the present invention.
图4为本发明应用示例的基于MapReduce的账户余额数据的处理架构示意图。FIG. 4 is a schematic diagram of a processing architecture of account balance data based on MapReduce in an application example of the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚明白,下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the present invention more clear, the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.
如图1所示,本发明实施例提供了一种账户余额数据的并行处理方法,该方法包括:As shown in Figure 1, an embodiment of the present invention provides a method for parallel processing of account balance data, the method comprising:
S10,一个或多个执行第一任务的Map节点读取账户余额明细数据的不同分片数据,生成所读取的分片数据中每一条余额记录的第一输出参数和第二输出参数;其中,所述第一输出参数至少包括账户ID,所述第二输出参数设定为账户状态信息,所述账户状态信息至少包括:余额值、交易日期和当天交易序号;S10, one or more Map nodes performing the first task read different fragmented data of the account balance detailed data, and generate the first output parameter and the second output parameter of each balance record in the read fragmented data; wherein , the first output parameter includes at least an account ID, the second output parameter is set as account status information, and the account status information includes at least: balance value, transaction date and transaction number of the day;
S20,一个或多个执行第一任务的Reduce节点读取所述执行第一任务的Map节点处理完毕的不同余额记录,根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录;其中,第一输出参数相同的余额记录由同一个Reduce节点读取;S20, one or more Reduce nodes executing the first task read the different balance records processed by the Map node executing the first task, and generate each account according to the first output parameter and the second output parameter of the balance record The daily average balance value record; wherein, the balance records with the same first output parameter are read by the same Reduce node;
该方法还可以包括下述特点:The method may also include the following features:
优选地,在一个或多个执行第一任务的Map节点读取账户余额明细数据的不同分片数据之前,还包括:Preferably, before one or more Map nodes performing the first task read different fragmented data of account balance detailed data, it also includes:
根据查询起止时间确定账户余额明细数据的读取范围,包括:将全量余额明细数据和截止到查询终止时间当天的增量余额明细数据确定为账户余额明细数据的读取范围;Determine the reading range of account balance detailed data according to the start and end time of the query, including: determining the full balance detailed data and the incremental balance detailed data up to the day when the query ends as the reading range of account balance detailed data;
将属于该读取范围内的账户余额明细数据分片,建立每一个分片与执行第一任务的Map节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Map节点根据所述映射关系读取对应的分片数据。Fragmenting the account balance detail data belonging to the reading range, and establishing a mapping relationship between each fragment and the Map node performing the first task; wherein, the mapping relationship is used for the Map node performing the first task The corresponding fragment data is read according to the mapping relationship.
优选地,所述根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录,包括:Preferably, the generating of the daily average balance value records of each account according to the first output parameter and the second output parameter of the balance record includes:
按照所述第一输出参数中的账户ID,遍历同一账户的各条余额记录,根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,将每一天的余额值在所述查询起止时间范围内取平均得到该账户的日均余额值,生成该账户的日均余额值记录。According to the account ID in the first output parameter, traverse each balance record of the same account, determine the balance value of the account for each day within the query start and end time range according to the second output parameter of the balance record, and calculate each day The balance value of the account is averaged within the start and end time range of the query to obtain the daily average balance value of the account, and a record of the daily average balance value of the account is generated.
优选地,根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,包括:Preferably, according to the second output parameter of the balance record, the balance value of the account for each day within the query start and end time range is determined, including:
从查询起始时间的当日至查询终止时间的当日,判断每一天是否存在余额记录,如存在,将当日交易序号最大的余额记录的余额值作为当日的最终余额值,如不存在,追溯早于当日且具有余额记录的最近日期,将所述最近日期那一天的交易序号最大的余额记录的余额值作为当日的最终余额值。From the day of the query start time to the day of the query end time, judge whether there is a balance record for each day. If it exists, use the balance value of the balance record with the largest transaction number on that day as the final balance value of the day. If it does not exist, trace back to earlier than For the latest date with a balance record on that day, the balance value of the balance record with the largest transaction sequence number on the day of the latest date is taken as the final balance value of the day.
优选地,在一个或多个执行第一任务的Reduce节点读取所述执行第一任务的Map节点处理完毕的不同余额记录之前,还包括:Preferably, before one or more Reduce nodes executing the first task read the different balance records processed by the Map node executing the first task, further comprising:
计算每一条余额记录的第一参数的哈希值,建立第一参数的哈希值与所述执行第一任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Reduce节点根据所述映射关系读取对应的余额记录。Calculate the hash value of the first parameter of each balance record, and establish a mapping relationship between the hash value of the first parameter and the Reduce node that executes the first task; wherein, the mapping relationship is used for the execution of the first task. The Reduce node of the task reads the corresponding balance record according to the mapping relationship.
优选地,每一条余额记录的第一参数的哈希值为将所述第一参数对执行第一任务的Reduce节点总个数取模;Preferably, the hash value of the first parameter of each balance record is the modulus of the first parameter to the total number of Reduce nodes executing the first task;
优选地,如图2所示,步骤S20后还包括:Preferably, as shown in Figure 2, after step S20, it also includes:
S30,确定每个账户的日均余额值所在的区间,统计每个区间内的账户数;S30, determining the interval in which the daily average balance of each account is located, and counting the number of accounts in each interval;
优选地,确定每个账户的日均余额值所在的区间,统计每个区间内的账户数,包括:Preferably, determine the interval in which the daily average balance of each account is located, and count the number of accounts in each interval, including:
S301,一个或多个执行第二任务的Map节点读取不同账户的日均余额值记录,生成所读取的日均余额值记录的第一输出参数和第二输出参数;其中,所述日均余额值记录的第一输出参数设定为所述日均余额值所在的区间,所述日均余额值记录的第二输出参数设定为1;S301, one or more Map nodes executing the second task read the daily average balance value records of different accounts, and generate the first output parameter and the second output parameter of the read daily average balance value records; wherein, the daily average balance value records The first output parameter of the average balance value record is set to the interval where the daily average balance value is located, and the second output parameter of the daily average balance value record is set to 1;
S302,一个或多个执行第二任务的Reduce节点读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录,根据所述日均余额值记录的第一输出参数和第二输出参数统计各日均余额值区间的账户数,包括:按照所述第一输出参数中的日均余额值区间,遍历同一日均余额值区间的各条日均余额值记录,将每一条日均余额值记录的第二输出参数进行累加,获得该日均余额值区间的账户数;其中,第一输出参数相同的日均余额值记录由同一个Reduce节点读取。S302. One or more Reduce nodes executing the second task read the different daily average balance value records processed by the Map node executing the second task, and according to the first output parameter and the second output parameter of the daily average balance value record, The output parameter counts the number of accounts in each average daily balance value interval, including: according to the daily average balance value interval in the first output parameter, traverse each daily average balance value record in the same daily average balance value interval, and record each daily average balance value interval The second output parameter of the average balance value record is accumulated to obtain the number of accounts in the daily average balance value interval; wherein, the daily average balance value records with the same first output parameter are read by the same Reduce node.
优选地,在一个或多个执行第二任务的Reduce节点读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录之前,还包括:Preferably, before one or more Reduce nodes executing the second task read the different daily average balance value records processed by the Map node executing the second task, it also includes:
计算每一条日均余额值记录的第一参数的哈希值,建立第一参数的哈希值与执行第二任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第二任务的Reduce节点根据所述映射关系读取对应的日均余额值记录。Calculate the hash value of the first parameter of each daily average balance value record, and establish a mapping relationship between the hash value of the first parameter and the Reduce node executing the second task; wherein, the mapping relationship is used for the execution of the second task The Reduce node of the second task reads the corresponding daily average balance value record according to the mapping relationship.
优选地,每一条日均余额值记录的第一参数的哈希值为将所述第一参数对执行第二任务的Reduce节点总个数取模;Preferably, the hash value of the first parameter of each daily average balance value record takes the modulus of the first parameter to the total number of Reduce nodes executing the second task;
其中,所述执行第二任务的Map节点与执行第一任务的Map节点是同一批节点或不同批的节点,也即,Map节点在执行完第一任务后,才可以执行第二任务。同理,所述执行第二任务的Reduce节点与执行第一任务的Reduce节点是同一批节点或不同批的节点,也即,Reduce节点在执行完第一任务后,才可以执行第二任务。Wherein, the Map nodes executing the second task and the Map nodes executing the first task are the same batch of nodes or different batches of nodes, that is, the Map nodes can execute the second task only after executing the first task. Similarly, the Reduce nodes executing the second task and the Reduce nodes executing the first task are the same batch of nodes or different batches of nodes, that is, the Reduce nodes can execute the second task only after executing the first task.
如图3所示,本发明实施例提供了一种账户余额数据的并行处理系统,该系统包括:As shown in Figure 3, an embodiment of the present invention provides a parallel processing system for account balance data, the system includes:
Map处理模块,包括一个或多个执行第一任务的Map节点;各执行第一任务的Map节点用于读取账户余额明细数据的不同分片数据,生成所读取的分片数据中每一条余额记录的第一输出参数和第二输出参数;其中,所述第一输出参数至少包括账户ID,所述第二输出参数设定为账户状态信息,所述账户状态信息至少包括:余额值、交易日期和当天交易序号;The Map processing module includes one or more Map nodes that execute the first task; each Map node that executes the first task is used to read different fragment data of the account balance detailed data, and generates each item in the read fragment data. The first output parameter and the second output parameter of the balance record; wherein, the first output parameter includes at least an account ID, and the second output parameter is set as account status information, and the account status information includes at least: balance value, Transaction date and transaction serial number of the day;
Reduce处理模块,包括一个或多个执行第一任务的Reduce节点;各执行第一任务的Reduce节点用于读取所述执行第一任务的Map节点处理完毕的不同余额记录,根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录;其中,第一输出参数相同的余额记录由同一个Reduce节点读取。The Reduce processing module includes one or more Reduce nodes that perform the first task; each Reduce node that performs the first task is used to read the different balance records that the Map node that performs the first task has processed, according to the balance record The first output parameter and the second output parameter respectively generate the daily average balance value records of each account; wherein, the balance records with the same first output parameter are read by the same Reduce node.
该系统还可以包括下述特点:The system may also include the following features:
优选地,所述执行第一任务的Reduce节点用于根据所述余额记录的第一输出参数和第二输出参数分别生成各账户的日均余额值记录,包括:按照所述第一输出参数中的账户ID,遍历同一账户的各条余额记录,根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,将每一天的余额值在所述查询起止时间范围内取平均得到该账户的日均余额值,生成该账户的日均余额值记录。Preferably, the Reduce node executing the first task is used to generate the daily average balance value records of each account according to the first output parameter and the second output parameter of the balance record, including: according to the first output parameter Account ID of the same account, traverse each balance record of the same account, determine the balance value of the account for each day within the query start and end time range according to the second output parameter of the balance record, and calculate the balance value of each day at the start and end of the query Take the average within the time range to obtain the daily average balance value of the account, and generate the daily average balance value record of the account.
优选地,所述Map处理模块还包括一个或多个执行第二任务的Map节点,所述Reduce处理模块还包括一个或多个执行第二任务的Reduce节点;Preferably, the Map processing module further includes one or more Map nodes executing the second task, and the Reduce processing module further includes one or more Reduce nodes executing the second task;
各执行第二任务的Map节点用于读取不同账户的日均余额值记录,生成所读取的日均余额值记录的第一输出参数和第二输出参数;其中,所述日均余额值记录的第一输出参数设定为所述日均余额值所在的区间,所述日均余额值记录的第二输出参数设定为1;Each Map node performing the second task is used to read the daily average balance value records of different accounts, and generate the first output parameter and the second output parameter of the read daily average balance value records; wherein, the daily average balance value The first output parameter of the record is set to the interval where the daily average balance value is located, and the second output parameter of the record of the daily average balance value is set to 1;
各执行第二任务的Reduce节点用于读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录,根据所述日均余额值记录的第一输出参数和第二输出参数统计各日均余额值区间的账户数,包括:按照所述第一输出参数中的日均余额值区间,遍历同一日均余额值区间的各条日均余额值记录,将每一条日均余额值记录的第二输出参数进行累加,获得该日均余额值区间的账户数;其中,第一输出参数相同的日均余额值记录由同一个Reduce节点读取。Each Reduce node executing the second task is used to read the different daily average balance value records processed by the Map node executing the second task, and make statistics according to the first output parameter and the second output parameter of the daily average balance value record The number of accounts in each daily average balance value interval, including: according to the daily average balance value interval in the first output parameter, traverse each daily average balance value record in the same daily average balance value interval, and record each daily average balance value The second output parameter of the record is accumulated to obtain the number of accounts in the daily average balance value interval; wherein, the daily average balance value records with the same first output parameter are read by the same Reduce node.
优选地,所述执行第一任务的Reduce节点用于根据所述余额记录的第二输出参数确定该账户在查询起止时间范围内的每一天的余额值,包括:从查询起始时间的当日至查询终止时间的当日,判断每一天是否存在余额记录,如存在,将当日交易序号最大的余额记录的余额值作为当日的最终余额值,如不存在,追溯早于当日且具有余额记录的最近日期,将所述最近日期那一天的交易序号最大的余额记录的余额值作为当日的最终余额值。Preferably, the Reduce node executing the first task is used to determine the balance value of the account for each day within the query start and end time range according to the second output parameter of the balance record, including: from the current day of the query start time to On the day of the query termination time, determine whether there is a balance record for each day. If it exists, use the balance value of the balance record with the largest transaction number on that day as the final balance value of the day. If it does not exist, trace back to the latest date with a balance record that is earlier than the current day , taking the balance value of the balance record with the largest transaction sequence number on the most recent date as the final balance value of the day.
优选地,所述Reduce处理模块还包括第一任务路由模块:Preferably, the Reduce processing module also includes a first task routing module:
所述第一任务路由模块,用于在一个或多个执行第一任务的Reduce节点读取所述执行第一任务的Map节点处理完毕的不同余额记录之前,计算每一条余额记录的第一参数的哈希值,建立第一参数的哈希值与所述执行第一任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Reduce节点根据所述映射关系读取对应的余额记录。The first task routing module is used to calculate the first parameter of each balance record before one or more Reduce nodes executing the first task read the different balance records processed by the Map node executing the first task Hash value, establishes the mapping relationship between the hash value of the first parameter and the Reduce node performing the first task; wherein, the mapping relationship is used for the Reduce node performing the first task according to the mapping relationship Read the corresponding balance record.
优选地,所述Reduce处理模块还包括第二任务路由模块:Preferably, the Reduce processing module also includes a second task routing module:
所述第二任务路由模块,用于在一个或多个执行第二任务的Reduce节点读取所述执行第二任务的Map节点处理完毕的不同日均余额值记录之前,计算每一条日均余额值记录的第一参数的哈希值,建立第一参数的哈希值与执行第二任务的Reduce节点的映射关系;其中,所述映射关系用于供所述执行第二任务的Reduce节点根据所述映射关系读取对应的日均余额值记录。The second task routing module is used to calculate each daily average balance value record before one or more Reduce nodes executing the second task read the different daily average balance value records processed by the Map node executing the second task The hash value of the first parameter of the value record, establishes the mapping relationship between the hash value of the first parameter and the Reduce node performing the second task; wherein, the mapping relationship is used for the Reduce node performing the second task according to The mapping relationship reads the corresponding daily average balance value records.
优选地,所述Map处理模块还包括分片模块:Preferably, the Map processing module also includes a fragmentation module:
所述分片模块,用于在一个或多个执行第一任务的Map节点读取账户余额明细数据的不同分片数据之前,根据查询起止时间确定账户余额明细数据的读取范围,包括:将全量余额明细数据和截止到查询终止时间当天的增量余额明细数据确定为账户余额明细数据的读取范围;将属于该读取范围内的账户余额明细数据分片,建立每一个分片与执行第一任务的Map节点的映射关系;其中,所述映射关系用于供所述执行第一任务的Map节点根据所述映射关系读取对应的分片数据。The sharding module is used to determine the reading range of the account balance detail data according to the query start and end time before one or more Map nodes performing the first task read different shard data of the account balance detail data, including: The full balance detailed data and the incremental balance detailed data up to the day when the query is terminated are determined as the reading range of the account balance detailed data; the account balance detailed data belonging to the reading range is fragmented, and each fragment is established and executed A mapping relationship of the Map node of the first task; wherein, the mapping relationship is used for the Map node executing the first task to read corresponding fragment data according to the mapping relationship.
应用示例Application example
下面给出一个应用例子:统计各账户今年1月1日到1月8日的日均余额,以及各账户日均余额的分布区间。(假设有两个账户:分别是id001和id002)An application example is given below: statistics of the average daily balance of each account from January 1 to January 8 this year, and the distribution range of the average daily balance of each account. (Suppose there are two accounts: id001 and id002 respectively)
两个账户的原始明细数据如表1所示,包括去年(2013年)的部分全量数据以及今年(2014年)1月1日至1月8日的增量数据。The original detailed data of the two accounts are shown in Table 1, including part of the full data of last year (2013) and incremental data from January 1 to January 8 of this year (2014).
表1Table 1
如图4所示,基于MapReduce启动2个任务,对帐户信息进行并行处理和计算。As shown in Figure 4, two tasks are started based on MapReduce to process and calculate account information in parallel.
第一个任务将输入的大规模数据文件分成若干分片交给Map节点并行处理,Map阶段将数据按照<帐户id,账户信息〉的形式进行输出,该步操作对海量的数据进行了分类预处理,将相同帐户的明细数据定向到同一个Reduce节点中合并处理,比如,根据账号id哈希Hash值(针对Reduce节点数量取模)分组并路由给多个Reduce并行处理;Reduce阶段对Map阶段输出的每条帐户的明细数据进行处理,包括:The first task divides the input large-scale data file into several fragments and sends them to the Map node for parallel processing. In the Map stage, the data is output in the form of <account id, account information>. This step classifies and pre-classifies the massive data Processing, directing the detailed data of the same account to the same Reduce node for combined processing, for example, grouping and routing to multiple Reduce nodes for parallel processing according to the account id hash Hash value (modulo the number of Reduce nodes); The output detailed data of each account is processed, including:
a)读取从去年全量文件到查询终止时间当天增量文件;a) Read the incremental files from the full amount of files last year to the query termination time;
b)Map阶段对所有的数据进行分类,输出<帐户id,帐户明细数据信息〉b) The Map stage classifies all the data and outputs <account id, account detail data information>
c)Reduce阶段选出同一个帐户id中明细数据进行组织和处理,确定每一个账户在查询起始时间到终止时间的统计周期内每一天的余额值,然后计算出每个帐户在该统计周期内的日均余额。c) In the Reduce stage, the detailed data in the same account id is selected for organization and processing, and the balance value of each account is determined for each day in the statistical cycle from the query start time to the end time, and then calculated for each account in the statistical cycle average daily balance.
帐户id001和帐户id002的所有数据分别被两个Reduce节点拉取处理。对每一个账户,Reduce节点从查询起始时间的当日至查询终止时间的当日,判断每一天是否存在余额记录,如存在,将当日交易序号最大的余额记录的余额值作为当日的最终余额值,如不存在,追溯早于当日且具有余额记录的最近日期,将所述最近日期那一天的交易序号最大的余额记录的余额值作为当日的最终余额值。All data of account id001 and account id002 are fetched and processed by two Reduce nodes respectively. For each account, the Reduce node judges whether there is a balance record for each day from the day of the query start time to the day of the query end time. If there is, the balance value of the balance record with the largest transaction number on the day is taken as the final balance value of the day. If it does not exist, trace back to the latest date that is earlier than the current day and has a balance record, and use the balance value of the balance record with the largest transaction sequence number on the day of the latest date as the final balance value of the current day.
比如,查询起始时间当天(今年1月1日),002账户有余额记录,则将002账户在2014年1月1日交易序号为1的余额记录的余额值“20元”作为002账户在在2014年1月1日的余额值;001账户在2014年1月1日没有余额记录,则将2013年的全量数据中距离今年最近的一条余额记录(2013年12月25日,交易序号为1的余额记录)的余额值“90元”作为001账户在2014年1月1日的余额值。For example, on the day when the query starts (January 1 this year), and there is a balance record in account 002, the balance value "20 yuan" in the balance record of account 002 on January 1, 2014 with transaction number 1 is taken as the balance value of account 002. The balance value on January 1, 2014; if the 001 account has no balance record on January 1, 2014, the balance record closest to this year in the full amount of data in 2013 (on December 25, 2013, the transaction number is 1) The balance value "90 yuan" is used as the balance value of the 001 account on January 1, 2014.
对002账户,第一Reduce节点确定的002账户在查询起止时间范围内每一天的余额值,如表2所示;For the 002 account, the balance value of the 002 account determined by the first Reduce node within the query start and end time range for each day, as shown in Table 2;
表2Table 2
对001账户,第一Reduce节点确定的002账户在查询起止时间范围内每一天的余额值,如表3所示;For account 001, the balance value of account 002 determined by the first Reduce node within the query start and end time range for each day, as shown in Table 3;
表3table 3
帐户002截止到2014年1月8日的日均余额计算如下:(20+50+15+15+40+60+60+30)/8=36.25(元);The daily average balance of account 002 as of January 8, 2014 is calculated as follows: (20+50+15+15+40+60+60+30)/8=36.25 (yuan);
帐户001截止到2014年1月8日的日均余额计算如下:(90+90+90+90+90+60+60+30)/8=75;The average daily balance of account 001 as of January 8, 2014 is calculated as follows: (90+90+90+90+90+60+60+30)/8=75;
第二个任务根据第一个任务的计算结果,统计出各账户日均余额值的分布情况。The second task calculates the distribution of the daily average balance of each account based on the calculation results of the first task.
根据第一个任务所输出的日均余额,在第二个任务的Map阶段对日均余额进行判断。例如,设置区间[0,15),[15,50),[50,100]。则帐户002的日均余额所属区间为[15,50),帐户001的日均余额所属区间为[50,100]。对账户002,以区间[15,50)作为第一输出参数(key),1作为第二输出参数(value);对账户001,以区间[50,100]作为第一输出参数(key),1作为第二输出参数(value)。在第二个任务的Reduce端,对每一个区间中的value进行累加并输出,结果即为各区间的账户分布数。According to the daily average balance output by the first task, the daily average balance is judged in the Map stage of the second task. For example, set intervals [0,15), [15,50), [50,100]. Then the interval of the average daily balance of account 002 is [15,50), and the interval of the average daily balance of account 001 is [50,100]. For account 002, the interval [15,50] is used as the first output parameter (key), and 1 is used as the second output parameter (value); for account 001, the interval [50,100] is used as the first output parameter (key), and 1 is used as the second output parameter (value). The second output parameter (value). At the Reduce end of the second task, the value in each interval is accumulated and output, and the result is the account distribution number in each interval.
上述实施例提供的一种账户余额数据的并行处理方法和系统,基于MapReduce将大规模的帐户余额明细数据分成若干份交给Map节点并行处理,Map阶段对数据按照帐户进行了分类,处理完成后根据帐户id分组并路由给多个Reduce节点并行处理,从而快速获取大数据量下的帐户日均余额的统计结果,处理效率高、可扩展性强。The method and system for parallel processing of account balance data provided by the above embodiments, based on MapReduce, divides large-scale account balance detailed data into several parts and hands them to Map nodes for parallel processing. In the Map stage, the data is classified according to accounts. After the processing is completed According to the account id, it is grouped and routed to multiple Reduce nodes for parallel processing, so as to quickly obtain the statistical results of the average daily balance of the account under a large amount of data, with high processing efficiency and strong scalability.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现,相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明不限制于任何特定形式的硬件和软件的结合。Those skilled in the art can understand that all or part of the steps in the above method can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, and the like. Optionally, all or part of the steps in the above embodiments can also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the above embodiments can be implemented in the form of hardware, or can be implemented in the form of software function modules. The form is realized. The present invention is not limited to any specific combination of hardware and software.
需要说明的是,本发明还可有其他多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。It should be noted that the present invention can also have other various embodiments, without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and deformations according to the present invention, but these Corresponding changes and deformations should belong to the scope of protection of the appended claims of the present invention.
Claims (14)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410306448.6A CN104050291B (en) | 2014-06-30 | 2014-06-30 | A kind of method for parallel processing and system of account balance data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410306448.6A CN104050291B (en) | 2014-06-30 | 2014-06-30 | A kind of method for parallel processing and system of account balance data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104050291A true CN104050291A (en) | 2014-09-17 |
CN104050291B CN104050291B (en) | 2017-11-10 |
Family
ID=51503123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410306448.6A Active CN104050291B (en) | 2014-06-30 | 2014-06-30 | A kind of method for parallel processing and system of account balance data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104050291B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740063A (en) * | 2014-12-08 | 2016-07-06 | 杭州华为数字技术有限公司 | Data processing method and apparatus |
CN106022901A (en) * | 2015-03-17 | 2016-10-12 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN107357679A (en) * | 2016-05-10 | 2017-11-17 | 银联数据服务有限公司 | A kind of backup method and device |
CN110659265A (en) * | 2019-09-27 | 2020-01-07 | 广州峻林互联科技有限公司 | Distributed parallel database resource management method and system |
CN111680080A (en) * | 2020-04-16 | 2020-09-18 | 中邮消费金融有限公司 | Data processing method and data processing system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100162230A1 (en) * | 2008-12-24 | 2010-06-24 | Yahoo! Inc. | Distributed computing system for large-scale data handling |
CN101799808A (en) * | 2009-02-10 | 2010-08-11 | 中国移动通信集团公司 | Data processing method and system thereof |
CN102467570A (en) * | 2010-11-17 | 2012-05-23 | 日电(中国)有限公司 | Connection query system and method for distributed data warehouse |
-
2014
- 2014-06-30 CN CN201410306448.6A patent/CN104050291B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100162230A1 (en) * | 2008-12-24 | 2010-06-24 | Yahoo! Inc. | Distributed computing system for large-scale data handling |
CN101799808A (en) * | 2009-02-10 | 2010-08-11 | 中国移动通信集团公司 | Data processing method and system thereof |
CN102467570A (en) * | 2010-11-17 | 2012-05-23 | 日电(中国)有限公司 | Connection query system and method for distributed data warehouse |
Non-Patent Citations (1)
Title |
---|
阿都建华: "基于OGSA的网格记帐系统的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740063A (en) * | 2014-12-08 | 2016-07-06 | 杭州华为数字技术有限公司 | Data processing method and apparatus |
CN106022901A (en) * | 2015-03-17 | 2016-10-12 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN107357679A (en) * | 2016-05-10 | 2017-11-17 | 银联数据服务有限公司 | A kind of backup method and device |
CN110659265A (en) * | 2019-09-27 | 2020-01-07 | 广州峻林互联科技有限公司 | Distributed parallel database resource management method and system |
CN110659265B (en) * | 2019-09-27 | 2020-11-24 | 广州峻林互联科技有限公司 | Distributed parallel database resource management method |
CN111680080A (en) * | 2020-04-16 | 2020-09-18 | 中邮消费金融有限公司 | Data processing method and data processing system |
Also Published As
Publication number | Publication date |
---|---|
CN104050291B (en) | 2017-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3117347B1 (en) | Systems and methods for rapid data analysis | |
CN103593436B (en) | file merging method and device | |
WO2017096892A1 (en) | Index construction method, search method, and corresponding device, apparatus, and computer storage medium | |
CN104050291B (en) | A kind of method for parallel processing and system of account balance data | |
CN107220261B (en) | Real-time mining method and device based on distributed data | |
CN106649828B (en) | A data query method and system | |
CN107832407A (en) | For generating the information processing method, device and readable storage medium storing program for executing of knowledge mapping | |
CN103064933A (en) | Data query method and system | |
CN107247811A (en) | SQL statement performance optimization method and device based on oracle database | |
CN105426375A (en) | Relationship network calculation method and apparatus | |
CN112699142A (en) | Cold and hot data processing method and device, electronic equipment and storage medium | |
CN110209732A (en) | Method of data synchronization and device of the relevant database to Hadoop database | |
CN107291770A (en) | The querying method and device of mass data in a kind of distributed system | |
CN106649344B (en) | Weblog compression method and device | |
CN113934767B (en) | A data processing method and device, computer equipment and storage medium | |
CN106648839A (en) | Method and device for processing data | |
CN118760714B (en) | Big data real-time statistical analysis method and system based on distributed computing architecture | |
CN107871055A (en) | A data analysis method and device | |
CN109389271B (en) | Application performance management method and system | |
WO2019153546A1 (en) | Ten-thousand-level dimension data generation method, apparatus and device, and storage medium | |
CN109241388A (en) | A kind of application programming interfaces behavior analysis method and system | |
CN103440351A (en) | Parallel computing method and device of association rule data mining algorithm | |
CN104021169B (en) | A kind of Hive Connection inquiring methods based on the algorithms of SDD 1 | |
CN107422991A (en) | A kind of storage strategy management system | |
CN114116908A (en) | Data management method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20180817 Address after: 200436 Room 411, No. three, JIANGCHANG Road, Jingan District, Shanghai, 411 Patentee after: Shanghai wave Cloud Computing Service Co., Ltd. Address before: 100085 floor 1, C 2-1, No. 2, Shang Di Road, Haidian District, Beijing. Patentee before: Electronic information industry Co.,Ltd of the tide (Beijing) |