CN103744878B - Large-scale Bayesian network parallel inference method based on MapReduce - Google Patents
Large-scale Bayesian network parallel inference method based on MapReduce Download PDFInfo
- Publication number
- CN103744878B CN103744878B CN201310709499.9A CN201310709499A CN103744878B CN 103744878 B CN103744878 B CN 103744878B CN 201310709499 A CN201310709499 A CN 201310709499A CN 103744878 B CN103744878 B CN 103744878B
- Authority
- CN
- China
- Prior art keywords
- value
- bayesian network
- node
- probability
- probability distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Computer And Data Communications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a large-scale Bayesian network parallel inference method based on the MapReduce. The large-scale Bayesian network parallel inference method based on the MapReduce aims at overcoming the defects of low inference efficiency, the large calculated amount and the like brought by a large number of nodes in the Bayesian network or multiple conditional probability parameters of all the nodes; breaking the efficiency bottleneck is taken as the main objective, the large-scale Bayesian network is stored by utilizing a distributed database HBase, the relation between HBase query processing and a Bayesian inference task is established, and parallel inference of the Bayesian network is achieved based on the MapReduce. The method better conforms to features of practical problems in the field of data analysis, medical diagnosis, industrial control, economic forecast and the like, has the better goodness of fit, can remove the limitation on the number of the nodes of the Bayesian network, and provides a supporting technology for expression, inference, application and the like of uncertain knowledge.
Description
Technical field
The invention discloses a kind of large-scale Bayesian network parallel inference method based on mapreduce, it is related to be based on
Large-scale Bayesian network is stored distributed data base hbase, is converted to the probability inference of Bayesian network by mapreduce
The method that data query on hbase processes and realizes Bayesian network probability inference based on mapreduce.Belong to artificial intelligence
Energy and field of information processing.
Background technology
With increasingly various, data scale sharp increase of data acquisition means data form, accumulate in data
Expression containing knowledge, understanding and application, are increasingly paid close attention to by people.Bayesian network (bayesian network) is with artwork
Type contains the uncertainty knowledge of probability distribution and causal connection to express simultaneously, represents random in the way of qualitative and quantitative
Relation of interdependence between variable, becomes current uncertainty knowledge and represents the basic framework with reasoning.Bayesian network is wide
General for fields such as data analysiss, medical diagnosiss, Industry Control, economic forecastings, for example, described in social network based on Bayesian network
Influencing each other between user, the interaction between description genetic fragment, etc..
Because the structure of Bayesian network and reasoning have an exponential time complexity, the traditional Bayesian network interior joint upper limit one
As be tens.But, the growth with Various types of data and the appearance of new application, for reflecting wherein contained uncertainty
The scale of the Bayesian network of knowledge also increasingly increases, and the feature of these large-scale Bayesian network is that number of nodes is many, each node bar
Part probability parameter is many.In recent years, the structure of large-scale Bayesian network and its application in practice increasingly receive publicity, example
As it is big with merge that Lu Yang (<Shanghai Communications University's Master's thesis>, 2013) proposes piecemeal study for gene data analysis
Scale Bayesian network construction method.For the probability inference problem of large-scale Bayesian network, in order to improve Reasoning Efficiency as far as possible,
Ad hoc structure in Bayesian network is used for accelerating probability inference, proposes parallel Accurate Reasoning or approximately push away by known method
The technology such as reason.Hu Chunling etc. (<pattern recognition and artificial intelligence>, 2011,24(6): 846-855) improve based on adjacent tree
Accurate Reasoning algorithm, to (<Yunnan University's journals>, 2010,32(4) such as light armies: 392-395) disappear first Accurate Reasoning to variable
Algorithm carries out parallelization, and Yang Feng (<HeFei University of Technology's Master's thesis>, 2008) carries out approximate resoning, Sun Yong based on sampling techniquess
Prunus mume (sieb.) sieb.et zucc. etc. (<patent 2011110319410>, 2012) introduces body and user feedback to improve inference speed.These methods are from certain
The efficiency of Bayesian network reasoning is improve on degree, but the Bayesian network scale for sustainable growth and complicated reality should
With still providing and a kind of there is extensibility, the universality inference method insensitive to Bayesian network scale.
The large-scale Bayesian network of description uncertainty knowledge, inherently one larger data source, and
Mapreduce is the programming model of effective process magnanimity, distributed data.Knowledge Discovery for mass data and correlation is asked
Topic, using mapreduce as design of Parallel Algorithms and the basis realized, overcoming traditional centralized algorithm cannot for known method
The deficiency of parallel processing mass data, the algorithm making many computation complexities high remains to preferably to adapt to many mass datas dig
Pick and the demand analyzed.Zhou Jiashuai etc. (<patent 201210157463.x>, 2012) proposes based on the big figure of mapreduce
Apart from Connection inquiring method, Li Lixian etc. (<computer system application>, 2013,22(2): 108-111) propose mapreduce
NB Algorithm parallel method under framework, Wang Yuan (<Yunnan University's Master's thesis>, 2013) proposes and is based on
The Bayesian network learning method of mapreduce.But, these are not directed to extensive Bayes based on the method for mapreduce
The reasoning of net.
The present invention is directed to the efficient reasoning problems of large-scale Bayesian network, using operating in hadoop distributed file system
Distributed data base hbase on hdfs, large-scale Bayesian network is considered as large-scale data it is proposed that by extensive pattra leaves
This net storage, to the method for hbase, establishes the relation between Bayesian network reasoning task and distributed networks database query process,
Give the method realizing Bayesian network parallel inference by the query processing of distributed data base based on mapreduce.For
The efficient reasoning of large-scale Bayesian network provide a kind of retractility good, using Open-Source Tools new method, know for uncertainty
Know expression and the application such as correlation analysiss, prediction and decision-making provides a kind of new technical foundation.
Content of the invention
It is an object of the invention to
A kind of large-scale Bayesian network parallel inference method based on mapreduce is provided.For large-scale Bayesian network
Interior joint quantity is many or each node condition probability parameter is many and Reasoning Efficiency that is bringing is low, computationally intensive the shortcomings of, to overcome effect
Rate bottleneck is main target, stores large-scale Bayesian network using distributed data base hbase, sets up hbase query processing
Relation and Bayesian network reasoning task between, and the parallel inference of Bayesian network is realized based on mapreduce.The method being given
More meet the feature of practical problem in the fields such as data analysiss, medical diagnosiss, Industry Control, economic forecasting, there is more preferable kiss
Right, the restriction of Bayesian network number of nodes can be eliminated, the expression of wherein uncertainty knowledge, reasoning and application etc. are provided and props up
Support technology.
The present invention completes according to the following steps
Present invention process flow process is: first, sets up for storing the table of large-scale Bayesian network in hbase data base, and
Based on mapreduce, in<key, value>mode concurrently by the bar of the directed acyclic graph structures of Bayesian network and each node
Part probability parameter table stores in hbase table;Then, the reasoning task on Bayesian network is decomposed, based on mapreduce
Corresponding probability parameter in inquiry hbase in a parallel fashion, and obtained by the multiplication and additive operation of these probability parameters
Marginal probability distribution involved by reasoning task, and then obtain probability inference result.
The distributed storage of large-scale Bayesian network
One Bayesian network is a directed acyclic graph, is expressed asg=(v,e), wherein:v={a 1, …,a n For saving
The set of point,nForgThe number of interior joint;eSet for directed edge;Each nodea i (1≤i≤n) there is a conditional probability ginseng
Number table, is abbreviated as cpt, describesa i Father's set of nodepa(a i ) righta i Impact, comprisea i The conditional probability of different valuesp
(a i |pa(a i )),pa(a i ) bepa(a i ) value.In order to be able to efficiently carry out the probability inference of large-scale Bayesian network, first will
Bayesian network stores on disk, that is, preserve two categories below information: the filiation between Bayesian network interior joint, and each node
Conditional probability parameter list.
For distributed data base hbase operating on hadoop distributed file system hdfs, can pass through
Hadoop host node (namenode) is operating hbase.The storage of Bayesian network is it is simply that by above-mentioned two category informations according to hbase
The each back end (datanode) storing in hadoop platform format distributedly on.
It is respectively directed togIn each nodea i , willa i 、pa(a i ) anda i Conditional probability parameter list with<key, value>
Form, store the table of hbase as a linet bn In, wherein:a i For line identifier;Key is expressed as " row Praenomen=row race member " shape
Formula,a i pa(a i ) it is row Praenomen,a i pa(a i ) for arranging race member;Value isp(a i |pa(a i )).For each nodea i , it is based on
Mapreduce, is concurrently read using map functiona i Each of conditional probability parameter listp(a i |pa(a i )) value, and deposit
Chu Weit bn Middle logical form be (a i ||a i pa(a i )=a i pa(a i ) ||p(a i |pa(a i ))) a line, " | | " is rower
Knowledge, the logical separator of key and value.Thus, hbase can be accessed by hadoop host node, and then support Bayesian network
Reasoning.
The parallel inference of Bayesian network
Probability calculation with posteriority probability calculation as Typical Representative, is several generic tasks of Bayesian network reasoning, and its essence is
Search the conditional probability parameter list of each node and simplify the meter of joint probability distribution using the conditional independence in Bayesian network
Calculate.Bayesian networkgOn posterior probability calculate it is simply that calculating the probability of query node value under given evidence node value situation,
It is expressed asp(q=q|e=e), wherein:eWithqIt is respectively evidence node and query node,e∈v,q∈v,e∩q=φ;eWithqPoint
It is noteWithqValue.
First, decompose probability inference task.
According to, reasoning task is converted top(q=q,e=e) andp(e
=e) this two marginal probability distributions calculating.p(q=q,e=e) it is to be directed toqWithe, and do not existqWitheIn those hidden node
(i.e.v-q-e) be possible to valued combinations situation under joint probability distribution sum, and basisgIn conditional independence will be every
Individual joint probability distribution is converted to the product of series of conditional distribution, obtains conditional probability distribution by inquiring about hbase;p
(e=e) it is to be directed toe, and do not existeIn those hidden node (i.e.v-e) be possible to valued combinations situation under joint probability
Distribution sum.
WillqWithe, andv-q-eThe combination of the be possible to value of interior joint, stores hdfs with document form, is designated ast jdp , each is combined as a line.
Then, hbase is inquired about based on mapreduce, and calculate related joint probability distribution.
Concurrently inquired about using map functiont bn In every a liner, consider successivelyt jdp In all row, result with < key,
Value > form write hdfs filef jdp In, wherein: key ist jdp In compriserArrange the row of race member, value isr's
Value(isgIn a certain conditional probability value);Using reduce function pair filef jdp In<key, value>divide to by key
Group, and each group is had all value multiplications of identical key, thus being directed toqWithe, andv-q-eThe each possibility of interior joint
Joint probability distribution under valued combinations situation.
Then, calculate marginal probability distribution, obtain Posterior probability distribution.
According top(q=q,e=e) andp(e=e) involved by evidence node value, query node value and hidden node value combination,
The result that reduce function is obtained adds up, thus obtaining marginal probability distributionp(q=q,e=e) andp(e=e), finally give
Required Posterior probability distribution, completes probability inference task.
In above step (1) ~ (2), large-scale Bayesian network is considered as large-scale data, large-scale Bayesian network is general
Rate reasoning task is converted to the query processing on large-scale data, is effectively utilized distributed data base to mass data storage
With the technical advantage of random access, thus efficiently carrying out the probability inference of Bayesian network in a parallel fashion.
, the present invention has the advantage that and good effect compared with known technology
(1) overcome the efficiency in the Bayesian network inference method such as known Accurate Reasoning, parallel inference and approximate resoning
Bottleneck, it is not necessary to limit to the scale of Bayesian network, can be achieved with the efficient reasoning of large-scale Bayesian network, the side of proposition
Method has preferable scalability.
(2) large-scale Bayesian network is considered as large-scale data, by the probability inference Task Switching of large-scale Bayesian network
For the query processing on large-scale data, it is a kind of new technique solving large-scale Bayesian network reasoning problems.
(3) known centralized Bayesian network reasoning pattern be extend to distributed situation, using large-scale data
Storage and Query Processing Technique, Open Source Platform and the system and be used widely at present and approve such as including hdfs and hbase
Mapreduce programming model, be efficiently completed the parallel inference of large-scale Bayesian network, method is easily achieved, have preferably
Extensibility.
(4) storage of large-scale Bayesian network and parallel inference method, more can truly describe social media data
Uncertainty knowledge in analysis, medical diagnosiss and the field such as information service, bioinformatics, meets that variable is many, relation complexity
Etc. potential feature, for practical application, have more more general and versatility than known technology.
In a word, establish a kind of efficient inference method not being subject to Bayesian network size limit, meet data analysiss, medical treatment is examined
Uncertainty knowledge in the fields such as disconnected, Industry Control, economic forecasting find and related application internal need.Extensive pattra leaves
This computational methods of this web, high efficiency reasoning, are a series of practical applications using Bayesian network as knowledge representation and reasoning framework
Provide strong key technology to support.
Brief description
The Technology Roadmap of Fig. 1 present invention.Including three below part: the distributed storage of large-scale Bayesian network, generally
Rate reasoning task decomposes, parallel inference.
The directed acyclic graph structures of Fig. 2 " credit card fraud detection " Bayesian network.Nodef、g、j、a、sRepresent respectively and swipe the card
Whether there is fraudulent, whether have purchased liquefied gas, whether have purchased jewelry, credit card holder age, credit card holder
Sex.
Specific embodiment
Embodiment: " credit card fraud detection " Bayesian network reasoning
(1) distributed storage of Bayesian network
For storage " credit card fraud detection " Bayesian networkt bn In nodef、g、j、a、s, parallel using map function
Ground reads each of conditional probability parameter list of node value, and it is stored with<key, value>formt bn In.Right
InfIf havingp(f=f 1)=0.1 Hep(f=f 2)=0.9, then store hbase data baset bn Row in table withfAs line identifier,
Row race be (f=f 1| | 0.1) and (f=f 2|| 0.9).Storage " credit card fraud detection " Bayesian networkt bn As shown in table 1.
Decompose probability inference task
If known evidence node value (a=a 4,j=j 1), query node valuef=f 1, reasoning task is: calculatesp(f=f 1|a=a 4,j=j 1).Due to, therefore by this reasoning task
Be converted top(f=f 1,a=a 4,j=j 1) andp(a=a 4,j=j 1) this two marginal probability distributions calculating.Forp(f=f 1,a=a 4,j=j 1) for,gWithsFor hidden node, this marginal probability distribution is exactlyf=f 1、a=a 4、j=j 1, andgWithsDifferent value groups
Close the joint probability distribution sum under situation;Forp(a=a 4,j=j 1) for,f、gWithsFor hidden node, this marginal probability distribution
It is exactlya=a 4、j=j 1, andf、gWithsJoint probability distribution sum under different valued combinations situations, by the group of these probable values
Close storage and arrive filet jdp In.
Table 1 stores " credit card fraud detection " Bayesian networkt bn
Table 2 stores the file of the combination of possible valuet jdp
| a 4 j 1 f 1 g 1 s 1 |
| a 4 j 1 f 1 g 1 s 2 |
| a 4 j 1 f 1 g 2 s 1 |
| a 4 j 1 f 1 g 2 s 2 |
| a 4 j 1 f 2 g 1 s 1 |
| a 4 j 1 f 2 g 1 s 2 |
| a 4 j 1 f 2 g 2 s 1 |
| a 4 j 1 f 2 g 2 s 2 |
(3) it is based on mapreduce and inquires about hbase, and calculate related joint probability distribution
Concurrently inquired about using map functiont bn In every a line, and witht jdp In all row be compared successively, fort bn In the first row, take out line identifier befRow race memberf 1It is known thatt jdp In comprisef 1Row havea 4 j 1 f 1 g 1 s 1、a 4 j 1 f 1 g 1 s 2、a 4 j 1 f 1 g 2 s 1Witha 4 j 1 f 1 g 2 s 2, therefore using these as key, willt bn The probit of middle current line
0.1 as value, with<key, value>form will<a 4 j 1 f 1 g 1 s 1, 0.1>、<a 4 j 1 f 1 g 1 s 2, 0.1>、<a 4 j 1 f 1 g 2 s 1, 0.1>and<a 4 j 1 f 1 g 2 s 2, 0.1 > storef jdp In.In the same way, for other in table 1
OK, corresponding<key, value>is storedf jdp In.
Using reduce function pair filef jdp Middle key identical value is multiplied, thus obtaining
p(a 4 j 1 f 1 g 1 s 1)=0.0009,p(a 4 j 1 f 1 g 1 s 2)=0.0013,p(a 4 j 1 f 1 g 2 s 1)=0.0036,
p(a 4 j 1 f 1 g 2 s 2)=0.0052,p(a 4 j 1 f 2 g 1 s 1)=0.00036,p(a 4 j 1 f 2 g 1 s 2)=0.00036,
p(a 4 j 1 f 2 g 2 s 1)=0.03564,p(a 4 j 1 f 2 g 2 s 2)=0.06237.
Calculate marginal probability distribution, obtain Posterior probability distribution.
According top(f=f 1,a=a 4,j=j 1) andp(a=a 4,j=j 1) involved by evidence node value, query node value and hidden
The combination of nodal value, the result that above reduce function is obtained adds up, and obtains
Finally give required Posterior probability distribution
Thus completing probability inference task.
Claims (2)
1. a kind of large-scale Bayesian network parallel inference method based on mapreduce it is characterised in that: complete according to the following steps
Become:
(1) distributed storage of large-scale Bayesian network
One Bayesian network is a directed acyclic graph, is expressed as g=(v, e), wherein: v={ a1,…,anFor node collection
Close, n is the number of g interior joint;E is the set of directed edge;Each node ai(1≤i≤n) has a conditional probability parameter list,
It is abbreviated as cpt, describe aiFather set of node pa (ai) to aiImpact, comprise aiConditional probability p (a of different valuesi|pa
(ai)), pa (ai) it is pa (ai) value, in order to be able to efficiently carry out the probability inference of large-scale Bayesian network, first by Bayes
Net storage on disk, that is, preserves two categories below information: the filiation between Bayesian network interior joint, and the condition of each node
Probability parameter table;
For distributed data base hbase operating on hadoop distributed file system hdfs, hadoop master can be passed through
Operating hbase, the storage of Bayesian network is it is simply that divide above-mentioned two category informations according to the form of hbase for node (namenode)
Store to cloth on each back end (datanode) in hadoop platform;
It is respectively directed to each node a in gi, by ai、pa(ai) and aiConditional probability parameter list with<key, value>form, work
Store the table t of hbase for a linebnIn, wherein: aiFor line identifier;Key is expressed as " row Praenomen=row race member " form,
aipa(ai) it is row Praenomen, aipa(ai) for arranging race member;Value is p (ai|pa(ai)), for each node ai, it is based on
Mapreduce, concurrently reads a using map functioniEach of conditional probability parameter list p (ai|pa(ai)) value, and deposit
Store up as tbnMiddle logical form is (ai||aipa(ai)=aipa(ai)||p(ai|pa(ai))) a line, " | | " is line identifier, key
With the logical separator of value, thus, can access hbase by hadoop host node, and then support pushing away of Bayesian network
Reason;
(2) parallel inference of Bayesian network
Probability calculation with posteriority probability calculation as Typical Representative, is several generic tasks of Bayesian network reasoning, and its essence is to look for
The conditional probability parameter list of each node simultaneously simplifies the calculating of joint probability distribution using the conditional independence in Bayesian network,
Posterior probability on Bayesian network g calculates it is simply that calculating the probability of query node value under given evidence node value situation, represents
For p (q=q | e=e), wherein: e and q is respectively evidence node and query node, e ∈ v, q ∈ v, e ∩ q=φ;E and q is respectively
For the value of e and q,
First, decompose probability inference task,
According to, reasoning task is converted to p (q=q, e=e) and p (e
=e) this two marginal probability distributions calculating, not those hidden node in q and e are designated as v-q-e, p (q=q, e=e) is
For the joint probability distribution sum under the be possible to valued combinations situation of q and e and v-q-e and only according to the condition in g
Each joint probability distribution is converted to the product of series of conditional distribution by vertical property, obtains condition by inquiry hbase general
Rate is distributed;P (e=e) is for e and the not joint under the be possible to valued combinations situation of those hidden node in e
Probability distribution sum, the combination of q and e and the be possible to value of v-q-e interior joint stores hdfs with document form, note
For tjdp, each is combined as a line;
Then, hbase is inquired about based on mapreduce, and calculates related joint probability distribution,
Concurrently inquire about t using map functionbnIn every a line r, successively consider tjdpIn all row, result is with < key, value
> form write hdfs file fjdpIn, wherein: key is tjdpIn comprise r row race member row, value is r
Value, i.e. a certain conditional probability value in g;Using reduce function pair file fjdpIn<key, value>divide to by key
Group, and each group is had all value multiplications of identical key, thus obtaining for q and e and each possibility of v-q-e interior joint
Joint probability distribution under valued combinations situation,
Then, calculate marginal probability distribution, obtain Posterior probability distribution,
Combination according to p (q=q, e=e) and evidence node value, query node value and hidden node value involved by p (e=e), will
The result that reduce function obtains adds up, thus obtaining marginal probability distribution p (q=q, e=e) and p (e=e), finally gives
Required Posterior probability distribution, completes probability inference task.
2. a kind of large-scale Bayesian network parallel inference method based on mapreduce according to claim 1, its feature
It is: methods described refers to the Bayesian network inference method of " credit card fraud detection ", completes according to the following steps:
(1) distributed storage of Bayesian network
T for storage " credit card fraud detection " Bayesian networkbnIn node f, g, j, a, s, concurrently read using map function
Take each of the conditional probability parameter list of node value, and it is stored t with<key, value>formbnIn, for f, if
There is p (f=f1)=0.1 and p (f=f2)=0.9, then store hbase data base tbnRow in table, using f as line identifier, arranges
Race is (f=f1| | 0.1) and (f=f2| | 0.9), the t of storage " credit card fraud detection " Bayesian networkbnAs shown in table 1,
Table 1 stores the t of " credit card fraud detection " Bayesian networkbn
(2) decompose probability inference task
If known evidence node value (a=a4, j=j1), query node value f=f1, reasoning task is: calculates p (f=f1| a=a4,
J=j1), due to, therefore by this reasoning task
Be converted to p (f=f1, a=a4, j=j1) and p (a=a4, j=j1) this two marginal probability distributions calculating, for p (f=f1,
A=a4, j=j1) for, g and s is hidden node, and this marginal probability distribution is exactly f=f1, a=a4, j=j1, and g with s different
Joint probability distribution sum under valued combinations situation;For p (a=a4, j=j1) for, f, g and s are hidden node, this edge
Probability distribution is exactly a=a4, j=j1, and f, g valued combinations situation different with s under joint probability distribution sum, by these
The combination of probable value stores file tjdpIn;
(3) it is based on mapreduce and inquires about hbase, and calculate related joint probability distribution
Concurrently inquire about t using map functionbnIn every a line, and and tjdpIn all row be compared successively, for tbnIn
The first row, take out line identifier be f row race member f1It is known that tjdpIn comprise f1Row have a4j1f1g1s1、a4j1f1g1s2、
a4j1f1g2s1And a4j1f1g2s2, therefore using these as key, by tbnThe probit 0.1 of middle current line as value, with
<key, value>form is by<a4j1f1g1s1,0.1>、<a4j1f1g1s2,0.1>、<a4j1f1g2s1, 0.1>and<a4j1f1g2s2,0.1
> store fjdpIn, in the same way, for other row in table 1, corresponding<key, value>is stored fjdpIn,
Using reduce function pair file fjdpMiddle key identical value is multiplied, thus obtaining
p(a4j1f1g1s1)=0.0009, p (a4j1f1g1s2)=0.0013, p (a4j1f1g2s1)=0.0036,
p(a4j1f1g2s2)=0.0052, p (a4j1f2g1s1)=0.00036, p (a4j1f2g1s2)=0.00036,
p(a4j1f2g2s1)=0.03564, p (a4j1f2g2s2)=0.06237;
(4) calculate marginal probability distribution, obtain Posterior probability distribution
According to p (f=f1, a=a4, j=j1) and p (a=a4, j=j1) involved by evidence node value, query node value and hidden section
The combination of point value, the result that above reduce function is obtained adds up, and obtains
Finally give required Posterior probability distribution
Thus completing probability inference task.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310709499.9A CN103744878B (en) | 2013-12-21 | 2013-12-21 | Large-scale Bayesian network parallel inference method based on MapReduce |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310709499.9A CN103744878B (en) | 2013-12-21 | 2013-12-21 | Large-scale Bayesian network parallel inference method based on MapReduce |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN103744878A CN103744878A (en) | 2014-04-23 |
| CN103744878B true CN103744878B (en) | 2017-02-01 |
Family
ID=50501896
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201310709499.9A Expired - Fee Related CN103744878B (en) | 2013-12-21 | 2013-12-21 | Large-scale Bayesian network parallel inference method based on MapReduce |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN103744878B (en) |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106294323B (en) * | 2016-08-10 | 2020-03-06 | 上海交通大学 | Methods for commonsense causal inference on short texts |
| CN106446145A (en) * | 2016-09-21 | 2017-02-22 | 郑州云海信息技术有限公司 | Quick creation method based on Hadoop for big data index |
| CN108134680B (en) * | 2016-11-30 | 2019-11-29 | 中国科学院沈阳自动化研究所 | A kind of systematic survey node optimization configuration method based on Bayesian network |
| CN108154380A (en) * | 2017-04-28 | 2018-06-12 | 华侨大学 | The method for carrying out the online real-time recommendation of commodity to user based on extensive score data |
| CN111681044A (en) * | 2020-05-28 | 2020-09-18 | 中国工商银行股份有限公司 | Method and device for processing point exchange cheating behaviors |
| WO2024055191A1 (en) * | 2022-09-14 | 2024-03-21 | Huawei Technologies Co., Ltd. | Methods, system, and apparatus for inference using probability information |
| CN120045594B (en) * | 2025-04-25 | 2025-07-04 | 中国人民解放军国防科技大学 | A distance range connection query visualization method, device, equipment and medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6076083A (en) * | 1995-08-20 | 2000-06-13 | Baker; Michelle | Diagnostic system utilizing a Bayesian network model having link weights updated experimentally |
| CN102194145A (en) * | 2011-06-15 | 2011-09-21 | 天津大学 | Bayesian network method for autonomously fusing prior knowledge |
| CN102724199A (en) * | 2012-06-26 | 2012-10-10 | 北京航空航天大学 | Attack intention recognition method based on Bayesian network inference |
| CN103455842A (en) * | 2013-09-04 | 2013-12-18 | 福州大学 | Credibility measuring method combining Bayesian algorithm and MapReduce |
-
2013
- 2013-12-21 CN CN201310709499.9A patent/CN103744878B/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6076083A (en) * | 1995-08-20 | 2000-06-13 | Baker; Michelle | Diagnostic system utilizing a Bayesian network model having link weights updated experimentally |
| CN102194145A (en) * | 2011-06-15 | 2011-09-21 | 天津大学 | Bayesian network method for autonomously fusing prior knowledge |
| CN102724199A (en) * | 2012-06-26 | 2012-10-10 | 北京航空航天大学 | Attack intention recognition method based on Bayesian network inference |
| CN103455842A (en) * | 2013-09-04 | 2013-12-18 | 福州大学 | Credibility measuring method combining Bayesian algorithm and MapReduce |
Non-Patent Citations (1)
| Title |
|---|
| 基于MapReduce的并行贝叶斯分类算法的设计与实现;丁光华 等;《微计算机信息》;20100331;第26卷(第3-3期);第176、190-191页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN103744878A (en) | 2014-04-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN103744878B (en) | Large-scale Bayesian network parallel inference method based on MapReduce | |
| Kim et al. | Hats: A hierarchical graph attention network for stock movement prediction | |
| Song et al. | Flexible job-shop scheduling via graph neural network and deep reinforcement learning | |
| Rendon-Sanchez et al. | Structural combination of seasonal exponential smoothing forecasts applied to load forecasting | |
| Yang et al. | Bayesian deep learning-based probabilistic load forecasting in smart grids | |
| Marinakis et al. | Particle swarm optimization for the vehicle routing problem: a survey and a comparative analysis | |
| Li | Credibilistic programming | |
| Alkhateeb et al. | A survey for recent applications and variants of nature-inspired immune search algorithm | |
| Pareek et al. | A review report on knowledge discovery in databases and various techniques of data mining | |
| Guan et al. | Ridesharing in urban areas: multi-objective optimisation approach for ride-matching and routeing with commuters’ dynamic mode choice | |
| Abdou et al. | Tourism demand modelling and forecasting: A Review Literature | |
| Jaddi et al. | Multi-population kidney-inspired algorithm with migration policy selections for feature selection problems | |
| Skulimowski | A foresight support system to manage knowledge on information society evolution | |
| Changdar et al. | A modified ant colony optimisation based approach to solve sub-tour constant travelling salesman problem | |
| Basha et al. | Recent Trends in Sustainable Big Data Predictive Analytics: Past Contributions and Future Roadmap | |
| Islam et al. | A framework for effective big data analytics for decision support systems | |
| Sikarwar et al. | A review on social network analysis methods and algorithms | |
| Chen et al. | Incremental community detection on large complex attributed network | |
| Sindhura et al. | Human resource management based economic analysis using data mining | |
| Anderer et al. | Forecasting reconciliation with a top-down alignment of independent level forecasts | |
| Sakawa et al. | Interactive fuzzy multiobjective stochastic programming with simple recourse | |
| Jiang et al. | A Data-Driven Evolutionary Algorithm for Dynamic Vehicle Routing Problems With Time Windows Under Limited Computational Time | |
| CN114511205A (en) | AI big data classification decision optimization system for situation interference recovery of smart city | |
| Kumar et al. | An analysis and literature review of algorithms for frequent itemset mining | |
| Shi-Ting | Management of tourism resources and demand based on neural networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20191104 Address after: Room 302, tianyujiayuan, intersection of Wanhua road and Huancheng North Road, Panlong District, Kunming City, Yunnan Province Patentee after: Yunnan yunshanghui Network Technology Co.,Ltd. Address before: College of information Yunnan University No. 2 Yunnan province Kunming City Lake Road 650091 Patentee before: Yunnan University |
|
| TR01 | Transfer of patent right | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170201 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |