CN103744878B

CN103744878B - Large-scale Bayesian network parallel inference method based on MapReduce

Info

Publication number: CN103744878B
Application number: CN201310709499.9A
Authority: CN
Inventors: 岳昆; 徐娟; 方启宇; 张骥先; 田凯琳; 刘惟; 刘惟一
Original assignee: Yunnan University YNU
Current assignee: Yunnan Yunshanghui Network Technology Co ltd
Priority date: 2013-12-21
Filing date: 2013-12-21
Publication date: 2017-02-01
Anticipated expiration: 2033-12-21
Also published as: CN103744878A

Abstract

The invention relates to a large-scale Bayesian network parallel inference method based on the MapReduce. The large-scale Bayesian network parallel inference method based on the MapReduce aims at overcoming the defects of low inference efficiency, the large calculated amount and the like brought by a large number of nodes in the Bayesian network or multiple conditional probability parameters of all the nodes; breaking the efficiency bottleneck is taken as the main objective, the large-scale Bayesian network is stored by utilizing a distributed database HBase, the relation between HBase query processing and a Bayesian inference task is established, and parallel inference of the Bayesian network is achieved based on the MapReduce. The method better conforms to features of practical problems in the field of data analysis, medical diagnosis, industrial control, economic forecast and the like, has the better goodness of fit, can remove the limitation on the number of the nodes of the Bayesian network, and provides a supporting technology for expression, inference, application and the like of uncertain knowledge.

Description

A kind of large-scale Bayesian network parallel inference method based on mapreduce

Technical field

The invention discloses a kind of large-scale Bayesian network parallel inference method based on mapreduce, it is related to be based on Large-scale Bayesian network is stored distributed data base hbase, is converted to the probability inference of Bayesian network by mapreduce The method that data query on hbase processes and realizes Bayesian network probability inference based on mapreduce.Belong to artificial intelligence Energy and field of information processing.

Background technology

With increasingly various, data scale sharp increase of data acquisition means data form, accumulate in data Expression containing knowledge, understanding and application, are increasingly paid close attention to by people.Bayesian network (bayesian network) is with artwork Type contains the uncertainty knowledge of probability distribution and causal connection to express simultaneously, represents random in the way of qualitative and quantitative Relation of interdependence between variable, becomes current uncertainty knowledge and represents the basic framework with reasoning.Bayesian network is wide General for fields such as data analysiss, medical diagnosiss, Industry Control, economic forecastings, for example, described in social network based on Bayesian network Influencing each other between user, the interaction between description genetic fragment, etc..

Because the structure of Bayesian network and reasoning have an exponential time complexity, the traditional Bayesian network interior joint upper limit one As be tens.But, the growth with Various types of data and the appearance of new application, for reflecting wherein contained uncertainty The scale of the Bayesian network of knowledge also increasingly increases, and the feature of these large-scale Bayesian network is that number of nodes is many, each node bar Part probability parameter is many.In recent years, the structure of large-scale Bayesian network and its application in practice increasingly receive publicity, example As it is big with merge that Lu Yang (<Shanghai Communications University's Master's thesis>, 2013) proposes piecemeal study for gene data analysis Scale Bayesian network construction method.For the probability inference problem of large-scale Bayesian network, in order to improve Reasoning Efficiency as far as possible, Ad hoc structure in Bayesian network is used for accelerating probability inference, proposes parallel Accurate Reasoning or approximately push away by known method The technology such as reason.Hu Chunling etc. (<pattern recognition and artificial intelligence>, 2011,24(6): 846-855) improve based on adjacent tree Accurate Reasoning algorithm, to (<Yunnan University's journals>, 2010,32(4) such as light armies: 392-395) disappear first Accurate Reasoning to variable Algorithm carries out parallelization, and Yang Feng (<HeFei University of Technology's Master's thesis>, 2008) carries out approximate resoning, Sun Yong based on sampling techniquess Prunus mume (sieb.) sieb.et zucc. etc. (<patent 2011110319410>, 2012) introduces body and user feedback to improve inference speed.These methods are from certain The efficiency of Bayesian network reasoning is improve on degree, but the Bayesian network scale for sustainable growth and complicated reality should With still providing and a kind of there is extensibility, the universality inference method insensitive to Bayesian network scale.

The large-scale Bayesian network of description uncertainty knowledge, inherently one larger data source, and Mapreduce is the programming model of effective process magnanimity, distributed data.Knowledge Discovery for mass data and correlation is asked Topic, using mapreduce as design of Parallel Algorithms and the basis realized, overcoming traditional centralized algorithm cannot for known method The deficiency of parallel processing mass data, the algorithm making many computation complexities high remains to preferably to adapt to many mass datas dig Pick and the demand analyzed.Zhou Jiashuai etc. (<patent 201210157463.x>, 2012) proposes based on the big figure of mapreduce Apart from Connection inquiring method, Li Lixian etc. (<computer system application>, 2013,22(2): 108-111) propose mapreduce NB Algorithm parallel method under framework, Wang Yuan (<Yunnan University's Master's thesis>, 2013) proposes and is based on The Bayesian network learning method of mapreduce.But, these are not directed to extensive Bayes based on the method for mapreduce The reasoning of net.

The present invention is directed to the efficient reasoning problems of large-scale Bayesian network, using operating in hadoop distributed file system Distributed data base hbase on hdfs, large-scale Bayesian network is considered as large-scale data it is proposed that by extensive pattra leaves This net storage, to the method for hbase, establishes the relation between Bayesian network reasoning task and distributed networks database query process, Give the method realizing Bayesian network parallel inference by the query processing of distributed data base based on mapreduce.For The efficient reasoning of large-scale Bayesian network provide a kind of retractility good, using Open-Source Tools new method, know for uncertainty Know expression and the application such as correlation analysiss, prediction and decision-making provides a kind of new technical foundation.

Content of the invention

It is an object of the invention to

A kind of large-scale Bayesian network parallel inference method based on mapreduce is provided.For large-scale Bayesian network Interior joint quantity is many or each node condition probability parameter is many and Reasoning Efficiency that is bringing is low, computationally intensive the shortcomings of, to overcome effect Rate bottleneck is main target, stores large-scale Bayesian network using distributed data base hbase, sets up hbase query processing Relation and Bayesian network reasoning task between, and the parallel inference of Bayesian network is realized based on mapreduce.The method being given More meet the feature of practical problem in the fields such as data analysiss, medical diagnosiss, Industry Control, economic forecasting, there is more preferable kiss Right, the restriction of Bayesian network number of nodes can be eliminated, the expression of wherein uncertainty knowledge, reasoning and application etc. are provided and props up Support technology.

The present invention completes according to the following steps

Present invention process flow process is: first, sets up for storing the table of large-scale Bayesian network in hbase data base, and Based on mapreduce, in<key, value>mode concurrently by the bar of the directed acyclic graph structures of Bayesian network and each node Part probability parameter table stores in hbase table；Then, the reasoning task on Bayesian network is decomposed, based on mapreduce Corresponding probability parameter in inquiry hbase in a parallel fashion, and obtained by the multiplication and additive operation of these probability parameters Marginal probability distribution involved by reasoning task, and then obtain probability inference result.

The distributed storage of large-scale Bayesian network

One Bayesian network is a directed acyclic graph, is expressed asg=(v,e), wherein:v={a ₁, …,a _nFor saving The set of point,nForgThe number of interior joint；eSet for directed edge；Each nodea _i(1≤i≤n) there is a conditional probability ginseng Number table, is abbreviated as cpt, describesa _iFather's set of nodepa(a _i) righta _iImpact, comprisea _iThe conditional probability of different valuesp (a _i|pa(a _i)),pa(a _i) bepa(a _i) value.In order to be able to efficiently carry out the probability inference of large-scale Bayesian network, first will Bayesian network stores on disk, that is, preserve two categories below information: the filiation between Bayesian network interior joint, and each node Conditional probability parameter list.

For distributed data base hbase operating on hadoop distributed file system hdfs, can pass through Hadoop host node (namenode) is operating hbase.The storage of Bayesian network is it is simply that by above-mentioned two category informations according to hbase The each back end (datanode) storing in hadoop platform format distributedly on.

It is respectively directed togIn each nodea _i, willa _i、pa(a _i) anda _iConditional probability parameter list with<key, value> Form, store the table of hbase as a linet _bnIn, wherein:a _iFor line identifier；Key is expressed as " row Praenomen=row race member " shape Formula,a _i pa(a _i) it is row Praenomen,a _i pa(a _i) for arranging race member；Value isp(a _i|pa(a _i)).For each nodea _i, it is based on Mapreduce, is concurrently read using map functiona _iEach of conditional probability parameter listp(a _i|pa(a _i)) value, and deposit Chu Weit _bnMiddle logical form be (a _i||a _i pa(a _i)=a _i pa(a _i) ||p(a _i|pa(a _i))) a line, " | | " is rower Knowledge, the logical separator of key and value.Thus, hbase can be accessed by hadoop host node, and then support Bayesian network Reasoning.

The parallel inference of Bayesian network

Probability calculation with posteriority probability calculation as Typical Representative, is several generic tasks of Bayesian network reasoning, and its essence is Search the conditional probability parameter list of each node and simplify the meter of joint probability distribution using the conditional independence in Bayesian network Calculate.Bayesian networkgOn posterior probability calculate it is simply that calculating the probability of query node value under given evidence node value situation, It is expressed asp(q=q|e=e), wherein:eWithqIt is respectively evidence node and query node,e∈v,q∈v,e∩q=φ；eWithqPoint It is noteWithqValue.

First, decompose probability inference task.

According to, reasoning task is converted top(q=q,e=e) andp(e =e) this two marginal probability distributions calculating.p(q=q,e=e) it is to be directed toqWithe, and do not existqWitheIn those hidden node (i.e.v-q-e) be possible to valued combinations situation under joint probability distribution sum, and basisgIn conditional independence will be every Individual joint probability distribution is converted to the product of series of conditional distribution, obtains conditional probability distribution by inquiring about hbase；p (e=e) it is to be directed toe, and do not existeIn those hidden node (i.e.v-e) be possible to valued combinations situation under joint probability Distribution sum.

WillqWithe, andv-q-eThe combination of the be possible to value of interior joint, stores hdfs with document form, is designated ast _jdp, each is combined as a line.

Then, hbase is inquired about based on mapreduce, and calculate related joint probability distribution.

Concurrently inquired about using map functiont _bnIn every a liner, consider successivelyt _jdpIn all row, result with < key, Value > form write hdfs filef _jdpIn, wherein: key ist _jdpIn compriserArrange the row of race member, value isr's Value(isgIn a certain conditional probability value)；Using reduce function pair filef _jdpIn<key, value>divide to by key Group, and each group is had all value multiplications of identical key, thus being directed toqWithe, andv-q-eThe each possibility of interior joint Joint probability distribution under valued combinations situation.

Then, calculate marginal probability distribution, obtain Posterior probability distribution.

According top(q=q,e=e) andp(e=e) involved by evidence node value, query node value and hidden node value combination, The result that reduce function is obtained adds up, thus obtaining marginal probability distributionp(q=q,e=e) andp(e=e), finally give Required Posterior probability distribution, completes probability inference task.

In above step (1) ~ (2), large-scale Bayesian network is considered as large-scale data, large-scale Bayesian network is general Rate reasoning task is converted to the query processing on large-scale data, is effectively utilized distributed data base to mass data storage With the technical advantage of random access, thus efficiently carrying out the probability inference of Bayesian network in a parallel fashion.

, the present invention has the advantage that and good effect compared with known technology

(1) overcome the efficiency in the Bayesian network inference method such as known Accurate Reasoning, parallel inference and approximate resoning Bottleneck, it is not necessary to limit to the scale of Bayesian network, can be achieved with the efficient reasoning of large-scale Bayesian network, the side of proposition Method has preferable scalability.

(2) large-scale Bayesian network is considered as large-scale data, by the probability inference Task Switching of large-scale Bayesian network For the query processing on large-scale data, it is a kind of new technique solving large-scale Bayesian network reasoning problems.

(3) known centralized Bayesian network reasoning pattern be extend to distributed situation, using large-scale data Storage and Query Processing Technique, Open Source Platform and the system and be used widely at present and approve such as including hdfs and hbase Mapreduce programming model, be efficiently completed the parallel inference of large-scale Bayesian network, method is easily achieved, have preferably Extensibility.

(4) storage of large-scale Bayesian network and parallel inference method, more can truly describe social media data Uncertainty knowledge in analysis, medical diagnosiss and the field such as information service, bioinformatics, meets that variable is many, relation complexity Etc. potential feature, for practical application, have more more general and versatility than known technology.

In a word, establish a kind of efficient inference method not being subject to Bayesian network size limit, meet data analysiss, medical treatment is examined Uncertainty knowledge in the fields such as disconnected, Industry Control, economic forecasting find and related application internal need.Extensive pattra leaves This computational methods of this web, high efficiency reasoning, are a series of practical applications using Bayesian network as knowledge representation and reasoning framework Provide strong key technology to support.

Brief description

The Technology Roadmap of Fig. 1 present invention.Including three below part: the distributed storage of large-scale Bayesian network, generally Rate reasoning task decomposes, parallel inference.

The directed acyclic graph structures of Fig. 2 " credit card fraud detection " Bayesian network.Nodef、g、j、a、sRepresent respectively and swipe the card Whether there is fraudulent, whether have purchased liquefied gas, whether have purchased jewelry, credit card holder age, credit card holder Sex.

Specific embodiment

Embodiment: " credit card fraud detection " Bayesian network reasoning

(1) distributed storage of Bayesian network

For storage " credit card fraud detection " Bayesian networkt _bnIn nodef、g、j、a、s, parallel using map function Ground reads each of conditional probability parameter list of node value, and it is stored with<key, value>formt _bnIn.Right InfIf havingp(f=f ₁)=0.1 Hep(f=f ₂)=0.9, then store hbase data baset _bnRow in table withfAs line identifier, Row race be (f=f ₁| | 0.1) and (f=f ₂|| 0.9).Storage " credit card fraud detection " Bayesian networkt _bnAs shown in table 1.

Decompose probability inference task

If known evidence node value (a=a ₄,j=j ₁), query node valuef=f ₁, reasoning task is: calculatesp(f=f ₁|a=a ₄,j=j ₁).Due to, therefore by this reasoning task Be converted top(f=f ₁,a=a ₄,j=j ₁) andp(a=a ₄,j=j ₁) this two marginal probability distributions calculating.Forp(f=f ₁,a=a ₄,j=j ₁) for,gWithsFor hidden node, this marginal probability distribution is exactlyf=f ₁、a=a ₄、j=j ₁, andgWithsDifferent value groups Close the joint probability distribution sum under situation；Forp(a=a ₄,j=j ₁) for,f、gWithsFor hidden node, this marginal probability distribution It is exactlya=a ₄、j=j ₁, andf、gWithsJoint probability distribution sum under different valued combinations situations, by the group of these probable values Close storage and arrive filet _jdpIn.

Table 1 stores " credit card fraud detection " Bayesian networkt _bn

Table 2 stores the file of the combination of possible valuet _jdp

a ₄ j ₁ f ₁ g ₁ s ₁
	a ₄ j ₁ f ₁ g ₁ s ₂
a ₄ j ₁ f ₁ g ₂ s ₁
	a ₄ j ₁ f ₁ g ₂ s ₂
a ₄ j ₁ f ₂ g ₁ s ₁
	a ₄ j ₁ f ₂ g ₁ s ₂
a ₄ j ₁ f ₂ g ₂ s ₁
	a ₄ j ₁ f ₂ g ₂ s ₂

(3) it is based on mapreduce and inquires about hbase, and calculate related joint probability distribution

Concurrently inquired about using map functiont _bnIn every a line, and witht _jdpIn all row be compared successively, fort _bnIn the first row, take out line identifier befRow race memberf ₁It is known thatt _jdpIn comprisef ₁Row havea ₄ j ₁ f ₁ g ₁ s ₁、a ₄ j ₁ f ₁ g ₁ s ₂、a ₄ j ₁ f ₁ g ₂ s ₁Witha ₄ j ₁ f ₁ g ₂ s ₂, therefore using these as key, willt _bnThe probit of middle current line 0.1 as value, with<key, value>form will<a ₄ j ₁ f ₁ g ₁ s ₁, 0.1>、<a ₄ j ₁ f ₁ g ₁ s ₂, 0.1>、<a ₄ j ₁ f ₁ g ₂ s ₁, 0.1>and<a ₄ j ₁ f ₁ g ₂ s ₂, 0.1 > storef _jdpIn.In the same way, for other in table 1 OK, corresponding<key, value>is storedf _jdpIn.

Using reduce function pair filef _jdpMiddle key identical value is multiplied, thus obtaining

p(a ₄ j ₁ f ₁ g ₁ s ₁)=0.0009,p(a ₄ j ₁ f ₁ g ₁ s ₂)=0.0013,p(a ₄ j ₁ f ₁ g ₂ s ₁)=0.0036,

p(a ₄ j ₁ f ₁ g ₂ s ₂)=0.0052,p(a ₄ j ₁ f ₂ g ₁ s ₁)=0.00036,p(a ₄ j ₁ f ₂ g ₁ s ₂)=0.00036,

p(a ₄ j ₁ f ₂ g ₂ s ₁)=0.03564,p(a ₄ j ₁ f ₂ g ₂ s ₂)=0.06237.

Calculate marginal probability distribution, obtain Posterior probability distribution.

According top(f=f ₁,a=a ₄,j=j ₁) andp(a=a ₄,j=j ₁) involved by evidence node value, query node value and hidden The combination of nodal value, the result that above reduce function is obtained adds up, and obtains

Finally give required Posterior probability distribution

Thus completing probability inference task.

Claims

1. a kind of large-scale Bayesian network parallel inference method based on mapreduce it is characterised in that: complete according to the following steps Become:

(1) distributed storage of large-scale Bayesian network

One Bayesian network is a directed acyclic graph, is expressed as g=(v, e), wherein: v={ a₁,…,a_nFor node collection Close, n is the number of g interior joint；E is the set of directed edge；Each node a_i(1≤i≤n) has a conditional probability parameter list, It is abbreviated as cpt, describe a_iFather set of node pa (a_i) to a_iImpact, comprise a_iConditional probability p (a of different values_i|pa (a_i)), pa (a_i) it is pa (a_i) value, in order to be able to efficiently carry out the probability inference of large-scale Bayesian network, first by Bayes Net storage on disk, that is, preserves two categories below information: the filiation between Bayesian network interior joint, and the condition of each node Probability parameter table；

For distributed data base hbase operating on hadoop distributed file system hdfs, hadoop master can be passed through Operating hbase, the storage of Bayesian network is it is simply that divide above-mentioned two category informations according to the form of hbase for node (namenode) Store to cloth on each back end (datanode) in hadoop platform；

It is respectively directed to each node a in g_i, by a_i、pa(a_i) and a_iConditional probability parameter list with<key, value>form, work Store the table t of hbase for a line_bnIn, wherein: a_iFor line identifier；Key is expressed as " row Praenomen=row race member " form, a_ipa(a_i) it is row Praenomen, a_ipa(a_i) for arranging race member；Value is p (a_i|pa(a_i)), for each node a_i, it is based on Mapreduce, concurrently reads a using map function_iEach of conditional probability parameter list p (a_i|pa(a_i)) value, and deposit Store up as t_bnMiddle logical form is (a_i||a_ipa(a_i)=a_ipa(a_i)||p(a_i|pa(a_i))) a line, " | | " is line identifier, key With the logical separator of value, thus, can access hbase by hadoop host node, and then support pushing away of Bayesian network Reason；

(2) parallel inference of Bayesian network

Probability calculation with posteriority probability calculation as Typical Representative, is several generic tasks of Bayesian network reasoning, and its essence is to look for The conditional probability parameter list of each node simultaneously simplifies the calculating of joint probability distribution using the conditional independence in Bayesian network, Posterior probability on Bayesian network g calculates it is simply that calculating the probability of query node value under given evidence node value situation, represents For p (q=q | e=e), wherein: e and q is respectively evidence node and query node, e ∈ v, q ∈ v, e ∩ q=φ；E and q is respectively For the value of e and q,

First, decompose probability inference task,

According to, reasoning task is converted to p (q=q, e=e) and p (e =e) this two marginal probability distributions calculating, not those hidden node in q and e are designated as v-q-e, p (q=q, e=e) is For the joint probability distribution sum under the be possible to valued combinations situation of q and e and v-q-e and only according to the condition in g Each joint probability distribution is converted to the product of series of conditional distribution by vertical property, obtains condition by inquiry hbase general Rate is distributed；P (e=e) is for e and the not joint under the be possible to valued combinations situation of those hidden node in e Probability distribution sum, the combination of q and e and the be possible to value of v-q-e interior joint stores hdfs with document form, note For t_jdp, each is combined as a line；

Then, hbase is inquired about based on mapreduce, and calculates related joint probability distribution,

Concurrently inquire about t using map function_bnIn every a line r, successively consider t_jdpIn all row, result is with < key, value > form write hdfs file f_jdpIn, wherein: key is t_jdpIn comprise r row race member row, value is r Value, i.e. a certain conditional probability value in g；Using reduce function pair file f_jdpIn<key, value>divide to by key Group, and each group is had all value multiplications of identical key, thus obtaining for q and e and each possibility of v-q-e interior joint Joint probability distribution under valued combinations situation,

Then, calculate marginal probability distribution, obtain Posterior probability distribution,

Combination according to p (q=q, e=e) and evidence node value, query node value and hidden node value involved by p (e=e), will The result that reduce function obtains adds up, thus obtaining marginal probability distribution p (q=q, e=e) and p (e=e), finally gives Required Posterior probability distribution, completes probability inference task.

2. a kind of large-scale Bayesian network parallel inference method based on mapreduce according to claim 1, its feature It is: methods described refers to the Bayesian network inference method of " credit card fraud detection ", completes according to the following steps:

(1) distributed storage of Bayesian network

T for storage " credit card fraud detection " Bayesian network_bnIn node f, g, j, a, s, concurrently read using map function Take each of the conditional probability parameter list of node value, and it is stored t with<key, value>form_bnIn, for f, if There is p (f=f₁)=0.1 and p (f=f₂)=0.9, then store hbase data base t_bnRow in table, using f as line identifier, arranges Race is (f=f₁| | 0.1) and (f=f₂| | 0.9), the t of storage " credit card fraud detection " Bayesian network_bnAs shown in table 1,

Table 1 stores the t of " credit card fraud detection " Bayesian network_bn

(2) decompose probability inference task

If known evidence node value (a=a₄, j=j₁), query node value f=f₁, reasoning task is: calculates p (f=f₁| a=a₄, J=j₁), due to, therefore by this reasoning task Be converted to p (f=f₁, a=a₄, j=j₁) and p (a=a₄, j=j₁) this two marginal probability distributions calculating, for p (f=f₁, A=a₄, j=j₁) for, g and s is hidden node, and this marginal probability distribution is exactly f=f₁, a=a₄, j=j₁, and g with s different Joint probability distribution sum under valued combinations situation；For p (a=a₄, j=j₁) for, f, g and s are hidden node, this edge Probability distribution is exactly a=a₄, j=j₁, and f, g valued combinations situation different with s under joint probability distribution sum, by these The combination of probable value stores file t_jdpIn；

Concurrently inquire about t using map function_bnIn every a line, and and t_jdpIn all row be compared successively, for t_bnIn The first row, take out line identifier be f row race member f₁It is known that t_jdpIn comprise f₁Row have a₄j₁f₁g₁s₁、a₄j₁f₁g₁s₂、 a₄j₁f₁g₂s₁And a₄j₁f₁g₂s₂, therefore using these as key, by t_bnThe probit 0.1 of middle current line as value, with <key, value>form is by<a₄j₁f₁g₁s₁,0.1>、<a₄j₁f₁g₁s₂,0.1>、<a₄j₁f₁g₂s₁, 0.1>and<a₄j₁f₁g₂s₂,0.1 > store f_jdpIn, in the same way, for other row in table 1, corresponding<key, value>is stored f_jdpIn,

Using reduce function pair file f_jdpMiddle key identical value is multiplied, thus obtaining

p(a₄j₁f₁g₁s₁)=0.0009, p (a₄j₁f₁g₁s₂)=0.0013, p (a₄j₁f₁g₂s₁)=0.0036,

p(a₄j₁f₁g₂s₂)=0.0052, p (a₄j₁f₂g₁s₁)=0.00036, p (a₄j₁f₂g₁s₂)=0.00036,

p(a₄j₁f₂g₂s₁)=0.03564, p (a₄j₁f₂g₂s₂)=0.06237；

(4) calculate marginal probability distribution, obtain Posterior probability distribution

According to p (f=f₁, a=a₄, j=j₁) and p (a=a₄, j=j₁) involved by evidence node value, query node value and hidden section The combination of point value, the result that above reduce function is obtained adds up, and obtains

Finally give required Posterior probability distribution

Thus completing probability inference task.