GB2431255A

GB2431255A - Anomalous behaviour detection system

Info

Publication number: GB2431255A
Application number: GB0520789A
Authority: GB
Inventors: Mark Girolami; Iain Ross Drummond; Ian D Hall
Original assignee: Memex Technology Ltd
Current assignee: Memex Technology Ltd
Priority date: 2005-10-13
Filing date: 2005-10-13
Publication date: 2007-04-18
Also published as: WO2007042837A1; GB0520789D0; GB2445142A; GB0808625D0

Abstract

A method for detecting anomalous behaviour comprises the step of: generating a non-anomalous profile of a plurality of data records calculating a first probability that one or more new data records belong to the non-anomalous profile; calculating a likelihood value, based on the first probability that the one or more new records do not belong to the non-anomalous profile. The data records may be telephone call data records. The characterising feature of the call data records may be one or more of the following: day of call, time call initiated, destination of call and duration of call. The method includes an alarm generating means. The approach of the present invention may be based on Bayesian networks. Potential applications include: telecommunications and network infrastructure monitoring for failure and intrusion, software application usage, banking, credit card, debit card, loyalty/reward card, freight/package distribution, asset tracking, insurance fraud, health-care fraud, internet transactions, intrusion detection.

Description

1 2431255 1 Anomalous behaviour detection system 3 The present invention

relates to an anomalous 4 behaviour detection system and particularly, but not exclusively, to an anomalous behaviour detection and 6 scoring system for telephone calls.

8 Each year in the telecommunications sector, 9 fraudulent transactions account for a substantial loss of annual revenue for telecom providers. The 11 detection of such fraudulent activity is an arduous 12 task and presents a significant challenge to 13 researchers and practitioners alike. This is due to 14 the nature of the telecommunications domain where a high volume of transactional call data is produced.

17 In fact, only a very small percentage of call 18 transactions are actually fraudulent and to detect 19 these in real time compounds the problem. Various solutions have been proposed, for example in 21 Detection of Fraud in Mobile Telecommunications" 22 (Shawe-Taylor, J., Howker, K. and Burge, P., 1 Information Security Technical Report 4 (1) :pp. 16- 2 28, 1999) a system comprising of rule-based and 3 artificial neural network components is developed, 4 whilst in "Signature Based methods for Data Streams" C. Cortes, and D. Pregibon (C. Cortes, and D. 6 Pregibon, Data Mining and Knowledge Discovery, 5, 7 pp 167-182, 2001) and "Detecting Fraud in the Real 8 World" (M. Cahill, D. Lambert, J. Pinheiro, and D. 9 Sun, Handbook of Massive Data Sets, pp 911-929, 2002) signature based methods are proposed.

12 Typically, these solutions have drawbacks which 13 limit the suitability to many areas which they might 14 be employed.

16 Limitations of these and other similar solutions 17 include: 18 * current systems have dependencies on additional 19 data from other sources (e.g. billing systems), such links are expensive to engineer and maintain; 21 * solutions comprising Neural Networks are not 22 necessarily sensitive to behaviour which has not 23 been "trained" - a new type of behaviour will 24 often not be detected, or simply be mislabelled by the system; 26 * the complexity of the existing underlying 27 computation is generally such that it mandates the 28 use of moderately, or extremely, expensive 29 computer hardware to fulfil the objective of processing a day of observed behaviour within a 31 day; 1 * when the modes of behaviour between the most 2 divergent elements of the observed population are 3 notionally close together (for example, for a set 4 of telephone calls from a residential area), existing systems can oscillate chaotically between 6 zero output and returning most of the input, 7 therefore, existing systems often require constant 8 "tuning" of internal parameters to limit this 9 behaviour; * presentation to an operator of the rationale for 11 a "detected" condition is often very difficult (or 12 indeed impossible) with current systems - neural 13 networks in particular do not lend themselves to 14 explaining how a result has actually been arrived at; 16 * existing systems are not easily able to model 17 and refine hypothetical patterns of behaviour, 18 which can then be immediately detected; 19 * many existing systems disregard entire classes of input on the grounds that they are too similar, 21 too expensive to process or simply do not conform 22 to some other criteria that the system requires to 23 enable processing, when often removing elements of 24 observed behaviour from the process will reduce the efficiency of the results; and 26 * the systems have limited or no ability to handle 27 new users or those with small numbers of 28 historically observed events.

According to a first aspect of the present invention 31 there is provided a method for detecting anomalous 32 behaviour comprising the steps of: 1 generating a non-anomalous profile of a 2 plurality of data records; 3 calculating a first probability that one or 4 more new data records belong to the non-anomalous profile; 6 calculating a likelihood value, based on the 7 first probability, that the one or more new data 8 records do not belong to the non-anomalous profile.

Preferably, the method further comprises the step of 11 generating an anomalous profile of a plurality of 12 anomalous data records and calculating a second 13 probability that one or more new data records belong 14 to an anomalous profile.

16 Preferably, the likelihood value is based on the 17 second probability as well as the first probability.

19 Preferably, the method further comprises the step of comparing the likelihood value to a predetermined 21 threshold value, wherein the one or more new data 22 records are classified as anomalous if the 23 likelihood value is greater than the threshold 24 value.

26 Preferably, the threshold value is calculated by 27 simulating data records according to the non- 28 anomalous profile and generating a simulated 29 distribution, the threshold value being taken from the simulated distribution.

1 Preferably, the non-anomalous profile is a non- 2 anomalous probability distribution of the plurality 3 of data records corresponding to non-anomalous 4 behaviour.

6 Preferably, the non-anomalous probability 7 distribution is generated from a function of a 8 series of feature probability distributions 9 representing characterising features of the data records.

12 Preferably, the feature probability distributions 13 are derived from a Dirichlet prior probability 14 distribution.

16 Further preferably, the feature probability 17 distributions are derived from a Dirichiet prior 18 probability distribution and the corresponding 19 multinomial likelihood.

21 Preferably, the anomalous profile is an anomalous 22 probability distribution of the plurality of 23 anomalous data records corresponding to anomalous 24 behaviour.

26 Preferably, the anomalous probability distribution 27 is generated from a function of a series of feature 28 probability distributions representing 29 characterising features of the data records.

1 Preferably, the feature probability distributions 2 are derived from a Dirichiet prior probability 3 distribution.

Further preferably, the feature probability 6 distributions are derived from a Dirichlet prior 7 probability distribution and the corresponding 8 multinomial likelihood.

Preferably, each of a plurality of users has an 11 associated plurality of data records and the 12 likelihood value is calculated for each user.

14 Preferably, the threshold value is calculated for each user.

17 Preferably, for a new user, the associated plurality 18 of data records is taken from one or more other 19 users and the non-anomalous profile is generated accordingly.

22 Preferably, the data records are call data records.

24 Preferably, the characterising features of the call data records are one or more of the following: day 26 of call, time call initiated, destination of call 27 and duration of call.

29 Preferably, the method further comprises the step of generating an alarm to alert one or more operators 31 when the one or more new data records have a 32 likelihood value above the threshold value.

2 According to a second aspect of the present 3 invention there is provided an anomalous behaviour 4 detection system comprising: a plurality of data records; 6 a non-anomalous profile generation means 7 enabled to generate a non-anomalous profile from the 8 plurality of data records; 9 a probability calculation means enabled to calculate a first probability that one or more new 11 data records belong to the non-anomalous profile; 12 a likelihood calculation means enabled to 13 calculate a likelihood value, based on the first 14 probability, that the one or more new data records do not belong to the non-anomalous profile.

17 Preferably, the system further comprises an 18 anomalous profile generation means enabled to 19 generate an anomalous profile from a plurality of anomalous data records and wherein the probability 21 calculation means is further enabled to calculate a 22 second probability that one or more new data records 23 belong to an anomalous profile.

Preferably, the likelihood value is based on the 26 second probability as well as the first probability.

28 Preferably, the system further comprises a 29 likelihood comparison means enabled to compare the likelihood value to a predetermined threshold value, 31 wherein the one or more new data records are 1 classified as anomalous if the likelihood value is 2 greater than the threshold value.

4 Preferably, the system further comprises a threshold calculation means enabled to calculate the threshold 6 value by simulating data records according to the 7 non-anomalous profile and generating a simulated 8 distribution, the threshold value being taken from 9 the simulated distribution.

11 Preferably, the non-anomalous profile is a non- 12 anomalous probability distribution of the plurality 13 of data records corresponding to non-anomalous 14 behaviour.

16 Preferably, the non-anomalous probability 17 distribution is generated from a function of a 18 series of feature probability distributions 19 representing characterising features of the data records.

22 Preferably, the feature probability distributions 23 are derived from a Dirichlet prior probability 24 distribution.

26 Further preferably, the feature probability 27 distributions are derived from a Dirichiet prior 28 probability distribution and the corresponding 29 multinomial likelihood.

31 Preferably, the anomalous profile is an anomalous 32 probability distribution of the plurality of 1 anomalous data records corresponding to anomalous 2 behaviour.

4 Preferably, the anomalous probability distribution is generated from a function of a series of feature 6 probability distributions representing 7 characterising features of the data records.

9 Preferably, the feature probability distributions are derived from a Dirichlet prior probability 11 distribution.

13 Further preferably, the feature probability 14 distributions are derived from a Dirichlet prior probability distribution and the corresponding 16 multinomial likelihood.

18 Preferably, each of a plurality of users has an 19 associated plurality of data records and the likelihood value is calculated for each user.

22 Preferably, the threshold value is calculated for 23 each user.

Preferably, for a new user, the associated plurality 26 of data records is taken from one or more other 27 users and the non-anomalous profile is generated 28 accordingly.

Preferably, the data records are call data records.

1 Preferably, the characterising features of the call 2 data records are one or more of the following: day 3 of call, time call initiated, destination of call 4 and duration of call.

6 Preferably, the system further comprises an alarm 7 generation means enabled to alert one or more 8 operators when the one or more new data records have 9 a likelihood value above the threshold value.

11 According to a third aspect of the present invention 12 there is provided a computer program product 13 directly loadable into the internal memory of a 14 digital computer comprising software code portions for performing the method of the first aspect of the 16 present invention.

18 Embodiments of the present invention will now be 19 described with reference to the accompanying drawings, in which: 22 Fig. 1 illustrates a flow diagram of an anomalous 23 behaviour detection and scoring system according to 24 the present invention.

26 The following description of the working of the

27 present invention and the associated examples are 28 made with reference to detecting anomalous behaviour 29 in telephone usage.

31 It should be appreciated that the invention can be 32 applied to detecting anomalous behaviour in other 1 applications and can be applied to any streams of 2 data originating from multiple sources where the 3 behavioural patterns may vary both between the 4 sources, and each source behaviour pattern itself may vary over time.

7 Examples of potential applications include, but are 8 not restricted to: 9 * telephone call data from mobile networks; * telecommunications and network 11 infrastructure monitoring for failure and 12 intrusion; 13 * software application usage; 14 * user internet browsing behaviour; * transaction streams for any banking, credit 16 card, debit card, loyalty/reward card or 17 similar scheme; 18 * freight/package distribution tracking; 19 * asset tracking; * insurance fraud; 21 * health-care fraud; 22 * internet transactions; and 23 * intrusion detection; Additionally, the present invention provides the 26 ability for a group of system owners to share 27 profile information, anonymously or otherwise, 28 between themselves to facilitate the identification 29 of any users who migrate between systems.

1 With reference to Fig. 1, an anomalous behaviour 2 detection and scoring system 10 is shown having a 3 non-anomalous profile generator 12 and an anomalous 4 profile generator 14.

6 The non-anomalous profile generator 12 has a non- 7 anomalous Call Data Record database 16 for non- 8 anomalous data records from user accounts. A profile 9 generator 18 generates an account profile for each user corresponding to non-anomalous usage and stores 11 the account profiles in a non-anomalous profile 12 database 20.

14 The anomalous profile generator 14 has an anomalous Call Data Record database 22 for anomalous data 16 records from identified fraudulent usage. An 17 anomalous profile generator 24 generates a 18 fraudulent profile corresponding to anomalous usage 19 and stores the fraudulent profile in an anomalous profile database 26.

22 Account profile generation can be a computationally 23 intensive exercise, dependant on the number of call 24 data records, but only requires to be performed at the outset of deployment of the system.

27 A probability calculator 28 calculates a probability 28 distribution function in respect of new data records 29 30 for a particular account profile and for the fraudulent profile.

1 A likelihood calculator 32 determines the likelihood 2 that the new data records 30 belong to the 3 particular account profile.

A threshold calculator 34 determines a particular 6 threshold value for the account profile and the 7 likelihood value is then compared to the threshold 8 value in a comparator 36. If the likelihood value is 9 greater than the threshold value then an alert 38 is generated to handle the anomalous data records 11 accordingly.

13 A new data adder 40 may then add the identified 14 anomalous data records to the call data records for the fraudulent profile.

17 If the likelihood value is less than the threshold 18 value then the new data adder may add the new data 19 records to the call data records for the account profile.

22 The account profile can then be updated at regular 23 intervals based on the new usage, thereby increasing 24 the accuracy of the account profile.

26 Furthermore, an operator defined profile 42 may be 27 added to either the anomalous or non-anomalous 28 profile database 20, 26. This allows profiles to be 29 added which are not necessarily drawn from data record history. For example, the profiles could 31 define infrequently occurring but large risk fraud 32 in call records.

2 The steps for generation of the account profile, 3 fraudulent profile, probability, likelihood value 4 and threshold value will now be described. It should be noted that the method employed in generating a 6 decision on the call data records is particularly 7 computationally efficient. For example, it is 8 possible using the present invention to process a 9 city's telephone calls, in this case approximately 4 million calls, in real-time on a relatively basic 11 desktop personal computer (of 2005 standards) 13 The approach of the present invention is based on 14 Bayesian networks. Bayesian networks are directed acyclic graphs in which nodes represent variables of 16 interest and the links represent causal influence 17 amongst the variables. To each node in a Bayesian 18 network corresponds a conditional probability, where 19 conditional variables are parents for the node.

Bayesian networks allow information to be obtained 21 about associations in the network and to observe the 22 reasons which caused a given result. This 23 information is defined by the network topology, data 24 domain knowledge and testing data. Depending on the movement direction in the network it is possible to 26 define consequences or causes of events. Bayesian 27 networks also have a high degree of flexibility and 28 ability to adapt to changes in environment. A 29 network can be expanded to accommodate additional conditional variables and relations. In case the 31 incoming data shows insignificance in a variable or 32 slight relation between variables, the network 1 complexity can be reduced, which subsequently 2 reduces the required amount of computation. At the 3 same time changes in network configuration do not 4 affect a whole network, but are conducted locally only.

7 Given a customer's telephone service usage (non- 8 anomalous) profile and a series of recent telephone 9 calls attributed to the customer account then a test is required to assess whether the logged transaction 11 data provides evidence which is sufficiently high to 12 accept that the calling activity genuinely 13 originated from the account.

Having logged a series of telephone calls, which 16 supposedly have been made by a customer then the 17 null hypothesis H0 is that the call has genuinely 18 originated from the customer's account and is 19 consistent with all previous patterns of calling behaviour from the account. The alternate hypothesis 21 H1 regarding the telephone calls is that they were 22 not made from the owner of the account but in fact 23 originate from another account which we will denote 24 asF1.

26 From classical hypothesis testing there will be a 27 rate a of TYPE I errors made for any test procedure.

29 In this case, the TYPE I errors (rejection of H0 when it is true) correspond to calls from an account 31 which are genuine being labelled as fraudulent or 32 having not originated from the customer. Given that 1 the number of telephone calls being tested in a 24 2 hour period can be in the order of tens of millions 3 then the TYPE I error rate a has to be very 4 carefully controlled and kept low to ensure that the number of calls exceeding the threshold are kept to 6 a manageable level for operators who may be required 7 to process the calls which raise alarms.

9 On the other hand the TYPE II error rate (acceptance of H0 when it is false) indicates the 11 number of deviant, and possibly fraudulent, 12 telephone calls which are classified as normal. The 13 TYPE II error rate also needs to be kept to a very 14 small level to ensure that the test is particularly sensitive to deviations from normal patterns of 16 usage which may be highly indicative of fraudulent 17 behaviour. The practical reality of such a fraud 18 detection system is that the false rejection rate 19 (TYPE I error rate) will have to be controlled.

21 Consider a number of independent and identically 22 distributed random vectors C1, C2, ..., CN denoting the 23 representation of N logged telephone calls and these 24 have a probability distribution P(C = clam) under the null hypothesis, that is they are generated from 26 a customer account am.

28 Furthermore, there is a probability distribution, 29 P(C = clfl, defining the distribution of telephone calls under the alternate hypothesis that another 31 alternate signature, possibly a fraudulent one, is 32 responsible for the generation of the phone calls.

1 Then Neyman-Pearson state that the most powerful 2 test (that which maximizes the power of the test 1- 3 /3) for a fixed significance level a is obtained by 4 using the likelihood ratio as the test statistic.

- fl1P(C=c7a1) The null hypothesis will be rejected when the value 11 of the test statistic A is smaller than Acrt such 12 that Prob(A < Acr2: Ho is true) = a.

14 The definition of the probability distributions under both H0 and H1, and the definition of Acrt to 16 set the level of significance of the tests i.e. the 17 false rejection rate will now follow.

19 Consider a population of customers, denoted by the set A, each of whom have an account with the telecom 21 provider. The m customer makes a series of Nm 22 telephone calls during a given period T, ..., T + e, 23 defined by C = [c1, c2, ..., cNm] . Where each c defines 24 the counts of the number of times that each of the events which defines a telephone call has occurred.

27 The account for the m customer will be 28 characterized by a consistent pattern of service 29 usage over a given period of time 1, ..., T - 6 from account initiation (time point 1) until 6 time 31 points prior to the set of telephone calls initiated 32 during period T, ..., T + C. This will be reflected 1 in a set of sufficient statistics, am, describing 2 the number of times, for this account, that a 3 particular event related to the initiation and 4 completion of a particular telephone service has occurred. This set of sufficient statistics, am, 6 will consist of, for example, the number of times a 7 call is initiated in the morning between 6.00 am and 8 midday, or the number of times that a call lasted 9 longer than 15 minutes given that the call was international.

12 In this example, the set of sufficient statistics, 13 am, are the counts of the number of times that 14 events are observed. For example, the number of times that a telephone number dialled falls into one 16 of the predefined categories.

18 The sufficient statistics are generated through 19 histogram binning. Histogram binning is the discretisation of continuous values, such as time, 21 or the agglomeration of a large number of discrete 22 values, such as actual telephone numbers, into a 23 smaller number of values.

A "bin" is a single point representing a range of 26 values which are covered by a discrete probability 27 value. In this example, the corpus of calls has been 28 mathematically decomposed to ascertain the boundary 29 conditions most appropriate for the bins, and the dependencies which most affect the results. These 31 results were used to define the sets of "bins" used.

1 For example, there are relevant bins for day of the 2 week, time of day, call type (essentially 3 destination), call duration and also for some 4 combinations of these. Histograms are the simplest way of visualising the values stored in the relevant 6 bins.

8 In this example, four independent sets of events, or 9 features, are used to define a simple account model.

These are: Day of Week (W); Call Start Time (S); 11 Call Destination (D); and Call Duration (L), each 12 denoted as: w e W, s E S, d E D, 1 E L. 14 It should be noted that the present invention is not in any way limited to the sets of events described 16 above nor does the assumption of statistical 17 independence between events require to be made.

18 These assumptions (independence between four events 19 defined) are used to illustrate the overall concept.

It should be appreciated that other dependent or 21 independent sets of events could be used for this 22 example and would inevitably be used in other models 23 using the present invention.

If the number of possible values for each event is 26 defined as WI, SI, IDI, and ILl (For example, 27 seven days in the week to make telephone calls then 28 WI = 7) in which case am = [a,w=l,... , IWI, am,s=1,...

29, I SI, am,d=1,... , I DI, am,l=l,... , I LI]T E I+ILI The definition for telephone calls, during the 31 period T, ..., T + e, follows in a similar manner such 1 that c = [c,w=1,... , I WI, c,s=1,... , I SI, c,d=1,...

2, IDI, c,,1=1,... ,ILI]T E 4 It would be possible to consider further conditional events such as Call Duration GIVEN Call Destination 6 and Call Destination GIVEN Start Time but for the 7 purposes of this example we will assume that the 8 each event is independent.

For the purposes of this example, the assumption is 11 made that given the customer account all call 12 related events are independent of each other i.e. 13 w s dIlIm.

The series of telephone calls, C, made during time 16 period T, ..., T + e, are made with probability 17 P(CIam), in other words this series of calls were 18 likely to have been made by the rn customer account 19 with probability P(CIam) . For the simplest case where it is assumed that each call is independent of 21 all previous calls made from the customer account 22 then: 23 P(cj,... , CN J a?fl,) = fl=1 P (cv, I a7,) Now each P(CIam) will be defined by the distribution 26 over the available features which characterize the 27 call which, in this case, will be the day that the 28 call was made (w), the time that the call was 29 initiated (s), the destination of the call (d), and the duration of the call (1) . Assuming conditional 31 independence then: 32 P(c, a) = P(l7, a7lL)P(d71 Ia.,11)P(s1 a?,l,)P(u7) 2 What is now required is a representation of each 3 account in terms of a set of parameters which define 4 each of the conditional probability distributions employed in each P(CIam) . Assuming independence of 6 the features each account, m, is then defined by the 7 following set of multinomial parameters as: 8 (otv os 9d E ro 1iJWI+fSj+IDI+ICj \ 772 17i fl-i rfl) I. J where the strictly positive parameters define the 11 multinomial distributions such that: 12 v'IWI9w Z_1ji rna - 14 SI - 1 i=i rn,i - 16 DJic1 ii -m,z - 18 vHJIil L_aj=i "m, The distribution over each of the required 21 Multinomial parameters will be defined with a 22 Dirichlet prior probability distribution which is 23 the conjugate of the Multinomial likelihood 24 such that, for example, the Start Time parameters have a Dirichlet prior distribution 26 defined as: = fl(9)n.zl 29 11Iii lt 31 where = each C,7 = 0 and r denotes the 32 Gamma function. The corresponding multinomial likelihood 33 for the start time is: P(a1IO) = P(a1JO) = 1. I1 (1,s=t* =j where a,, = . Then the marginal 6 distribution for the account based on, for 7 example Start Time alone, is: 9 I. P(a fr) = / P(a, T)P(61)d91 = (() J fl(js j (trn.=. (( ,z 1 13 - j; !F ((i) fl1 1'(am + 14 - fist m!F( ) F(am, + () 16 and this follows for the other terms P(amIc) 17 and P(amIa) . Now the specific account 18 profile is dependent on the Dirichiet parameters: i4, c:4i, and 21 as the multinomial parameters have been integrated 22 out due to the conjugacy of the Multinomial- 23 Dirichiet distributions.

The parameters of the Dirichlet can be written as a 26 product of a normalised measure over for example S 27 in the case of Start Time and a positive real value 28 that is: = >J1 rri. = 31 The values of the parameters of the Dirichlet prior 32 probabilities have a direct effect on the predictive 1 probability assigned to a series of calls given a 2 particular account. For the case where the prior 3 parameter values for all variables is set to the 4 value of one, then it is implicitly assumed that all parameter values are equally likely a priori as 6 for = 1 V i then P(O) = [(181).

8 In practical terms, given that the mh account is new 9 and has made no calls then we are assuming that all possible behaviours or modes of service usage are 11 equally likely to emerge. The form of prior 12 probability just discussed is particularly naive in 13 that given the existing population of customer 14 accounts A it ignores all the information available regarding specific characteristics of service usage 16 from the population or market segmented parts of the 17 population.

19 Therefore, in the absence of account specific information, that is a brand new account, our prior 21 should be guided by the population average signature 22 (or the part of the market segment the new customer 23 is attributed to) in which case each 24 = /L'IZT11 = /LP(SIA.) where the P(s2IA) is the probability of a call being initiated 26 at Start Time equals i (the ith start time 27 event, for example, Morning given the whole 28 population of accounts A) . The values of the 29 coefficientsp'1, will be account specific and can be identified via some form of grid search.

31 Alternatively, to obtain the scalar values for each 1 account we can employ Empirical Bayes 2 (Type II maximum likelihood) such that: = argmax 1ogP(arrc$) Now denoting 6 (i) - log P (a J 7 and 8 f"() = 82 log P(a7a) 9 then f'() = - +p)- P(s,IA) {I!(a,; + ji1P(jA)) - T'(iP(s A))] 11 = ) )kI P(c. A)2 W'(a1_ + p P( s IA)) - l"(it; P(s4 A)) 12 where W denotes the digamma function and these 13 expressions can be employed in a Newton 14 iteration: s - f'(i4) /-rn /_L f/F (,, 16 i'rn 17 for each attribute of every account.

19 We now have to consider how to assign the required probabilities to a new sequence of calls originating 21 from the accounts.

23 The posterior probability over the parameters 24 follows as: sI 26 P(9, lam. c) = (am,s + 1m) [J(s) (a =z+.L 1) J1 =i + n,z) 28 Now for N calls made during the new period T, ..., T+E 29 we require the following probability:

N

31 = JJ P(c Ian1) = P(1a,r)P(dIa1)P(sJa)P('v a) 1 where: 2 P(wjafl?) = fl=1 P(wa) P(sa.rn,) = flTl P(.sam) P(dIa7) = J[I P(d, am) 6 P(1fa,.) = fli1 P(i1ja,) 8 Now definingc andc. =>11c then: P(sar,, c) = / P(sJ8) P(O a(,1. )dll 0 = F + () rfl)( UJj f IL1 I'(('m, 4 rr) j 13 - 1 (a,,, + fl1 F(c + 4tm + - I_I8=I e,1 jj)1 F(i1, + r(. + + 16 and this follows for the other terms required, that 17 is P(1a) , P(dIa), and P(warn) to compute the 18 predictive likelihood of the series of calls 19 originating from the specific account P(CIam).

21 The non-anomalous profiles for each account comprise 22 of the sufficient statistics (counts of each event) 23 and the estimated values of the parameters of the 24 Dirichiet priors which require little storage overhead. The scoring of the series of calls amounts 26 simply to the iterated application of the Gamma 27 function in each term of equation (7) 28 defining P(CIam) Account specific thresholds are also required to 31 capitalize on the individual descriptive statistics.

32 For a given level of test significance a each 1 account will require a corresponding value 2 such that: 3 Prob(A' < : 7-b is true) = This has important practical consequences in that 6 the False Rejection rate, number of calls which are 7 actually genuine being rejected by the system as 8 inconsistent with the current profile, will be 9 controlled by this value.

11 To this end a form of Parametric Bootstrap is 12 employed by using the above predictive distributions 13 to repeatedly simulate series of calls from each 14 account, compute their associated scores and then obtain the empirical distribution of the scores.

16 These account specific empirical distributions can 17 then be used to obtain the account specific 18 threshold scores which will yield the required test 19 significance levels (i.e. TYPE I error rates) 21 As mentioned previously, the example above assumes 22 that the set of events used to describe the model 23 are independent of each other. That is, for three 24 independent variables x1, x2, x3 the probability is defined as: 26 p(x)=p(x).p(x2).p(x3) 28 The probability in the case of dependent variables 29 should be calculated as a product of conditional probabilities. For example: 31 p(x) = p(x1). p(x2 x1). p(x3 I x1, x2) 1 The present invention has a number of distinct 2 advantages over prior solutions including: 3 * There is no empirical reliance on external data 4 other than call data records, that is the system can work in isolation and no expensive 6 integration with other systems is mandated.

7 Given that many companies have a number of 8 different billing systems for different 9 services, this is not a trivial point; * the system has the ability to identify classes 11 of anomalous behaviour which have not been seen 12 before, simply because they are not what the 13 user normally does, as opposed to only being 14 able to identify patterns the system has already seen; 16 * there is immediate result scoring against new, 17 or entirely speculative, fraud patterns and no 18 reliance on updates to the system to cope with 19 the new fraud pattern; * the system has the ability to processes all 21 data records for a customer or user, thereby 22 building an accurate pattern of behaviour, 23 rather than simply looking at high-value calls 24 allowing a level of discrimination not achievable when considering only a portion of a 26 customer's behaviour; 27 * the system is also insensitive to the nature of 28 calls being made, that is, it does not matter 29 whether the calls were to or from a mobile cellular phone or an internet connection, all 31 data is processed and no pre-filtering of call 32 information is necessary; 1 * as more call data for a given account is 2 processed, the accuracy of the system improves 3 since the account signature will become more 4 and more refined; and * cost of ownership is reduced due to reduced 6 hardware costs and real-time processing.

8 Improvements and modifications may be incorporated 9 without departing from the scope of the present invention.

Claims

1 CLAIMS 3 1. A method for detecting anomalous behaviour 4 comprising the

steps of: generating a non- anomalous profile of a plurality of data records 6 calculating a first probability that one or more 7 new data records belong to the non-anomalous 8 profile; calculating a likelihood value, based on 9 the first probability, that the one or more new data records do not belong to the non-anomalous 11 profile.

13 2. A method as claimed in claim 1, further 14 comprising the step of generating an anomalous profile of a plurality of anomalous data records 16 and calculating a second probability that one or 17 more new data records belong to an anomalous 18 profile.

3. A method as claimed in claim 2, wherein the 21 likelihood value is based on the second 22 probability as well as the first probability.

24 4. A method as claimed in any of claims 1 to 3, further comprising the step of comparing the 26 likelihood value to a predetermined threshold 27 value, wherein the one or more new data records 28 are classified as anomalous if the likelihood 29 value is greater than the threshold value.

31 5. A method as claimed in any of claims 1 to 4, 32 wherein the threshold value is calculated by 1 simulating data records according to the non- 2 anomalous profile and generating a simulated 3 distribution, the threshold value being taken from 4 the simulated distribution.

6 6. A method as claimed in any of claims 1 to 5, 7 wherein the non-anomalous profile is a non- 8 anomalous probability distribution of the 9 plurality of data records corresponding to non- anomalous behaviour. 11 -

12 7. A method as claimed in claim 6, wherein the 13 non-anomalous probability distribution is 14 generated from a function of a series of feature probability distributions representing 16 characterising features of the data records.

18 8. A method as claimed in claim 7, wherein the 19 feature probability distributions are derived from a Dirichiet prior probability distribution.

22 9. A method as claimed in claim 7, wherein the 23 feature probability distributions are derived from 24 a Dirichlet prior probability distribution and the corresponding multinomial likelihood.

27 10. A method as claimed in any of claims 1 to 9, 28 wherein the anomalous profile is an anomalous 29 probability distribution of the plurality of anomalous data records corresponding to anomalous 31 behaviour.

1 11. A method as claimed in claim 10, wherein the 2 anomalous probability distribution is generated 3 from a function of a series of feature probability 4 distributions representing characterising features of the data records.

7 12. A method as claimed in claim 11, wherein the 8 feature probability distributions are derived from 9 a Dirichlet prior probability distribution.

11 13. A method as claimed in claim 11, wherein the 12 feature probability distributions are derived from 13 a Dirichlet prior probability distribution and the 14 corresponding multinomial likelihood.

16 14. A method as claimed in any of claims 1 to 13, 17 wherein each of a plurality of users has an 18 associated plurality of data records and the 19 likelihood value is calculated for each user.

21 15. A method as claimed in claim 14, wherein the 22 threshold value is calculated for each user.

24 16. A method as claimed in any of claims 1 to 15, wherein, for a new user, the associated plurality 26 of data records is taken from one or more other 27 users and the non-anomalous profile is generated 28 accordingly.

17. A method as claimed in any of claims 1 to 16, 31 wherein the data records are call data records.

1 18. A method as claimed in claim 17, wherein the 2 characterising features of the call data records 3 are one or more of the following: day of call, 4 time call initiated, destination of call and duration of call.

7 19. A method as claimed in any of claims 1 to 18, 8 further comprising the step of generating an alarm 9 to alert one or more operators when the one or more new data records have a likelihood value 11 above the threshold value.

13 20. An anomalous behaviour detection system 14 comprising: a plurality of data records; a non- anomalous profile generation means enabled to 16 generate a non-anomalous profile from the 17 plurality of data records; a probability 18 calculation means enabled to calculate a first 19 probability that one or more new data records belong to the non- anomalous profile; a likelihood 21 calculation means enabled to calculate a 22 likelihood value, based on the first probability, 23 that the one or more new data records do not 24 belong to the non-anomalous profile.

26 21. A system as claimed in claim 20, further 27 comprising an anomalous profile generation means 28 enabled to generate an anomalous profile from a 29 plurality of anomalous data records and wherein the probability calculation means is further 31 enabled to calculate a second probability that one 1 or more new data records belong to an anomalous 2 profile.

4 22. A system as claimed in claim 21, wherein the likelihood value is based on the second 6 probability as well as the first probability.

8 23. A system as claimed in any of claims 20 to 22, 9 further comprising a likelihood comparison means enabled to compare the likelihood value to a 11 predetermined threshold value, wherein the one or 12 more new data records are classified as anomalous 13 if the likelihood value is greater than the 14 threshold value.

16 24. A system as claimed in any of claims 20 to 23, 17 further comprising a threshold calculation means 18 enabled to calculate the threshold value by 19 simulating data records according to the nonanomalous profile and generating a simulated 21 distribution, the threshold value being taken from 22 the simulated distribution.

24 25. A system as claimed in any of claims 20 to 24, wherein the non-anomalous profile is a non- 26 anomalous probability distribution of the 27 plurality of data records corresponding to non- 28 anomalous behaviour.

26. A system as claimed in claim 25, wherein the 31 non-anomalous probability distribution is 32 generated from a function of a series of feature 1 probability distributions representing 2 characterising features of the data records.

4 27. A system as claimed in claim 26, wherein the feature probability distributions are derived from 6 a Dirichlet prior probability distribution.

8 28. A system as claimed in claim 26, wherein the 9 feature probability distributions are derived from a Dirichiet prior probability distribution and the 11 corresponding multinomial likelihood.

13 29. A system as claimed in any of claims 20 to 28, 14 wherein the anomalous profile is an anomalous probability distribution of the plurality of 16 anomalous data records corresponding to anomalous 17 behaviour.

19 30. A system as claimed in claim 29, wherein the anomalous probability distribution is generated 21 from a function of a series of feature probability 22 distributions representing characterising features 23 of the data records.

31. A system as claimed in claim 30, wherein the 26 feature probability distributions are derived from 27 a Dirichlet prior probability distribution.

29 32. A system as claimed in claim 30, wherein the feature probability distributions are derived from 31 a Dirichiet prior probability distribution and the 32 corresponding multinomial likelihood.

2 33. A system as claimed in any of claims 20 to 32, 3 wherein each of a plurality of users has an 4 associated plurality of data records and the likelihood value is calculated for each user.

7 34. A system as claimed in claim 33, wherein the 8 threshold value is calculated for each user.

35. A system as claimed in any of claims 20 to 34, 11 wherein, for a new user, the associated plurality 12 of data records is taken from one or more other 13 users and the non-anomalous profile is generated 14 accordingly.

16 36. A system as claimed in any of claims 20 to 35, 17 wherein the data records are call data records.

19 37. A system as claimed in claim 36, wherein the characterising features of the call data records 21 are one or more of the following: day of call, 22 time call initiated, destination of call and 23 duration of call.

38. A system as claimed in any of claims 20 to 37, 26 wherein the system further comprises an alarm 27 generation means enabled to alert one or more 28 operators when the one or more new data records 29 have a likelihood value above the threshold value.

31 39. A computer program product directly loadable 32 into the internal memory of a digital computer 1 comprising software code portions for performing 2 the method of claim 1 to 20.