|
| 1 | +# Anomaly Detection of Numbers of Phone Calls |
| 2 | + |
| 3 | +| ML.NET version | API type | Status | App Type | Data type | Scenario | ML Task | Algorithms | |
| 4 | +|----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------| |
| 5 | +| v5.2 | Dynamic API | Up-to-date | Console app | .csv files | Call Numbers Anomaly Detection| Time Series - Anomaly Detection | Sr Entire Anomaly Detection, Period Detection | |
| 6 | + |
| 7 | +In this introductory sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to detect **anomalies** in a series of number of calls data. In the world of machine learning, this type of task is called TimeSeries Anomaly Detection. |
| 8 | + |
| 9 | +## Problem |
| 10 | +We are having data on number of calls over 10 weeks with daily granularity. The data itself has a periodical pattern as the volumn of calls is large is weekdays and small in weekends. We want to find those points that fall out of the regular pattern of the series. In the world of machine learning, this type of task is called Time-Series anomaly detection. |
| 11 | + |
| 12 | +To solve this problem, we will build an ML model that takes as inputs: |
| 13 | +* Date |
| 14 | +* Number of calls. |
| 15 | + |
| 16 | +and outputs the anomalies in the number of calls. |
| 17 | + |
| 18 | +## Dataset |
| 19 | +We have created sample dataset for number of calls. The dataset `phone_calls.csv` can be found [here](./SrCnnEntireDetection/Data/phone_calls.csv) |
| 20 | + |
| 21 | +Format of **Phone Calls DataSet** looks like below. |
| 22 | + |
| 23 | +| timestamp | value | |
| 24 | +|--------|--------------| |
| 25 | +| 2018/9/3 | 36.69670857 | |
| 26 | +| 2018/9/4 | 35.74160571 | |
| 27 | +| ..... | ..... | |
| 28 | +| 2018/10/3 | 34.49893429 | |
| 29 | +| ... | .... | |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +The data in Phone Calls dataset is collected in real world transactions with normalization and rescale transformation. |
| 34 | + |
| 35 | +## ML task - Time Series Anomaly Detection |
| 36 | +Anomaly detection is the process of detecting outliers in the data. Anomaly detection in time-series refers to detecting time stamps, or points on a given input time-series, at which the time-series behaves differently from what was expected. These deviations are typically indicative of some events of interest in the problem domain: a cyber-attack on user accounts, power outage, bursting RPS on a server, memory leak, etc. |
| 37 | + |
| 38 | +## Solution |
| 39 | +To solve this problem, first, we should determine the period of the series. Second, we can extract the periodical component of the series and apply anomaly detection on the residual part of the series. In ML.net, we could use the detect seasonality function to find the period of a given series. Given the period, the STL algorithm decompose the time-series into three components as `Y = T + S + R`, where `Y` is the original series, `T` is the trend component, `S` is the seasonal componnent and `R` is the residual component of the series(Refer to [this](http://www.nniiem.ru/file/news/2016/stl-statistical-model.pdf) paper for more details on this algorithm). Then, SR-CNN detector is applied to detect anomaly on `R` to capture the anomalies(Refer to [this](https://arxiv.org/pdf/1906.03821.pdf) paper for more details on this algorithm). |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +Luckily, ML.net makes the process super simple as we can see in this sample. |
| 44 | + |
| 45 | +### 1. Detect Period |
| 46 | + |
| 47 | +In the first step, we invoke the `DetectSeasonality` function to obtain the period. |
| 48 | + |
| 49 | +```CSharp |
| 50 | +int period = mlContext.AnomalyDetection.DetectSeasonality(dataView, inputColumnName); |
| 51 | +``` |
| 52 | + |
| 53 | +### 2. Detect Anomaly |
| 54 | + |
| 55 | +First, we need to specify the parameters used for SrCnnEntire detector(Please refer to [here](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.timeseriescatalog.detectentireanomalybysrcnn?view=ml-dotnet#Microsoft_ML_TimeSeriesCatalog_DetectEntireAnomalyBySrCnn_Microsoft_ML_AnomalyDetectionCatalog_Microsoft_ML_IDataView_System_String_System_String_System_Double_System_Int32_System_Double_Microsoft_ML_TimeSeries_SrCnnDetectMode_) for the details on the parameters). Then, we invoke the detector and obtain a view of the output data. |
| 56 | +```CSharp |
| 57 | +var options = new SrCnnEntireAnomalyDetectorOptions() |
| 58 | +{ |
| 59 | + Threshold = 0.3, |
| 60 | + Sensitivity = 64.0, |
| 61 | + DetectMode = SrCnnDetectMode.AnomalyAndMargin, |
| 62 | + Period = period, |
| 63 | +}; |
| 64 | +var outputDataView = mlContext.AnomalyDetection.DetectEntireAnomalyBySrCnn(dataView, outputColumnName, inputColumnName, options); |
| 65 | +``` |
| 66 | + |
| 67 | +### 3. Consume results |
| 68 | +The result can be retrived by simply enumerate the result. `Anomaly`, `ExpectedValue`, `UpperBoundary` and `LowerBoundary` are some of the useful output columns. |
| 69 | + |
| 70 | +```CSharp |
| 71 | +//STEP 5: Get the detection results as an IEnumerable |
| 72 | +var predictions = mlContext.Data.CreateEnumerable<PhoneCallsPrediction>( |
| 73 | + outputDataView, reuseRowObject: false); |
| 74 | + |
| 75 | +Console.WriteLine("The anomaly detection results obtained."); |
| 76 | +var index = 0; |
| 77 | + |
| 78 | +Console.WriteLine("Index\tData\tAnomaly\tAnomalyScore\tMag\tExpectedValue\tBoundaryUnit\tUpperBoundary\tLowerBoundary"); |
| 79 | +foreach (var p in predictions) |
| 80 | +{ |
| 81 | + if (p.Prediction[0] == 1) |
| 82 | + { |
| 83 | + Console.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7} <-- alert is on, detecte anomaly", index, |
| 84 | + p.Prediction[0], p.Prediction[1], p.Prediction[2], p.Prediction[3], p.Prediction[4], p.Prediction[5], p.Prediction[6]); |
| 85 | + } |
| 86 | + else |
| 87 | + { |
| 88 | + Console.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7}", index, |
| 89 | + p.Prediction[0], p.Prediction[1], p.Prediction[2], p.Prediction[3], p.Prediction[4], p.Prediction[5], p.Prediction[6]); |
| 90 | + } |
| 91 | + ++index; |
| 92 | + |
| 93 | +} |
| 94 | + |
| 95 | +//Index Data Anomaly AnomalyScore Mag ExpectedValue BoundaryUnit UpperBoundary LowerBoundary |
| 96 | +//0,0,0,0.012431224740909462,36.841787256739266,32.92296779138513,41.14206982401966,32.541504689458876 |
| 97 | +//1,0,0,0.06732467206114204,35.67303618137362,32.92296779138513,39.97331874865401,31.372753614093227 |
| 98 | +//2,0,0,0.053027383620274836,34.710132999891826,33.06901172138514,39.029491313022824,30.390774686760828 |
| 99 | +//3,0,0,0.027326808903921952,33.44765248883495,33.215055651385136,37.786086547816545,29.10921842985335 |
| 100 | +//4,0,0,0.0074169435448767015,28.937110922276364,33.06901172138514,33.25646923540736,24.61775260914537 |
| 101 | +//5,0,0,0.01068288760963436,5.143895892785781,32.92296779138513,9.444178460066171,0.843613325505391 |
| 102 | +//6,0,0,0.02901575691006479,5.163325228419392,32.92296779138513,9.463607795699783,0.8630426611390014 |
| 103 | +//7,0,0,0.015220262187074987,36.76414836240396,32.92296779138513,41.06443092968435,32.46386579512357 |
| 104 | +//8,0,0,0.029223955855920452,35.77908590657007,32.92296779138513,40.07936847385046,31.478803339289676 |
| 105 | +//9,0,0,0.05014588266429284,34.547259536635245,32.92296779138513,38.847542103915636,30.246976969354854 |
| 106 | +//10,0,0,0.006478629327524482,33.55193524820608,33.06901172138514,37.871293561337076,29.23257693507508 |
| 107 | +//11,0,0,0.0144699438892775,29.091800129624648,32.92296779138513,33.392082696905035,24.79151756234426 |
| 108 | +//12,0,0,0.00941397738418861,5.154836630338823,32.92296779138513,9.455119197619213,0.8545540630584334 |
| 109 | +//13,0,0,0.01012680059746895,5.234332502492464,32.92296779138513,9.534615069772855,0.934049935212073 |
| 110 | +//14,0,0,0.0391359937506989,36.54992549471526,32.92296779138513,40.85020806199565,32.24964292743487 |
| 111 | +//15,0,0,0.01879091709088552,35.79526470980883,32.92296779138513,40.095547277089224,31.494982142528443 |
| 112 | +//16,0,0,0.04275209137629126,34.34099013096804,32.92296779138513,38.64127269824843,30.040707563687647 |
| 113 | +//17,0,0,0.024479312458949517,33.61201516582131,32.92296779138513,37.9122977331017,29.31173259854092 |
| 114 | +//18,0,0,0.010781906482188448,29.223563320561812,32.92296779138513,33.5238458878422,24.923280753281425 |
| 115 | +//19,0,0,0.006907498717766534,5.170512168851533,32.92296779138513,9.470794736131923,0.8702296015711433 |
| 116 | +//20,0,0,0.003183991678813579,5.2614938889462834,32.92296779138513,9.561776456226674,0.9612113216658926 |
| 117 | +//21,0,0,0.04256581040333137,36.37103858487317,32.92296779138513,40.67132115215356,32.07075601759278 |
| 118 | +//22,0,0,0.022860533704528126,35.813544599026855,32.92296779138513,40.113827166307246,31.513262031746464 |
| 119 | +//23,0,0,0.019266922707912835,34.05600492733225,32.92296779138513,38.356287494612644,29.755722360051863 |
| 120 | +//24,0,0,0.008008656062259012,33.65828319077884,32.92296779138513,37.95856575805923,29.358000623498448 |
| 121 | +//25,0,0,0.018746201354033914,29.381125690882463,32.92296779138513,33.681408258162854,25.080843123602072 |
| 122 | +//26,0,0,0.0141022037992637,5.261543539820418,32.92296779138513,9.561826107100808,0.9612609725400283 |
| 123 | +//27,0,0,0.013396001938040617,5.4873712582971805,32.92296779138513,9.787653825577571,1.1870886910167897 |
| 124 | +//28,1,0.4971326063712256,0.3521692757832201,36.504694001629254,32.92296779138513,40.804976568909645,32.20441143434886 < --alert is on, detecte anomaly |
| 125 | +``` |
0 commit comments