Part of course:
Anomaly detection refers to the technique of identifying unusual patterns and finding outliers in a set of observations. Outliers are data points that differ considerably from the remainders of the dataset. Usually, extreme values that diverge from the normal or expected behavior.
Historically statistics was applied to find and remove outliers, for example from the tails of a Gaussian distribution. The idea was that outliers which result from errors (noise, human, etc) may arise in misleading interpretations. In addition, by filtering them out, modern algorithms in supervised learning can gain in accuracy. On the other hand, the anomalies are nowadays also the object of interest, as it is the case of “rare events” in physics, medicine, business or cybersecurity.
Datasets vary in their nature, but the most typical ones are time series and spatial data. There are three main types of outliers
Example of collective outlier. The red portion of the signal is an outlier because it was at that value for a significantly longer duration than normal.
Anomalies can be studied in a binary classification scenario where training data points are labeled as "normal" or "anomalous", in a supervised learning framework where labels are of any kind and the model predictions are compared to the actual classes to check for agreement.
Anomalies can also be studied in an unsupervised learning setting, where a score function is applied over the whole dataset. Outliers are determined by setting a threshold to the scores or by taking the n samples with the largest values. All frameworks can be approached by statistical or machine learning methods, as listed below.
1: Statistical Techniques
2: Machine Learning Techniques