1.什么是异常检测

异常检测（Outlier Detection），顾名思义，是识别与正常数据不同的数据，与预期行为差异大的数
据。识别如信用卡欺诈，工业生产异常，网络流里的异常（网络侵入）等问题，针对的是少数的事件。

1.1异常的类别

一、异常检测概述 - 图1

1.2异常检测任务分类

一、异常检测概述 - 图2

1.3异常检测场景

故障检测
物联网异常检测
欺诈检测
工业异常检测
时间序列异常检测
视频异常检测
日志异常检测
医疗日常检测
网络入侵检测
2.异常检测常用方法
PCA算法数学实例
PCA（Principal Component Analysis）是一种常用的数据分析方法。PCA通过线性变换将原始数据变换为一组各维度线性无关的表示，可用于提取数据的主要特征分量，常用于高维数据的降维。

详细来源来源实例：http://blog.codinglabs.org/articles/pca-tutorial.html

PCA的应用实例：https://blog.csdn.net/qq_15111861/article/details/94185363

DBSCAN和LOF的异同：http://blog.sina.com.cn/s/blog_ab089a840102ylin.html

3.异常检测常用开源库

Scikit-learn

Scikit-learn是一个Python语言的开源机器学习库。它具有各种分类，回归和聚类算法。也包含了一些
异常检测算法，例如LOF和孤立森林。
官网：https://scikit-learn.org/stable/

PyOD

Python Outlier Detection（PyOD）是当下最流行的Python异常检测工具库，其主要亮点包
括：
包括近20种常见的异常检测算法，比如经典的LOF/LOCI/ABOD以及最新的深度学习如对抗生
成模型（GAN）和集成异常检测（outlier ensemble）
支持不同版本的Python：包括2.7和3.5+；支持多种操作系统：windows，macOS和Linux
简单易用且一致的API，只需要几行代码就可以完成异常检测，方便评估大量算法
使用JIT和并行化（parallelization）进行优化，加速算法运行及扩展性（scalability），可以
处理大量数据。
官网：https://pyod.readthedocs.io/en/latest/

练习

学习pyod库基本操作。

from __future__ import division
from __future__ import print_function
import os
import sys
# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
    os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))
from pyod.models.lof import LOF
from pyod.utils.data import generate_data
from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize
if __name__ == "__main__":
    contamination = 0.1  # percentage of outliers
    n_train = 200  # number of training points
    n_test = 100  # number of testing points
    # Generate sample data
    X_train, y_train, X_test, y_test = \
        generate_data(n_train=n_train,
                      n_test=n_test,
                      n_features=2,
                      contamination=contamination,
                      random_state=42)
    # train LOF detector
    clf_name = 'LOF'
    clf = LOF()
    clf.fit(X_train)
    # get the prediction labels and outlier scores of the training data
    y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
    y_train_scores = clf.decision_scores_  # raw outlier scores
    # get the prediction on the test data
    y_test_pred = clf.predict(X_test)  # outlier labels (0 or 1)
    y_test_scores = clf.decision_function(X_test)  # outlier scores
    # evaluate and print the results
    print("\nOn Training Data:")
    evaluate_print(clf_name, y_train, y_train_scores)
    print("\nOn Test Data:")
    evaluate_print(clf_name, y_test, y_test_scores)
    # visualize the results
    visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
              y_test_pred, show_figure=True, save_figure=False)

归一化的图例

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
y = np.random.randint(1,100,20)
x = np.arange(1,21)
mean = np.mean(y)
plt.figure(figsize=(10, 5))
data, = plt.plot(x,y)
meanLine, = plt.plot(x,[mean]*20,linestyle = '--')
translationLine, =plt.plot(x,y-mean)
centringLine, = plt.plot(x,[np.mean(y-mean)]*20)
plt.legend([data,meanLine,translationLine,centringLine], ["data", "mean","y-mean","centring"], loc='upper left')

一、异常检测概述