Java 类名:com.alibaba.alink.operator.batch.outlier.OcsvmOutlierBatchOp
Python 类名:OcsvmOutlierBatchOp

功能介绍

与传统SVM不同的是,one-class SVM是一种非监督的学习算法,经常被用来做异常点检测。在该算法的训练集中只有一类positive(或者negative)的数据,而没有(或存在极少量)另外一类,通常称其为异常点。该算法需要学习(learn)的就是边界(boundary),而不是最大间隔(maximum margin),通过边界对异常点进行预测。

参数说明

| 名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 | | —- | —- | —- | —- | —- | —- | —- |

| predictionCol | 预测结果列名 | 预测结果列名 | String | ✓ | | |

| coef0 | Kernel函数的相关参数coef0 | Kernel函数的相关参数,只有在POLY和SIGMOID时起作用。 | Double | | | 0.0 |

| degree | 多项式阶数 | 多项式的阶数,默认2 | Integer | | [1, +inf) | 2 |

| epsilon | 收敛阈值 | 迭代方法的终止判断阈值,默认值为 1.0e-6 | Double | | [0.0, +inf) | 1.0E-6 |

| featureCols | 特征列名数组 | 特征列名数组,默认全选 | String[] | | 所选列类型为 [BIGDECIMAL, BIGINTEGER, BYTE, DOUBLE, FLOAT, INTEGER, LONG, SHORT] | null |

| gamma | Kernel函数的相关参数gamma | Kernel函数的相关参数,只在 RBF, POLY 和 SIGMOID 时起作用. 如果不设置默认取 1/d,d为特征维度. | Double | | | -1.0 |

| groupCols | 分组列名数组 | 分组列名,多列,可选,默认不选 | String[] | | | null |

| kernelType | 核函数类型 | 核函数类型,可取为”RBF”,”POLY”,”SIGMOID”,”LINEAR” | String | | “RBF”, “POLY”, “SIGMOID”, “LINEAR” | “RBF” |

| maxOutlierNumPerGroup | 每组最大异常点数目 | 每组最大异常点数目 | Integer | | | |

| maxOutlierRatio | 最大异常点比例 | 算法检测异常点的最大比例 | Double | | | |

| maxSampleNumPerGroup | 每组最大样本数目 | 每组最大样本数目 | Integer | | | |

| nu | 异常点比例上界参数nu | 该参数取值范围是(0,1),该值与支持向量的数目正向相关。 | Double | | | 0.01 |

| outlierThreshold | 异常评分阈值 | 只有评分大于该阈值才会被认为是异常点 | Double | | | |

| predictionDetailCol | 预测详细信息列名 | 预测详细信息列名 | String | | | |

| tensorCol | tensor列 | tensor列 | String | | 所选列类型为 [BOOL_TENSOR, BYTE_TENSOR, DOUBLE_TENSOR, FLOAT_TENSOR, INT_TENSOR, LONG_TENSOR, STRING, STRING_TENSOR, TENSOR, UBYTE_TENSOR] | null |

| vectorCol | 向量列名 | 向量列对应的列名,默认值是null | String | | 所选列类型为 [DENSE_VECTOR, SPARSE_VECTOR, STRING, VECTOR] | null |

| numThreads | 组件多线程线程个数 | 组件多线程线程个数 | Integer | | | 1 |

代码示例

Python 代码

  1. from pyalink.alink import *
  2. import pandas as pd
  3. useLocalEnv(1)
  4. df = pd.DataFrame([
  5. [0.730967787376657,0.24053641567148587,0.6374174253501083,0.5504370051176339],
  6. [0.7308781907032909,0.41008081149220166,0.20771484130971707,0.3327170559595112],
  7. [0.7311469360199058,0.9014476240300544,0.49682259343089075,0.9858769332362016],
  8. [0.731057369148862,0.07099203475193139,0.06712000939049956,0.768156984078079],
  9. [0.7306094602878371,0.9187140138555101,0.9186071189908658,0.6795571637816596],
  10. [0.730519863614471,0.08825840967622589,0.4889045498516358,0.461837214623537],
  11. [0.7307886238322471,0.5796252073129174,0.7780122870716483,0.11499709190022733],
  12. [0.7306990420600421,0.7491696031336331,0.34830970303125697,0.8972771427421047]])
  13. # load data
  14. data = BatchOperator.fromDataframe(df, schemaStr="x1 double, x2 double, x3 double, x4 double")
  15. OcsvmOutlierBatchOp() \
  16. .setFeatureCols(["x1", "x2", "x3", "x4"]) \
  17. .setGamma(0.5) \
  18. .setNu(0.1) \
  19. .setKernelType("RBF") \
  20. .setPredictionCol("pred").linkFrom(data).print();

Java 代码

  1. package com.alibaba.alink.operator.batch.outlier;
  2. import com.alibaba.alink.operator.batch.BatchOperator;
  3. import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
  4. import org.junit.Test;
  5. public class OcsvmBatchOpTest {
  6. @Test
  7. public void testOutlier() throws Exception {
  8. BatchOperator <?> data = new MemSourceBatchOp(
  9. new Object[][] {
  10. {0.730967787376657, 0.24053641567148587, 0.6374174253501083, 0.5504370051176339},
  11. {0.7308781907032909, 0.41008081149220166, 0.20771484130971707, 0.3327170559595112},
  12. {0.7311469360199058, 0.9014476240300544, 0.49682259343089075, 0.9858769332362016},
  13. {0.731057369148862, 0.07099203475193139, 0.06712000939049956, 0.768156984078079},
  14. {0.7306094602878371, 0.9187140138555101, 0.9186071189908658, 0.6795571637816596},
  15. {0.730519863614471, 0.08825840967622589, 0.4889045498516358, 0.461837214623537},
  16. {0.7307886238322471, 0.5796252073129174, 0.7780122870716483, 0.11499709190022733},
  17. {0.7306990420600421, 0.7491696031336331, 0.34830970303125697, 0.8972771427421047}
  18. },
  19. new String[] {"x1", "x2", "x3", "x4"});
  20. new OcsvmOutlierBatchOp()
  21. .setFeatureCols("x1", "x2", "x3", "x4")
  22. .setGamma(0.5)
  23. .setNu(0.2)
  24. .setKernelType("RBF")
  25. .setPredictionCol("pred").linkFrom(data).print();
  26. }
  27. }

运行结果

| x1 | x2 | x3 | x4 | pred | | —- | —- | —- | —- | —- |

| 0.7310 | 0.2405 | 0.6374 | 0.5504 | false |

| 0.7309 | 0.4101 | 0.2077 | 0.3327 | false |

| 0.7311 | 0.9014 | 0.4968 | 0.9859 | false |

| 0.7311 | 0.0710 | 0.0671 | 0.7682 | false |

| 0.7306 | 0.9187 | 0.9186 | 0.6796 | true |

| 0.7305 | 0.0883 | 0.4889 | 0.4618 | false |

| 0.7308 | 0.5796 | 0.7780 | 0.1150 | false |

| 0.7307 | 0.7492 | 0.3483 | 0.8973 | false |