Java 类名:com.alibaba.alink.operator.batch.audio.ExtractMfccFeatureBatchOp
Python 类名:ExtractMfccFeatureBatchOp

功能介绍

  • 从数据中提取 MFCC 特征。
  • 支持Alink Vector、一维或两维Alink FloatTensor格式的数据

    使用方式

    用于声学特征提取,通常与ReadAudioToTensor组件一起使用,连接在其后

    文献索引

    [1] Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE transactions on acoustics, speech, and signal processing, 1980, 28(4): 357-366.

    参数说明

    | 名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 | | —- | —- | —- | —- | —- | —- | —- |

| sampleRate | 采样率 | 采样率 | Integer | ✓ | | |

| selectedCol | 选中的列名 | 计算列对应的列名 | String | ✓ | | |

| hopTime | 相邻窗口时间间隔 | 相邻窗口时间间隔 | Double | | | 0.032 |

| numMfcc | mfcc参数 | mfcc参数 | Integer | | | 128 |

| outputCol | 输出结果列 | 输出结果列列名,可选,默认null | String | | | null |

| reservedCols | 算法保留列名 | 算法保留列 | String[] | | | null |

| windowTime | 一个窗口的时间 | 一个窗口的时间 | Double | | | 0.128 |

| numThreads | 组件多线程线程个数 | 组件多线程线程个数 | Integer | | | 1 |

代码示例

Python 代码

以下代码仅用于示意,可能需要修改部分代码或者配置环境后才能正常运行!

  1. dataDir = "https://alink-test-data.oss-cn-hangzhou.aliyuncs.com/audio";
  2. df = pd.DataFrame([
  3. ["246.wav"],
  4. ["247.wav"]
  5. ])
  6. allFiles = BatchOperator.fromDataframe(df, schemaStr='wav_file_path string')
  7. SAMPLE_RATE = 16000
  8. readOp = ReadAudioToTensorBatchOp().setRootFilePath(dataDir) \
  9. .setSampleRate(SAMPLE_RATE) \
  10. .setRelativeFilePathCol("wav_file_path") \
  11. .setOutputCol("tensor") \
  12. .linkFrom(allFiles)
  13. mfccOp = ExtractMfccFeatureBatchOp() \
  14. .setSampleRate(SAMPLE_RATE) \
  15. .setSelectedCol("tensor") \
  16. .linkFrom(readOp)
  17. mfccOp.print()

Java 代码

  1. import com.alibaba.alink.operator.batch.BatchOperator;
  2. import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
  3. import com.alibaba.alink.testutil.AlinkTestBase;
  4. import org.junit.Test;
  5. public class ExtractMfccFeatureBatchOpTest extends AlinkTestBase {
  6. @Test
  7. public void testExtractMfccFeatureBatchOp() throws Exception {
  8. String dataDir = "https://alink-test-data.oss-cn-hangzhou.aliyuncs.com/audio";
  9. String[] allFiles = {"246.wav", "247.wav"};
  10. int sampleRate = 16000;
  11. String tensorName = "tensor";
  12. String mfccName = "mfcc";
  13. String wavFile = "wav_file_path";
  14. BatchOperator source = new MemSourceBatchOp(allFiles, wavFile)
  15. .link(new ReadAudioToTensorBatchOp()
  16. .setRootFilePath(dataDir)
  17. .setSampleRate(sampleRate)
  18. .setRelativeFilePathCol(wavFile)
  19. .setDuration(2)
  20. .setOutputCol(tensorName)
  21. )
  22. .link(new ExtractMfccFeatureBatchOp()
  23. .setSelectedCol(tensorName)
  24. .setSampleRate(sampleRate)
  25. .setWindowTime(0.128)
  26. .setHopTime(0.032)
  27. .setNumMfcc(26)
  28. .setOutputCol(mfccName))
  29. .select(new String[]{wavFile, mfccName})
  30. .print();
  31. }
  32. }

运行结果

| wav_file_path | mfcc | | —- | —- |

| 246.wav | FLOAT#59,26,1#48.78127 -32.02646 12.432438 … |

| 247.wav | FLOAT#59,26,1#-50.62911 -13.844937 24.176699 … |