NLP训练&落地的标准流程

代码示例
数据下载
if [ ! -f ./nlu_train.csv ]; then wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/knowledge_nlu/nlu_train.csvfiif [ ! -f ./nlu_dev.csv ]; then wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/knowledge_nlu/nlu_dev.csvfi
标准Finetune测试脚本
easynlp \ --mode=train \ --worker_count=1 \ --worker_gpu=1 \ --tables=nlu_train.csv,nlu_dev.csv \ --input_schema=label:str:1,text1:str:1,text2:str:1 \ --first_sequence=text1 \ --second_sequence=text2 \ --label_name=label \ --label_enumerate_values=0,1 \ --checkpoint_dir=./base_model \ --learning_rate=3e-5 \ --epoch_num=3 \ --random_seed=42 \ --save_checkpoint_steps=200 \ --sequence_length=128 \ --micro_batch_size=32 \ --app_name=text_match \ --user_defined_parameters="pretrain_model_name_or_path=bert-base-chinese"
知识增强预训练测试脚本
easynlp \ --mode=train \ --worker_count=1 \ --worker_gpu=1 \ --tables=nlu_train.csv,nlu_dev.csv \ --input_schema=label:str:1,text1:str:1,text2:str:1 \ --first_sequence=text1 \ --second_sequence=text2 \ --label_name=label \ --label_enumerate_values=0,1 \ --checkpoint_dir=./dkplm_model \ --learning_rate=3e-5 \ --epoch_num=3 \ --random_seed=42 \ --save_checkpoint_steps=200 \ --sequence_length=128 \ --micro_batch_size=32 \ --app_name=text_match \ --user_defined_parameters="pretrain_model_name_or_path=alibaba-pai/pai-dkplm-medical-base-zh"
知识蒸馏测试脚本:
# forward teacher logitseasynlp \ --mode=predict \ --worker_count=1 \ --worker_gpu=1 \ --tables=nlu_train.csv \ --outputs=logits_pred.csv \ --input_schema=label:str:1,text1:str:1,text2:str:1 \ --output_schema=logits \ --first_sequence=text1 \ --second_sequence=text2 \ --checkpoint_path=./dkplm_model \ --micro_batch_size=32 \ --sequence_length=128 \ --app_name=text_matcheasynlp \ --app_name=data_augmentation \ --worker_count=1 \ --worker_gpu=1 \ --mode=predict \ --tables=nlu_train.csv \ --input_schema=label:str:1,text1:str:1,text2:str:1 \ --first_sequence=text1 \ --second_sequence=text2 \ --label_name =label \ --outputs=nlu_aug.tsv \ --output_schema=augmented_data \ --checkpoint_dir=_ \ --sequence_length=128 \ --micro_batch_size=32 \ --user_defined_parameters=" pretrain_model_name_or_path=alibaba-pai/pai-dkplm-medical-base-zh type=mlm_da expansion_rate=10 mask_proportion=0.1 remove_blanks=True "# train student
小样本学习测试脚本
详见:https://www.yuque.com/easyx/easynlp/ochmnf