用户可以直接在DSW中建立实例并基于EasyNLP进行开发。以下为使用步骤:
创建DSW实例
进入 阿里云官网 → 产品 → 人工智能 → 机器学习平台 PAI (直通链接)
注册/登录账户后点击 前往控制台
选择 交互式建模(DSW),并 创建实例 
选择合适的 GPU 实例资源
镜像选择官方镜像中的pytorch:1.7PAI-gpu-py36-cu101-ubuntu18.04 或 pytorch:1.8PAI-gpu-py36-cu101-ubuntu18.04
DSW中使用EasyNLP提取特征
# 1. 读取数据wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/dev2.tsv# 2. 模型特征提取easynlp \--mode=predict \--worker_gpu=1 \--tables=dev2.tsv \--outputs=dev.pred.tsv \--input_schema=label:str:1,sid1:str:1,sid2:str:1,sent1:str:1,sent2:str:1 \--output_schema=pooler_output,first_token_output,all_hidden_outputs \--first_sequence=sent1 \--append_cols=label \--checkpoint_dir=bert-small-uncased \--micro_batch_size=32 \--sequence_length=128 \--app_name=vectorization
DSW中使用EasyNLP训练
启动实例,在Terminal运行以下代码,检测EasyNLP是否正常运行
# 1. 安装easynlp
如果环境里没有安装easynlp,需要安装一下:
pip install path-to-easynlp.whl
# 或者直接git clone代码库,然后安装
# 2. 数据准备
wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/train.tsv
wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/dev.tsv
# 3. 模型训练
DISTRIBUTED_ARGS="--nproc_per_node 2 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 6009"
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch $DISTRIBUTED_ARGS main.py \
--mode train \
--tables=tmp/train.tsv,tmp/dev.tsv \
--input_schema=label:str:1,sid1:str:1,sid2:str:1,sent1:str:1,sent2:str:1 \
--first_sequence=sent1 \
--second_sequence=sent2 \
--label_name=label \
--label_enumerate_values=0,1 \
--checkpoint_dir=./tmp/classification_model/ \
--learning_rate=3e-5 \
--epoch_num=3 \
--random_seed=42 \
--logging_steps=20 \
--save_checkpoint_steps=50 \
--sequence_length=128 \
--micro_batch_size=32 \
--app_name=text_classify \
--user_defined_parameters=' pretrain_model_name_or_path=bert-small-uncased'
evaluation步骤完成后的预期结果为:
用户也可以通过AppZoo的方式调用EasyNLP:
# 1. 安装easynlp
如果环境里没有安装easynlp,需要安装一下:
pip install path-to-easynlp.whl
# 或者直接git clone代码库,然后安装
# 2. 数据准备
wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/train.tsv
wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/classification/dev.tsv
# 3. 模型训练
easynlp \
--mode=train \
--worker_gpu=1 \
--tables=train.tsv,dev.tsv \
--input_schema=label:str:1,sid1:str:1,sid2:str:1,sent1:str:1,sent2:str:1 \
--first_sequence=sent1 \
--second_sequence=sent2 \
--label_name=label \
--label_enumerate_values=0,1 \
--checkpoint_dir=./classification_model \
--learning_rate=3e-5 \
--epoch_num=3 \
--random_seed=42 \
--save_checkpoint_steps=50 \
--sequence_length=128 \
--micro_batch_size=32 \
--app_name=text_classify \
--user_defined_parameters='pretrain_model_name_or_path=bert-small-uncased'
# 4. 模型评测
easynlp \
--mode=evaluate \
--worker_gpu=1 \
--tables=dev.tsv \
--input_schema=label:str:1,sid1:str:1,sid2:str:1,sent1:str:1,sent2:str:1 \
--first_sequence=sent1 \
--second_sequence=sent2 \
--label_name=label \
--label_enumerate_values=0,1 \
--checkpoint_dir=./classification_model \
--sequence_length=128 \
--micro_batch_size=32 \
--app_name=text_classify
# 5. 模型预测
easynlp \
--mode=predict \
--worker_gpu=1 \
--tables=dev.tsv \
--outputs=dev.pred.tsv \
--input_schema=label:str:1,sid1:str:1,sid2:str:1,sent1:str:1,sent2:str:1 \
--output_schema=predictions,probabilities,logits,output \
--append_cols=label \
--first_sequence=sent1 \
--second_sequence=sent2 \
--checkpoint_path=./classification_model \
--micro_batch_size=32 \
--sequence_length=128 \
--app_name=text_classify
预测步骤完成后,dev.pred.tsv的预期输出为:
[
