ConvLab-2
主要关注任务型对话
- 训练:对于每一个任务/模型的,会设计适用于该任务/模型的数据接口,在此基础上对所有数据集进行预处理
- 评估:针对每个数据集设计了统一的接口,格式dict
总体来说较为混乱,可以看到与选择数据集相关的if else语句,这也和该项目目前仅支持4个数据集有关
ParlAI
致力于打造构建对话系统的统一平台
- 一份数据集称为一个task,遵循提供的接口,就可以面向所有模型
- Implement
build.pyto download and build any needed data - Implement
agents.py, with at least aDefaultTeacherwhich extendsTeacheror one of its children - Add the task to the the task list
每一条数据通过Message进行包装,在agent和environment之间传递
build.py
完成build方法
- 下载存储
-
agents.py
Text files
ParlAIDialogTeacher(DialogTeacher)- data in the format of ParlAI Dialog
- example:
```
text:Sam went to the kitchen.
Pat gave Sam the milk. Where is the milk? labels:kitchen reward:1 label_candidates:hallway|kitchen|bathroom
text:Sam went to the hallway.
- key:value, seperated by tab- support attr: text(str), labels(list, concat by | in str initially), label_candidates(str, concat by | in str initially), episode_done(bool) and anything(str) you like but just text- `DialogTeacher(FixedDialogTeacher)`- an iterable with each call returning a tuple in the form `((x, y, r, c, i), new_episode?)`- 支持query, label, reward, label candidates, image and anything else (you can put it in str or iter according your format, no limit)- `x` (str) is a query and possibly context- `y` (iter) is an iterable of label(s) for that query- `r` (str) is the str reward for getting that query correct, optional- `c` (iter) is an iterable of label candidates that the student can choose from, optional- `i` (str) is a str path to an image on disk, which will be loaded by the data class at request-time. should always point to the raw image file, optional- `new_episode?` (bool) is a boolean value specifying whether that example is the start of a new episode. If you don't use episodes set this to `True` every time.\<a name="bGL12"></a>### Json`ConversationTeacher(DialogTeacher)`- jsonl```json{'possible_conversation_level_info': True,'dialog':[[{'id': 'speaker_1','text': <first utterance>,},{'id': 'speaker_2','text': <second utterance>,},...],...]...}
only support id and text in dialog
Others
ChunkTeacher: 适用于内存不够的情况- from Scratch:适用于non-fixed data等其他情况
可以通过命令行参数指定数据集的扩展选项
- ‘-t babi’ sets up the
DefaultTeacherin ‘parlai/core/tasks/babi/agents.py’. - ‘-t babi:task1k’ sets up the
Task1kTeacherin the babi/agents.py file, which allows you to specify specific settings for certain tasks. For bAbI, this refers to the setting where there are only 1000 unique training examples per task. - ‘-t babi:task1k:1’ provides 1 as a parameter to
Task1kTeacher, which is interpreted by the Task1kTeacher to mean “I want task 1” (as opposed to the 19 other bAbI tasks). - ‘-t babi,squad’ sets up the
DefaultTeacherfor both babi and squad. Any number of tasks can be chained together with commas to load up each one of them. - ‘-t #qa’ specifies the ‘qa’ category, loading up all tasks with that category in the ‘parlai/core/task_list.py’ file.
Message

- ‘-t babi’ sets up the
primary medium for information flow (messages between agents and the environment) in ParlAI
- a subclass of a python
dictcontaining the actions of an agent (observable by other agents or the environment) - The primary function of the
Messageobject is to ensure that agents do not unintentionally edit the fields within observations and actions. In order to edit the field of aMessageobject, one must callmessage.force_set(key, new_value).
