🚀 原文地址：https://rasa.com/docs/action-server/knowledge-bases

在 Rasa 框架中使用ActionQueryKnowledgeBase在对话中利用知识库的信息，知识库操作可以使得我们能够处理以下类型的对话：

对话式AI中的一个常见问题是，用户不仅通过名称来引用某些对象，而且还使用诸如“第一个”或“它”之类的引用术语。我们需要跟踪提供的信息，这些信息帮助我们将这些提及解析为正确的对象。

此外，用户可能希望在对话期间获得有关对象的详细信息，例如餐厅是否有户外座位，或者它有多贵。为了响应这些用户请求，需要有关餐厅领域的知识。由于信息可能会发生变化，因此硬编码信息不是解决方案。

为了应对上述挑战，Rasa 可以与知识库集成。要使用此集成，我们可以创建一个从ActionQueryKnowledgeBase继承的自定义操作，这是一个预先编写的自定义操作，其中包含查询知识库中对象及其属性的逻辑。

我们可以在 examples/knowledgebasebot 中找到完整的示例，以及在下面实现此自定义操作的说明。

1. 使用ActionQueryKnowledgeBase

1.1 创建知识库

用于回答用户请求的数据将存储在知识库中，知识库可以用来存储复杂的数据结构，建议开始时使用InMemoryKnowledgeBase。一旦你想开始处理大量数据，可以切换到一个定制的知识库。

要初始化InMemoryKnowledgeBase，需要在 JSON 文件中提供数据。以下示例包含有关餐厅和酒店的数据，JSON 结构应该包含每个对象类型的键，例如resuaurant和hotel。每个对象类型都映射到一个对象列表，这里我们有一个包含 3 家餐厅和 3 家酒店的列表。

{
    "restaurant": [
        {
            "id": 0,
            "name": "Donath",
            "cuisine": "Italian",
            "outside-seating": true,
            "price-range": "mid-range"
        },
        {
            "id": 1,
            "name": "Berlin Burrito Company",
            "cuisine": "Mexican",
            "outside-seating": false,
            "price-range": "cheap"
        },
        {
            "id": 2,
            "name": "I due forni",
            "cuisine": "Italian",
            "outside-seating": true,
            "price-range": "mid-range"
        }
    ],
    "hotel": [
        {
            "id": 0,
            "name": "Hilton",
            "price-range": "expensive",
            "breakfast-included": true,
            "city": "Berlin",
            "free-wifi": true,
            "star-rating": 5,
            "swimming-pool": true
        },
        {
            "id": 1,
            "name": "Hilton",
            "price-range": "expensive",
            "breakfast-included": true,
            "city": "Frankfurt am Main",
            "free-wifi": true,
            "star-rating": 4,
            "swimming-pool": false
        },
        {
            "id": 2,
            "name": "B&B",
            "price-range": "mid-range",
            "breakfast-included": false,
            "city": "Berlin",
            "free-wifi": false,
            "star-rating": 1,
            "swimming-pool": false
        },
    ]
}

一旦在 JSON 文件中定义了数据，例如 data.json，我们可以使用此数据文件创建InMemoryKnowledgeBase，它将被传递给查询知识库的操作。知识库中的每个对象都应该至少有name和id字段以使用默认实现。如果没有，则必须自定义InMemoryKnowledgeBase。

1.2 定义NLU数据

在本章节中：

我们将引入一个新的意图query_knowledge_base
我们将标注mention实体，以便模型检测到像“第一个”这样间接提及的对象
我们将广泛使用同义词

为了让机器人理解用户想要从知识库中检索信息，我们需要定义一个新的意图，这里称之为query_knowledge_base。我们可以将ActionQueryKnowledgeBase可以处理的请求分为两类：

用户希望获得特定类型的对象列表
用户希望了解对象的某个属性

意图应该包含着两个请求的多个变化：

nlu:
- intent: query_knowledge_base
  examples: |
    - what [restaurants]{"entity": "object_type", "value": "restaurant"} can you recommend?
    - list some [restaurants]{"entity": "object_type", "value": "restaurant"}
    - can you name some [restaurants]{"entity": "object_type", "value": "restaurant"} please?
    - can you show me some [restaurants]{"entity": "object_type", "value": "restaurant"} options
    - list [German](cuisine) [restaurants]{"entity": "sobject_type", "value": "restaurant"}
    - do you have any [mexican](cuisine) [restaurants]{"entity": "object_type", "value": "restaurant"}?
    - do you know the [price range]{"entity": "attribute", "value": "price-range"} of [that one](mention)?
    - what [cuisine](attribute) is [it](mention)?
    - do you know what [cuisine](attribute) the [last one]{"entity": "mention", "value": "LAST"} has?
    - does the [first one]{"entity": "mention", "value": "1"} have [outside seating]{"entity": "attribute", "value": "outside-seating"}?
    - what is the [price range]{"entity": "attribute", "value": "price-range"} of [Berlin Burrito Company](restaurant)?
    - what about [I due forni](restaurant)?
    - can you tell me the [price range](attribute) of [that restaurant](mention)?
    - what [cuisine](attribute) do [they](mention) have?

上述示例仅仅是为了展示与餐厅领域相关的例子，我们应该将知识库中存在的每种对象类型的示例添加到 query_knowledge_base意图中。

除了为每种查询类型添加各种训练示例外，还需要在训练示例中指定和注释以下实体：

obejct_type：每当训练样本引用知识库中的特定对象类型时，该对象应该标记为实体。使用同义词将restaurants映射到restaurant，正确的对象类型为知识库中的键。
mention：如果用户通过“第一个”、“那个”或“它”引用对象，则应该将这些术语标记为mention，我们还使用同义词将一些提及映射到符号。
attribute：知识库中定义的所有属性名称都应该在 NLU 数据中标识为属性。同样，使用同义词将属性名称的变体映射到知识库中使用的名称。

请记住将这些实体添加到域文件中（作为实体和插槽）：

entities:
  - object_type
  - mention
  - attribute
slots:
  object_type:
    type: any
    influence_conversation: false
    mappings:
    - type: from_entity
      entity: object_type
  mention:
    type: any
    influence_conversation: false
    mappings:
    - type: from_entity
      entity: mention
  attribute:
    type: any
    influence_conversation: false
    mappings:
    - type: from_entity
      entity: attribute

1.3 创建查询知识库的操作

要创建自己的知识库操作，需要继承ActionQueryKnowledgeBase，并将知识库传递给ActionQueryKnowledgeBase的构造函数。

from rasa_sdk.knowledge_base.storage import InMemoryKnowledgeBase
from rasa_sdk.knowledge_base.actions import ActionQueryKnowledgeBase
class MyKnowledgeBaseAction(ActionQueryKnowledgeBase):
    def __init__(self):
        knowledge_base = InMemoryKnowledgeBase("data.json")
        super().__init__(knowledge_base)

无论何时创建ActionQueryKnowledgeBase，都需要将知识库传递给构造函数，它可以是InMemoryKnowledgeBase类或者自己实现的KnowledgeBase类。因为不支持同时使用多个知识库，我们只能从一个知识库中提取信息。

以下是此操作的全部代码，操作的名称是action_query_knowledge_base，不要忘记将其添加到域文件中。

actions:
- action_query_knowledge_base

:::info 📌 备注
——————————
如果我们重写了默认操作action_query_knowledge_base，那么需要将以下三个未特征化的插槽添加到域文件中。这些插槽会在ActionQueryKnowledgeBase内部进行使用，如果我们保留了默认操作名称，则会自动添加这些插槽：

knowledge_base_objects
knowledge_base_last_object
knowledge_base_last_object_type :::

我们需要确保将故事添加到故事文件中，其中包括query_knowledge_base意图和action_query_knowledge_base操作。例如：

stories:
- story: knowledge base happy path
  steps:
  - intent: greet
  - action: utter_greet
  - intent: query_knowledge_base
  - action: action_query_knowledge_base
  - intent: goodbye
  - action: utter_goodbye

我们需要做的最后一件事是在域文件中定义响应utter_ask_rephrase，如果该操作不知道如何处理用户的请求，它将使用此响应来要求用户重新措辞：

responses:
  utter_ask_rephrase:
  - text: "Sorry, I'm not sure I understand. Could you rephrase it?"
  - text: "Could you please rephrase your message? I didn't quite get that."

添加所有相关部分后，该操作便可以查询数据库了。

2. 如何工作

ActionQueryKnowledgeBase会查看在请求中提取的实体，以及之前设置的插槽，以决定查询什么。

2.1 查询知识库中的对象

为了查询任何类型对象的知识库，用户的请求需要包含对象类型。

有这样一个例子“Can you please name some restaurants?”，此问题包含了感兴趣的对象类型restaurant。机器人需要获取该实体来指定查询，否则该操作将不知道用户对哪些对象感兴趣。

当用户说了这样的内容“What Italian restaurant options in Berlin do I have?”，用户想要获取的餐厅列表要包含两个条件，一是有意大利美食，二是坐落于 Berlin。如果命名实体识别在用户的请求种检测到这些属性，则操作将使用这些属性来过滤在知识库中找到的餐厅。

为了让那个机器人检测这些属性，我们需要在 NLU 数据中将 “Italian”和“Berlin”标记为实体：

intents:
- intent: query_knowledge_base
  examples: |
    - What [Italian](cuisine) [restaurant](object_type) options in [Berlin](city) do I have?.

属性的名称cuisine和city应该与知识库中使用的名称相同，我们需要将他们作为实体和插槽添加到域文件中。

2.2 查询知识库中的对象属性

如果用户想要获取关于某个对象的特定信息，则请求应该包括感兴趣的对象和属性。

例如，如果用户问了这样的内容“What is the cuisine of Berlin Burrito Company?”，用户想要获得 Berlin Burrito Company 餐厅（感兴趣的对象）的 “cuisine”（感兴趣的属性）。应该将感兴趣的属性和对象标记为 NLU 训练数据中的实体：

intents:
- intent: query_knowledge_base
  examples: |
    - What is the [cuisine](attribute) of [Berlin Burrito Company](restaurant)?

确保将对象类型restaurant作为实体和插槽添加到域文件中。

2.3 解决提及

按照上面的例子，用户可能并不总是用他们的名字来指代餐馆。用户可以通过名称引用感兴趣的对象，例如“Berlin Burrito Company”（对象的标识字符串），或者他们可能通过提及引用先前列出的对象，例如“What is the cuisine of the second restaurant you mentioned?”，我们的操作能够将这些提及解析为知识库中的实际对象。更具体地说，它可以解析两种提及和类型：序数（例如“第一个”）和指代（例如“它”或“那个”）。

1）序数提及

当用户通过它在列表中的位置来引用一个对象时，它被称为序数提及。以下是一个示例：

User: What restaurants in Berlin do you know?
Bot: Found the following objects of type 'restaurant': 1: I due forni 2: PastaBar 3: Berlin Burrito Company
User: Does the first one have outside seating?

用户使用“the first one”一词来指代“I due forni”，其他序数提及可能包含“第二个”、“最后一个”、“任何”或“3”。

当向用户呈现对象列表时，通常使用序号提及，为了将这些提及解析为实际对象，我们可以使用在KnowledgeBase类中设置的序数提及映射。默认映射如下所示：

{
    "1": lambda l: l[0],
    "2": lambda l: l[1],
    "3": lambda l: l[2],
    "4": lambda l: l[3],
    "5": lambda l: l[4],
    "6": lambda l: l[5],
    "7": lambda l: l[6],
    "8": lambda l: l[7],
    "9": lambda l: l[8],
    "10": lambda l: l[9],
    "ANY": lambda l: random.choice(l),
    "LAST": lambda l: l[-1],
}

序数提及映射将字符串（例如“1”）映射到列表中的对象（例如lambda l: l[0]表示索引为 0 的对象）。例如，由于序数提及映射不包含“the first one”条目，因此使用实体同义词将 NLU 数据中的“the first one”映射到“1”很重要：

intents:
- intent: query_knowledge_base
  examples: |
    - Does the [first one]{entity: "mention", value": 1} have [outside seating]{entity: "attribute", value": "outside-seating"}

命名实体识别将“first one”检测为提及实体，但将“1”放入mention插槽。因此，我们的操作可以将mention槽与序数提及映射一起使用，从而将“first one”解析为实际对象“I due forni”。

我们可以通过KnowledgeBase实现的set_oridinal_mention_mapping函数来覆写序数提及。

2）指代提及

我们来看一段下面的对话：

User: What is the cuisine of PastaBar?
Bot: PastaBar has an Italian cuisine.
User: Does it have wifi?
Bot: Yes.
User: Can you give me an address?

在“Does it have wifi?”问题中，用户通过“it”这个词来指代“PastBar”。如果命名实体识别检测到“it”作为实体提及，知识库操作会将其解析为对话中最后提到的对象“PastBar”。

在下一个输入中，用户间接引用对象“PastBar”，而不是明确提及它。知识库操作将检测用户想要获取特定属性的值，在上例中为地址。如果命名实体识别未检测到提及或对象，则该操作假定用户指的是最近提及的对象“PastBar”。

我们可以在初始化操作时，将user_last_object_mention设置为False来禁用此行为。

3. 自定义

3.1 创建ActionQueryKnowledgeBase

如果想自定义机器人对用户说的内容，我们可以覆写ActionQueryKnowledgeBase的两个函数：utter_objects和utter_attribute_value。

当用户请求对象列表时使用utter_objects，一旦机器人从知识库中检索到对象，它将默认使用一条消息响应用户，格式如下：

Found the following objects of type 'restaurant'：
1. I due forni
2. PastaBar
3. Berlin Burrito Company

或者如果没有对象被发现：

I could not find any objects of type 'restaurant'.

如果想要更改话语格式，我们可以在操作中覆写utter_objects方法。

当用户询问有关对象的特定信息时，utter_atrribute_value函数决定说什么。如果在知识库中找到感兴趣的属性，机器人将用以下话语进行响应：

'Berlin Burrito Company' has the value 'Mexican' for attribute 'cuisine'.

如果没有找到请求属性的值，机器人将响应：

Did not find a valid value for attribute 'cuisine' for object 'Berlin Burrito Company'.

如果要更改机器人的话语，我们可以覆写utter_attribute_value函数。

:::info ⌛ 备注
——————————
在此博客上有一个关于如何在自定义操作中使用知识库的教程，这里主要详细解释ActionQueryKnowledgeBase背后的实现。 :::

3.2 创建自己的知识库操作

借助于ActionQueryKnowledgeBase，我们可以轻松地将知识库集成到操作中。但是，该操作只能处理两种用户请求：

用户想要从知识库中获取对象列表
用户想要获取特定对象的属性值

该操作无法在我们的知识库中比较对象，或考虑对象之间的关系。此外，解决任何提及都采用对话中最后提及的对象，这种方式不总是最优的结果。

如果想要处理更复杂的用例，我们可以编写自己的自定义操作。Rasa SDK 提供了一些助手函数[rasa_sdk.knowledge_base.utils](https://github.com/RasaHQ/rasa-sdk/tree/main/rasa_sdk/knowledge_base/)可以帮助我们实现自己的解决方案。建议使用KnowledgeBase接口，这样在新自定义操作中仍然可以使用ActionQueryKnowledgeBase。

3.3 自定义InMemoryKnowledgeBase

InMemoryKnowledgeBase继承了KnowledgeBase，我们可以通过覆写以下函数来自定义InMemoryKnowledgeBase：

get_key_attribute_of_object：为了跟踪用户最后谈论的对象，我们将重要属性的值存储在特定的插槽中。每个对象都应该有一个唯一的主键属性，类似于关系数据库中的主键。默认情况下，每个对象类型的主键属性的名称都设置为id。通过调用set_key_attribute_of_object，覆写特定对象类型的主键属性的名称。

get_representation_function_of_object：我们看一个示例

{
  "id": 0,
  "name": "Donath",
  "cuisine": "Italian",
  "outside-seating": true,
  "price-range": "mid-range"
}

当用户要求机器人列出任何意大利餐厅时，它不需要餐厅的所有细节。相反地，他可能希望能提供一个有意义的名称来标识餐厅，在大多数情况下，对象的名称就可以了。get_representation_function_of_object函数返回一个lambda函数，该函数将上述餐厅对象映射到其名称。

lambda obj: obj["name"]

每当机器人谈论特定对象时，都会使用此功能，以便为用户提供一个有意义的对象名称。默认情况下，lambda函数返回对象name属性的值。如果对象没有name属性，或者对象的name不明确，则应该通过调用set_representation_function_of_object为该对象类型设置新的lambda函数。

set_ordinal_mention_mapping：要将序数提及（例如“second one”）解析为列表中的对象，需要使用序数提及映射。默认情况下，序数提及映射如下所示：

{
  "1": lambda l: l[0],
  "2": lambda l: l[1],
  "3": lambda l: l[2],
  "4": lambda l: l[3],
  "5": lambda l: l[4],
  "6": lambda l: l[5],
  "7": lambda l: l[6],
  "8": lambda l: l[7],
  "9": lambda l: l[8],
  "10": lambda l: l[9],
  "ANY": lambda l: random.choice(l),
  "LAST": lambda l: l[-1],
}

我们可以通过调用set_ordinal_mention_mapping函数来覆写。

有关InMemoryKnowledgeBase的示例实现，请参见示例，它使用了set_representation_function_of_object方法覆写对象类型为hotel的默认表示。

3.4 创建自己的知识库

如果有更多的数据，或者想使用更复杂的数据结构（例如涉及不同对象之间的关系），我们可以创建自己的知识库实现。只需要继承KnowledgeBase并实现get_objects、get_object以及get_attribute_of_object，知识库代码提供了关于这些方法应该做什么的更多信息。

我们还可以通过调整自定义InMemoryKnowledgeBase中的方法，进一步定制知识库，这里有一篇博客解释如何建立自己的知识库。

Python 教程

🚡知识库操作