🚀 原文链接:https://rasa.com/docs/rasa/generating-nlu-data

NLU (Natural Language Understanding) is the part of Rasa Open Source that performs intent classification, entity extraction, and response retrieval.

NLU will take in a sentence such as “I am looking for a French restaurant in the center of town” and return structured data like:

  1. {
  2. "intent": "search_restaurant",
  3. "entities": {
  4. "cuisine": "French",
  5. "location": "center"
  6. }
  7. }

Building NLU models is hard, and building ones that are production-ready is even harder. Here are some tips for designing your NLU training data and pipeline to get the most out of your bot.

1. Conversation-Driven Development for NLU

Conversation-Driven Development (CDD) means letting real user conversations guide your development. For building a great NLU model, this means two key things:

1.1 Gather Real Data

When it comes to building out NLU training data, developers are sometimes tempted to use text generation tools or templates to quickly increase the number of training examples. This is a bad idea for two reasons:

  • First, your synthetic(人造的) data won’t look like the messages that users actually send to your assistant, so your model will underperform.
  • Second, by training and testing on synthetic data, you trick yourself into thinking that your model is actually performing well, and you won’t notice major issues.

Remember that if you use a script to generate training data, the only thing your model can learn is how to reverse-engineer the script.

To avoid these problems, it is always a good idea to collect as much real user data as possible to use as training data. Real user messages can be messy, contain typos, and be far from ‘ideal’ examples of your intents. But keep in mind that those are the messages you’re asking your model to make predictions about! Your assistant will always make mistakes initially, but the process of training & evaluating on user data will set your model up to generalize much more effectively in real-world scenarios.

1.2 Share with Test Users Early

In order to gather real data, you’re going to need real user messages. A bot developer can only come up with a limited range of examples, and users will always surprise you with what they say. This means you should share your bot with test users outside the development team as early as possible. See the full CDD guidelines for more details.