自然语言的基本概念

  • Human language is a system specifically constructed to convey meaning, and is not produced by a physical manifestation of any kind.
    • is a discrete / symbolic / categorical system.
  • word :is a signifer that maps to a signified (idea or thing).

    Tasks in NLP

    The goal of NLP: design algorithms to allow computers to “understand” natural language in order to perform some task.

  • Example Tasks

    • Easy: Spell Checking, Keyword Search, Finding Synonyms
    • Medium: Parsing information from websites, documents, etc.
    • Hard: Machine Translation, Semantic Analysis (What is the meaning of query statement ?), Coreference, Question Answering.
  • Represent Words
    • The first and arguably most important common denominator across all NLP tasks.
    • Earlier NLP: treats words as atomic symbols
    • Now: word vectors
      • need to have some notion of similarity and difference between words.
      • can encode this ability in the vector themselves. (distance measure such as Jaccard, Cosine, Euclidean, etc)