自然语言的基本概念
- Human language is a system specifically constructed to convey meaning, and is not produced by a physical manifestation of any kind.
- is a discrete / symbolic / categorical system.
word :is a signifer that maps to a signified (idea or thing).
Tasks in NLP
The goal of NLP: design algorithms to allow computers to “understand” natural language in order to perform some task.
Example Tasks
- Easy: Spell Checking, Keyword Search, Finding Synonyms
- Medium: Parsing information from websites, documents, etc.
- Hard: Machine Translation, Semantic Analysis (What is the meaning of query statement ?), Coreference, Question Answering.
- Represent Words
- The first and arguably most important common denominator across all NLP tasks.
- Earlier NLP: treats words as atomic symbols
- Now: word vectors
- need to have some notion of similarity and difference between words.
- can encode this ability in the vector themselves. (distance measure such as Jaccard, Cosine, Euclidean, etc)