NLTK是NLP一个工具包,下面介绍具体用法
    1、分词、断句

    1. from nltk.tokenize import word_tokenize,sent_tokenize
    2. #分词
    3. sentence="Hello Mr. Smith, how are you doing today? "
    4. print(word_tokenize(sentence))
    5. #['Hello', 'Mr.', 'Smith', ',', 'how', 'are', 'you', 'doing', 'today', '?']
    6. #断句
    7. paragraph="Hello Mr. Smith, how are you doing today?\
    8. The weather is great, and Python is awesome.\
    9. The sky is pinkish-blue. You shouldn't eat cardboard."
    10. print(sent_tokenize(paragraph))
    11. #['Hello Mr. Smith, how are you doing today?',
    12. # 'The weather is great, and Python is awesome.',
    13. # 'The sky is pinkish-blue.', "You shouldn't eat cardboard."]

    2、停用词

    1. from nltk.corpus import stopwords
    2. print(stopwords.words('english'))
    3. #['i', 'me', 'my', 'myself',......"won't", 'wouldn', "wouldn't"]