NLTK是NLP一个工具包,下面介绍具体用法
1、分词、断句
from nltk.tokenize import word_tokenize,sent_tokenize#分词sentence="Hello Mr. Smith, how are you doing today? "print(word_tokenize(sentence))#['Hello', 'Mr.', 'Smith', ',', 'how', 'are', 'you', 'doing', 'today', '?']#断句paragraph="Hello Mr. Smith, how are you doing today?\The weather is great, and Python is awesome.\The sky is pinkish-blue. You shouldn't eat cardboard."print(sent_tokenize(paragraph))#['Hello Mr. Smith, how are you doing today?',# 'The weather is great, and Python is awesome.',# 'The sky is pinkish-blue.', "You shouldn't eat cardboard."]
2、停用词
from nltk.corpus import stopwordsprint(stopwords.words('english'))#['i', 'me', 'my', 'myself',......"won't", 'wouldn', "wouldn't"]
