Python nltk.TreebankWordTokenizer() Examples
The following are 4
code examples of nltk.TreebankWordTokenizer().
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example.
You may also want to check out all available functions/classes of the module
nltk
, or try the search function
.
Example #1
Source File: tokenization.py From language with Apache License 2.0 | 5 votes |
def __init__(self): self._word_tokenizer = nltk.TreebankWordTokenizer() if FLAGS.punkt_tokenizer_file is not None: self._sent_tokenizer = py_utils.load_pickle(FLAGS.punkt_tokenizer_file) else: self._sent_tokenizer = nltk.load("tokenizers/punkt/english.pickle")
Example #2
Source File: word.py From claf with MIT License | 5 votes |
def _treebank_en(self, text): if self.word_tokenizer is None: import nltk self.word_tokenizer = nltk.TreebankWordTokenizer() return [ token.replace("''", '"').replace("``", '"') for token in self.word_tokenizer.tokenize(text) ]
Example #3
Source File: tokenizer.py From textblob-ar with MIT License | 5 votes |
def tokenize(self, text): return TreebankWordTokenizer().tokenize(text)
Example #4
Source File: text_utils.py From document-qa with Apache License 2.0 | 5 votes |
def __init__(self): self.sent_tokenzier = nltk.load('tokenizers/punkt/english.pickle') self.word_tokenizer = nltk.TreebankWordTokenizer()