Python nltk.TreebankWordTokenizer() Examples

The following are 4 code examples of nltk.TreebankWordTokenizer(). You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may also want to check out all available functions/classes of the module nltk , or try the search function

Example #1

Source File: tokenization.py From language with Apache License 2.0

5 votes

def __init__(self):
    self._word_tokenizer = nltk.TreebankWordTokenizer()
    if FLAGS.punkt_tokenizer_file is not None:
      self._sent_tokenizer = py_utils.load_pickle(FLAGS.punkt_tokenizer_file)
    else:
      self._sent_tokenizer = nltk.load("tokenizers/punkt/english.pickle")

Example #2

Source File: word.py From claf with MIT License

5 votes

def _treebank_en(self, text):
        if self.word_tokenizer is None:
            import nltk

            self.word_tokenizer = nltk.TreebankWordTokenizer()

        return [
            token.replace("''", '"').replace("``", '"')
            for token in self.word_tokenizer.tokenize(text)
        ]

Example #3

Source File: tokenizer.py From textblob-ar with MIT License

5 votes

def tokenize(self, text):

        return TreebankWordTokenizer().tokenize(text)

Example #4

Source File: text_utils.py From document-qa with Apache License 2.0

5 votes

def __init__(self):
        self.sent_tokenzier = nltk.load('tokenizers/punkt/english.pickle')
        self.word_tokenizer = nltk.TreebankWordTokenizer()