cc.mallet.pipe.TokenSequence2FeatureSequence Java Examples
The following examples show how to use
cc.mallet.pipe.TokenSequence2FeatureSequence.
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: TopicModelPipe.java From baleen with Apache License 2.0 | 5 votes |
/** * Construct topic model pipe with given stopwords and alphabets * * @param stopwords to be removed * @param alphabet to use */ public TopicModelPipe(Collection<String> stopwords, Alphabet alphabet) { // @formatter:off super( ImmutableList.of( new CharSequenceLowercase(), new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")), new RemoveStopwords(stopwords), new TokenSequence2FeatureSequence(alphabet))); // @formatter:on }
Example #2
Source File: ReferencesClassifierTrainer.java From bluima with Apache License 2.0 | 5 votes |
static List<Pipe> getPipes() { List<Pipe> pipes = newArrayList(); pipes.add(new Target2Label()); pipes.add(new MyInput2RegexTokens()); // pipes.add(new PrintInputAndTarget()); pipes.add(new TokenSequence2FeatureSequence()); pipes.add(new FeatureSequence2FeatureVector()); return pipes; }
Example #3
Source File: LDA.java From topic-detection with Apache License 2.0 | 5 votes |
/** * Creates a list of Malelt instances from a list of documents * @param texts a list of documents * @return a list of Mallet instances * @throws IOException */ private InstanceList createInstanceList(List<String> texts) throws IOException { ArrayList<Pipe> pipes = new ArrayList<Pipe>(); pipes.add(new CharSequence2TokenSequence()); pipes.add(new TokenSequenceLowercase()); pipes.add(new TokenSequenceRemoveStopwords()); pipes.add(new TokenSequence2FeatureSequence()); InstanceList instanceList = new InstanceList(new SerialPipes(pipes)); instanceList.addThruPipe(new ArrayIterator(texts)); return instanceList; }