Java Code Examples for cc.mallet.types.Instance#setData()

The following examples show how to use cc.mallet.types.Instance#setData() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: CorpusRepresentationMalletTarget.java    From gateplugin-LearningFramework with GNU Lesser General Public License v2.1 6 votes vote down vote up
/**
 * Extract the independent features for a single instance annotation.
 * 
 * Extract the independent features for a single annotation according to the information
 * in the featureInfo object. The information in the featureInfo instance gets updated 
 * by this. 
 * 
 * NOTE: this method is static so that it can be used in the CorpusRepresentationMalletSeq class too.
 * 
 * @param instanceAnnotation instance annotation
 * @param inputAS input annotation set
 * @param targetFeatureName feature name of target
 * @param featureInfo feature info instance
 * @param pipe mallet pipe
 * @param nameFeature name feature
 * @return  Instance
 */
static Instance extractIndependentFeaturesHelper(
        Annotation instanceAnnotation,
        AnnotationSet inputAS,
        FeatureInfo featureInfo,
        Pipe pipe) {
  
  AugmentableFeatureVector afv = new AugmentableFeatureVector(pipe.getDataAlphabet());
  // Constructor parms: data, target, name, source
  Instance inst = new Instance(afv, null, null, null);
  for(FeatureSpecAttribute attr : featureInfo.getAttributes()) {
    FeatureExtractionMalletSparse.extractFeature(inst, attr, inputAS, instanceAnnotation);
  }
  // TODO: we destructively replace the AugmentableFeatureVector by a FeatureVector here,
  // but it is not clear if this is beneficial - our assumption is that yes.
  inst.setData(((AugmentableFeatureVector)inst.getData()).toFeatureVector());
  return inst;
}
 
Example 2
Source File: RemoveStopwords.java    From baleen with Apache License 2.0 5 votes vote down vote up
@Override
public Instance pipe(Instance carrier) {
  TokenSequence input = (TokenSequence) carrier.getData();
  TokenSequence output = new TokenSequence();
  for (int i = 0; i < input.size(); i++) {
    Token t = input.get(i);
    if (!stopwords.contains(t.getText())) {
      output.add(t);
    }
  }
  carrier.setData(output);
  return carrier;
}
 
Example 3
Source File: MalletCalculator.java    From TagRec with GNU Affero General Public License v3.0 5 votes vote down vote up
private void initializeDataStructures() {
	this.instances = new InstanceList(new StringList2FeatureSequence());
	for (Map<Integer, Integer> map : this.maps) {
		List<String> tags = new ArrayList<String>();
		for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
			for (int i = 0; i < entry.getValue(); i++) {
				tags.add(entry.getKey().toString());
			}				
		}
		Instance inst = new Instance(tags, null, null, null);
		inst.setData(tags);
		this.instances.addThruPipe(inst);
	}
}
 
Example 4
Source File: MalletCalculatorTweet.java    From TagRec with GNU Affero General Public License v3.0 5 votes vote down vote up
private void initializeDataStructures() {
    this.instances = new InstanceList(new StringList2FeatureSequence());
    for (Map<Integer, Integer> map : this.maps) {
        List<String> tags = new ArrayList<String>();
        for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
            for (int i = 0; i < entry.getValue(); i++) {
                tags.add(entry.getKey().toString());
            }               
        }
        Instance inst = new Instance(tags, null, null, null);
        inst.setData(tags);
        this.instances.addThruPipe(inst);
    }
}