Java Code Examples for de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token#getCoveredText()

The following examples show how to use de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token#getCoveredText() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: WordShapeExtractor.java    From ambiverse-nlu with Apache License 2.0 6 votes vote down vote up
@Override
protected String getFeatureValue(Token a) {
	String text = a.getCoveredText();		
	 
	CharsetEncoder isoEncoder = Charset.forName("ISO-8859-1").newEncoder(); 

	text = text.replace("“", "\"").replace("„", "\"").replace("–", "-").replace("−", "-").replace("…", "...").replace("—", "-").replace("’", "'").replace("’", "'");

	// here you can choose which word shape classifier you'd like to use
	String shape = WordShapeClassifier.wordShape(text, WordShapeClassifier.WORDSHAPECHRIS4);
	
	StringBuilder sb = new StringBuilder();
	for (int i = 0, n = shape.length(); i < n; i++) {
	    char c = shape.charAt(i);
	    
	    if(!isoEncoder.canEncode(c)) {
			sb.append("-NON-ISO-");
		} else {
			sb.append(c);
		}
	}
	

	return sb.toString();
}
 
Example 2
Source File: WebannoTsv3Writer.java    From webanno with Apache License 2.0 6 votes vote down vote up
private void setTokenSentenceAddress(JCas aJCas)
{
    int sentNMumber = 1;
    for (Sentence sentence : select(aJCas, Sentence.class)) {
        int lineNumber = 1;
        for (Token token : selectCovered(Token.class, sentence)) {
            AnnotationUnit unit = new AnnotationUnit(token.getBegin(), token.getEnd(), false,
                    token.getCoveredText());
            units.add(unit);
            if (lineNumber == 1) {
                sentenceUnits.put(unit, sentence.getCoveredText());
            }
            unitsLineNumber.put(unit, sentNMumber + "-" + lineNumber);
            lineNumber++;
        }
        sentNMumber++;
    }
}