Java Code Examples for de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token#getCoveredText()
The following examples show how to use
de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token#getCoveredText() .
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: WordShapeExtractor.java From ambiverse-nlu with Apache License 2.0 | 6 votes |
@Override protected String getFeatureValue(Token a) { String text = a.getCoveredText(); CharsetEncoder isoEncoder = Charset.forName("ISO-8859-1").newEncoder(); text = text.replace("“", "\"").replace("„", "\"").replace("–", "-").replace("−", "-").replace("…", "...").replace("—", "-").replace("’", "'").replace("’", "'"); // here you can choose which word shape classifier you'd like to use String shape = WordShapeClassifier.wordShape(text, WordShapeClassifier.WORDSHAPECHRIS4); StringBuilder sb = new StringBuilder(); for (int i = 0, n = shape.length(); i < n; i++) { char c = shape.charAt(i); if(!isoEncoder.canEncode(c)) { sb.append("-NON-ISO-"); } else { sb.append(c); } } return sb.toString(); }
Example 2
Source File: WebannoTsv3Writer.java From webanno with Apache License 2.0 | 6 votes |
private void setTokenSentenceAddress(JCas aJCas) { int sentNMumber = 1; for (Sentence sentence : select(aJCas, Sentence.class)) { int lineNumber = 1; for (Token token : selectCovered(Token.class, sentence)) { AnnotationUnit unit = new AnnotationUnit(token.getBegin(), token.getEnd(), false, token.getCoveredText()); units.add(unit); if (lineNumber == 1) { sentenceUnits.put(unit, sentence.getCoveredText()); } unitsLineNumber.put(unit, sentNMumber + "-" + lineNumber); lineNumber++; } sentNMumber++; } }