org.apache.lucene.analysis.charfilter.MappingCharFilter Java Examples
The following examples show how to use
org.apache.lucene.analysis.charfilter.MappingCharFilter.
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: TestSimplePatternTokenizer.java From lucene-solr with Apache License 2.0 | 6 votes |
public void testOffsetCorrection() throws Exception { final String INPUT = "Günther Günther is here"; // create MappingCharFilter List<String> mappingRules = new ArrayList<>(); mappingRules.add( "\"ü\" => \"ü\"" ); NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); builder.add("ü", "ü"); NormalizeCharMap normMap = builder.build(); CharFilter charStream = new MappingCharFilter( normMap, new StringReader(INPUT)); // create SimplePatternTokenizer Tokenizer stream = new SimplePatternTokenizer("Günther"); stream.setReader(charStream); assertTokenStreamContents(stream, new String[] { "Günther", "Günther" }, new int[] { 0, 13 }, new int[] { 12, 25 }, INPUT.length()); }
Example #2
Source File: TestSimplePatternSplitTokenizer.java From lucene-solr with Apache License 2.0 | 6 votes |
public void testOffsetCorrection() throws Exception { final String INPUT = "Günther Günther is here"; // create MappingCharFilter List<String> mappingRules = new ArrayList<>(); mappingRules.add( "\"ü\" => \"ü\"" ); NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); builder.add("ü", "ü"); NormalizeCharMap normMap = builder.build(); CharFilter charStream = new MappingCharFilter( normMap, new StringReader(INPUT)); // create SimplePatternSplitTokenizer Tokenizer stream = new SimplePatternSplitTokenizer("Günther"); stream.setReader(charStream); assertTokenStreamContents(stream, new String[] { " ", " is here" }, new int[] { 12, 25 }, new int[] { 13, 33 }, INPUT.length()); }
Example #3
Source File: TestPathHierarchyTokenizer.java From lucene-solr with Apache License 2.0 | 5 votes |
public void testNormalizeWinDelimToLinuxDelim() throws Exception { NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); builder.add("\\", "/"); NormalizeCharMap normMap = builder.build(); String path = "c:\\a\\b\\c"; Reader cs = new MappingCharFilter(normMap, new StringReader(path)); PathHierarchyTokenizer t = new PathHierarchyTokenizer(newAttributeFactory(), DEFAULT_DELIMITER, DEFAULT_DELIMITER, DEFAULT_SKIP); t.setReader(cs); assertTokenStreamContents(t, new String[]{"c:", "c:/a", "c:/a/b", "c:/a/b/c"}, new int[]{0, 0, 0, 0}, new int[]{2, 4, 6, 8}, new int[]{1, 0, 0, 0}, path.length()); }
Example #4
Source File: MappingCharFilterFactory.java From Elasticsearch with Apache License 2.0 | 4 votes |
@Override public Reader create(Reader tokenStream) { return new MappingCharFilter(normMap, tokenStream); }
Example #5
Source File: MappingCharFilterFactory.java From crate with Apache License 2.0 | 4 votes |
@Override public Reader create(Reader tokenStream) { return new MappingCharFilter(normMap, tokenStream); }