Java Code Examples for com.jstarcraft.core.utility.StringUtility#SPACE
The following examples show how to use
com.jstarcraft.core.utility.StringUtility#SPACE .
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: LanguageDetector.java From jstarcraft-nlp with Apache License 2.0 | 4 votes |
/** * 检测语言 * * @param text * @param options * @return */ public SortedSet<DetectionLanguage> detectLanguages(String text, Object2BooleanMap<String> options) { SortedSet<DetectionLanguage> locales = new TreeSet<>(); // 最小长度限制 int size = text.length(); if (size < minimum) { return locales; } // 最大长度限制 if (size > maximum) { text = text.substring(0, maximum); size = maximum; } // 白名单,黑名单 Set<String> writes = options.size() == 0 ? Collections.EMPTY_SET : new HashSet<>(); Set<String> blacks = options.size() == 0 ? Collections.EMPTY_SET : new HashSet<>(); for (Object2BooleanMap.Entry<String> option : options.object2BooleanEntrySet()) { if (option.getBooleanValue()) { writes.add(option.getKey()); } else { blacks.add(option.getKey()); } } /* * Get the script which characters occur the most in `value`. */ int count = -1; String script = null; for (DetectionPattern regulation : patterns.values()) { Pattern pattern = regulation.getPattern(); Matcher matcher = pattern.matcher(text); int match = 0; while (matcher.find()) { match++; } if (match > count) { count = match; script = regulation.getName(); } } if (script == null || count <= 0) { return locales; } /* One languages exists for the most-used script. */ Set<DetectionTrie> dictionaries = tires.get(script); if (dictionaries == null) { /* * If no matches occured, such as a digit only string, or because the language is ignored, exit with `und`. */ if (!checkLanguage(script, writes, blacks)) { return locales; } locales.add(new DetectionLanguage(Locale.forLanguageTag(script), 1D)); return locales; } /* * Get all distances for a given script, and normalize the distance values. */ // 前后补空格是为了N-Gram处理 text = StringUtility.SPACE + REPLACE.matcher(text).replaceAll(StringUtility.SPACE).toLowerCase() + StringUtility.SPACE; CharacterNgram ngram = new CharacterNgram(3, text); Object2IntMap<CharSequence> tuples = new Object2IntOpenHashMap<>(); for (CharSequence character : ngram) { count = tuples.getInt(character); tuples.put(character, count + 1); } for (DetectionTrie dictionary : dictionaries) { String language = dictionary.getName(); if (checkLanguage(language, writes, blacks)) { double score = getScore(tuples, dictionary.getTrie()); DetectionLanguage locale = new DetectionLanguage(Locale.forLanguageTag(language), score); locales.add(locale); } } if (!locales.isEmpty()) { normalizeScores(text, locales); } return locales; }
Example 2
Source File: ComplexCondition.java From jstarcraft-core with Apache License 2.0 | 4 votes |
@Override public String toString() { return "(" + left + StringUtility.SPACE + operator.getOperate() + StringUtility.SPACE + right + ")"; }
Example 3
Source File: SimpleCondition.java From jstarcraft-core with Apache License 2.0 | 4 votes |
@Override public String toString() { return prefix + index + StringUtility.SPACE + operator.getOperate() + StringUtility.SPACE + (operator == AwkOperator.IN ? property : value); }