org.apache.spark.ml.feature.NGram Java Examples
The following examples show how to use
org.apache.spark.ml.feature.NGram.
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: NGramBuilder.java From vn.vitk with GNU General Public License v3.0 | 6 votes |
/** * Creates a n-gram data frame from text lines. * @param lines * @return a n-gram data frame. */ DataFrame createNGramDataFrame(JavaRDD<String> lines) { JavaRDD<Row> rows = lines.map(new Function<String, Row>(){ private static final long serialVersionUID = -4332903997027358601L; @Override public Row call(String line) throws Exception { return RowFactory.create(Arrays.asList(line.split("\\s+"))); } }); StructType schema = new StructType(new StructField[] { new StructField("words", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) }); DataFrame wordDF = new SQLContext(jsc).createDataFrame(rows, schema); // build a bigram language model NGram transformer = new NGram().setInputCol("words") .setOutputCol("ngrams").setN(2); DataFrame ngramDF = transformer.transform(wordDF); ngramDF.show(10, false); return ngramDF; }
Example #2
Source File: JavaNGramExample.java From SparkDemo with MIT License | 5 votes |
public static void main(String[] args) { SparkSession spark = SparkSession .builder() .appName("JavaNGramExample") .getOrCreate(); // $example on$ List<Row> data = Arrays.asList( RowFactory.create(0, Arrays.asList("Hi", "I", "heard", "about", "Spark")), RowFactory.create(1, Arrays.asList("I", "wish", "Java", "could", "use", "case", "classes")), RowFactory.create(2, Arrays.asList("Logistic", "regression", "models", "are", "neat")) ); StructType schema = new StructType(new StructField[]{ new StructField("id", DataTypes.IntegerType, false, Metadata.empty()), new StructField( "words", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) }); Dataset<Row> wordDataFrame = spark.createDataFrame(data, schema); NGram ngramTransformer = new NGram().setN(2).setInputCol("words").setOutputCol("ngrams"); Dataset<Row> ngramDataFrame = ngramTransformer.transform(wordDataFrame); ngramDataFrame.select("ngrams").show(false); // $example off$ spark.stop(); }
Example #3
Source File: NGramConverter.java From jpmml-sparkml with GNU Affero General Public License v3.0 | 5 votes |
@Override public List<Feature> encodeFeatures(SparkMLEncoder encoder){ NGram transformer = getTransformer(); DocumentFeature documentFeature = (DocumentFeature)encoder.getOnlyFeature(transformer.getInputCol()); return Collections.singletonList(documentFeature); }
Example #4
Source File: NGramConverter.java From jpmml-sparkml with GNU Affero General Public License v3.0 | 4 votes |
public NGramConverter(NGram transformer){ super(transformer); }