org.apache.spark.mllib.stat.KernelDensity Java Examples
The following examples show how to use
org.apache.spark.mllib.stat.KernelDensity.
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: JavaKernelDensityEstimationExample.java From SparkDemo with MIT License | 6 votes |
public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("JavaKernelDensityEstimationExample"); JavaSparkContext jsc = new JavaSparkContext(conf); // $example on$ // an RDD of sample data JavaRDD<Double> data = jsc.parallelize( Arrays.asList(1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0)); // Construct the density estimator with the sample data // and a standard deviation for the Gaussian kernels KernelDensity kd = new KernelDensity().setSample(data).setBandwidth(3.0); // Find density estimates for the given values double[] densities = kd.estimate(new double[]{-1.0, 2.0, 5.0}); System.out.println(Arrays.toString(densities)); // $example off$ jsc.stop(); }
Example #2
Source File: PosteriorSummaryUtils.java From gatk-protected with BSD 3-Clause "New" or "Revised" License | 6 votes |
/** * Given a list of posterior samples, returns an estimate of the posterior mode (using * mllib kernel density estimation in {@link KernelDensity} and {@link BrentOptimizer}). * Note that estimate may be poor if number of samples is small (resulting in poor kernel density estimation), * or if posterior is not unimodal (or is sufficiently pathological otherwise). If the samples contain * {@link Double#NaN}, {@link Double#NaN} will be returned. * @param samples posterior samples, cannot be {@code null} and number of samples must be greater than 0 * @param ctx {@link JavaSparkContext} used by {@link KernelDensity} for mllib kernel density estimation */ public static double calculatePosteriorMode(final List<Double> samples, final JavaSparkContext ctx) { Utils.nonNull(samples); Utils.validateArg(samples.size() > 0, "Number of samples must be greater than zero."); //calculate sample min, max, mean, and standard deviation final double sampleMin = Collections.min(samples); final double sampleMax = Collections.max(samples); final double sampleMean = new Mean().evaluate(Doubles.toArray(samples)); final double sampleStandardDeviation = new StandardDeviation().evaluate(Doubles.toArray(samples)); //if samples are all the same or contain NaN, can simply return mean if (sampleStandardDeviation == 0. || Double.isNaN(sampleMean)) { return sampleMean; } //use Silverman's rule to set bandwidth for kernel density estimation from sample standard deviation //see https://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth final double bandwidth = SILVERMANS_RULE_CONSTANT * sampleStandardDeviation * Math.pow(samples.size(), SILVERMANS_RULE_EXPONENT); //use kernel density estimation to approximate posterior from samples final KernelDensity pdf = new KernelDensity().setSample(ctx.parallelize(samples, 1)).setBandwidth(bandwidth); //use Brent optimization to find mode (i.e., maximum) of kernel-density-estimated posterior final BrentOptimizer optimizer = new BrentOptimizer(RELATIVE_TOLERANCE, RELATIVE_TOLERANCE * (sampleMax - sampleMin)); final UnivariateObjectiveFunction objective = new UnivariateObjectiveFunction(f -> pdf.estimate(new double[] {f})[0]); //search for mode within sample range, start near sample mean final SearchInterval searchInterval = new SearchInterval(sampleMin, sampleMax, sampleMean); return optimizer.optimize(objective, GoalType.MAXIMIZE, searchInterval, BRENT_MAX_EVAL).getPoint(); }
Example #3
Source File: PosteriorSummaryUtils.java From gatk with BSD 3-Clause "New" or "Revised" License | 5 votes |
/** * Given a list of posterior samples, returns an estimate of the posterior mode (using * mllib kernel density estimation in {@link KernelDensity} and {@link BrentOptimizer}). * Note that estimate may be poor if number of samples is small (resulting in poor kernel density estimation), * or if posterior is not unimodal (or is sufficiently pathological otherwise). If the samples contain * {@link Double#NaN}, {@link Double#NaN} will be returned. * @param samples posterior samples, cannot be {@code null} and number of samples must be greater than 0 * @param ctx {@link JavaSparkContext} used by {@link KernelDensity} for mllib kernel density estimation */ public static double calculatePosteriorMode(final List<Double> samples, final JavaSparkContext ctx) { Utils.nonNull(samples); Utils.validateArg(samples.size() > 0, "Number of samples must be greater than zero."); //calculate sample min, max, mean, and standard deviation final double sampleMin = Collections.min(samples); final double sampleMax = Collections.max(samples); final double sampleMean = new Mean().evaluate(Doubles.toArray(samples)); final double sampleStandardDeviation = new StandardDeviation().evaluate(Doubles.toArray(samples)); //if samples are all the same or contain NaN, can simply return mean if (sampleStandardDeviation == 0. || Double.isNaN(sampleMean)) { return sampleMean; } //use Silverman's rule to set bandwidth for kernel density estimation from sample standard deviation //see https://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth final double bandwidth = SILVERMANS_RULE_CONSTANT * sampleStandardDeviation * Math.pow(samples.size(), SILVERMANS_RULE_EXPONENT); //use kernel density estimation to approximate posterior from samples final KernelDensity pdf = new KernelDensity().setSample(ctx.parallelize(samples, 1)).setBandwidth(bandwidth); //use Brent optimization to find mode (i.e., maximum) of kernel-density-estimated posterior final BrentOptimizer optimizer = new BrentOptimizer(RELATIVE_TOLERANCE, RELATIVE_TOLERANCE * (sampleMax - sampleMin)); final UnivariateObjectiveFunction objective = new UnivariateObjectiveFunction(f -> pdf.estimate(new double[] {f})[0]); //search for mode within sample range, start near sample mean final SearchInterval searchInterval = new SearchInterval(sampleMin, sampleMax, sampleMean); return optimizer.optimize(objective, GoalType.MAXIMIZE, searchInterval, BRENT_MAX_EVAL).getPoint(); }