Java Code Examples for org.carrot2.core.ControllerFactory#createSimple()
The following examples show how to use
org.carrot2.core.ControllerFactory#createSimple() .
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: SavingResultsToXml.java From scava with Eclipse Public License 2.0 | 6 votes |
public static void main(String [] args) throws Exception { // Let's fetch some results from MSN first final Controller controller = ControllerFactory.createSimple(); final Map<String, Object> attributes = Maps.newHashMap(); CommonAttributesDescriptor.attributeBuilder(attributes) .documents(new ArrayList<Document>(SampleDocumentData.DOCUMENTS_DATA_MINING)) .query("data mining"); final ProcessingResult result = controller.process(attributes, LingoClusteringAlgorithm.class); // Now, we can serialize the entire result to XML like this result.serialize(System.out); System.out.println(); // Optionally, we can choose whether we want to serialize documents and clusters result.serialize(System.out, false /* don't save documents */, true /* save clusters */); }
Example 2
Source File: SavingResultsToJson.java From scava with Eclipse Public License 2.0 | 6 votes |
public static void main(String [] args) throws Exception { // Let's fetch some results from MSN first final Controller controller = ControllerFactory.createSimple(); final Map<String, Object> attributes = Maps.newHashMap(); CommonAttributesDescriptor.attributeBuilder(attributes) .documents(new ArrayList<Document>(SampleDocumentData.DOCUMENTS_DATA_MINING)) .query("data mining"); final ProcessingResult result = controller.process(attributes, LingoClusteringAlgorithm.class); // Now, we can serialize the entire result to XML like this result.serializeJson(new PrintWriter(System.out)); System.out.println(); // Optionally, we can provide a callback for JSON-P-style calls result.serializeJson( new PrintWriter(System.out), "loadResults", true /* indent */, false /* save documents */, true /* save clusters */); }
Example 3
Source File: TopicsTransMetricProvider.java From scava with Eclipse Public License 2.0 | 5 votes |
private List<Cluster> produceTopics(ArrayList<Document> documents) { /* A controller to manage the processing pipeline. */ final Controller controller = ControllerFactory.createSimple(); /* * Perform clustering by topic using the Lingo algorithm. Lingo can take * advantage of the original query, so we provide it along with the documents. */ final ProcessingResult byTopicClusters = controller.process(documents, null, LingoClusteringAlgorithm.class); final List<Cluster> clustersByTopic = byTopicClusters.getClusters(); return clustersByTopic; }
Example 4
Source File: CommitsMessageTopicsTransMetricProvider.java From scava with Eclipse Public License 2.0 | 5 votes |
private List<Cluster> produceTopics(ArrayList<Document> documents) { /* A controller to manage the processing pipeline. */ final Controller controller = ControllerFactory.createSimple(); /* * Perform clustering by topic using the Lingo algorithm. Lingo can take * advantage of the original query, so we provide it along with the documents. */ final ProcessingResult byTopicClusters = controller.process(documents, null, LingoClusteringAlgorithm.class); final List<Cluster> clustersByTopic = byTopicClusters.getClusters(); return clustersByTopic; }
Example 5
Source File: ClusteringDocumentList.java From scava with Eclipse Public License 2.0 | 4 votes |
public static void main(String [] args) { /* [[[start:clustering-document-list-intro]]] * * <div> * <p> * The easiest way to get started with Carrot2 is to cluster a collection * of {@link org.carrot2.core.Document}s. Each document can consist of: * </p> * * <ul> * <li>document content: a query-in-context snippet, document abstract or full text,</li> * <li>document title: optional, some clustering algorithms give more weight to document titles,</li> * <li>document URL: optional, used by the {@link org.carrot2.clustering.synthetic.ByUrlClusteringAlgorithm}, * ignored by other algorithms.</li> * </ul> * * <p> * To make the example short, the code shown below clusters only 5 documents. Use * at least 20 to get reasonable clusters. If you have access to the query that generated * the documents being clustered, you should also provide it to Carrot2 to get better clusters. * </p> * </div> * * [[[end:clustering-document-list-intro]]] */ { // [[[start:clustering-document-list]]] /* A few example documents, normally you would need at least 20 for reasonable clusters. */ final String [][] data = new String [] [] { { "http://en.wikipedia.org/wiki/Data_mining", "Data mining - Wikipedia, the free encyclopedia", "Article about knowledge-discovery in databases (KDD), the practice of automatically searching large stores of data for patterns." }, { "http://www.ccsu.edu/datamining/resources.html", "CCSU - Data Mining", "A collection of Data Mining links edited by the Central Connecticut State University ... Graduate Certificate Program. Data Mining Resources. Resources. Groups ..." }, { "http://www.kdnuggets.com/", "KDnuggets: Data Mining, Web Mining, and Knowledge Discovery", "Newsletter on the data mining and knowledge industries, offering information on data mining, knowledge discovery, text mining, and web mining software, courses, jobs, publications, and meetings." }, { "http://en.wikipedia.org/wiki/Data-mining", "Data mining - Wikipedia, the free encyclopedia", "Data mining is considered a subfield within the Computer Science field of knowledge discovery. ... claim to perform \"data mining\" by automating the creation ..." }, { "http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm", "Data Mining: What is Data Mining?", "Outlines what knowledge discovery, the process of analyzing data from different perspectives and summarizing it into useful information, can do and how it works." }, }; /* Prepare Carrot2 documents */ final ArrayList<Document> documents = new ArrayList<Document>(); for (String [] row : data) { documents.add(new Document(row[1], row[2], row[0])); } /* A controller to manage the processing pipeline. */ final Controller controller = ControllerFactory.createSimple(); /* * Perform clustering by topic using the Lingo algorithm. Lingo can * take advantage of the original query, so we provide it along with the documents. */ final ProcessingResult byTopicClusters = controller.process(documents, "data mining", LingoClusteringAlgorithm.class); final List<Cluster> clustersByTopic = byTopicClusters.getClusters(); /* Perform clustering by domain. In this case query is not useful, hence it is null. */ final ProcessingResult byDomainClusters = controller.process(documents, null, ByUrlClusteringAlgorithm.class); final List<Cluster> clustersByDomain = byDomainClusters.getClusters(); // [[[end:clustering-document-list]]] ConsoleFormatter.displayClusters(clustersByTopic); ConsoleFormatter.displayClusters(clustersByDomain); } }
Example 6
Source File: LoadingAttributeValuesFromXml.java From scava with Eclipse Public License 2.0 | 4 votes |
public static void main(String [] args) throws Exception { InputStream xmlStream = null; try { xmlStream = LoadingAttributeValuesFromXml.class .getResourceAsStream("algorithm-lingo-attributes.xml"); // Load attribute value sets from the XML stream final AttributeValueSets attributeValueSets = AttributeValueSets .deserialize(xmlStream); // Get the desired set of attribute values for use with further processing final Map<String, Object> defaultAttributes = attributeValueSets .getDefaultAttributeValueSet().getAttributeValues(); final Map<String, Object> fasterClusteringAttributes = attributeValueSets .getAttributeValueSet("faster-clustering").getAttributeValues(); // Perform processing using the attribute values final Controller controller = ControllerFactory.createSimple(); // Initialize the controller with one attribute set controller.init(fasterClusteringAttributes); // Perform clustering using the attribute set provided at initialization time Map<String, Object> requestAttributes = Maps.newHashMap(); CommonAttributesDescriptor.attributeBuilder(requestAttributes) .documents(Lists.newArrayList(SampleDocumentData.DOCUMENTS_DATA_MINING)) .query("data mining"); ProcessingResult results = controller.process(requestAttributes, LingoClusteringAlgorithm.class); ConsoleFormatter.displayClusters(results.getClusters()); // Perform clustering using some other attribute set, in this case the // one that is the default in the XML file. requestAttributes = CommonAttributesDescriptor.attributeBuilder(Maps.newHashMap(defaultAttributes)) .documents(Lists.newArrayList(SampleDocumentData.DOCUMENTS_DATA_MINING)) .query("data mining").map; results = controller.process(requestAttributes, LingoClusteringAlgorithm.class); ConsoleFormatter.displayClusters(results.getClusters()); } finally { CloseableUtils.close(xmlStream); } }