org.jsoup.select.NodeTraversor Java Examples
The following examples show how to use
org.jsoup.select.NodeTraversor.
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: PatentDocument.java From act with GNU General Public License v3.0 | 6 votes |
private static List<String> extractTextFromHTML(DocumentBuilder docBuilder, NodeList textNodes) throws ParserConfigurationException, TransformerConfigurationException, TransformerException, XPathExpressionException { List<String> allTextList = new ArrayList<>(0); if (textNodes != null) { for (int i = 0; i < textNodes.getLength(); i++) { Node n = textNodes.item(i); /* This extremely around-the-horn approach to handling text content is due to the mix of HTML and * XML in the patent body. We use Jsoup to parse the HTML entities we find in the body, and use * its extremely convenient NodeVisitor API to recursively traverse the document and extract the * text content in reasonable chunks. */ Document contentsDoc = Util.nodeToDocument(docBuilder, "body", n); String docText = Util.documentToString(contentsDoc); // With help from http://stackoverflow.com/questions/832620/stripping-html-tags-in-java org.jsoup.nodes.Document htmlDoc = Jsoup.parse(docText); HtmlVisitor visitor = new HtmlVisitor(); NodeTraversor traversor = new NodeTraversor(visitor); traversor.traverse(htmlDoc); List<String> textSegments = visitor.getTextContent(); allTextList.addAll(textSegments); } } return allTextList; }
Example #2
Source File: HtmlToPlainText.java From astor with GNU General Public License v2.0 | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor.traverse(formatter, element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #3
Source File: DocumentFragmentBuilder.java From storm-crawler with Apache License 2.0 | 5 votes |
public static DocumentFragment fromJsoup( org.jsoup.nodes.Document jsoupDocument) { HTMLDocumentImpl htmlDoc = new HTMLDocumentImpl(); htmlDoc.setErrorChecking(false); DocumentFragment fragment = htmlDoc.createDocumentFragment(); org.jsoup.nodes.Element rootEl = jsoupDocument.child(0); // skip the // #root node NodeTraversor.traverse(new W3CBuilder(htmlDoc, fragment), rootEl); return fragment; }
Example #4
Source File: Node.java From jsoup-learning with MIT License | 5 votes |
/** * Perform a depth-first traversal through this node and its descendants. * @param nodeVisitor the visitor callbacks to perform on each node * @return this node, for chaining */ public Node traverse(NodeVisitor nodeVisitor) { Validate.notNull(nodeVisitor); NodeTraversor traversor = new NodeTraversor(nodeVisitor); traversor.traverse(this); return this; }
Example #5
Source File: HtmlToPlainText.java From jsoup-learning with MIT License | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor traversor = new NodeTraversor(formatter); traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #6
Source File: HtmlToPlainText.java From astor with GNU General Public License v2.0 | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor.traverse(formatter, element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #7
Source File: W3CDom.java From astor with GNU General Public License v2.0 | 5 votes |
/** * Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document * before converting. * @param in jsoup doc * @param out w3c doc * @see org.jsoup.helper.W3CDom#fromJsoup(org.jsoup.nodes.Document) */ public void convert(org.jsoup.nodes.Document in, Document out) { if (!StringUtil.isBlank(in.location())) out.setDocumentURI(in.location()); org.jsoup.nodes.Element rootEl = in.child(0); // skip the #root node NodeTraversor.traverse(new W3CBuilder(out), rootEl); }
Example #8
Source File: HtmlToPlainText.java From intellij-quarkus with Eclipse Public License 2.0 | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor traversor = new NodeTraversor(formatter); traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #9
Source File: W3CDom.java From astor with GNU General Public License v2.0 | 5 votes |
/** * Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document * before converting. * @param in jsoup doc * @param out w3c doc * @see org.jsoup.helper.W3CDom#fromJsoup(org.jsoup.nodes.Document) */ public void convert(org.jsoup.nodes.Document in, Document out) { if (!StringUtil.isBlank(in.location())) out.setDocumentURI(in.location()); org.jsoup.nodes.Element rootEl = in.child(0); // skip the #root node NodeTraversor.traverse(new W3CBuilder(out), rootEl); }
Example #10
Source File: Node.java From astor with GNU General Public License v2.0 | 5 votes |
/** * Perform a depth-first traversal through this node and its descendants. * @param nodeVisitor the visitor callbacks to perform on each node * @return this node, for chaining */ public Node traverse(NodeVisitor nodeVisitor) { Validate.notNull(nodeVisitor); NodeTraversor traversor = new NodeTraversor(nodeVisitor); traversor.traverse(this); return this; }
Example #11
Source File: HtmlToPlainText.java From firebase-android-sdk with Apache License 2.0 | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor.traverse(formatter, element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #12
Source File: W3CDom.java From astor with GNU General Public License v2.0 | 5 votes |
/** * Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document * before converting. * @param in jsoup doc * @param out w3c doc * @see org.jsoup.helper.W3CDom#fromJsoup(org.jsoup.nodes.Document) */ public void convert(org.jsoup.nodes.Document in, Document out) { if (!StringUtil.isBlank(in.location())) out.setDocumentURI(in.location()); org.jsoup.nodes.Element rootEl = in.child(0); // skip the #root node NodeTraversor traversor = new NodeTraversor(new W3CBuilder(out)); traversor.traverse(rootEl); }
Example #13
Source File: HtmlToPlainText.java From lemminx with Eclipse Public License 2.0 | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor traversor = new NodeTraversor(formatter); traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #14
Source File: HtmlToPlainText.java From astor with GNU General Public License v2.0 | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor traversor = new NodeTraversor(formatter); traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #15
Source File: HtmlToPlainText.java From eclipse.jdt.ls with Eclipse Public License 2.0 | 5 votes |
/** * Format an Element to plain-text * @param element the root element to format * @return formatted text */ public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor traversor = new NodeTraversor(formatter); traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node return formatter.toString(); }
Example #16
Source File: HtmlToPlainText.java From echo with Apache License 2.0 | 4 votes |
public String getPlainText(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor traversor = new NodeTraversor(formatter); traversor.traverse(element); return formatter.toString(); }
Example #17
Source File: Node.java From jsoup-learning with MIT License | 4 votes |
protected void outerHtml(StringBuilder accum) { new NodeTraversor(new OuterHtmlVisitor(accum, getOutputSettings())).traverse(this); }
Example #18
Source File: Javadocs.java From Recaf with MIT License | 4 votes |
private static String text(Element element) { FormattingVisitor formatter = new FormattingVisitor(); NodeTraversor.traverse(formatter, element); return formatter.toString(); }
Example #19
Source File: Cleaner.java From jsoup-learning with MIT License | 4 votes |
private int copySafeNodes(Element source, Element dest) { CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest); NodeTraversor traversor = new NodeTraversor(cleaningVisitor); traversor.traverse(source); return cleaningVisitor.numDiscarded; }
Example #20
Source File: Node.java From astor with GNU General Public License v2.0 | 4 votes |
protected void outerHtml(Appendable accum) { NodeTraversor.traverse(new OuterHtmlVisitor(accum, getOutputSettings()), this); }
Example #21
Source File: Node.java From astor with GNU General Public License v2.0 | 4 votes |
/** * Perform a depth-first filtering through this node and its descendants. * @param nodeFilter the filter callbacks to perform on each node * @return this node, for chaining */ public Node filter(NodeFilter nodeFilter) { Validate.notNull(nodeFilter); NodeTraversor.filter(nodeFilter, this); return this; }
Example #22
Source File: Node.java From astor with GNU General Public License v2.0 | 4 votes |
/** * Perform a depth-first traversal through this node and its descendants. * @param nodeVisitor the visitor callbacks to perform on each node * @return this node, for chaining */ public Node traverse(NodeVisitor nodeVisitor) { Validate.notNull(nodeVisitor); NodeTraversor.traverse(nodeVisitor, this); return this; }
Example #23
Source File: Cleaner.java From astor with GNU General Public License v2.0 | 4 votes |
private int copySafeNodes(Element source, Element dest) { CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest); NodeTraversor.traverse(cleaningVisitor, source); return cleaningVisitor.numDiscarded; }
Example #24
Source File: Node.java From astor with GNU General Public License v2.0 | 4 votes |
protected void outerHtml(Appendable accum) { NodeTraversor.traverse(new OuterHtmlVisitor(accum, getOutputSettings()), this); }
Example #25
Source File: Node.java From astor with GNU General Public License v2.0 | 4 votes |
/** * Perform a depth-first filtering through this node and its descendants. * @param nodeFilter the filter callbacks to perform on each node * @return this node, for chaining */ public Node filter(NodeFilter nodeFilter) { Validate.notNull(nodeFilter); NodeTraversor.filter(nodeFilter, this); return this; }
Example #26
Source File: Node.java From astor with GNU General Public License v2.0 | 4 votes |
/** * Perform a depth-first traversal through this node and its descendants. * @param nodeVisitor the visitor callbacks to perform on each node * @return this node, for chaining */ public Node traverse(NodeVisitor nodeVisitor) { Validate.notNull(nodeVisitor); NodeTraversor.traverse(nodeVisitor, this); return this; }
Example #27
Source File: Cleaner.java From astor with GNU General Public License v2.0 | 4 votes |
private int copySafeNodes(Element source, Element dest) { CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest); NodeTraversor traversor = new NodeTraversor(cleaningVisitor); traversor.traverse(source); return cleaningVisitor.numDiscarded; }
Example #28
Source File: Cleaner.java From astor with GNU General Public License v2.0 | 4 votes |
private int copySafeNodes(Element source, Element dest) { CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest); NodeTraversor.traverse(cleaningVisitor, source); return cleaningVisitor.numDiscarded; }
Example #29
Source File: Node.java From astor with GNU General Public License v2.0 | 4 votes |
protected void outerHtml(Appendable accum) { new NodeTraversor(new OuterHtmlVisitor(accum, getOutputSettings())).traverse(this); }