org.cyberneko.html.HTMLConfiguration Java Examples

The following examples show how to use org.cyberneko.html.HTMLConfiguration. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: HtmlUtils.java    From openemm with GNU Affero General Public License v3.0 6 votes vote down vote up
/**
 * Parse an entire HTML document or a document fragment. Use lowercase translation for names of tags and attributes.
 * @param document a HTML code to parse.
 * @param encoding an encoding to use for a parser.
 * @return a parsed document representation.
 */
public static Document parseDocument(String document, String encoding) throws IOException, SAXException {
    DOMParser parser = new DOMParser(new HTMLConfiguration());

    try {
        // These URLs are predefined parameters' names (check org.cyberneko.html.HTMLConfiguration for more information)
        parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
        parser.setProperty("http://cyberneko.org/html/properties/default-encoding", encoding);
    } catch (SAXNotRecognizedException | SAXNotSupportedException e) {
        logger.error("Unexpected parser configuration error occurred: " + e.getMessage());
        throw new RuntimeException(e);
    }

    StringReader reader = new StringReader(document);
    InputSource source = new InputSource(reader);
    parser.parse(source);

    return parser.getDocument();
}
 
Example #2
Source File: DefaultDOMSource.java    From CSSBox with GNU Lesser General Public License v3.0 6 votes vote down vote up
@Override
public Document parse() throws SAXException, IOException
{
    //temporay NekoHTML fix until nekohtml gets fixed
    if (!neko_fixed)
    {
        HTMLElements.Element li = HTMLElements.getElement(HTMLElements.LI);
        HTMLElements.Element[] oldparents = li.parent;
        li.parent = new HTMLElements.Element[oldparents.length + 1];
        for (int i = 0; i < oldparents.length; i++)
            li.parent[i] = oldparents[i];
        li.parent[oldparents.length] = HTMLElements.getElement(HTMLElements.MENU);
        neko_fixed = true;
    }
    
    DOMParser parser = new DOMParser(new HTMLConfiguration());
    parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
    if (charset != null)
        parser.setProperty("http://cyberneko.org/html/properties/default-encoding", charset);
    parser.parse(new org.xml.sax.InputSource(getDocumentSource().getInputStream()));
    return parser.getDocument();
}
 
Example #3
Source File: DOMSource.java    From jStyleParser with GNU Lesser General Public License v3.0 6 votes vote down vote up
public Document parse() throws SAXException, IOException
{
    DOMParser parser = new DOMParser(new HTMLConfiguration());
    parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
    parser.setProperty("http://cyberneko.org/html/properties/names/attrs", "lower");
    if (charset != null)
        parser.setProperty("http://cyberneko.org/html/properties/default-encoding", charset);
    
    //preparation for filters, not used now
    /*XMLDocumentFilter attributeFilter = new DOMAttributeFilter();
    XMLDocumentFilter[] filters = { attributeFilter };
    parser.setProperty("http://cyberneko.org/html/properties/filters", filters);*/        
    
    parser.parse(new org.xml.sax.InputSource(is));
    doc = parser.getDocument();
    return doc;
}
 
Example #4
Source File: HTMLSAXParser.java    From document-management-software with GNU Lesser General Public License v3.0 4 votes vote down vote up
public HTMLSAXParser() {
     super(new HTMLConfiguration());
}
 
Example #5
Source File: NekoDOMParser.java    From lams with GNU General Public License v2.0 4 votes vote down vote up
NekoDOMParser( HTMLConfiguration configuration, DocumentAdapter adapter ) {
    super( configuration );
    _documentAdapter = adapter;
}
 
Example #6
Source File: ScriptFilter.java    From lams with GNU General Public License v2.0 4 votes vote down vote up
/** Constructs a script object with the specified configuration. */
ScriptFilter( HTMLConfiguration config ) {
    _configuration = config;
}
 
Example #7
Source File: HTMLParser.java    From document-management-system with GNU General Public License v2.0 4 votes vote down vote up
public HTMLParser() {
	super(new HTMLConfiguration());
}