This code example shows how to parse HTML in Java by using jsoup. As there are many libraries for various purposes, there are a lot of html parser in Java. A lot of developers wonder which one is the best before they made a decision on an HTML parser. Jsoup is a very good start.
The following Java code accepts a url, finds elements by class name and finds all available links in the page.
import java.io.IOException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class Main{ public static void main(String[] args) throws IOException { Document doc = Jsoup.connect("http://www.programcreek.com").get(); Elements titles = doc.select(".entrytitle"); //print all titles in main page for(Element e: titles){ System.out.println("text: " +e.text()); System.out.println("html: "+ e.html()); } //print all available links on page Elements links = doc.select("a[href]"); for(Element l: links){ System.out.println("link: " +l.attr("abs:href")); } } } |
You can download the jsoup Java html parser by simply google searching “jsoup”.
Good post keep updating.
Thanks , Good to know about new things here, Let me share this, . CCNA training in pune
Richard Dickinson. This is because your class path is not correct. i follow the same steps and got this error. i was running project with this command java -cp target/htmlLParser-1.0-SNAPSHOT.jar com.fatBas.com.Main i was getting error because of -cp was not defined. then i run the class from main .java by right clicking on main.java . it work . hope this help
ClassNotFoundException: org.jsoup.Jsoup …. easy solution, download JSoup (search google), and add it as a library in your project.
I’ve probably made an error compiling but when I try this I get these errors:
java Main
Exception in thread “main” java.lang.NoClassDefFoundError: org/jsoup/Jsoup
at Main.main(Main.java:34)
Caused by: java.lang.ClassNotFoundException: org.jsoup.Jsoup
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
… 1 more
any ideas?