A Set contains no duplicate elements. That is one of the major reasons to use a set. There are 3 commonly used implementations of Set: HashSet, TreeSet and LinkedHashSet. When and which to use is an important question. In brief, if you need a fast set, you should use HashSet; if you need a sorted set, then TreeSet should be used; if you need a set that can be store the insertion order, LinkedHashSet should be used.
1. Set Interface
2. HashSet vs. TreeSet vs. LinkedHashSet
HashSet is Implemented using a hash table. Elements are not ordered. The
TreeSet is implemented using a tree structure(red-black tree in algorithm book). The elements in a set are sorted, but the
LinkedHashSet is between HashSet and TreeSet. It is implemented as a hash table with a linked list running through it, so it provides the order of insertion. The time complexity of basic methods is O(1).
3. TreeSet Example
TreeSet<Integer> tree = new TreeSet<Integer>(); tree.add(12); tree.add(63); tree.add(34); tree.add(45); Iterator<Integer> iterator = tree.iterator(); System.out.print("Tree set data: "); while (iterator.hasNext()) { System.out.print(iterator.next() + " "); } |
Output is sorted as follows:
Tree set data: 12 34 45 63
Now let’s define a Dog class as follows:
class Dog { int size; public Dog(int s) { size = s; } public String toString() { return size + ""; } } |
Let’s add some dogs to TreeSet like the following:
import java.util.Iterator; import java.util.TreeSet; public class TestTreeSet { public static void main(String[] args) { TreeSet<Dog> dset = new TreeSet<Dog>(); dset.add(new Dog(2)); dset.add(new Dog(1)); dset.add(new Dog(3)); Iterator<Dog> iterator = dset.iterator(); while (iterator.hasNext()) { System.out.print(iterator.next() + " "); } } } |
Compile ok, but run-time error occurs:
Exception in thread "main" java.lang.ClassCastException: collection.Dog cannot be cast to java.lang.Comparable at java.util.TreeMap.put(Unknown Source) at java.util.TreeSet.add(Unknown Source) at collection.TestTreeSet.main(TestTreeSet.java:22)
Because TreeSet is sorted, the Dog object need to implement
class Dog implements Comparable<Dog>{ int size; public Dog(int s) { size = s; } public String toString() { return size + ""; } @Override public int compareTo(Dog o) { return size - o.size; } } |
The output is:
1 2 3
4. HashSet Example
HashSet<Dog> dset = new HashSet<Dog>(); dset.add(new Dog(2)); dset.add(new Dog(1)); dset.add(new Dog(3)); dset.add(new Dog(5)); dset.add(new Dog(4)); Iterator<Dog> iterator = dset.iterator(); while (iterator.hasNext()) { System.out.print(iterator.next() + " "); } |
Output:
5 3 2 1 4
Note the order is not certain.
5. LinkedHashSet Example
LinkedHashSet<Dog> dset = new LinkedHashSet<Dog>(); dset.add(new Dog(2)); dset.add(new Dog(1)); dset.add(new Dog(3)); dset.add(new Dog(5)); dset.add(new Dog(4)); Iterator<Dog> iterator = dset.iterator(); while (iterator.hasNext()) { System.out.print(iterator.next() + " "); } |
The order of the output is certain and it is the insertion order:
2 1 3 5 4
6. Performance testing
The following method tests the performance of the three class on
public static void main(String[] args) { Random r = new Random(); HashSet<Dog> hashSet = new HashSet<Dog>(); TreeSet<Dog> treeSet = new TreeSet<Dog>(); LinkedHashSet<Dog> linkedSet = new LinkedHashSet<Dog>(); // start time long startTime = System.nanoTime(); for (int i = 0; i < 1000; i++) { int x = r.nextInt(1000 - 10) + 10; hashSet.add(new Dog(x)); } // end time long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("HashSet: " + duration); // start time startTime = System.nanoTime(); for (int i = 0; i < 1000; i++) { int x = r.nextInt(1000 - 10) + 10; treeSet.add(new Dog(x)); } // end time endTime = System.nanoTime(); duration = endTime - startTime; System.out.println("TreeSet: " + duration); // start time startTime = System.nanoTime(); for (int i = 0; i < 1000; i++) { int x = r.nextInt(1000 - 10) + 10; linkedSet.add(new Dog(x)); } // end time endTime = System.nanoTime(); duration = endTime - startTime; System.out.println("LinkedHashSet: " + duration); } |
From the output below, we can clearly wee that HashSet is the fastest one.
HashSet: 2244768 TreeSet: 3549314 LinkedHashSet: 2263320
* The test is not precise, but can reflect the basic idea that TreeSet is much slower because it is sorted.
LinkedHashSet in Java is not thread safe. Also performance wise it is not as fast as HashSet as it has to maintain insertion order too.
The code example is actually wrong, since the Dog class does not correctly define .hashCode() and .equals() method and, thus, hash-based structures WON’T work correctly.
E.g. it will add two new Dog(1) objects into the set separately.
Bad, baad article.
HashSet is faster than both LinkedHashSet and TreeSet as LinkedHashSet has the overhead of maintaining the insertion order and TreeSet has the overhead of maintaining the sorted order.
The random slow down the program so result incorrect at this time. Change int x = r.nextInt(1000 – 10) + 10;
like treeSet.add(new Dog(i));
Thanks for detailed explanation in simple manner
In your performance test you do not take in count the .next operation. That’s important for the comparison between HashSet [O(h/n)] and LinkedHashSet [O(1)]
Good article, bad way to present data: numbers on diagram say about 1.5 times difference, picture shows 6 times difference… Looks pretty scary for a reader scheming through the page…
well explain and must visit java collection examples
well explain and visit best java collection examples here
If you find this article helpful – think! What this articles does benchmark is adding values to various sets.
The much more important question is: What are you doing with the set once the values are added?
Of course a TreeSet is slow when adding values because all new values are sorted. That’s why you would normally avoid filling a TreeSet like that all together (You’d fill an array/LinkedHashSet and initialize your TreeSet with that array).
BUT: A TreeSet is very fast if you want to randomly access values in your Set because it has been sorted and there is no need to walk through the Array or LinkedList that backs the other lists.
All LinkedStuff is slow when randomly accessing values but they are very fast when moving or replacing objects.
Lastly: A HashSet is the most memory demanding of them all. If you have lots of RAM and you want resonable performance for read and writes to your set, this is the one to use.
Summary: Which Set is the fastest cannot simply be benchmarked by adding values to different types of sets. The right set for your use case may be dependent on more than just the performance during set initialization.
Dear Friends,, this is a nice code, works 100% perfect.
do not worry about the results. it depends upon the System you are using. and also the Processor business. Every time you run this code, it will give different results
Thanks 🙂
dsvgd
Thanks Buddy 🙂 Simple and concise
The result is exchanged like below
HashSet: 1109680
TreeSet: 500201
LinkedHashSet: 1675388
When i run u r code …it will be exception Exception in thread “main” java.lang.ClassCastException: setproject.Dog1 cannot be cast to java.lang.Comparable
at java.util.TreeMap.compare(Unknown Source)
at java.util.TreeMap.put(Unknown Source)
at java.util.TreeSet.add(Unknown Source)
at setproject.Main.main(Main.java:36)
I just add comparable interface with Dog class but the result is exchanged.. Give me the correct solution.. Thanx in advance
explained in very simple and easy to understand manner.. Performance test results helped to visualise and convince the theory learnt using practical way.. Awesome article.. keep writing ..
Performance Test results are unexpected. We can not say order would be this
The results are exchanged, the correct is:
TreeSet: 2263320
LinkedHashSet: 3549314
well done, very nicely , crisply explained.. gone through entire page withing 1 minute.. thansk