Backreferences in Java Regular Expressions is another important feature provided by Java.
To understand backreferences, we need to understand group first. Group in regular expression means treating multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses – â€()â€. Each set of parentheses corresponds to a group.
Backreferences are convenient, because it allows us to repeat a pattern without writing it again. We can just refer to the previous defined group by using \#(# is the group number). This will make more sense after you read the following two examples.
Example 1: Finding Repeated Pattern
(\d\d\d)\1 matches 123123, but does not match 123456 in a row. This indicates that the referred pattern needs to be exactly the name.
String str = "123456"; Pattern p = Pattern.compile("(\\d\\d\\d)\\1"); Matcher m = p.matcher(str); System.out.println(m.groupCount()); while (m.find()) { String word = m.group(); System.out.println(word + " " + m.start() + " " + m.end()); } |
1
123123 0 6
Example 2: Finding Duplicate Words
String pattern = "\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b"; Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); String phrase = "unique is not duplicate but unique, Duplicate is duplicate."; Matcher m = p.matcher(phrase); while (m.find()) { String val = m.group(); System.out.println("Matching subsequence is \"" + val + "\""); System.out.println("Duplicate word: " + m.group(1) + "\n"); } |
Matching subsequence is “unique is not duplicate but unique”
Duplicate word: unique
Matching subsequence is “Duplicate is duplicate”
Duplicate word: Duplicate
Note: This is not a good method to use regular expression to find duplicate words. From the example above, the first “duplicate†is not matched.
Why Use Backreferences?
Check out more regular expression examples.