Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = “leetcode”,
dict = [“leet”, “code”].Return true because “leetcode” can be segmented as “leet code”.
1. Naive Approach
This problem can be solve by using a naive approach, which is trivial. A discussion can always start from that though.
public class Solution { public boolean wordBreak(String s, Set<String> dict) { return wordBreakHelper(s, dict, 0); } public boolean wordBreakHelper(String s, Set<String> dict, int start){ if(start == s.length()) return true; for(String a: dict){ int len = a.length(); int end = start+len; //end index should be <= string length if(end > s.length()) continue; if(s.substring(start, start+len).equals(a)) if(wordBreakHelper(s, dict, start+len)) return true; } return false; } } |
Time is O(n^2) and exceeds the time limit.
2. Dynamic Programming
The key to solve this problem by using dynamic programming approach:
- Define an array t[] such that t[i]==true => 0-(i-1) can be segmented using dictionary
- Initial state t[0] == true
public class Solution { public boolean wordBreak(String s, Set<String> dict) { boolean[] t = new boolean[s.length()+1]; t[0] = true; //set first to be true, why? //Because we need initial state for(int i=0; i<s.length(); i++){ //should continue from match position if(!t[i]) continue; for(String a: dict){ int len = a.length(); int end = i + len; if(end > s.length()) continue; if(t[end]) continue; if(s.substring(i, end).equals(a)){ t[end] = true; } } } return t[s.length()]; } } |
Time: O(string length * dict size).
3. Java Solution 3 – Simple and Efficient
In Solution 2, if the size of the dictionary is very large, the time is bad. Instead we can solve the problem in O(n^2) time (n is the length of the string).
public boolean wordBreak(String s, Set<String> wordDict) { int[] pos = new int[s.length()+1]; Arrays.fill(pos, -1); pos[0]=0; for(int i=0; i<s.length(); i++){ if(pos[i]!=-1){ for(int j=i+1; j<=s.length(); j++){ String sub = s.substring(i, j); if(wordDict.contains(sub)){ pos[j]=i; } } } } return pos[s.length()]!=-1; } |
4. The More Interesting Problem
The dynamic solution can tell us whether the string can be broken to words, but can not tell us what words the string is broken to. So how to get those words?
Check out Word Break II.
Hi Guys
I made a small video explaining the recursion and DP approach, please check this out
https://www.youtube.com/watch?v=gjmfFd0Pmh4&t=15s
It works OK for example given in the article but can be easily broken:
s = “leetcode”,
dict = [“lee”,”leet”, “code”].
I like this problem… it is so simple, but a nice exercise.
Here is a version using a stack instead of recursion (just for fun), however the complexity is O(n^2)… not acceptable.
// Brute-force:
// T(n) = O(n^2)
private boolean wordBreak_(String s, Set dict) {
if (s == null || s.length() == 0)
return false;
Stack stack = new Stack();
int n = s.length();
stack.push(0);
while (!stack.empty()) {
int start = stack.pop();
if (start == n)
return true;
for (String a:dict) {
int len = a.length();
if (start + len > n)
continue;
if (s.substring(start, start+len).equals(a))
stack.push(start + len);
}
}
return false;
}
I like this problem… it is so simple, but a nice exercise.
Here is a version using a stack instead of recursion (just for fun), however the complexity is O(n^2)… not acceptable.
public boolean wordBreak_(String s, Set dict) {
if (s == null || s.length() == 0)
return false;
Stack stack = new Stack();
int n = s.length();
stack.push(0);
while (!stack.empty()) {
int start = stack.pop();
for (String a:dict) {
int len = a.length();
if (start + len > n)
continue;
boolean isSame = s.substring(start, start+len).equals(a);
if (isSame && start + len == n)
return true;
if (isSame)
stack.push(start + len);
}
}
return false;
}
The description is not complete :
1) not all of the words in the dictionary have to be used
2) you can use a word for dictionary multiple times
3) words in dictionary can be substrings of other words in dictionary
Java one loop solution. This is the shortest I have seen here and probably the most efficient. See my code below :)
public static boolean WordBreak(String str, List words){
StringBuilder sb = new StringBuilder(str);
int start=0, end=str.length(), counter=1;
while(counter <= end){
if(words.contains(sbOne.substring(start,counter))){
sbOne.delete(start,counter);
counter=1;
end=sbOne.length();
}
counter++;
}
return (sbOne.length() == 0) ? true : false;
}
Solution 2 is exhaustively iterating through the word dictionary, which is problematic (as is use of a HashSet). Using a a SortedSet, you can constrain the words you compare against via set.subSet over [minimal substring of input string, max substring of input string].
Use java.util.BitSet for space. 🙂
Why do I think that the first solution is the most efficient? The remaining two solutions loop through each char in string s, while the first one did not. When you call dict.contains() in solution 3, I think below the surface the dictionary is looped through too.
The naive approach is actually the best, isn’t it? This approach does not loop string s from 0 to s.length-1
The brute force solution seems to be wrong?
Input: s = soybean ; dict = {“so”, “y”, “bean”}
Output: NO
But output should be YES
Yes, do you have a solution for it, when we can’t repeat the words.
if Approach 1 and Approach 3 both are O(N^2), why is 3 so much better than one?
slight improvement on solution 3, use boolean, instead in int to avoid confusion.
public boolean wordBreak(String s, Set wordDict) {
boolean[] pos = new boolean[s.length()+1];
pos[0] = true;
for(int i = 0; i < s.length(); i++){
for(int j = i+1; pos[i] && j <= s.length(); j++){
if (wordDict.contains(s.substring(i, j)))
pos[j] = true;
}
}
return pos[s.length()];
}
public boolean wordBreak(String s,Set dict){
HashMap map = new HashMap();
return wordBreak(s, dict,map);
}
public boolean wordBreak(String s,Set dict,HashMap map){
if(map.containsKey(s)){
return map.get(s);
}
for (int i = 0; i <= s.length(); i++) {
String prefix = s.substring(0,i);
if(dict.contains(prefix)){
if(i == s.length()){
return true;
}
if(wordBreak(s.substring(i),dict,map)){
map.put(s.substring(i), true);
return true;
}
}
}
map.put(s, false);
return false;
}
This implementation looks neat. But the complexity is exponential so I would choose polynomial implementation for my case.
http://www.ideserve.co.in/learn/word-break-problem
Here is a detailed explanation of the algorithm.
Yes, i had commented that a Trie were a better solution – before seeing you have already posted it.
A Trie is a better solution than DP for this problem.
The dynamic solution fails for the case
s = “aaaab”
dict = [“a”, “aa”, “ac”]
It returns true but should return false.
how about this?
import java.util.*;
public class WordBreak {
static Set dictionary = new HashSet();
public static void main (String[] args) {
initializeDictionary();
System.out.println(wordBreak(“leetcodesamsung”));
}
private static Boolean wordBreak(String s) {
Boolean[] memo = new Boolean[s.length()];
Arrays.fill(memo, Boolean.FALSE);
int startIndex = 0;
for (int i = 0; i < s.length(); i++) {
int endIndex = i+1;
if (!memo[i] && dictionary.contains(s.substring(startIndex,endIndex)))
{
memo[i] = true;
startIndex = endIndex;
}
}
return memo[s.length() – 1];
}
private static void initializeDictionary() {
dictionary.add("le");
dictionary.add("et");
dictionary.add("code");
dictionary.add("samsun");
dictionary.add("g");
}
}
Yes, the wordBreak(s.substring(i), dict) should put in the if condition. However, my solution cannot pass the latched online judge.
I think instead of returning wordBreak(s.substring(i), dict) you need to have that in the if statement with dict.contains(sstr). Otherwise your function returns to early in some cases
if (dict.contains(sstr) && wordBreak(s.substring(i), dict)) return true;
Your solution does not pass leetcode online judge.
Input: “goalspecial”, [“go”,”goal”,”goals”,”special”]
Output: false
Expected: true
Thanks for your nice & complete post. Verify the validity of a string is easy. It is more complex to split a valid string into words. My discussion and java program can be found here http://www.capacode.com/?p=335
Very short Python solution, also using trie:
class WordSplitTreeNode:
def __init__(self):
self.children = [None for i in range(ord(“z”) – ord(“a”) + 1)]
self.final = False
class WordSplitTree:
def __init__(self):
self.root = WordSplitTreeNode()
def addWord(self, word):
node = self.root
for c in word:
i = ord(c) – ord(“a”)
if node.children[i] is None:
node.children[i] = WordSplitTreeNode()
node = node.children[i]
node.final = True
def iterate(self, word, pos, node):
if node.final and pos == len(word):
yield “”
return
nextNodes = []
i = ord(word[pos]) – ord(“a”)
if node.final:
nextNodes.append(self.root.children[i])
nextNodes.append(node.children[i])
for nextNode in nextNodes:
if not nextNode is None:
for s in self.iterate(word, pos + 1, nextNode):
yield (” ” if node.final and nextNode is self.root.children[i] else “”) + word[pos] + s
def checkWordCanBeSplit(word, dictWords):
tree = WordSplitTree()
for w in dictWords:
tree.addWord(w)
return [splitWord for splitWord in tree.iterate(word, 0, tree.root)]
The problem is supposed to be equivalent to matching the regexp (leet|code)*, which means that it can be solved by building a DFA in O(2^m) and executing it in O(n)
DP and Recursive Solution with working code at http://www.gohired.in/2014/12/word-break-problem.html
not true if dict = {“a”, “ab”}
Can anyone clarify the complexity of naïve?
Because you skipped the last IF condition if t[end] is already true, now all possibilities are not given.
For INPUT: “leetcode”, [“leetcode”,”leet”,”code”].
matches are : leetcode leet
code is not given as t[end] is made true by match leetcode.
/* package whatever; // don’t place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
class Wordbreak
{
public static boolean wordBreak(String s, String[] dict){
int counter = 0;
System.out.println(“Given string to beak :”+ s);
for(String ds : dict){
//System.out.println(“dict string :”+ ds);
if(s.contains(ds)){
counter = counter +1;
//System.out.println(“Index of :”+ ds +” ” + s.indexOf(ds));
int strtindex = s.indexOf(ds);
int len = ds.length();
String sb = s.substring( strtindex, strtindex+len);
//System.out.println(“Am der “+ sb);
//System.out.println(“dict string :”+ ds);
}
}
if (counter>0){
return true;
}
return false;
}
public static void main (String[] args) throws java.lang.Exception
{
String s = “programcreek”;
String[] dict = new String[]{“programcree”,”program”,”creek”};
boolean b = wordBreak(s, dict);
if(b){
System.out.println(“Can be done”);
}
else{
System.out.println(“Not possible!!”);
}
}
}
{
if (s == null || s.length() == 0 || dict.isEmpty()) {
return false;
}
for (String w : dict) {
if (s.equals(w)) {
return true;
} else if (s.startsWith(w)) {
String newS = new String(s);
do {
newS = newS.replace(w, “”);
} while (newS.startsWith(w));
if (newS.equals(“”)) {
return true;
}
boolean result = this.wordBreak(newS, dict);
if (result) {
return true;
}
}
}
return false;
}
Another solution , O(n^3), being n the length() of s. I assume that the set is a hashSet.
public boolean wordBreak(String s, Set dict) {
if(s == null || s.length() == 0){
return true;
}
boolean[] arr = new boolean[s.length()];
if(dict.contains(s.charAt(0)+””)){
arr[0] = true;
}
for(int i = 1;i<arr.length;i++){
if (dict.contains(s.substring(0, i + 1))) {
arr[i] = true;
}
for (int j = 0; j < i; j++) {
int a = i – j + 1;
String subWord = s.substring(j + 1, j + a);
if (dict.contains(subWord) && arr[j]) {
arr[i] = true;
j = i;
break;
}
}
}
return arr[arr.length-1];
}
One of the questions will be: Can we use the same dictionary word more than once? For example, if we have the word “leetcodeleetcode” and the dictionary have the words {“leet”, “code”}, the result will be true?
Not true for post java7. From Java 7, substring() is a O(n) operation!
Apparetly, this algorithm is not correct.
If you use this dictory
dict.add(“leet”);
dict.add(“code”);
dict.add(“lee”);
dict.add(“programcree”);
dict.add(“program”);
dict.add(“creek”);
, then if will not recognize “leetcode”.
I guess this can be solved by using Tries also.
Time Complexity : O(n) + O(m)
Space Complexity : O(m)
Let me know if the following code will work for all cases.
import java.util.*;
import java.lang.*;
import java.io.*;
class TrieNode {
char val;
boolean isRoot;
boolean isLeaf;
TrieNode children[]; // There can be atmost 26 children (english alphabets)
TrieNode() {
val = ‘^’;
isRoot = true;
children = new TrieNode[26];
initializeChildren();
}
TrieNode(char val) {
this.val = val;
isRoot = false;
isLeaf = false;
children = new TrieNode[26];
initializeChildren();
}
void initializeChildren() {
for(int i=0;i<children.length;i++) {
children[i] = null;
}
}
}
class Trie {
TrieNode root;
Trie() {
root = new TrieNode();
}
void insert(String input) {
int start = ‘A’;
TrieNode current = root;
for(int i=0;i<input.length();i++) {
int val = Character.toUpperCase(input.charAt(i));
int index = val – start;
if(current.children[index] == null)
current.children[index] = new TrieNode(input.charAt(i));
current = current.children[index];
}
current.isLeaf = true;
}
}
class Example {
boolean wordBreak(String s, Set<String> dict) {
// First Construct Trie from the dictionary
Trie a = new Trie();
for(String i : dict) {
a.insert(i);
}
TrieNode current = a.root;
int start = ‘A’;
boolean result = false;
for(int i=0; i<s.length();i++) {
int val = Character.toUpperCase(s.charAt(i));
int index = val – start;
if(current.children[index] == null) {
// Word is not in the dictionary
current = a.root;
result = false;
break;
}
current = current.children[index];
if(current.isLeaf == true) {
// Start from the beginning for the next character
current = a.root;
result = true;
}
}
return result;
}
public static void main(String[] args) {
Example temp = new Example() ;
Set<String> dict = new HashSet<String>();
dict.add(“leet”);
dict.add(“code”);
dict.add(“programcree”);
dict.add(“program”);
dict.add(“creek”);
System.out.println(“Wordbreak (programcreek) = ” + temp.wordBreak(“programcreek”, dict));
System.out.println(“Wordbreak (leetcode) = ” + temp.wordBreak(“programcreek”, dict));
}
}
import java.util.*;
import java.lang.*;
import java.io.*;
class TrieNode {
char val;
boolean isRoot;
boolean isLeaf;
TrieNode children[]; // There can be atmost 26 children (english alphabets)
TrieNode() {
val = '^';
isRoot = true;
children = new TrieNode[26];
initializeChildren();
}
TrieNode(char val) {
this.val = val;
isRoot = false;
isLeaf = false;
children = new TrieNode[26];
initializeChildren();
}
void initializeChildren() {
for(int i=0;i<children.length;i++) {
children[i] = null;
}
}
}
class Trie {
TrieNode root;
Trie() {
root = new TrieNode();
}
void insert(String input) {
int start = 'A';
TrieNode current = root;
for(int i=0;i<input.length();i++) {
int val = Character.toUpperCase(input.charAt(i));
int index = val - start;
if(current.children[index] == null)
current.children[index] = new TrieNode(input.charAt(i));
current = current.children[index];
}
current.isLeaf = true;
}
}
class Example {
boolean wordBreak(String s, Set dict) {
// First Construct Trie from the dictionary
Trie a = new Trie();
for(String i : dict) {
a.insert(i);
}
TrieNode current = a.root;
int start = 'A';
boolean result = false;
for(int i=0; i<s.length();i++) {
int val = Character.toUpperCase(s.charAt(i));
int index = val - start;
if(current.children[index] == null) {
// Word is not in the dictionary
current = a.root;
result = false;
break;
}
current = current.children[index];
if(current.isLeaf == true) {
// Start from the beginning for the next character
current = a.root;
result = true;
}
}
return result;
}
public static void main(String[] args) {
Example temp = new Example() ;
Set dict = new HashSet();
dict.add("leet");
dict.add("code");
dict.add("programcree");
dict.add("program");
dict.add("creek");
System.out.println("Wordbreak (programcreek) = " + temp.wordBreak("programcreek", dict));
System.out.println("Wordbreak (leetcode) = " + temp.wordBreak("programcreek", dict));
}
}
I can think of trie based solution:
Time Complexity = O(n) + O(m)
Space Complexity = O(m)
Let me if this will work:
import java.util.*;
import java.lang.*;
import java.io.*;
class TrieNode {
char val;
boolean isRoot;
boolean isLeaf;
TrieNode children[]; // There can be atmost 26 children (english alphabets)
TrieNode() {
val = '^';
isRoot = true;
children = new TrieNode[26];
initializeChildren();
}
TrieNode(char val) {
this.val = val;
isRoot = false;
isLeaf = false;
children = new TrieNode[26];
initializeChildren();
}
void initializeChildren() {
for(int i=0;i<children.length;i++) {
children[i] = null;
}
}
}
class Trie {
TrieNode root;
Trie() {
root = new TrieNode();
}
void insert(String input) {
int start = 'A';
TrieNode current = root;
for(int i=0;i<input.length();i++) {
int val = Character.toUpperCase(input.charAt(i));
int index = val - start;
if(current.children[index] == null)
current.children[index] = new TrieNode(input.charAt(i));
current = current.children[index];
}
current.isLeaf = true;
}
}
public class Example {
boolean wordBreak(String s, Set dict) {
// First Construct Trie from the dictionary
Trie a = new Trie();
for(String i : dict) {
a.insert(i);
}
TrieNode current = a.root;
int start = 'A';
boolean result = false;
for(int i=0; i<s.length();i++) {
int val = Character.toUpperCase(s.charAt(i));
int index = val - start;
if(current.children[index] == null) {
// Word is not in the dictionary
current = a.root;
result = false;
break;
}
current = current.children[index];
if(current.isLeaf == true) {
// Start from the beginning for the next character
current = a.root;
result = true;
}
}
return result;
}
public static void main(String[] args) {
Example temp = new Example() ;
Set dict = new HashSet();
dict.add("leet");
dict.add("code");
dict.add("programcree");
dict.add("program");
dict.add("creek");
System.out.println("Wordbreak (programcreek) = " + temp.wordBreak("programcreek", dict));
System.out.println("Wordbreak (leetcode) = " + temp.wordBreak("leetcode", dict));
System.out.println("Wordbreak (lesscode) = " + temp.wordBreak("lesscode", dict));
}
}
Small correction of complexity in the 2nd case.
It’s mentioned that “Time: O(string length * dict size)” but you also run equals (and substring is not constant for Java > 1.6) for every word in dictionary so it’s more like O(string length * dict size * length of the longest word in dict).
complexity of naïve is O(n^2) not O(2^n)
Use a HasMap can reduce repeated calculation
Recursion with hashMap:
class Solution:
def __init__(self):
self.table = {}
def wordBreak(self, s, dict):
if len(s) == 0:
return True
if len(s) == 1:
return s in dict
if s in self.table:
return self.table[s]
isBreakable = False
for i in range(len(s)):
word = s[:i+1]
if word in dict:
subFlag = self.wordBreak(s[i+1:], dict)
if s[i+1:] not in self.table:
self.table[s[i+1:]] = subFlag
isBreakable |= subFlag
return isBreakable
recursive solution
public static boolean wordBreak(String s, Set dict){
//input validation
//Base case
if(dict.contains(s))
return true;
else {
for(int i = 0; i < s.length(); i++){
String sstr = s.substring(0, i);
if(dict.contains(sstr))
return wordBreak(s.substring(i), dict);
}
}
return false;
}
we can do it in O(n) right , assuming if Set dic is actually a Hashset, so the retrival on Hashset is always O(1)
List arr;
StringBuilder sb = new StringBuilder();
int i=0;
int wordIndex=0;
while(i>s.length){
if(dic.get(sb.substring(wordIndex,i) != null){
wordIndex=i+1;
}else{
arr.add(sb.substring(wordIndex, i));
}
i++;
}
Another approach
package test;
import java.util.HashSet;
import java.util.Set;
public class WordBreak2 {
public static boolean wordBreak(String s,Set dict) {
if (s.length()==0) {
return true;
}
for(int i=1; i<=s.length(); i++) {
String firstWord=s.substring(0, i);
String remaing=s.substring(i);
if (dict.contains(firstWord) && wordBreak(remaing, dict) ) {
System.out.print(" ");
System.out.print(firstWord);
return true;
}
}
return false;
}
public static void main(String[] args) {
Set dict=new HashSet(5);
dict.add(“program”);
if (wordBreak(“pprogram”, dict)) {
System.out.println(” YES”);
} else {
System.out.println(” NO”);
}
dict=new HashSet(5);
dict.add(“ab”);
dict.add(“abc”);
dict.add(“de”);
if (wordBreak(“abcde”, dict)) {
System.out.println(” YES”);
} else {
System.out.println(” NO”);
}
}
}
This is more efficient if dict is big which is usually
Seems good to me, I will try later. Thanks!
Thanks for these solutions. Just starting to go through the problems but looks like very useful website.
As for how to get the words that the string breaks up to:
Change the “t” array to integer instead of boolean.
Replacing setting t[end] to true (i.e. saying you have found a break up of 0..end substring] with setting t[end] to i, thus saying you have found a break up of 0..end substring and the last word in that break up is substring i..end of the main string.
Then at the end if I can break up the string, I check t[s.length()]. THe last word in the break up will substring starting at t[s.length()] and ending at s.length()-1. And you repeate this procedure to get the other words.
I don’t get what you mean, can you explain in more detail? Thanks.
You should skip the string comparison in the last IF condition if t[end] is already true.
Do you know if a better one exists? Can you provide a better solution? Appreciate it!
I don’t think looping through the dic is a good idea.