How SubString works in java
Substring method from String class in Java is one of the most often methods. But if not used wisely it leads to memory leaks in Java JDK1.6 but was fixed later in JDK1.7. How substring works in java and how its leads to memory leaks are few famous Java interview questions asked to many developers. If you have not seen the implementation of substring method then its really hard to explain how it leads to a memory leak.
How substring works in java
The substring(int beginIndex, int endIndex) method of the String class. It returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex – 1. Thus the length of the substring is endIndex-beginIndex.
Substring is an overloaded method in java. One version of substring method String substring(int beginIndex)
takes just beginIndex, and returns part of String started from beginIndex till end, while other String substring(int beginIndex, int endIndex)
takes two parameters beginIndex and endIndex and returns part of String starting from beginIndex to endIndex-1.
Have a look at the actual bug which leads to memory leak and was fixed in JDK1.7
1 2 3 4 5 6 |
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6294060 The actual behaviour is that substring() uses an performance optimized package private constructor which does not perform a copy of the char array for substring but reuses the original char array and so the memory will not be released as long a reference to the extracted substring exists. |
String is immutable in java and substring on original string creates a new object in memory. If your String is holding values in GB and we are doing substring to get only 10 char out of it, still until JDK1.7 it used to store the reference of the same char array of gigabyte size. And this stops it from garbage collection and thus leading to memory leak. This issue was fixed as part of JDK1.7 release.
How to deal with this problem in JDK 6
If for any reason you are not in position to migrate to 1.7 above JDK, then there are still few workarounds using which you can avoid this memory leak.
intern()
method : Calling intern on String object adds it into String pool if not already present and fetches existing string from pool if already present. This will not create extra space but will resuse the existing one.- Using
String(String original)
constructor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
/** * Initializes a newly created {@code String} object so that it represents * the same sequence of characters as the argument; in other words, the * newly created string is a copy of the argument string. Unless an * explicit copy of {@code original} is needed, use of this constructor is * unnecessary since Strings are immutable. * * @param original * A {@code String} */ public String(String original) { int size = original.count; char[] originalValue = original.value; char[] v; if (originalValue.length > size) { // The array representing the String is bigger than the new // String itself. Perhaps this constructor is being called // in order to trim the baggage, so make a copy of the array. int off = original.offset; v = Arrays.copyOfRange(originalValue, off, off+size); } else { // The array representing the String is the same // size as the String, so no point in making a copy. v = originalValue; } this.offset = 0; this.count = size; this.value = v; } |
As you can see in the above constructor implementation if the original String is bigger than new String value then it trims the extra characters to make it of same size by using Arrays.CopyOfRange()
method and creates a new string of small size. This removes original string reference and is eligible for garbage collection.
To use above constructor we can write below client code, this will use above constructor and release the reference to original string by creating new.
1 2 |
String configValue = getConfigValues(); String output = new String(configValue.substring(startIndex, endIndex)); |
substring() in JDK 6 versus JDK 7
substring() in JDK6
Below code is simplified to show the differences. Here it references to same object.
1 2 3 4 5 6 7 8 9 10 11 |
//JDK 6 String(int offset, int count, char value[]) { this.value = value; this.offset = offset; this.count = count; } public String substring(int beginIndex, int endIndex) { ...... return new String(offset + beginIndex, endIndex - beginIndex, value); } |
substring() in JDK 7
JDK 7 has improvement as it copies from original character arrays and creates a new char array and thus new string object.
1 2 3 4 5 6 7 8 9 10 11 |
//JDK 7 public String(char value[], int offset, int count) { .... this.value = Arrays.copyOfRange(value, offset, offset + count); } public String substring(int beginIndex, int endIndex) { .... int subLen = endIndex - beginIndex; return new String(value, beginIndex, subLen); } |
Important Interview questions for experienced developers:
Best Core Java Interview Question and Answers
Spring Interview Questions and Answers
What is Rehashing and Load factor in HashMap?
Java HashMap Implementation and Performance