Java 9 brings a new improved string, which in most cases, will reduce String memory consumption to half.

Why Compact Strings required in Java 9?

Strings in Java are internally represented by a char[] containing the characters of the String. And, every char is made up of 2 bytes because Java internally uses UTF-16.

For instance, if a String contains a word in the English language, the leading 8 bits will all be 0 for every char, as an ASCII character can be represented using a single byte.

Many characters require 16 bits to represent them but statistically most require only 8 bits — LATIN-1 character representation. So, there is a scope to improve the memory consumption and performance.

Strings usually consume as much as 25% of the heap memory. Making String twice as small would mean not only a significant memory consumption reduction, but also a significant reduction of Garbage Collection overhead.

Compressed Strings in Java 6

The string memory consumption issue is not new. It has been discussed for quite some time already. In fact, in Java 6, a new feature was introduced to address this issue – Compressed Strings.

The idea was – instead of using char[] array for the internal representation an Object could be used. If necessary, two bytes per character would still be used assigning char[] array to that object. If not, one byte per character is sufficient and byte[] array can be used.

This was an optional, experimental feature, which could be enabled on demand using a -XX:+UseCompressedStrings flag.

Compact Strings in Java 9

Instead of having char[] array, String is now represented as byte[] array. Depending on which characters it contains, it will either use UTF-16 or Latin-1, that is – either one or two bytes per character.

Now, the question is – how will all the String operations work? How will it distinguish between the LATIN-1 and UTF-16 representations?

To tackle this issue, another change is made to the internal implementation of the String. We have a final field coder, that preserves this information.

Unlike Compressed Strings, this feature is enabled by default. If necessary (in a case where there are mainly UTF-16 Strings used), it can still be disabled by -XX:-CompactStrings.

The change does not affect any public interfaces of String or any other related classes. Many of the classes were reworked to support the new String representation, such as StringBuffer or StringBuilder.

Compact Strings Implementation in Java 9

Until Java 8, the String was stored as a char[]:

From Java 9, it’ll be a byte[]:

The variable coder:

Where the coder can be:

Most of the String operations now check the coder and dispatch to the specific implementation

How coder works

In Java 9 String class implementation, the length is calculated as:

If the String contains only LATIN-1, the value of the coder will be 0 so the length of the String will be the same as the length of the byte array.

In other cases, if the String is in UTF-16 representation, the value of coder will be 1, and hence the length will be half the size of the actual byte array.

Performance Impact

Unlike Compressed Strings, the new solution does not contain any String repacking, thus should be much more performant.

In addition to significant memory footprint reduction, it should provide a performance gain when processing 1-byte Strings as there is much less data to be processed.

Garbage Collection overhead will be reduced as well. Processing of 2-byte string does mean a minor performance hit, because there is some additional logic for handling both cases for Strings. But overall, performance should be improved as 2-byte Strings should represent just a minority of all String instances.

In case there are performance issues in situations, where the majority of Strings are 2-byte, the feature can always be disabled.

Conclusion

Java 9 introduces a new feature, which does reduce the memory footprint of String to half in most of the cases. It is backward compatible, no public interfaces were changed. If required, it can be disabled by an -XX flag. It is a successor to the Compressed Strings from Java 6.

It's good to share...Share on FacebookTweet about this on TwitterShare on LinkedInPin on PinterestShare on Google+Email this to someone

Leave a Reply

Your email address will not be published. Required fields are marked *