Text Operations in Java using String, StringBuilder and StringJoiner

By Adam McQuistan in Java  09/19/2019 Comment

Introduction

In this article go through some fundamental Java techniques to work with textual data via the String, StringBuilder and StringJoiner classes. My use "text" is deliberate when referring the the type of data because I want to make clear when I'm talking about a kind a data (data as a sequence of char primitives) as opposed to the String class which is an abstraction of text data as strings.

The Immutable World of the String Class

In Java the most common way to represent text data is as instances of the immutable String class. Here immutability means once its created you cannot change anything about it and are forced to create new instances composed of parts of the original and optionally other String instances. You create String class instances either as literal assignments or with the less common approach of of using the new keyword.

// literal String instance
String name = "Adam";

// String instance created via new keyword
String name2 = new String("Adam");

In practice you should exclusively use the first approach and for good reason too not just preference. When you create a String literal the Java Virtual Machine (JVM) keeps a proverbial bucket of these String literal instances and when it sees one being created that it has already seen elsewhere in the program it will essentially ignore the second (or third, fourth, ect...) and use the first one rather than creating another identical instance of an immutable object. The same is not true for the second approach of creating an explicit instance with the new keyword.

The fact that the JVM is capable of maintaining this resource conserving String Pool is a big help but, the fact of the matter is that there is an enormous number of String instances capable of being assembed from the various alpha / numeric and symbol characters available in the world. What this means is that a whole lot of immutable, single purpose, String objects can be created in a hurry and the JVM has to maintain them in ways such as keeping track of reference counts, scope, and garbage collection which has been known to lead to problems.

Mutability via the StringBuilder Class

Because String instances are so common in Java programs and are often operated on in high frequencies there is a need to allow performing repeatable operations on text data representations in a way that doesn't require creating a new object instance for every action. This is where the StringBuilder class comes in to save the day.

Creating an instance of StringBuilder is usually done using either the default empty constructor or passing it an initial String instance.

// default empty constructor
var sb = new StringBuilder();

// construction with string literal
var sb2 = new StringBuilder("Java");

Generally the desired end result of using the StringBuilder class is a String instance which is accomplished via the use of the toString method just like any of the other reference types in Java.

sb2.toString(); // "Java"

Mirroring Operations Among String and StringBuilder Classes

For the code examples in the remaining sections I will be using JShell for interactively executing the operations with immediate feedback. If you are unfamilar with JShell take a look at the docs but, to simply follow along with the examples just type jshell in your terminal / command prompt (assuming you have Java version x or superior) and you should be off and running.

$ jshell
|  Welcome to JShell -- Version 12
|  For an introduction type: /help intro

jshell> 

Concatenation

I will start off with what is probably the most common operation when dealing with text data and that is concatenation. Concatenation is where you build a sequence of characters up from two or more other things. Note that I said things because in Java you can combine String data with other non-String based data types.

String Concatenation using the + operator

The common way of providing String based concatenation is with the + operator similar in concept to the traditional add operation where two things are added together. In fact, normal addition can take place during a sequence of concatentation operations if both operands are numeric in type.

jshell> String firstName = "Adam";
firstName ==> "Adam"

jshell> String lastName = "McQuistan";
lastName ==> "McQuistan"

jshell> String fullName = firstName + " " + lastName;
fullName ==> "Adam McQuistan"

1) If one of the operands of the pair undergoing the operation is a String data type then any other type is implicitly converted to a String and concatenation is performed

jshell> 1 + "2"
$2 ==> "12"

jshell> "3" + 4
$3 ==> "34"

jshell> class Person {
   ...>   String firstName;
   ...>   String lastName;
   ...>   Person(String firstName, String lastName) {
   ...>     this.firstName = firstName;
   ...>     this.lastName = lastName;
   ...>   }
   ...>   public String toString() {
   ...>     return firstName + " " + lastName;
   ...>   }
   ...> }
|  created class Person

jshell> var me = new Person("Adam", "McQuistan");
me ==> Adam McQuistan

jshell> "7" + me
$3 ==> "7Adam McQuistan"

2) If both operands are numeric in type then addition occurs

jshell> 2 + 3
$7 ==> 5

3) Evaluate the expression in pairs of operands from left to right

jshell> 1 + 30 + " is my age and my name is " + me
$8 ==> "31 is my age and my name is Adam McQuistan"

Using the String concat method

You can also perform concatenation using the concat method of the Stirng class which returns a new String instance when called allowing for repeated chanined calls.

jshell> String greeting = "Hello";
greeting ==> "Hello"

jshell> greeting.concat(" my name is Adam").concat(" and I'm a software developer");
$10 ==> "Hello my name is Adam and I'm a software developer"

Using the append method of StringBuilder

Concatenation using the StringBuilder class is really more similar to using the concat String method except that the append method mutates the object by appending the new text data to the internal state of the StringBuilder instance. In the JShell interpreter it may appear as though a new instance is being returned but, the StringBuilder methods actually work under a different paradigm of generally always returning an internal reference to it's own instance. This allow for method chaining similar to that of the String class with the major exception that you are operating on the same instance rather and a new one during each method invocation.

jshell> var sb = new StringBuilder("Java");
sb ==> Java

jshell> sb.append(" is a ");
$13 ==> Java is a 

jshell> sb.append(" robust and popular language");
$14 ==> Java is a  robust and popular language

jshell> sb.toString()
$15 ==> "Java is a  robust and popular language"

Length (Number of Characters In Sequence)

Determining the number of characters that represent the text data within the String and StringBuilder class is accomplished through the length() method.

jshell> "Java".length()
$16 ==> 4

jshell> new StringBuilder("Java").length()
$17 ==> 4

Finding the Location of a Character in Text Data

Often times it is desirable to retrieve the index location of an underlying character or sequence of characters from text data and, again in both cases of String and StringBuilder this is accomplished through consistent method with signatures indexOf(String) and lastIndexOf(String)

jshell> var javaStr = "Java";
javaStr ==> "Java"

jshell> var javaSB = new StringBuilder("Java");
javaSB ==> Java

jshell> javaStr.indexOf("a");
$20 ==> 1

jshell> javaSB.indexOf("a");
$21 ==> 1

jshell> javaStr.lastIndexOf("a");
$22 ==> 3

jshell> javaSB.lastIndexOf("a");
$23 ==> 3

jshell> javaStr.indexOf("va");
$24 ==> 2

jshell> javaSB.indexOf("va");
$25 ==> 2

Retrieving a Character from the Text Data

In other situations you may want to retieve a character from the text data based off a positional index. To do this you use the charAt(int) method signature of either String or StringBuilder class.

jshell>  javaStr.charAt(2);
$26 ==> 'v'

jshell> javaSB.charAt(2);
$27 ==> 'v'

Substituting a Character for Another in the Text Data

To swap out one character, or subset of characters, with another character or sequence of characters (aka String) while using instances of the String class you use the replace(char oldChar, char newChar) or replace(CharSequence target, CharSequence replacement). These replace all occurences of either a single char or a CharSequence such as a String. If you instead want to replace just the first occurrence of something then there is the replaceFirst(String target, String replacement) method. Again, I want to remind readers that this will always produce a new String instance leaving the original String instance unchanged.

jshell> "Java Java Java".replace("a", "e");
$29 ==> "Jeve Jeve Jeve"

jshell> "Java Java Java".replace("va", "ve");
$30 ==> "Jave Jave Jave"

jshell> "Java Java Java".replaceFirst("a", "e");
$31 ==> "Jeva Java Java"

When using StringBuilder to swap character data within text you again have a replace method but the signature varies slightly. The StringBuilder method is replace(int start, int end, String replacement) constisting of the start and end indexes along with the content you want to put in place. You can additionally swap out individual characters for a given index position in the sequence of characters of the underlying data structure using setCharAt(int, str). Recall that StringBuilder method calls return a refernce to themselves upon invoking a method which is useful for chaining operations.

jshell> var javaSB = new StringBuilder("Java Java Java");
javaSB ==> Java Java Java

jshell> javaSB.replace(1, 2, "e")
$2 ==> Jeva Java Java

jshell> javaSB = new StringBuilder("Java Java Java")
javaSB ==> Java Java Java

jshell> javaSB.replace(1, 2, "e").replace(6, 7, "e").replace(11, 12, "e");
$4 ==> Jeva Jeva Jeva

jshell> javaSB.setCharAt(3, 'e')

jshell> System.out.println(javaSB)
Jeve Jeva Jeva

Be Careful with Equality

Checking for equality among variables is useful for many data types. Text data is no exception. However, the way equals behaves has long been known to be a tricky subject and the differences between the behavior of String and StringBuilder is certainly a confusing one.

In the case of the String class equals compares the contents of the underlying character sequences to determine equality whereas the StringBuilder relies only on instance equality.

String instances are logically equivalent

jshell> String x = "X";
x ==> "X"

jshell> String x2 = "X";
x2 ==> "X"

jshell> x.equals(x2);
$22 ==> true

Because of the String Pool that I mentioned earlier literal String instances exhibit instance equality also.

jshell> x == x2;
$23 ==> true

In the case of the StringBuilder class the equals method implementation is actually a bit of a problem. This is because it only exhibits instance equality rather than logical equality. Furthermore, StringBuilder does implement the Comparable interface's compareTo method but it is not consistent with the implementation of equals. See my previous article on Comparable and Comparator for specifics on why this last bit is important.

jshell> var sb1 = new StringBuilder(x);
sb1 ==> X

jshell> var sb2 = new StringBuilder(x2);
sb2 ==> X

jshell> sb1.equals(sb2)
$26 ==> false

Parsing Out Part of the Sequence of Chars

A very common manipulation is to extract a portion of the text data and this is accomplished via the substring(int start) or substring(int start, int end) methods for both the String class and the StringBuilder classes.

The substring(int start) signatures parse out the portion of the sequence of characters begining with the start index through the end. The substring(int start, int end) also begins parsing out from the start index up to but not including the end.

jshell> var javaStr = "Java is a stable and performant programming language"
javaStr ==> "Java is a stable and performant programming language"

jshell> var javaSB = new StringBuilder(javaStr);
javaSB ==> Java is a stable and performant programming language

jshell> var endPortion =  javaStr.substring(10);
endPortion ==> "stable and performant programming language"

jshell> var endPortionSB = javaSB.substring(10);
endPortionSB ==> "stable and performant programming language"

jshell> var subStr = javaStr.substring(2, 4)
subStr ==> "va"

jshell> var subSB = javaSB.substring(2,4);
subSB ==> "va"

jshell> javaStr
javaStr ==> "Java is a stable and performant programming language"

jshell> javaSB
javaSB ==> Java is a stable and performant programming language

A Tour of ways to Combine Strings in Java

It is often useful to be able to combine many existing String instances into one resultant String. With Java there are a number of different ways to accomplish this all of which are useful so, I'll leave it up to the reader to choose their favorite. However, I would like to at least demonstrate some of the ways I have accomplished this. For demonstration I will use the same input on all operations.

jshell> import java.util.*;

jshell> var strings = Arrays.asList("My", "name", "is", "Adam");
strings ==> [My, name, is, Adam]

Concatention using the String Class

jshell> String concatStr = "";
concatStr ==> ""

jshell> for (var s : strings) {
   ...>   concatStr += " " + s;
   ...> }

jshell> concatStr
concatStr ==> " My name is Adam"

Appending using the StringBuilder Class

jshell> var sbAppending = new StringBuilder();
sbAppending ==> 

jshell> for (var s : strings) {
   ...>   sbAppending.append(" ").append(s);
   ...> }

jshell> sbAppending.toString()
$12 ==> " My name is Adam"

Using the StringJoiner class

Java has a class specifically for the use case of joining strings named StringJoiner and it is constructed by passing a String to the constructor to specify the value you want to delimit the strings being joined.

jshell> var sj = new StringJoiner(" ");
sj ==> 

jshell> for (var s : strings) {
   ...>   sj.add(s);
   ...> }

jshell> sj.toString();
$15 ==> "My name is Adam"

Using the Stream API

The final way I know of to join strings is to use the Collectors#joining(String) collector on a Stream of String instances. The parameter passed to the joining method represents the delimiter to use to separate the joined parts of the resultant string.

jshell> strings.stream().collect(Collectors.joining(" "));
$16 ==> "My name is Adam"

Learn More About Java with These Resources

Conclusion

In this article I have demonstrated many of the common methods I know of for performing operations on text data in the Java programming language. I've compared and contrasted the Stirng and StringBuilder classes as well as why you might use one over the other as well as pointed out some of the pecularities of these classes. To finish up I have a few code examples of how to join collections (or arrays) of existing String instances.

Share with friends and colleagues

[[ likes ]] likes

Community favorites for Java

theCodingInterface