Ways to count words in string using multiple delimiters

1. Overview

We all come across some business scenarios where we have to determine the number of words present in a string.
In this article, we will be using a set of delimiters (the character that breaks words in a string) like – space, comma, exclamation, question mark, full stop, etc. and, to be specific:,-@;|!:[](){}_*#%^.~ to count the words in a given string, firstly, by using Core Java and secondly, by using external libraries.

2. Count Words – Using Core Java

Let’s take a string that has extra spaces like multiple spaces between words, leading and trailing space.

String stringInput= " I  like   Cookies, Cake and Icecream but not chocolates! ";

2.1. Split

The split() method is available by default in java.lang.String and, we don’t need an additional library to use this method. The split method takes regex as an argument and returns the array of strings divided by the delimiter.

So, in our example,  to get the words from a String stringInput, firstly, we’ll trim() it, to remove all leading and trailing spaces and then, pass multiple delimiters in the split() method of String class  and secondly, applying length function will give us words count :

    static int countUsingSplit(String stringInput) {
        String delimiter = "[,:?.;~!@_|\\(\\)\\{\\}\\[|\\]|\\s]+";
        String stringTocheck = stringInput.trim();
        int counter = stringTocheck.split(delimiter).length;
        return counter;
    }

Here, the below test pass as the word mother ‘s is consider as one word.

    @Test
    public void testCountUsingSplit() {
        String stringTocheck = " I  like    my Mother's Cookies, Cake and Icecream but not chocolates! ";
        int count = WordCount.countUsingSplit(stringTocheck);
        assertEquals(11, count);
    }

Split() is flexible because we can tokenize and get the result in a single line.

2.2. StringTokenizer

Like split(), the StringTokenizer class is also by default available in java.lang.String.

Since the default delimiter in StringTokenizer are /t/r/f/n, so as soon as we pass a string in it, String gets converted into tokens. Further, the countToken() returns the number of words in a String.

    static int countUsingStringTokenizer(String stringTocheck) {
        String delimiter = " ',-@;|!:[](){}_*#%^~.";
        StringTokenizer tokenizer = new StringTokenizer(stringTocheck, delimiter);
        int counter = tokenizer.countTokens();
        return counter;
    }

In the above function, we have specified apostrophe in the set of delimiter, so now the word count becomes 12.

    @Test
    public void testCountUsingStringTokenizer() {
        String stringTocheck = " I  like    my Mother's Cookies, Cake and Icecream but not chocolates! ";
        int count = WordCount.countUsingStringTokenizer(stringTocheck);
        assertEquals(12, count);
    }

Here, the leading and trailing space will be removed implicitly by the StringTokenizer.

StringTokenizer is a legacy class and its use in new code is discouraged, but this prepares us for some of the other techniques which use this method of delimiting.

3. Count Words – Using External Library

Now, we will try to count the words by using some external libraries.

3.1. Spring StringUtils

We need to include a spring-core dependency to use the StringUtils class.

In our example, we are using version 5.1, but to get the latest version for this library, click on the link org.springframework :

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>5.1.8.RELEASE</version>
        </dependency>

So, if our project is already in Spring, then it’s worth using this utility.

The tokenizeToStringArray() in StringUtils takes String and delimiter as input. It tokenize the given stringTocheck into String array with the help of StringTokenizer. Further, applying the length function will give us the word count.

    static int countUsingSpringStringUtils(String stringTocheck) {
        String delimiter = " ',-@;|!:[](){}_*#%^~.";
        String[] counterStr = org.springframework.util.StringUtils.tokenizeToStringArray(stringTocheck, delimiter);
        int counter = counterStr.length;
        return counter;
    }

Let’s test it now:

    @Test
    public void testCountUsingSpringStringUtils() {
    String stringTocheck = " I  like    my Mother's Cookies, Cake and Icecream but not chocolates! ";
    int count = WordCount.countUsingSpringStringUtils(stringTocheck);
    assertEquals(12, count);
    }

3.2. Apache StringUtils

To use this utility, we need to first include its dependency in maven.

In our example, we are using version 3.8, but to find the latest version for this library, please click on the link  org.apache.commons :

<dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.8.1</version>
        </dependency>

The split() of Apache StringUtils, splits the provided text into an array using specified delimiter. This is an alternative to using StringTokenizer. Further, applying the length function will give us the word count.

    static int countUsingApacheStringUtils(String stringTocheck) {
        String delimiter = " ',-@;|!:[](){}_*#%^~.";
        int counter = StringUtils.split(stringTocheck, delimiter).length;
        return counter;
    }

Lets test it:

    @Test
    public void testCountUsingApacheStringUtils() {
        String stringTocheck = " I  like    my Mother's Cookies, Cake and Icecream but not chocolates! ";
        int count = WordCount.countUsingApacheStringUtils(stringTocheck);        
        assertEquals(12, count);
    }

Here, the above test pass as our example returns 12 as count.

Now if we change the stringTocheck to “the[[[end”, our method should return 2 as word count and below method should pass:

    @Test
    public void testCountUsingApacheStringUtils1() {
        String stringTocheck = " the[[[end  ";
        int count = WordCount.countUsingApacheStringUtils(stringTocheck);
        assertEquals(2, count);
    }

4. Conclusion

To summarize, in this article, we have shown a variety of ways to count the words in a string using multiple delimiters. Now, if we look performance-wise, String.split is a bit slow, as it recompiles the regular expression every time and StringUtils.split is faster. Hence, if we want to tokenize millions of strings, then it is worth using apache commons StringUtils.

The source code is available here Github.

38 thoughts on “Ways to count words in string using multiple delimiters”

  1. Thanks designed for sharing such a pleasant idea, post is fastidious, thats why i
    have read it completely

  2. We’re a group of volunteers and opening
    a new scheme in our community. Your website
    provided us with valuable info to work on. You have done an impressive job and our entire community will be thankful to you.

  3. Thank you for sharing your thoughts. I really appreciate your efforts and I am waiting for your further post thanks
    once again.

  4. Hi there it’s me, I am also visiting this web site daily,
    this website is in fact pleasant and the people are actually sharing fastidious thoughts.

  5. I all the time emailed this webpage post page to all my friends,
    because if like to read it next my friends will too.

  6. Like!! Really appreciate you sharing this blog post.Really thank you! Keep writing.

  7. Hey there, I think your website might be having browser compatibility issues.
    When I look at your blog site in Chrome, it looks fine but when opening in Internet Explorer, it has some overlapping.
    I just wanted to give you a quick heads up! Other then that, great blog!

    1. Hi there,
      Thanks for sharing the compatibility issue but I checked on Edge its working fine.So can you please tell me on which version of IE are you checking?

  8. Greate article. Keep posting such kind of info on your
    site. Im really impressed by your blog.
    Hey there, You’ve performed a great job. I will
    certainly digg it and individually suggest to my friends.
    I’m confident they will be benefited from this site.

  9. Unquestionably imagine that which you said.
    Your favorite justification seemed to be at the net the easiest factor to consider of.
    I say to you, I definitely get annoyed at the same time as
    people think about worries that they just do not recognise about.

    You managed to hit the nail upon the highest and also defined out
    the whole thing with no need side-effects , other people can take a signal.
    Will probably be back to get more. Thank you adreamoftrains website
    hosting companies

  10. There’s definately a lot to find out about this subject.
    I really like all of the points you have made.

  11. It’s awesome to visit this site and reading the views of all
    friends regarding this article, while I am also
    eager of getting knowledge.

  12. I used to be suggested this web site by my cousin. I’m
    now not sure whether this put up is written by him as no
    one else understand such distinct about my difficulty.
    You’re wonderful! Thank you!

  13. This is the proper blog for anybody who wants to find out about this topic. You realize so much its nearly laborious to argue with you (not that I actually would want匟aHa). You definitely put a new spin on a topic thats been written about for years. Great stuff, simply nice!

  14. You should participate in a contest for among the best blogs on the web. I will advocate this site!

  15. Hey! I simply want to give an enormous thumbs up for the good data you have here on this post. I might be coming again to your weblog for more soon.

  16. Hi, i read your blog occasionally and i own a similar one and i was just
    wondering if you get a lot of spam remarks? If so how do you protect against
    it, any plugin or anything you can recommend?
    I get so much lately it’s driving me insane so any help is very much appreciated.

    1. Hi, yes I do get lots of spams…I’ll definitely let you know once I’ll have a working solution for it 🙂

  17. I’m gone to inform my little brother, that he should also visit this web site on regular basis to take updated from newest reports.

  18. Hi there, I read your blog like every week. Your humoristic style is witty, keep doing
    what you’re doing!

  19. Right here is the right webpage for anyone who
    really wants to understand this topic. You know so much
    its almost tough to argue with you (not that I
    personally will need to…HaHa). You definitely put a new spin on a topic which has been written about for ages.

    Great stuff, just wonderful!

  20. We are a group of volunteers and starting a new scheme in our community.
    Your website provided us with valuable info to work on. You’ve done an impressive job and our entire community will be thankful
    to you.

  21. Thanks for another magnificent article. The place else
    could anybody get that kind of info in such a perfect means of
    writing? I have a presentation next week, and I am at the look
    for such info.

  22. Hi there. I found your website by the use of Google at the same time as searching for a similar topic, your website came up. It appears to be great. I have bookmarked it in my google bookmarks to visit then. Emili Abrahan Encratia

  23. Great post. I used to be checking continuously this weblog and I’m impressed! Very helpful info specifically the last part 🙂 I handle such information a lot. I used to be looking for this certain info for a very long time. Thank you and best of luck.

Comments are closed.