Counting Words with Java 8

Sadly, in my day job, I am not yet able to use the awesomeness that is Java 8. However, from time to time, I like to kill a little time solving programming challenges, and I try to use Java 8 for those.

Today’s challenge came from /r/dailyprogrammer on Reddit. It was a pretty straightforward challenge – given a text file, count the number of occurrences for each word.┬áThis turns out to be very easy to do with streams!

We need to do the following operations:

  1. Read in all the lines from the file.
  2. Break up each line into words.
  3. Count each occurrence of the word.
  4. Sort the result.
  5. Print it out.

For simplicity’s sake, let’s assume a word is defined as any group of characters separated by whitespace.

Here’s the code:

       Files.lines(Paths.get(args[0]))
            .flatMap(line -> Stream.of(line.split("\\s+")))
            .map(String::toLowerCase)
            .collect(Collectors.toMap(word -> word, word -> 1, Integer::sum))
            .entrySet()
            .stream()
            .sorted((a, b) -> a.getValue() == b.getValue() ? a.getKey().compareTo(b.getKey()) : b.getValue() - a.getValue())
            .forEach(System.out::println);

Let’s break it down.

Files.lines reads a file and returns a Stream of its lines. But we want words, not lines. No problem. Stream.flatMap takes a function, that returns a Stream, to apply to each element. This gives us a mini-stream of words for each line. Then, flatMap flattens all those Streams into one big Stream, containing all the words in the file. In this case, we want to split the line on whitespace to form our words. Then we pass it along to String::toLowerCase so that we’re doing a case-insensitive word count.

Now that we have a Stream of all the words in the file, we can start processing. What we want is a Map<String, Integer> that maps each word to the number of occurrences. Collectors.toMap does this for us. The first argument is a function that should return the key in the map. In this case, the key is just the word, which describes the somewhat pointless looking word -> word. The second argument is a function that returns the value in the map. Here’s where it gets tricky. We’re using the three-argument version of Collectors.toMap, which handles collisions in the value function. The third argument is a function that will combine two colliding values to form a new value.

To sum up the number of occurrences of each word, we start with a value of 1. Here’s what happens. Say the word “cat” appears 3 times in the input file. This call to Collectors.toMap will result in three mappings whose key is “cat”, and whose value is 1. To get the word count, we want to add the three values (of 1 each) in the event of a collision. So we use Integer::sum to do this for us.

The hard part is done, but we still need to sort and print the results. Because collect is a terminal operation, we’ll need a new stream to proceed. Calling stream() on the resulting Map’s keySet will give us the stream we need.

To do the sorting, our comparison function should first check the word counts. If the words have the same count, then they should be sorted in alphabetical order. Otherwise, they should be sorted based on the number of occurrences.

Lastly, we print the sorted stream to the console to get the output.

To summarize, Java 8 lambdas and streams are insanely cool and I hope I get more experience with them soon!

Advertisements

Functional FizzBuzz with Java 8 Streams

As of today, the first release of Java 8 is ready for public use. The most notable new feature is lambda expressions, which finally lets us write functional programs in Java. To illustrate this, let’s look at the classic FizzBuzz problem.

If you aren’t familiar with FizzBuzz, it’s a very simple coding problem. For each number between 1 and 100:

  • If the number is divisible by 3, print “Fizz”
  • If the number is divisible by 5, print “Buzz”
  • If the number is divisible by both 3 and 5, print “FizzBuzz”

Here’s the code:

import java.util.stream.IntStream;

public class FizzBuzz {
    public static void main(String...args) {
        IntStream.range(1, 101)
            .mapToObj(n -> {
                if (n % 15 == 0) return "FizzBuzz";
                else if (n % 3 == 0) return "Fizz";
                else if (n % 5 == 0) return "Buzz";
                else return n;
            }).forEach(System.out::println);
    }
}

The first step is to get a stream of the numbers from 1 to 100. This is done by calling IntStream.range(1, 101). The second argument is exclusive, so to get 1 to 100 we need to specify 101 as the end value.

Once we have a stream, we can do a map operation. Since this is an IntStream, but the desired output is Strings, we can use IntStream‘s mapToObj method to do the conversion. If we used the standard map method, we’d get a compile error because of an incompatible return type.

The mapToObj method expects a lambda expression. It will apply the lambda to each item in the stream (the numbers from 1 to 100), and return a new stream of the mapped values. In this case, the mapping is straightforward. We check if the number is divisible by 3, 5, or 15 (both 3 and 5) and return the appropriate value.

To print the results to the console, we use another new Java 8 feature: method references. After mapping the stream, we want to print each item. For this we’ll use forEach. Instead of a lambda expression, like we used with mapToObj, we’ll pass a method reference to System.out.println. The syntax is System.out::println. Reminds me of C++!

This is a very simple example, but I hope it demonstrates the power of lambda expressions.

For more information, see: