Counting Words with Java 8

Sadly, in my day job, I am not yet able to use the awesomeness that is Java 8. However, from time to time, I like to kill a little time solving programming challenges, and I try to use Java 8 for those.

Today’s challenge came from /r/dailyprogrammer on Reddit. It was a pretty straightforward challenge – given a text file, count the number of occurrences for each word. This turns out to be very easy to do with streams!

We need to do the following operations:

  1. Read in all the lines from the file.
  2. Break up each line into words.
  3. Count each occurrence of the word.
  4. Sort the result.
  5. Print it out.

For simplicity’s sake, let’s assume a word is defined as any group of characters separated by whitespace.

Here’s the code:

       Files.lines(Paths.get(args[0]))
            .flatMap(line -> Stream.of(line.split("\\s+")))
            .map(String::toLowerCase)
            .collect(Collectors.toMap(word -> word, word -> 1, Integer::sum))
            .entrySet()
            .stream()
            .sorted((a, b) -> a.getValue() == b.getValue() ? a.getKey().compareTo(b.getKey()) : b.getValue() - a.getValue())
            .forEach(System.out::println);

Let’s break it down.

Files.lines reads a file and returns a Stream of its lines. But we want words, not lines. No problem. Stream.flatMap takes a function, that returns a Stream, to apply to each element. This gives us a mini-stream of words for each line. Then, flatMap flattens all those Streams into one big Stream, containing all the words in the file. In this case, we want to split the line on whitespace to form our words. Then we pass it along to String::toLowerCase so that we’re doing a case-insensitive word count.

Now that we have a Stream of all the words in the file, we can start processing. What we want is a Map<String, Integer> that maps each word to the number of occurrences. Collectors.toMap does this for us. The first argument is a function that should return the key in the map. In this case, the key is just the word, which describes the somewhat pointless looking word -> word. The second argument is a function that returns the value in the map. Here’s where it gets tricky. We’re using the three-argument version of Collectors.toMap, which handles collisions in the value function. The third argument is a function that will combine two colliding values to form a new value.

To sum up the number of occurrences of each word, we start with a value of 1. Here’s what happens. Say the word “cat” appears 3 times in the input file. This call to Collectors.toMap will result in three mappings whose key is “cat”, and whose value is 1. To get the word count, we want to add the three values (of 1 each) in the event of a collision. So we use Integer::sum to do this for us.

The hard part is done, but we still need to sort and print the results. Because collect is a terminal operation, we’ll need a new stream to proceed. Calling stream() on the resulting Map’s keySet will give us the stream we need.

To do the sorting, our comparison function should first check the word counts. If the words have the same count, then they should be sorted in alphabetical order. Otherwise, they should be sorted based on the number of occurrences.

Lastly, we print the sorted stream to the console to get the output.

To summarize, Java 8 lambdas and streams are insanely cool and I hope I get more experience with them soon!

Advertisements

At least spell your job title correctly

I’ve seen several articles floating around lately about resume/interview tips. Most of it is really good advice. Things like:

  • Do research on the company, have a good answer for why you want to work for them.
  • Tailor your resume and cover letter for each company you apply to.
  • Dress for the job you want.
  • Proofread your resume.

That last point is in bold, because I think it’s really important and really under-utilized. In the field of software development, attention to detail isn’t just a good idea. It’s critical for success. And if you can’t catch errors on your resume, what does that say about your attention to detail?

One thing I see misspelled on resumes – a lot – is the job title. I have lost track of how many times I’ve seen a resume or LinkedIn profile for a “Principle Software Engineer”. It’s good to be a principled software engineer, but what you’re looking for here is “Principal.”

Another thing to watch out for is how you spell or capitalize certain terminology. For example, I’ve seen many resumes boasting the applicant’s skills in “JAVA”. Java has never been stylized in all capital letters.

I may be criticized for this post for being too pedantic or picky. That may be. But if I see these errors on your resume, my confidence in your attention to detail will be reduced.

Functional FizzBuzz with Java 8 Streams

As of today, the first release of Java 8 is ready for public use. The most notable new feature is lambda expressions, which finally lets us write functional programs in Java. To illustrate this, let’s look at the classic FizzBuzz problem.

If you aren’t familiar with FizzBuzz, it’s a very simple coding problem. For each number between 1 and 100:

  • If the number is divisible by 3, print “Fizz”
  • If the number is divisible by 5, print “Buzz”
  • If the number is divisible by both 3 and 5, print “FizzBuzz”

Here’s the code:

import java.util.stream.IntStream;

public class FizzBuzz {
    public static void main(String...args) {
        IntStream.range(1, 101)
            .mapToObj(n -> {
                if (n % 15 == 0) return "FizzBuzz";
                else if (n % 3 == 0) return "Fizz";
                else if (n % 5 == 0) return "Buzz";
                else return n;
            }).forEach(System.out::println);
    }
}

The first step is to get a stream of the numbers from 1 to 100. This is done by calling IntStream.range(1, 101). The second argument is exclusive, so to get 1 to 100 we need to specify 101 as the end value.

Once we have a stream, we can do a map operation. Since this is an IntStream, but the desired output is Strings, we can use IntStream‘s mapToObj method to do the conversion. If we used the standard map method, we’d get a compile error because of an incompatible return type.

The mapToObj method expects a lambda expression. It will apply the lambda to each item in the stream (the numbers from 1 to 100), and return a new stream of the mapped values. In this case, the mapping is straightforward. We check if the number is divisible by 3, 5, or 15 (both 3 and 5) and return the appropriate value.

To print the results to the console, we use another new Java 8 feature: method references. After mapping the stream, we want to print each item. For this we’ll use forEach. Instead of a lambda expression, like we used with mapToObj, we’ll pass a method reference to System.out.println. The syntax is System.out::println. Reminds me of C++!

This is a very simple example, but I hope it demonstrates the power of lambda expressions.

For more information, see:

Time for a new adventure

ImageIn this industry, it’s become the norm to change jobs every few years. Gone are the days where you worked for one company your whole career.

I’ve been a software engineer at Dell for just over five years. In that time, I’ve worked on some interesting projects and learned from some very smart people. My favorite project from my time at Dell was a cross-platform mobile application made using Sencha Touch. I’ve always been interested in mobile devices (I was carrying a PDA back in the day!), and this project got me more interested in the mobile space.

Recently, a job opportunity came along to work in the mobile space, at ROAM Data. ROAM is a very cool company that works on mobile point-of-sale technology. I’ll be working on the back end, not the mobile applications themselves, but I’ve always wanted to be a part of the ever-growing mobile industry.

I will miss my colleagues at Dell and enjoyed my time there. In a couple of weeks, though, it will be time to start my next adventure.

Why you should stop playing Candy Crush Saga

Like many of my friends, I got sucked into Candy Crush Saga. It’s a very addicting “match 3” game that I spent too much time playing. I frequently asked friends for extra lives, extra moves, or tickets to access the next level. It was a good game to pass the time when I had a few minutes to kill.

King, the maker of Candy Crush, recently made headlines when they filed for a trademark on the word “Candy”, and started threatening other games with names containing “Candy”. This already was bad enough, but recently something else has come to light that has made King look even worse.

One of the games targeted by King is called Candy Swipe. It’s not a “match 3” game, but it has an appearance very similar to Candy Crush Saga. Just like Candy Crush, the game even exclaims “Sweet!” if the player makes a high-scoring move. Due to the similarity of the names, many people have mistaken Candy Swipe for Candy Crush. The game’s art assets are also strikingly similar:

ImageGiven these facts, one would think that King has a reasonable complaint in this case. There’s just one problem: Candy Swipe was released in 2010, and Candy Crush Saga was released in 2012. Candy Swipe is an independent game, developed by Albert Ransom, in memory of his late mother. He has since been supporting his family with the income he earns from the game. 

King’s audacity in blatantly copying Candy Swipe is bad enough; it is even more insulting for them to turn around and take action against Ransom’s trademark on “Candy Swipe” (which was granted in 2011, a year before Candy Crush was released).

In a further move of hypocrisy, King posted an open letter about intellectual property on their website. This letter states, among other things:

We believe in a thriving game development community, and believe that good game developers – both small and large – have every right to protect the hard work they do and the games they create.

The developer of Candy Swipe responded with an open letter of his own, where he essentially waves the white flag of surrender. A single indie developer can’t hope to match the legal challenge from a company like King. What other choice did he have?

After following this saga closely, I promptly uninstalled Candy Crush Saga from my phone and removed it from my Facebook page. If you care about small businesses and indie developers, you should do the same.

On Technical Intimidation

The software industry can be extremely intimidating. New technologies, libraries, and tools pop up seemingly every day and we’re supposed to keep up with it all. Unless you’re one of those rockstar developers, I’m sure you’ve gone through times where you felt like you were far behind the curve. And it can be downright scary.

The good news, though, is that it’s extremely common. Our industry is hard, and keeping up with everything is simply unrealistic.

I recently came across a lightning talk that Chris Morris, of LivingSocial, gave at RailsConf 2013 on this very topic:

Word Chains and Graphs

Recently, I’ve been working through Dave Thomas’s Code Kata problems in an effort to sharpen my programming skills. The early ones were fairly easy, and I got through them pretty quickly. That changed once I got to Kata 19, Word Chains.

The idea sounds simple enough. The puzzle is to write a program that will transform a starting word to an ending word. One letter is changed at a time, and each word in the chain must be a real word.

I initially coded a seemingly easy solution, but it only worked in a small subset of cases. I didn’t think the problem through enough before jumping to code. In my original implementation, each word in the chain needed to have a letter in the proper position of the ending word. This worked for a case like “cat” to “dog” (cat, cot, cog, dog) but not in a more tricky case such as “the” to “end” (the, tie, die, did, aid, and, end) that had more intermediate steps.

Clearly, I was on the wrong path.

Finally, it occurred to me that I would have to calculate all the possible word transformations, then find the shortest path. Uh-oh. Sounds like a job for a graph. Now, I haven’t looked at graph theory since college. I spent a couple of days researching the topic, even watching some interesting MIT lectures on algorithms (which I highly recommend).

Finally, I set to coding my new solution. After loading the list of words from the dictionary file, a graph is constructed. The starting node is the starting word, then all possible edges are recursively calculated. In a classic case of over-engineering, I then implemented Dijkstra’s algorithm to find the shortest path.

The solution was plenty fast, finding any word chain I could throw at it in well under a second. But as I thought about it more, I realized Dijkstra was a bit of overkill. Part of Dijkstra’s algorithm involves measuring the weights of different edges to find the shortest path. However, in this case, all the edges are equal in weight (a weight of 1).

With all edge weights equal, it was essentially a breadth-first search. Realizing that, I wrote a new implementation to find the shortest path, using a simple breadth-first search.


(Image: Wikipedia)

After getting that working, I found another optimization. Due to the way the breadth-first search traverses the graph, the graph doesn’t need to be generated ahead of time. At each step of the search, when a node (a word) is visited, all the next words need to be generated. This can actually be calculated at each step, building the graph on the fly. This can reduce memory usage, since it will result in a smaller graph, only building what it needs to as it searches the graph.

The key to the breadth-first search is a Map that associates a node with its predecessor in the shortest path. This Map also is used to remember what words have already been visited:

		Map<String, String> previous = new HashMap<>();

		Queue<String> q = new LinkedList<>();
		q.add(fromWord);

		String word = null;
		while (!q.isEmpty() && !(word = q.remove()).equals(toWord)) {
			List<String> nextWords = getNextWords(word);
			for (String nextWord : nextWords) {
				if (!previous.containsKey(nextWord)) {
					previous.put(nextWord, word);
					q.add(nextWord);
				}
			}
		}

When the search is completed, the shortest path can be constructed using the gathered predecessor data. Starting with the ending word, the path is built in reverse by looking up each word’s precedessor. This continues until the starting word is reached:

		List<String> path = new ArrayList<>();
		String word = toWord;
		while (!word.equals(fromWord)) {
			path.add(0, word);
			word = previous.get(word);
		}

		path.add(0, fromWord);

With that, the shortest word chain is calculated. This was a tough challenge for me, but it was interesting to re-learn all that graph theory that I had forgotten.

You can see the full Java solution on my GitHub page here:
https://github.com/joeattardi/codekata/tree/master/wordchains