William Brinkert's Blog

Devloper/Designer In Training

Ruby's Enumerable Method #group_by

Easy to use, but powerful

October 31, 2015

Ruby has a multitude of built in, ready to go methods that are flexible and powerful. So many in fact that if you are used to programming in other languages, like Java or C++, it might seem like you are cheating when writing Ruby code because they have done so much of the work for us. The trick sometimes is to find the right method for the job, and a lot of that comes down to your personal programming style and preference.

But you also need to know where you can look to find methods that may be useful for your purposes. That could be a very large topic, so let's just look at some types of objects you might often find yourself using, such as arrays and hashes. Both arrays and hashes are their own distinct classes. Each class has its own set of methods. You can find this information in the Ruby API documentation for arrays here and for hashes here. But both arrays and hashes can also use the module Enumerable and you can find that documentation here. So what is the difference between a Class and Module? Classes are about Objects which can have instances. Modules are more abstract and they provide methods that can be used across multiple Classes. You cannot have an instance of a Module - it is more like a library of goodies that you can use on certain Classes that have an interface with that Module.

Today we are going to look at the method #group_by which can be found in the Module Enumerable. To quote directly from the Ruby 2.2.3 API:

group_by {|obj| block} → a_hash

group_by → an_enumerator

Groups the collection by the result of the block. Returns a hash where the keys are the evaluated result from the block and the values are the arrays of elements in the collection that correspond to the key. If no block is given, an enumerator is returned instead.

And they provide a helpful little example so that you can see what the method does>

(1..6).group_by { |i| i%3 } #=> {0 => [3, 6], 1 => [1, 4], 2 => [2, 5]}

So group_by in this case takes each element in the block and finds the answer to i % 3. For each answer type it gets it builds an array of elements that yield the same answer. The answer value is stored as the Key in the hash and the array of elements which yield the same answer are stored as the corresponding Value. Looking at our returned hash we see that zero occurs when either 3 or 6 are evaluated by i % 3. Each distinct possible answer generated by the evaluation of the expression in the block will result in a new key/value pair being added to the hash.

Now, that example might not seems to jump off the page as particularly exciting or useful. But let's look at other ways that we could use this handy method. It's power is in the ability to take a collection of data and group it in some or any way that might be useful to us. It can be an extremely useful tool if you need to figure out the frequency with which a certain things is happening. Take a look at this code below:

set = []
1000.times {set << rand(100)} #create a random data set
grp = set.group_by { |x| x }
sorted_groups = grp.sort_by { |k, v| k }
sorted_group.each do |k, v|
print "#{k} : "
puts "*" * v.length
end

The output of this program looks something like this:

0 :****************************************************************************
1 :*********************************************************************
2 :*******************************************************************
3 :***************************************************************************
4 :****************************************************************
5 :******************************************************************************
6 :*****************************************************************
7 :***********************************************************************
8 :*****************************************************************
9 :**********************************************************************

The data set here is arbitrary and generated randomly, but the point is that group_by easily groups the data so that we can see the frequency with which certain numbers (or ranges of numbers) are occuring in any given data set. That is a wonderful tool for data analysis! Interesting to note that Ruby's random number generator don't evenly distribute numbers over this many iterations, which is to be expected. But try this code snippet out on your own and you'll find that if you run about a million iterations, the rand operator does almost completely flatten out to about 0.1% variance between values ranging from 0 to 9.

But how about a less mathy/geeky application? Suppose you have a data collection of people and their addresses and you want to separate them into groups depending on what state the live in. Group_by can easily and handily do this for you without having to write loops and conditionals. Check out this last code snippet to see how you could do this with both arrays and hashes.

peeps = [["john", "ca"], ["sam", "ny"], ["sara", "ca"], ["lucy", "ny"]]
people = peeps.to_h
sorted_peeps = peeps.group_by { |e| e[1]}
sorted_people = people.group_by { |k, v| v}
puts sorted_peeps
puts sorted_people

And the output from that would be:

{"ca"=>[["john", "ca"], ["sara", "ca"]], "ny"=>[["sam", "ny"], ["lucy", "ny"]]}
{"ca"=>[["john", "ca"], ["sara", "ca"]], "ny"=>[["sam", "ny"], ["lucy", "ny"]]}

I hope this provides some insight into the power of Ruby's built in methods. Happy hunting!