Tuesday, February 17, 2009

Examples of Bag and MultiKeys

In this installment, I’m going to discuss Java usages tips and tricks for the Apache Commons Collection Bag interface. Honestly, the first time I saw the Apache Commons Collection Bags API, I thought "Well, that isn't very useful." The Bag interface states the intention of the class; it is to count occurrences of objects. Like all the products in the Apache Commons Collections, Predicates, Closures, and Transformers can be plugged into the bag at run time to determine the behavior. I ran some tests, which are at the end of the article. The cost of using the framework is negligible, at about 10 to 20 milliseconds. This of course is compared to a traditional method of sampling counting occurrences and stuffing into a map. You can be the judge of the test, let me know if you think the test is accurate.

So, what kind of problem justifies adding the complexity of Bags to a project? Joining the magic of a Transformer, MultiKey, and a decorator you can produce a dynamic solution for counting groups of occurrences of properties of beans in a collection. This would be suitable for a generic controller responsible for counting groups of properties. It would be very helpful in a reporting system. Let’s take a look at the sample and walk through the code.

The Code:

package com.blogspot.apachecommonstipsandtricks.bags;
import java.util.*;
import java.net.*;
import java.io.*;
import java.lang.reflect.*;
import org.apache.commons.collections.*;
import org.apache.commons.collections.comparators.*;
import org.apache.commons.collections.keyvalue.*;
import org.apache.commons.collections.bag.*;
import org.apache.commons.collections.bag.HashBag;
import org.apache.commons.lang.*;
import org.apache.commons.beanutils.*;
import com.blogspot.apachecommonstipsandtricks.*;
public class MagicBagOfTricks
{
public static void main(String[] args) throws IOException, InvocationTargetException, NoSuchMethodException, IllegalAccessException
{
List<LaborForce> list = new ArrayList<LaborForce>();
URL url = new URL("http", "1796193846474123283-a-1802744773732722657-s-sites.googlegroups.com", 80, "/site/psenger/Home/data.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
if (((str = in.readLine()) != null))
{
do
{
String[] strings = str.split("\t");
strings = StringUtils.stripAll(strings);

State state = State.valueOf(StringUtils.trim(strings[0]));
Gender gender = "Male".equals(StringUtils.trim(strings[1])) ? Gender.Male : Gender.Female;
Integer year = Integer.valueOf(StringUtils.trim(strings[2]));

list.add(new LaborForce(state, gender, year));
} while ((str = in.readLine()) != null);
}
in.close();

Map map = PropertyUtils.describe(new LaborForce());
Set s = map.keySet();
s.remove("class");
System.out.println("All the possible properties on the LaborForce bean are [" + StringUtils.join(s.toArray(),",") + "]");

Bag masterBag = new HashBag();
Bag genderBag = TransformedBag.decorate(masterBag, new PropertiesMultiKeyTransformer( new String[]{ "gender" } ) );
Bag genderYearBag = TransformedBag.decorate(masterBag, new PropertiesMultiKeyTransformer( new String[]{ "gender", "year" } ) );
Bag genderYearStateBag = TransformedBag.decorate(masterBag, new PropertiesMultiKeyTransformer( new String[]{ "gender", "year" , "state" } ) );

genderBag.addAll( list );
genderYearBag.addAll( list );
genderYearStateBag.addAll( list );

Comparator comparator = ComparatorUtils.chainedComparator(new Comparator[]{new MultiKeyCompartor(0),new MultiKeyCompartor(1),new MultiKeyCompartor(2)});
Set<MultiKey> set = new TreeSet<MultiKey>(comparator);
set.addAll(masterBag.uniqueSet());
for (MultiKey multiKey : set)
{
System.out.println( "[" +StringUtils.join(multiKey.getKeys(),',') + "] = " + masterBag.getCount(multiKey));
}
}
private static class PropertiesMultiKeyTransformer implements Transformer
{
String[] methodNames;
private PropertiesMultiKeyTransformer(String[] methodNames)
{
this.methodNames = methodNames;
}
public Object transform(Object o)
{
List<Object> ooos = new ArrayList<Object>();
for (String methodName : methodNames)
{
try
{
ooos.add(PropertyUtils.getProperty(o, methodName));
}
catch (Exception e)
{
throw new FunctorException(e);
}
}
return new MultiKey(ooos.toArray(new Object[ooos.size()]));
}
}
private static class MultiKeyCompartor implements Comparator<MultiKey>
{
private int i;
private MultiKeyCompartor(int i)
{
this.i = i;
}
public int compare(MultiKey o1, MultiKey o2)
{
Object[] keys1 = o1.getKeys();
Object[] keys2 = o2.getKeys();
Object oo1 = null;
try
{
oo1 = keys1[i];
}
catch (ArrayIndexOutOfBoundsException e)
{
}
Object oo2 = null;
try
{
oo2 = keys2[i];
}
catch (ArrayIndexOutOfBoundsException e)
{
}
NullComparator nullComparator = new NullComparator(false);
return nullComparator.compare(oo1, oo2);
}
}
}


Between lines 18 and 36 of the MagicBagOfTricks class, we fetch the sample data. The data is in the format of StatetabGendertabyear. If you want to download the data, you can get it here.


List<LaborForce> list = new ArrayList<LaborForce>();
URL url = new URL("http", "1796193846474123283-a-1802744773732722657-s-sites.googlegroups.com", 80, "/site/psenger/Home/data.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
if (((str = in.readLine()) != null))
{
do
{
String[] strings = str.split("\t");
strings = StringUtils.stripAll(strings);

State state = State.valueOf(StringUtils.trim(strings[0]));
Gender gender = "Male".equals(StringUtils.trim(strings[1])) ? Gender.Male : Gender.Female;
Integer year = Integer.valueOf(StringUtils.trim(strings[2]));

list.add(new LaborForce(state, gender, year));
} while ((str = in.readLine()) != null);
}
in.close();


The purpose of code on line 38 through 41 is to just print the properties of the bean, LaborForce, minus the class method. It is important to see how the PropertyUtils describes the bean, because we will be accessing it later via the same API.


Map map = PropertyUtils.describe(new LaborForce());
Set s = map.keySet();
s.remove("class");
System.out.println("All the possible properties on the LaborForce bean are [" + StringUtils.join(s.toArray(),",") + "]");


On line 43, we create a single HashBag object. I will refer to this HashBag object as the masterBag. A HashBag is an implementation of Bag with the characteristics of a HashSet. When objects are stored in the HashBag, the inserted object’s hash code is assessed, just like a HashSet, and ensures the uniqueness of the object in the Bag as it is stored. You might think the count would be 1 no mater how many times you insert the object, but in fact, the bag counts the inserted and removed occurrences. Even more confusing is the fact that there is an iterator method that returns all multiple occurrences and a uniqueSet method that pulls a set of objects. Line 44 through 46, is where the magic happens. We create three decorated Bags, with TransformedBag.decorate. All of them with variations of the PropertiesMultiKeyTransformer and backed by a single HashBag, the masterBag from line 43. When an object is inserted into one of the decorated HashBags, it is transformed by PropertiesMultiKeyTransformer and the results are physically placed in the masterBag. PropertiesMultiKeyTransformer uses the PropertyUtils to pull all the given properties off the bean to create a MultiKey object. It’s important that the we always put the same property in each object index in the array of objects of the MultiKey. If we didn’t, we would get an exception. Furthermore, we can’t transform the object into an array of objects because HashCode doesn’t exist on an Array of Objects. This becomes a problem when we insert Objects into the bag. The MultiKey is a perfect fit for this problem and that is the reason for its use. This class is effectively a wrapper class.


Bag masterBag = new HashBag();
Bag genderBag = TransformedBag.decorate(masterBag, new PropertiesMultiKeyTransformer( new String[]{ "gender" } ) );
Bag genderYearBag = TransformedBag.decorate(masterBag, new PropertiesMultiKeyTransformer( new String[]{ "gender", "year" } ) );
Bag genderYearStateBag = TransformedBag.decorate(masterBag, new PropertiesMultiKeyTransformer( new String[]{ "gender", "year" , "state" } ) );


On line 48 through 50 we add the object to the decorated transformed bags, which are all backed by the masterBag.


genderBag.addAll( list );
genderYearBag.addAll( list );
genderYearStateBag.addAll( list );


On line 52 through 58 I ask the masterBag for a set, load the results in a TreeSet that has a chained Transformer. The results are a sorted set of MultiKeys with counts of the occurrences of the vectors in the list.


Comparator comparator = ComparatorUtils.chainedComparator(new Comparator[]{new MultiKeyCompartor(0),new MultiKeyCompartor(1),new MultiKeyCompartor(2)});
Set<MultiKey> set = new TreeSet<MultiKey>(comparator);
set.addAll(masterBag.uniqueSet());
for (MultiKey multiKey : set)
{
System.out.println( "[" +StringUtils.join(multiKey.getKeys(),',') + "] = " + masterBag.getCount(multiKey));
}


The inner class PropertiesMultiKeyTransformer is a Transformer constructed with an array of Strings that represent the method names to use with PropertyUtils.getProperty on the beans in the list. It dynamically pulls the given properties from the bean and loads them into the new MultiKey. There is a limit to the number of Objects a MultiKey can accept, but for this demo, we are ok. In addition, the Objects in the MultiKey have to be consistently in the same order for each comparator. We are using a ComparatorUtils.chainedComparator of MultiKeyCompartor which will throw a ClassCastException if the objects are the wrong order.


private static class PropertiesMultiKeyTransformer implements Transformer
{
String[] methodNames;
private PropertiesMultiKeyTransformer(String[] methodNames)
{
this.methodNames = methodNames;
}
public Object transform(Object o)
{
List<Object> ooos = new ArrayList<Object>();
for (String methodName : methodNames)
{
try
{
ooos.add(PropertyUtils.getProperty(o, methodName));
}
catch (Exception e)
{
throw new FunctorException(e);
}
}
return new MultiKey(ooos.toArray(new Object[ooos.size()]));
}
}
private static class MultiKeyCompartor implements Comparator<MultiKey>
{
private int i;
private MultiKeyCompartor(int i)
{
this.i = i;
}
public int compare(MultiKey o1, MultiKey o2)
{
Object[] keys1 = o1.getKeys();
Object[] keys2 = o2.getKeys();
Object oo1 = null;
try
{
oo1 = keys1[i];
}
catch (ArrayIndexOutOfBoundsException e)
{
}
Object oo2 = null;
try
{
oo2 = keys2[i];
}
catch (ArrayIndexOutOfBoundsException e)
{
}
NullComparator nullComparator = new NullComparator(false);
return nullComparator.compare(oo1, oo2);
}
}


The code for the data transfer object used within this project is called the LaborForce object. See the following sample for the object. You can find State and Gender in some of the other projects used in this blog.

The Code:

package com.blogspot.apachecommonstipsandtricks.bags;
import com.blogspot.apachecommonstipsandtricks.*;
public class LaborForce
{
private State state;
private Gender gender;
private int year;
public LaborForce()
{
}
public LaborForce(State state, Gender gender, int year)
{
this.state = state;
this.gender = gender;
this.year = year;
}
public State getState()
{
return state;
}
public void setState(State state)
{
this.state = state;
}
public Gender getGender()
{
return gender;
}
public void setGender(Gender gender)
{
this.gender = gender;
}
public int getYear()
{
return year;
}
public void setYear(int year)
{
this.year = year;
}
}


The results:

All the possible properties on the LaborForce bean are [state,gender,year]
[Male] = 7842
[Male,1950] = 134
[Male,1950,AL] = 1
[Male,1950,AK] = 6
[Male,1950,AZ] = 2
[Male,1950,CA] = 1
[Male,1950,CO] = 4
[Male,1950,CT] = 3
[Male,1950,DE] = 3
[Male,1950,DC] = 1
[Male,1950,FL] = 2
[Male,1950,GA] = 2
[Male,1950,HI] = 4
..
..
[Female,2008,SD] = 3
[Female,2008,TN] = 2
[Female,2008,TX] = 6
[Female,2008,UT] = 2
[Female,2008,VT] = 1
[Female,2008,VA] = 2
[Female,2008,WA] = 3
[Female,2008,WI] = 3


Performance Test


The test I ran was very simple. I wrapped a timer around parts of the code that were unique to the process. And the results of using the Bag are small, compared to a traditional method.

The code:

package com.blogspot.apachecommonstipsandtricks.bags;
import java.util.*;
import java.text.*;
import java.net.*;
import java.io.*;
import org.apache.commons.collections.*;
import org.apache.commons.collections.bag.*;
import org.apache.commons.collections.bag.HashBag;
import org.apache.commons.lang.time.*;
import org.apache.commons.lang.*;
import com.blogspot.apachecommonstipsandtricks.*;
public class BagStressTest
{
private static final int NumberOfTimesToRun = 1000;
private static final int MasterListSize = 10;

public static void main(String[] args) throws IOException
{
int i = 1;
SimpleDateFormat sdf = new SimpleDateFormat("ss.SSS");
StopWatch sw;
Bag bag = null;
Map<State, Integer> map = null;
long sumOfTime = 0;

List<LaborForce> list = new ArrayList<LaborForce>();
URL url = new URL("http", "1796193846474123283-a-1802744773732722657-s-sites.googlegroups.com", 80, "/site/psenger/Home/data.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
if (((str = in.readLine()) != null))
{
do
{
String[] strings = str.split("\t");
strings = StringUtils.stripAll(strings);

State state = State.valueOf(StringUtils.trim(strings[0]));
Gender gender = "Male".equals(StringUtils.trim(strings[1])) ? Gender.Male : Gender.Female;
Integer year = Integer.valueOf(StringUtils.trim(strings[2]));

list.add(new LaborForce(state, gender, year));
} while ((str = in.readLine()) != null);
}
in.close();

List<LaborForce> listOfSampleData = new ArrayList<LaborForce>();
for (int j = 0; j < MasterListSize; j++)
{
listOfSampleData.addAll( list );
}
System.out.println("The sample listOfSampleData length is " + listOfSampleData.size());
System.out.println(" ---------- Start Test ---------- ");
sumOfTime = 0;
sw = new StopWatch();
for (int j = 0; j < NumberOfTimesToRun; j++)
{
sw.start();
bag = TransformedBag.decorate(new HashBag(), new Transformer()
{
public Object transform(Object o)
{
LaborForce dto = (LaborForce) o;
return dto.getState();
}
});
bag.addAll(listOfSampleData);
sw.stop();
sumOfTime += sw.getTime();
sw.reset();
}
System.out.println("With a Bag, Average Time in seconds and milliseconds is " + sdf.format( new Date( sumOfTime / NumberOfTimesToRun ) ) );
System.out.println(" ---------- End Test ---------- ");

// Alternative Way.
System.out.println(" ---------- Start Test ---------- ");
sumOfTime = 0;
sw.reset();
for (int j = 0; j < NumberOfTimesToRun; j++)
{
sw.start();
map = new HashMap<State, Integer>();
for (LaborForce lbf : listOfSampleData)
{
Integer count = map.get(lbf.getState());
if (null == count)
{
count = 0;
}
count++;
map.put(lbf.getState(), count);
}
sw.stop();
sumOfTime += sw.getTime();
sw.reset();
}
System.out.println("Traditional counting, Average Time in seconds and milliseconds is " + sdf.format( new Date( sumOfTime / NumberOfTimesToRun ) ) );
System.out.println(" ---------- End Test ---------- ");
}
}


The results:

The sample listOfSampleData length is 156000
---------- Start Test ----------
With a Bag, Average Time in seconds and milliseconds is 00.032
---------- End Test ----------
---------- Start Test ----------
Traditional counting, Average Time in seconds and milliseconds is 00.026
---------- End Test ----------


Thanks for reading this article, I look forward to any questions or feedback.

Author: Philip A Senger

1 comment:

  1. Wow, I’ve been using this api for years, and I cant tell you how many times I could have used this. Thanks for sharing this with us.

    ReplyDelete