Tagshadow: Jama, Weka and Scatter Plots
There’s nothing quite like the double edged blade of progress for producing equal parts spectacular failure and insight. I linked to some of my experiments displaying scatter plot book recommendations. Following feedback I’ve iterated over various displays with less and less “graph” feel. You’ll get a look at that when my first dimension reductions see the light.
On that front I’ve made some progress with Jama, which allows me to represent and manipulate matrices in Java. I ran my lovecraft example data through the paces of SVD, QR, LU, and such, but the real test will be when I try and manipulate the significantly larger amazon Tag matrix.
I DID use the larger data set with Weka, a VERY neat tool that exposes a TON of interesting ways to manipulate large sets of data. This is where I had the largest insights and failures. The failures involved quickly discovering the constraints of my memory as I repeatedly crashed the application with too much data. When I found a subset of the data (and an appropriate heap size) it could handle I actually got some of my first plots of Principal Component data. On one hand I’m giddy and on another I realize that I still have a long way to go. Weka’s Knowledge Flow functionality (complete with flashbacks to MatLab in college) allows for some crazy fast prototyping and experimenting.
I’ve got earlier editions of Numerical Recipes: The Art of Scientific Computing and Data Mining: Practical Machine Learning Tools and Techniques on hold at the library, so it should be a fun weekend.
~ by mentatjack on September 11, 2009.