Parallelizing Bootstrapping At Least…

I understand that there are some workloads that don't lend themselves to parallel processing. Yet bootstrapping is so clearly parallelizable. Why in this day and age, do we have to rely on serial processing of bootstrapping tasks, I don't understand. This became infuriatingly obvious to me when a bootstrapping task I left running on Friday... Continue Reading →

OpenRefine for Data Mangling

I often use data from various sources and sometimes have to get creative about transforming the data into a format that I can use easily. So far I have done this mainly with R (and with Python from time to time). The other day, Dr. Mazzolla pointed to OpenRefine, an application for data clean-up and... Continue Reading →

Writing Better R Code

When I teach R, I always caution the audiences about the quirks of R programming. It is very typical for someone with a background in Java or Python to write code in R that will take forever to execute (like I used to do, and sometimes still do). I warn the students about loops and... Continue Reading →

Google TensorFlow

Google open sourced TensorFlow (TF), a distributed machine learning library, in November.  The basic idea is that, you build your ML process into a graph and let TF handle the running and distribution of the work between cores. Be it cores in your CPU or GPU, TF has you covered. The dataflow graph works much... Continue Reading →

Create a website or blog at WordPress.com

Up ↑