This week the conditions aligned, and I got to use two new (to me) technologies: Vader Sentiment Analysis and Copilot. It was a fortuitous coincidence that these two technologies came together, as their combined power was able to break me out of my comfort zone and still be productive. https://youtu.be/mQEom8ESs8k Wrote a simple sentiment analysis... Continue Reading →
Processing Large Json Files on Command Line
So for one of the projects I am working on I have to deal with large Json files. My first instinct is to use what I know: R, python, bash... But this time I had to use a tool that was new to me, JQ, with great results. JQ one liner processes the data and... Continue Reading →
Parallelizing Bootstrapping At Least…
I understand that there are some workloads that don't lend themselves to parallel processing. Yet bootstrapping is so clearly parallelizable. Why in this day and age, do we have to rely on serial processing of bootstrapping tasks, I don't understand. This became infuriatingly obvious to me when a bootstrapping task I left running on Friday... Continue Reading →
Non-Linear Regressions on Large Scale Data With nlsLoop nls.multstart Package
A brief aside (I am in crunch mode and can't write more). Getting into nls (very exciting) these days and learning quite a bit about it. One issue with fitting these models is the difficulty of finding appropriate starting values. When you are trying to fit one of these models finding starting values is not... Continue Reading →
My Experience with LFE Package and Why Open Source is the Way for Research
I keep talking about using large datasets and how some models take quite long time to run. One of those models is typically two way fixed effects models. While regular econometrics packages like plm are quite good for reasonably large datasets, more and more I find it hard to use with my datasets. Here is... Continue Reading →
3D plots in R with Plotly
Plotly presents a visually pleasing way to create 3D plots in R and Python.
OpenRefine for Data Mangling
I often use data from various sources and sometimes have to get creative about transforming the data into a format that I can use easily. So far I have done this mainly with R (and with Python from time to time). The other day, Dr. Mazzolla pointed to OpenRefine, an application for data clean-up and... Continue Reading →
Writing Better R Code
When I teach R, I always caution the audiences about the quirks of R programming. It is very typical for someone with a background in Java or Python to write code in R that will take forever to execute (like I used to do, and sometimes still do). I warn the students about loops and... Continue Reading →
Google TensorFlow
Google open sourced TensorFlow (TF), a distributed machine learning library, in November. The basic idea is that, you build your ML process into a graph and let TF handle the running and distribution of the work between cores. Be it cores in your CPU or GPU, TF has you covered. The dataflow graph works much... Continue Reading →