Detecting feature importance in fast.ai neural networks

I'm working on a new neural network that tries to predict an outcome – true or false – based on 65 different variables in a table.

The tabular model I made with fast.ai is somewhat accurate at making those predictions (it's a small data set of just 5,000 rows). But to me even more interesting is determining which of the 65 features matter most. 

I knew calculating this "feature importance" was possible with random forests, but could I do it with neural nets?

It turns out I can. The trick is, essentially, to try the model without each feature. The degree to which the model gets worse with that feature missing indicates its importance – or lack of importance.

This blog post describes how to run this test, and this adaptation worked perfectly in my fast.ai notebook. Here's the code in a Gist:

Unfortunately, because my project uses internal Quartz analytics, I can't share the data or the charts I'm playing with. But with the code above, I can now "see into" the neural network and get cool insights about what's going on


Converting videos to images for machine learning

This week I kept to my summer of training plan, however the model-building I did was for a Quartz project we're not ready to share. But! I learned something super useful in the process: how to quickly turn videos into many still images.

For our latest project, I'm training a model to identify specific objects available to me – much like how I trained a model to identify items in the office.

The fastest way to get lots of images of an object is to take a video of it. And a quick way to turn that video into images – called an "image sequence" – is ffmpeg. It seems to convert from many formats like .mp4, .mov, .avi to lots different image formats such as .jpg and .png.

There's plenty more detail in the ffmpeg docs, but here's what I did that worked so quickly on my Mac:

brew install ffmpeg

I use Homebrew to put things on my Mac, so this went pretty quickly. I had to update my Xcode command line tools, but Homebrew is super helpful and told me exactly what I needed to do.

Next, I did this from the Terminal:

ffmpeg -i IMG_1019.MOV -r 15 coolname%04d.jpg

Here's what's going on:

  • -i means the next thing is the input file
  • IMG_1019.MOV is the movie I Airdropped from my phone to my laptop
  • -r is the flag for the sample rate.
  • 15 is the rate. I wanted every other image, so 15 frames every second. 1 would be every second; 0.25 every 4th second.
  • coolname is just a prefix I picked for each image
  • %04d means each frame gets a zero-padded sequence number, starting with 0001 and going to 9999– so my image files are named coolname0001.jpg, coolname0002.jpg, coolname0003.jpg, etc.
  • .jpg is the image format I want. If I put .png I got PNGs instead.

In mere moments I had a dozens of JPG files I could use for training. And that's pretty great.

A bot now updates my Slack status

One of my closest collaborators is a teammate far away — I'm in New York and Emily Withrow is in Chicago.

We stay connected chatting on Slack. But recently Emily asked if I could regularly update my Slack status to indicate what I was doing at the moment, like coding, meeting, eating. It's the kind of thing colleagues in New York know just by glancing toward my desk.

Changing my Slack status isn't hard; remembering do it is. So I built a bot to change it for me.

New Kid on the Blockchain

UPDATED at 7:45 pm ET on 9/17/2018 with new information. See the end of the post for details.

It's my time to go crypto.

I've followed blockchain technology, principles and trends for years without getting involved, but now have couple of reasons to get real: A new blockchain-based journalism project is about to launch, and my employer, Quartz, just launched a new cryptocurrency newsletter.

It also seemed perfect for my practice of beginning new things repeatedly.

The inspiration

Earlier this year, friends Manoush Zomorodi and Jen Poyant left their public radio jobs to join a new journalism … thing … called Civil. I had heard snippets about Civil, and started listening to Manoush's and Jen's podcast, ZigZag, part of which attempts to explain it.

After weeks of being pretty confused, I think I get it. Here's my attempt: Civil is a system designed to foster and reward quality journalism in a decentralized way, in contrast to platforms like Facebook and Google upon which so much journalism rests today.

The system’s backbone is the blockchain-based Civil token, abbreviated CVL. Holders of tokens can start news organizations in the system, challenge the membership of other news organizations in the system and/or cast votes when such challenges arise.

I have no idea if it will work. But I’m interested, and I’d rather participate than watch from the sidelines. So I’m willing to give it a whirl and okay with losing a little money in the process.

To participate, I just needed to buy some CVL ... though it turns out there's no just about it. But that's okay, too.

Beginning as a Practice

[I recently presented this post as a 5-minute Ignite talk.]

On a morning flight some years back, the pilot's cheerful voice came over the speakers.

"I'm glad you're flying with us. This is the first time I've flown a Boeing 747,” the captain said with a pause. “Today."

We all laughed, of course. Who’d want to be on a pilot’s maiden flight?!

Not us. We want experts. Society counts on them. Companies pay them better. Spectators watch them play. Vacationers rely on their forecasts. We attend educational institutions and work long hours to become them — the qualified, the trusted, the best.

Nobody likes being a beginner.

Except that I do.

Make Every Week: Programs in Python

“Daddy, I want to learn Python,” announced my 12-year-old daughter a couple of weeks ago. Boys in her youth group know it, she said. She wanted to, too.

Say no more.

I’ve introduced my daughters to a variety of friendly programming platforms, including Kids Ruby, Hopscotch, Codea and Lua in Minecraft. They’ve sweetly tolerated my programatic prodding. This was the first direct request.

I quickly ordered two paper copies of “Learn Python the Hard Way,” by Zed A Shaw, and we’ve been walking through each lesson together — one every week.