Converting videos to images for machine learning

This week I kept to my summer of training plan, however the model-building I did was for a Quartz project we're not ready to share. But! I learned something super useful in the process: how to quickly turn videos into many still images.

For our latest project, I'm training a model to identify specific objects available to me – much like how I trained a model to identify items in the office.

The fastest way to get lots of images of an object is to take a video of it. And a quick way to turn that video into images – called an "image sequence" – is ffmpeg. It seems to convert from many formats like .mp4, .mov, .avi to lots different image formats such as .jpg and .png.

There's plenty more detail in the ffmpeg docs, but here's what I did that worked so quickly on my Mac:

brew install ffmpeg

I use Homebrew to put things on my Mac, so this went pretty quickly. I had to update my Xcode command line tools, but Homebrew is super helpful and told me exactly what I needed to do.

Next, I did this from the Terminal:

ffmpeg -i IMG_1019.MOV -r 15 coolname%04d.jpg

Here's what's going on:

  • -i means the next thing is the input file
  • IMG_1019.MOV is the movie I Airdropped from my phone to my laptop
  • -r is the flag for the sample rate.
  • 15 is the rate. I wanted every other image, so 15 frames every second. 1 would be every second; 0.25 every 4th second.
  • coolname is just a prefix I picked for each image
  • %04d means each frame gets a zero-padded sequence number, starting with 0001 and going to 9999– so my image files are named coolname0001.jpg, coolname0002.jpg, coolname0003.jpg, etc.
  • .jpg is the image format I want. If I put .png I got PNGs instead.

In mere moments I had a dozens of JPG files I could use for training. And that's pretty great.

Artisanal AI: Detecting objects in our office

Off-the-shelf services like the Google Vision are trained to identify objects in general, like car, vehicle, and road in the image below.

But many of the journalism projects we're encountering in the Quartz AI Studio benefit from custom-built models that identify very specific items. I recently heard Meredith Broussard call this kind of work "artisanal AI," which cracked me up and also fits nicely.

So as an experiment, and as part of my summer training program, I trained an artisanal model to identify between the three objects at the top of this page from the Quartz offices: A Bevi water dispenser, a coffee urn, and a Quartz Creative arcade game (don't you wish you had one of those?!)

I also made a little website where my colleagues and I can test the model. You can, too — though you'll have to come visit to get the best experience!

The results

The model is 100% accurate at identifying the images I fed it — which probably is not all that surprising. It's based on an existing model called resnet34, which was trained on the ImageNet data set to distinguish between thousands of things. Using a technique called transfer learning, I taught that base model to use all of its existing power to distinguish between just three objects.

Making music with my arms

The brilliant Imogen Heap performed in New York a few weeks ago, and I got to experience live how she crafts sounds with her arms and hands

It was a great night of beautiful music and technology, both.

One mystery I couldn't solve from the audience was how her computer detected the position of her arms. Unlike in her early videos, I didn't see something akin to a Kinect on stage.

Now I think maybe I know.

That's because this week I took a workshop from Hannah Davis on using the ml5.js coding library, which touts itself as "friendly machine learning for the web," letting me use machine learning models in a browser. The class was part of the art+tech Eyeo Festival in Minneapolis.

One of the models Davis demonstrated was PoseNet (also here), which estimates the position of various body parts — elbows, wrists, knees, etc — in an image or video. I'd never seen PoseNet work before, let alone in JavaScript and in a browser.


Inspired by Heap, I set out to quickly code a music controller based on my arm movements, as seen by PoseNet through my laptop camera.

Try it yourself

It's pretty rough, but you can try it here. Just let the site use your camera, toggle the sound on, and try controlling the pitch by moving your right hand up and down in the camera frame!

I put it on Glitch, which means you can remix it. Or take a peek at the code on Github.

There are lots more ml5.js examples you can try. Just put the index.html, script.js, and models (if there's such a folder) someplace on the web where the files can be hosted. Or put them on your local machine and run a simple "localhost" server.

My summer training program

This summer is all about training. Yes, I'm trying to run regularly, but I'm actually talking about training machine-learning algorithms.

I've been trying to learn machine learning for about three years — only to feel hopelessly overwhelmed. It was as though someone said, "With a chicken, a cow, and a field of wheat, you can make a lovely soufflé!"  

I took online classes, read books, and tried to modify sample code. But unless I devoted myself to the computer version of animal husbandry, it seemed, I was stuck.

Then someone at work mentioned fast.ai. It's a machine-learning library for Python that got me to the eggs-milk-flour stage, and provided some great starter recipes. Thanks to free guides and videos, I was soon baking algorithms that actually worked.

Now I want to get good, and experiment with different flavors and styles.

So this summer, I'm setting out to train and use new machine learning models, at least one each week. I'll try several techniques, use different kinds of data, and solve a variety of problems. It's a little like my Make Every Week project, providing constraints to inspire and motivate me.

I'll share what I learn, both here and at qz.ai where the Quartz AI Studio is helping journalists use machine learning, and I get to practice machine learning at work. 

In the fall I'll be teaching a few workshops and classes that will incorporate, I hope, some of the things I've learned this summer. If you'd like to hear about those once they're announced, drop your email address into the signup box on this page and I'll keep you posted.

Time to train!

You can't make your NY Times subscription online-only online

Our family believes in paying for good journalism, so we have a few subscriptions – including the New York Times.

When we signed up, we got online access along with physical papers delivered on the weekend. But we almost never read the paper version anymore, and thought it a waste. So today I went online to change my subscription to all-digital.

But you can't.

You must actually call the New York Times and speak to someone. I had to call two phone numbers, speak to two robots, and two people. All together, it took me 15 minutes. Not forever, but the user experience was a C-minus at best.

Here's what I did:

A bot now updates my Slack status

One of my closest collaborators is a teammate far away — I'm in New York and Emily Withrow is in Chicago.

We stay connected chatting on Slack. But recently Emily asked if I could regularly update my Slack status to indicate what I was doing at the moment, like coding, meeting, eating. It's the kind of thing colleagues in New York know just by glancing toward my desk.

Changing my Slack status isn't hard; remembering do it is. So I built a bot to change it for me.

New Kid on the Blockchain

UPDATED at 7:45 pm ET on 9/17/2018 with new information. See the end of the post for details.

It's my time to go crypto.

I've followed blockchain technology, principles and trends for years without getting involved, but now have couple of reasons to get real: A new blockchain-based journalism project is about to launch, and my employer, Quartz, just launched a new cryptocurrency newsletter.

It also seemed perfect for my practice of beginning new things repeatedly.

The inspiration

Earlier this year, friends Manoush Zomorodi and Jen Poyant left their public radio jobs to join a new journalism … thing … called Civil. I had heard snippets about Civil, and started listening to Manoush's and Jen's podcast, ZigZag, part of which attempts to explain it.

After weeks of being pretty confused, I think I get it. Here's my attempt: Civil is a system designed to foster and reward quality journalism in a decentralized way, in contrast to platforms like Facebook and Google upon which so much journalism rests today.

The system’s backbone is the blockchain-based Civil token, abbreviated CVL. Holders of tokens can start news organizations in the system, challenge the membership of other news organizations in the system and/or cast votes when such challenges arise.

I have no idea if it will work. But I’m interested, and I’d rather participate than watch from the sidelines. So I’m willing to give it a whirl and okay with losing a little money in the process.

To participate, I just needed to buy some CVL ... though it turns out there's no just about it. But that's okay, too.

Beginning as a Practice

[I recently presented this post as a 5-minute Ignite talk.]

On a morning flight some years back, the pilot's cheerful voice came over the speakers.

"I'm glad you're flying with us. This is the first time I've flown a Boeing 747,” the captain said with a pause. “Today."

We all laughed, of course. Who’d want to be on a pilot’s maiden flight?!

Not us. We want experts. Society counts on them. Companies pay them better. Spectators watch them play. Vacationers rely on their forecasts. We attend educational institutions and work long hours to become them — the qualified, the trusted, the best.

Nobody likes being a beginner.

Except that I do.