Converting videos to images for machine learning

This week I kept to my summer of training plan, however the model-building I did was for a Quartz project we're not ready to share. But! I learned something super useful in the process: how to quickly turn videos into many still images.

For our latest project, I'm training a model to identify specific objects available to me – much like how I trained a model to identify items in the office.

The fastest way to get lots of images of an object is to take a video of it. And a quick way to turn that video into images – called an "image sequence" – is ffmpeg. It seems to convert from many formats like .mp4, .mov, .avi to lots different image formats such as .jpg and .png.

There's plenty more detail in the ffmpeg docs, but here's what I did that worked so quickly on my Mac:

brew install ffmpeg

I use Homebrew to put things on my Mac, so this went pretty quickly. I had to update my Xcode command line tools, but Homebrew is super helpful and told me exactly what I needed to do.

Next, I did this from the Terminal:

ffmpeg -i IMG_1019.MOV -r 15 coolname%04d.jpg

Here's what's going on:

  • -i means the next thing is the input file
  • IMG_1019.MOV is the movie I Airdropped from my phone to my laptop
  • -r is the flag for the sample rate.
  • 15 is the rate. I wanted every other image, so 15 frames every second. 1 would be every second; 0.25 every 4th second.
  • coolname is just a prefix I picked for each image
  • %04d means each frame gets a zero-padded sequence number, starting with 0001 and going to 9999– so my image files are named coolname0001.jpg, coolname0002.jpg, coolname0003.jpg, etc.
  • .jpg is the image format I want. If I put .png I got PNGs instead.

In mere moments I had a dozens of JPG files I could use for training. And that's pretty great.