Artisanal AI: Detecting objects in our office

Off-the-shelf services like the Google Vision are trained to identify objects in general, like car, vehicle, and road in the image below.

But many of the journalism projects we're encountering in the Quartz AI Studio benefit from custom-built models that identify very specific items. I recently heard Meredith Broussard call this kind of work "artisanal AI," which cracked me up and also fits nicely.

So as an experiment, and as part of my summer training program, I trained an artisanal model to identify between the three objects at the top of this page from the Quartz offices: A Bevi water dispenser, a coffee urn, and a Quartz Creative arcade game (don't you wish you had one of those?!)

I also made a little website where my colleagues and I can test the model. You can, too — though you'll have to come visit to get the best experience!

The results

The model is 100% accurate at identifying the images I fed it — which probably is not all that surprising. It's based on an existing model called resnet34, which was trained on the ImageNet data set to distinguish between thousands of things. Using a technique called transfer learning, I taught that base model to use all of its existing power to distinguish between just three objects.

But I also needed to know if it's not one of those three things. For example, among those three objects, the model thinks this fire extinguisher is closest to a coffee urn.

arcade: 0.1335
bevi: 0.1811
coffee: 0.6854

But as you can see, it's only 69% certain it's a coffee urn. Which is not very certain. So I added some code that essentially says "none of the above" if it's not certain.

Apparently, an upside-down picture of me also registers as most like a coffee urn — but doesn't meet my threshold for certainty.

You could imagine a model like this used to help journalists spot specific patterns from thousands of satellite images, like did for this investigation into illegal amber mining in Ukraine.

My steps

I've been learning how to use the Python library mainly through this amazing and free online course, and my steps are based on what I learned there:

  1. Set up a Jupyter notebook on a computer with a GPU. I used an Amazon EC2 setup because I'm familiar with that, but there are several easier ways listed in the course documents.
  2. Turned short videos of the objects into hundreds of separate images for training. Quartz's Ankur Thakkar did this with Adobe Premiere, I believe. The ridiculously large set of these images is here (it's 2.2 gigabytes). We later realized picking every 10th frame works just fine.
  3. Put those images in three separate folders named for their objects, and put those all in a folder called "office."
  4. Loaded the data into a "data block," adding nifty tweaks to randomize the lighting, rotation, and other aspects of the images to make the model more resilient.
  5. Trained the model with that data, but using resnet34 as a base.
  6. Exported the model as a single file.
  7. Put that model file online at a public url (it's in an Amazon S3 bucket, but could be a public Dropbox or Google Drive file).
  8. Updated this repo with the url for that model and the "classes" the model looks for, which in this case are ['arcade', 'bevi', 'coffee'].
  9. Deployed that repo to a hosting service called Render.
  10. Gave it a whirl.

More detailed steps

First off, I'm going to figure out a better way to share these things. So if you return in a week or two, there may be an easier way to play with my code.

Until then, if you'd like to replicate or build on my work your best bet is to set up a Jupyter notebook on a GPU in one of the methods described here.

Then you can download this Jupyter notebook which has all of my steps. You'll also need the ridiculously large set of images (again, 2.2 GB).

Let me know if you build anything!