Heel, Rotson! My list of computer-generated dog names

Shadoopy. Dango. Ray-Bella. Figgie.

If I told you those were names of actual dogs in New York City, would you believe me?

They're not. They were generated by a machine-learning algorithm mimicking dog names after it "studied" a list of 81,542 dogs registered in NYC.

The experiment, which took just a few hours Saturday, was something I've wanted to try since I saw the playful, awesome work of Janelle Shane and her experiments using neural networks to generate paint colors, guinea pig names and Harry Potter fan fiction.

I happened to have some free time, and decided to give it a shot. Along the way I:

built, in mere minutes, a computer in the cloud powerful enough for machine learning
made and played with a recurrent neural network
learned a little more about machine learning
had a lot of fun

The program generated lots of names, including many that existed in the original data. Once I filtered those out, I had almost 400 computer-created, mostly plausible dog names. Here are some of my favorites:

Rotson
Dudly
Lenzy
Murta
Cookees
Geortie
Dewi
Chocobe
Sckrig
Booncy
Cramp
Dango
Ray-Bella
Santha
Coocoda
Satty
Bronz
Shadoopy
Mishtak
Figgie
Grimby
Phince
Bum-Charmo
Soma
Blant
Snowflatey

If you'd like to geek out about how I did this, read on. You can do it, too.

The Software

Shane describes using char-rnn for her projects. "Char-rnn all the things," she says.

Char-rnn ALL THE THINGS! https://t.co/dsL7PoVyrz
— Janelle Shane (@JanelleCShane) May 12, 2017

The char-rnn documentation is pretty great, and has some helpful info. It also suggests using torch-rnn as a new, faster way to do this character-based learning. I knew I'd have access to Torch on my cloud computer, so I went with torch-rnn.

The Hardware

All the cool kids are using GPUs -- graphics processing units -- for machine learning. Unlike the CPU (central processing unit) that runs your laptop or desktop, GPUs are chips designed for the fast, simultaneous computations needed for video, graphics, and gaming. And that power happens to be excellent for machine learning programs.

Buying, configuring and running a computer to use GPUs for machine learning can be tricky. I've tried. But at an artificial intelligence conference last week I learned that I could spin up a fully-configured computer with all the necessary gear in the cloud. And it takes about 3 minutes to set up.

The "cloud" computer actually lives in the Amazon Web Services (AWS) system. Learning the AWS landscape and its plethora of three-character creatures is trickier than it should be. If you've never done it, it'll take some work. (Here's how I teach it to college students.) But for anyone already comfortable starting an Elastic Compute Cloud (EC2) instance, it's pretty straightforward.

I went to the EC2 section of my AWS dashboard and picked "Launch instance"
In step 1, "Choose an Amazon Machine Image (AMI)," I clicked on "AWS Marketplace" on the left side of the page.
I searched for "deep learning."
I picked the Amazon-branded "Deep Learning AMI Ubuntu Version." This has a ton of current machine learning packages and drivers already on board.
In step 2, "Choose an Instance Type," I picked "p2.xlarge," which comes with NVIDIA graphics hardware and the CUDA drivers to run it.

If you try this yourself, be careful with the instance size you pick! The p2.xlarge is about a dollar an hour. The next size up is $7.20 an hour. Also, you'll want to be sure to "stop" the instance when you are done playing -- a dollar an hour adds up quickly!

The Data

I used the dog-license data obtained through a public-records request from the NYC health department back in 2012, which we used for the WNYC Dogs of NYC project and made public shortly thereafter.

The Process

Once I spun up the Amazon computer, I roughly followed the instructions in the torch-rnn repo. Since Torch is already loaded on that computer, but not fully installed, I had to add some steps and eliminate some others.

For all of the steps I took, check out the process notes I wrote along the way. My work, along with the data, is in this dognames branch of my fork of the torch-rnn repo.

Woof!

Photo taken in New York City's Inwood Hill Park by Steve Guttman, used under creative commons from Flickr.

johnkeefe.net

journalism, data & diy hackery