Esquire Theme by Matthew Buchanan
Social icons by Tim van Damme

06

Jan

Fun With New York City Birth Names

Mayor Bloomberg and New York City have made a lot of city agencies’ data available to the public through NYC Open Data. The city has even launched a competition called BigApps with prizes of up to $10,000 for app developers. Submissions for the current round, 3.0, are due January 25th, so if you want in, you’d better get started.

I was playing around with the site when two sets of data caught my eye: 2009 Popular Birth Names (Male) and 2009 Popular Birth Names (Female). I’ve always found name frequency data fascinating. In fact, I’ve wasted far too much time over the years at the Baby Name Voyager (contrast the precipitous drop for Gertrudes and Ethels against the spike of Madisons.) So, I decided to have some fun with the New York City name data.

For starters, I wondered how New York compared to the rest of the country. Here are the top ten most popular names in 2009. Red names didn’t make the top 15 in the other column, while yellow names didn’t crack the top 10. New Yorkers, apparently, really love the name Jayden. We’re also way out of synch when it comes to female names.

Next I started to wonder how letter frequency differs between names and typical English text. Fortunately enough, Wikipedia has an entire entry devoted to the topic. Here’s a table that compares letter frequency in male, female, and all names against typical text. (Unsurprisingly, measuring letter frequency in English looks to be a thorny topic. What type of text do you use? The front page of the New York Times, as Alfred Mosher Butts did to determine letter frequency in Scrabble? We won’t go there.)

The single most interesting row in this table is A. 1 in 5 letters in NYC female names is an A? Flipping through the raw data reveals why: Amanda, Sarah, Samantha, Kayla, and Maya all cracked the top 20. Also of interest is how some letters appear to be much more female or male — “female letters” include A, I, L, and Y, while “male letters” include D and O. Here’s the full table:

Lastly, I was curious how the first letter of names compared to the first letters of all English words. Of interest here is that T starts a whopping 16.7% of English words, yet only starts 3.1% of popular names in New York City. Conversely, the least common word-starters (X, Z, Q, V, J) start a total of 1.5% of English words, yet punch way above their weight with names, starting 12.8%. Even excluding J, these letters are twice as likely to start a name as they are to start a word.

There’s a lot of other cool stuff on the NYC OpenData page, so stay tuned. Who knows what’s next… 311 call data? Campaign contributions? Electric consumption?

PS - For the Excel geeks out there, I found a neat way to count token frequency in a string without using VBA. The formula looks like this: =LEN([WORD])-LEN(SUBSTITUTE([WORD], [TOKEN], “”))