Some time ago, I heard a statistic about the number of Chinese cities with more than one million people that surprised me. The original purpose of this post was to dig up those statistics for myself, and make the same point—with an elegant data visualization—in a way that would surprise you. I think I made it about a quarter of the way there. You can see the results below, or live over here.
For my money, the line graph and the so-called “pack” are the two most compelling visualizations I created. The line graph makes a strong point about both city size and count at the same time, especially when you consider that it mixes continents and continent-sized countries. The pack is a favorite because I did not realize you could generate something so powerful with so few lines of code. Note that these depict all urban areas with more than two million people.



There were two things that ended up taking some time to sort through. First, a city’s population is a fuzzy number. Commonly reported figures in the US include:
- The number of people within a city’s legally-defined limits;
- An urban core area: “a densely settled core of census tracts and/or census blocks that meet minimum population density requirements”; or
- A metropolitan statistical area: “one or more adjacent counties or county equivalents that have at least one urban core area of at least 50,000 population, plus adjacent territory that has a high degree of social and economic integration”
To bring these definitions to life, consider Boston. Boston proper is the 21st largest city in the US with a population of about 625,000 people. This figure excludes 106,000 residents of the Republic of Cambridge, 77,000 residents of Somerville, and many others. By contrast, the metropolitan statistical area—which includes Cambridge, Quincy, and even parts of New Hampshire—is the 10th largest in the US with 4.6MM residents. Rather than worry too much about these definitions and how they work across countries and continents, I let Wikipedia decide for me.
The original statistic that prompted this post, by the way, is that China has over 160 cities with more than one million people. In this case, a “city” includes both the urban core area and the surrounding suburbs. The United States has 51 MSAs with at least one million people, the smallest of which is Rochester, NY.
When I finally had some workable raw data in hand, I set out to find a new tool to use for the visualizations. It was time to move on from Excel, and I didn’t think R would cut it. But as luck would have it, we hosted a Lot18 hackathon at FirstMark’s office last week. One of Lot18’s engineers told me about a Javascript package called d3. Reload the d3 homepage a few times to see some of the cool things you can do with it. Using this new toy was so fun that I ended up creating six different ok visualizations rather than one really good one. What’s most exciting, though, is to have a powerful new tool for future posts. In any case, here are the rest of the visualizations I created with d3:



