Who's On First is a gazetteer of places. Not quite all the places in the world but a whole lot of them and, we hope, the kinds of places that we mostly share in common.
A gazetteer is a big list of places, each with a stable identifier and some number of descriptive properties about that location.
An interesting way to think about a gazetteer is to consider it as the space where debate about a place is managed but not decided. We call our gazetteer Who's On First (or sometimes "WOF" for short). According to Wikipedia, Who’s on First:
...is a comedy routine made famous by Abbott and Costello. The premise of the sketch is that Abbott is identifying the players on a baseball team for Costello, but their names and nicknames can be interpreted as non-responsive answers to Costello's questions. For example, the first baseman is named "Who"; thus, the utterance "Who's on first" is ambiguous between the question ("Which person is the first baseman?") and the answer ("The name of the first baseman is 'Who'"). "Who's on First?" is descended from turn-of-the-century burlesque sketches that used plays on words and names. Examples are "The Baker Scene" (the shop is located on Watt Street) and "Who Dyed" (the owner is named Who). In the 1930 movie Cracked Nuts, comedians Bert Wheeler and Robert Woolsey examine a map of a mythical kingdom with dialogue like this: "What is next to Which." "What is the name of the town next to Which?" "Yes." In English music halls (Britain's equivalent of vaudeville theatres), comedian Will Hay performed a routine in the early 1930s (and possibly earlier) as a schoolmaster interviewing a schoolboy named Howe who came from Ware but now lives in Wye.
Which sort of sums up the
problem of geo, nicely. It might be easier, perhaps, if we all understood and experienced the world as coordinate data but we don’t, so the burden of “place” and its many meanings is one we trundle along with to this day.
Our gazetteer is absolutely not finished – both in terms of data coverage as well as data quality – so, in the near-term, you should adjust your expectations accordingly when you approach the data. We are releasing the data now because we believe it is important not just to articulate our goals and intentions around the project but also to back them up with tangible proofs.
Tools and documentation for Who's On First are available from our GitHub repository:
There is also a public
spelunker, a tool for browsing the data and a
get lat lon style application for investigating specific points on the big planet-ball we all live on.
And if you're feeling like getting in to the details there is a very thorough and very long introductory blog post about Who's On First followed by another twenty-or-so thousand words (and counting) of thinking about the project, as it evolves:
When we source other open data projects we make best effort to indicate them (e.g.:
src:geom:naturalearth). We also include the original source's properties prefixed with a suitable namespaces. A complete list of sources and their namespaces is maintained in a separate whosonfirst-sources GitHub repository:
Please notify us if you believe that an open data project has not been properly noted. Our original work is generally indicated with properties prefixed with
wof: or is not prefixed (like "name").
Crediting Who's On First is recommended and linking back to the License is required. For example:
Data from Who's On First. License.
The Who's On First dataset is both original work and a modification of existing open data. Some of those open data projects do require attribution. We have listed all sources in the full license file.
Remember, some sources require attribution, some do not. Mapzen's original work, including the format and structure that allows Who's On First to operate, is made available under a Creative Commons Zero designation, and a shout out would be lovely.