Data Principles

Who’s On First data is guided by two principles.

  1. The canonical URL for any given Who's On First ID is relative. This might seem counter-intuitive to the point of even being contradictory.
  2. Given any Who's On First ID it should be possible to generate a (relative) URL for that record using a simple and well-defined formula.

The current model is to split a Who's On First ID in 3-number chunks representing nested subdirectories, followed by filename consisting of the ID followed by .geojson.

For example the ID for Montréal is 101736545 which becomes 101/736/545/101736545.geojson.

As of this writing it remains clear that this approach (lots of tiny files parented by lots of nested directories) can be problematic. We may be forced to choose another approach, like fewer subdirectories but nothing has been decided and anything we do will be backwards compatible. By definition this means anything we do (to rewrite or redirect any existing URLs that people are using) should be possible for anyone else hosting their own copy of the data.

And that's the important bit: It should be possible for multiple groups to host their own copy of the Who's On First data while still maintaining stable references to any place in the dataset.

For example, consider two organizations each with their own domains:

The URLs and may be completely different but both refer to the city of Montréal, or Who's On First ID 101736545.

Even though each organization hosts their own copy of the Who's On First dataset — and the reasons for doing so are entirely their own business — they still have a simple and unintrusive way to preserve parity when referring to places.

Mapzen's canonical URL for Who's On First data is and we expect that these URLs will become canonical for other people by virtue of our efforts around the project but we've tried to design things in such a way that this doesn't have to be the case for everyone.