GET “Who's on First Base” - Mapzen's New Geographic Directory / Sudo Null IT News FREE

Small variant

All administrative units! While everything is damp and complicated !!! Just for today !!!

Jumbo version

Mapzen creates a geographic directory of administrative units. Not that entirely , but the vast absolute majority, and, we hope, most of their species. A geographic directory is a colossal list of administrative units, each of which has a indissoluble identifier and a number of properties that describe their location. It is interesting to view the handbook as a space where debates around administrative units are conducted , but not resolved. We call our directory "World Health Organization's On First" or, in short, "WOF".

According to Wikipedia , Who's along Start:

... it's a comedy scene successful famous by Abbott and Costello. The plot is based on what Abbott calls Costello baseball players, but their names and nicknames can make up interpreted American Samoa meaningless answers to Costello's questions. For example, a first-base player is called "Who"; therefore, by the pinna "Who is in first base" is sensed doubly both the question ("Which player is in first base?") and the answer ("The name of the player in freshman base is WHO"). "Who's On Number 1" comes from the burlesque sketches of the beginning of the last century, which used a play on words and names. For example, "The Baker Scene" (the store is located on Watt Street (consonant with "What street")) and "World Health Organization Dyed" (the owner's public figure is Who (Who)). In the 1930 Cracked Nuts film, comedians Bert Wheelwright and Henry M. Robert Woolsey studied a map of the mythical kingdom with approximately the following dialogue: "- Which one?" (Which metropolis is behind Which?) "- Yes." In the English music halls (the British analogue of Vaudeville theaters) of the early 1930s, comic Will Hay played in a scene where the instructor interviews a student Howe, who came from Ware, but straightaway lives in Wye.

The name utterly underlines one of the "problems" in geography. Of course, it would be simpler if it were enough for us to perceive the global only as a set of coordinates . But we cannot do this, and the weight of the "administrative unit" with wholly that is meant by this lies with us to this twenty-four hours.

Our manoeuver is rattling distant from complete (both in terms of data coverage, and in terms of their quality) so in the hot future you should not expect very much when using it.

We are publishing the data now, because it is important for U.S.A not but to say our goals and intentions, but as wel to turn them into tangible results . Consider this blog spot and data release now as a reflection of our concenter and not our supreme finish.

Our channelis is not the first (and not the cobbler's last, hopefully). Very much has been created before us. The most noticeable of them:

  • Getty Institute Place Name Thesaurus
  • Alexandria Appendage Library
  • GeoPlanet from Yahoo
  • Geonames
  • Joint contribution of the New York Public Library (NYPL) and the Library of Congress (LoC) to the creation of a historic geographic directory
  • Newly Declared NYPL Blank / Time Catalogue
  • Naturalarth
  • Quattroshapes

We enjoyment Quattroshapes as the fundament for our data set, as IT has the most info about more places for more clip.

We complement this solidification with relevant geometries and metadata from the Natural Earth , GeoPlanet , GeoNames and Zetashapes projects , American Samoa distant American Samoa their licenses allow.

People who are familiar with the problem probably think: Does this mean that the scarcity of coverage of predestined types of body units (many regions and rural settlements about the world) will delay the start of the whole process? The resolution is yes. One of our low-set-term goals is to catch out which body units in the GeoPlanet set lack the corresponding counterpart in Quattroshapes. Having conventional their list, we will beryllium able to import name calling and hierarchies for these units from GeoPlanet, and data coverage will immediately improve. Apodictic, many units testament lose their coordinates during meaning, but we are sure that this problem will be solved with metre.

We are not the opening in our bespeak to create a comprehensive open dataset about the whole world. We consider each much project American Samoa a unique contribution to this common cause, and away combining these contributions, we hope to begin to create something more than just their sum.

Completely information is for sale under a Creative Commons Zero license .

A number of open sources used by the States require attribution. We stimulate listed these sources here .

Huge variation

The Brobdingnagian version is huge. It makes sense to make yourself a cup of coffee, operating room maybe something stronger if you climb into the same hobo camp as we do.

After all, what is a geographical reference?

As we aforesaid earlier, a geographic directory is a large leaning of body units, each of which has a constant identifier and a number of properties that describe their location. It is interesting to consider the enchiridion as a space where debates just about body units are conducted , but not resolved.

The simplest and friendliest expression of this idea is to consider how other than multitude can diagnose the same neck of the woods. Sometimes two the great unwashe call this place the same, just write it down in different ways. Remember how geocoder is still pedantic to the correctness of queries, and how funny it is if an inaccurate query is entered. Now multiply this by the number of each languages ​​in the world.

The easiest way to explicate the purpose of the guide is to say that it solves the trouble of misspelling. If every locality has a simple (often numeral, only most importantly constant) identifier, then you and I can refer to the selfsame locality in any possible linguistic context exploitation this identifier. And get along not lose a net ton of time, given the science wisdom.

For example, I can Call the City "Montreal" (in European country), and you discover it, say, "몬트리올" (in Korean) and someone else - "Montréal" (in French) or "MTL" (international reducing) etcetera. So, information technology would be great if the directory were the space in which all these representations of one administrative body (or, for playfulness, one hallucination stipulatory by all) could coexist.

To convey the much complex musical theme that the handbook exists for debate, we can recollection that the "administrative body" is a big problem, equally information technology often becomes a subject of controversy on a social, political and often very emotional tied. This problem is not New. People argue and, sometimes, fight back for their vision of belonging and the borders of the territory for as long as they remember themselves.

In not a unity book, not a single paragraph, non a single sentence have different minds for so galore age yet been fit, or are non trying, or do not need to come to a common opinion on a sure as shooting territory. Perhaps this ... is useless, but I want to believe that some list of key fruit-value pairs does not matter how complete, created in an try to localization complete the nuances of much a territory, will help in resolving a long-running dispute.

Primary principles

The guide is based on a number of standard principles:

Mapzen has an opinion

Information technology is important that Mapzen does not have an sentiment on each specific administrative unit, but on the nature of the unit in and of itself . This necessary point outlines the boundaries and gives USA an discernment of what our project is and what is severe, what information technology is not.

Reflect all points of view that fall for inside the boundaries of the project

The world is a complicated thing, and we would the likes of the geographic guide to be a rather platform for sometimes conflicting opinions about this world. We intend to reflect as many opinions or decisions on a specific unit as we can for applications and users. How this will plain itself in specific conditions remains to glucinium seen, but we pose a goal for ourselves.

Moveability

The standard source of information about the administrative unit is a GeoJSON text file with a unique 64-bit denotive ID. Altogether computers can operate with text files and numbers. Text files tin exist viewed or adjusted in any emeritus text editor. Text files can be printed on a printer. Numbers are chop-chop and well indexed by databases.

We use text files, because for our data are especially heavy: ease of use, reliability and portability over time. The benefits of the ripe old textbook format overbalance the benefits of former options.

For exercise, Google's Protocol Buffers are outstanding, but they require many other Google programs to use. Shapefilesfrom ESRI are also marvellous, their prevalence and a tenacious history confirm the convenience of the format, however, they also need to install special programs for the sake of a trifle redaction.

This does not nasty that text edition or static files are the best choice. Information technology all depends on specific tasks, and, if necessary, we testament translate all the data in a more lightweight and convenient data formatting, but you leave always have access to simple textbook files.

Geojson

We use GeoJSON as the underived exchange format for two complementary reasons:

  • This data structure with a tokenish of markup at the moment. If someone comes up with a more concise markup language, we will switch thereto.
  • There are many tools for working with GeoJSON and, importantly, for converting it to all other formats used.

Tell Pine Tree State more (complicated things)

If you are now interested in simple things (such as names, geometries, and the required nominal of properties related to them), then curlicue further .

Compliance

When dealing with other directories (and we want to interact with each available directories: both disused, current, and formed), a good option is to start past looking for correspondences between them.

Parallelism therein case is the basis to affirm that, e.g., "their Boston" and "our Boston" are unity and the same. Their details may differ completely due to other tasks and views. Having different points of opinion is neat. Correspondence allows anyone to work with things that interest him, taking into account the work of others, and providing a chemical mechanism for interaction.

Each WOF record has a property wof:concordancesin the figure of key / esteem pairs, which is a list of pointers to the same object in other databases. For illustrate:

          "wof:ID": 101736545, "wof:concordances": {     "fct:id": "03c06bce-8f76-11e1-848f-cfd5bf3ef515",     "gn:Gem State": "6077243",     "gp:id": "3534" }                  

At the time of this issue, we birth correspondences with GeoNames (159,359 objects), GeoPlanet (135,399), QuattroShapes (115,550), Factual (80,973), various airport classifiers (ICAO, IATA, FAA and OurAirports), Wikipedia (so far only at airports) and justified with Mapzen Border countries . Much coming soon.

Types of Body Units

For any hierarchy of administrative units, we deliver identified three "classes", ace of which may go to any type of unit of measurement. This does not mean that in that respect cannot exist other classes (or types of administrative units). We exactly decided to start with such a set.

Common (C)

These units are common to all hierarchies and all administrative units in WHO's On Initial.

An important point: this way that any object must have single or more common crack objects of this class (for example, a country, Oregon a continent, or sometimes simply major planet Earth). This does not keep specific additions to the hierarchy for a particular proposition place ready to fit information technology into an existing common hierarchy.

Common-optional (CO)

Units of this class are implied as theatrical role of the general power structure, but may be gone because they are not appropriate, surgery we do not have such data. An example of this type is the county.

Nonmandatory (O)

These parts of the hierarchy are typically circumstantial to a particular country or domain. For exemplar, many nested departments in France or Germany. The only rule: optional (O) types moldiness be somewhere inside the general (C) hierarchy.

The borderline list of body unit types for the nigh broad hierarchy looks something like this:

- continent (C)   - country (C)     - region (C)       - "county" (CO)         - vicinity (C)           - neck of the woods (C)        

A more detailed adaptation might be:

- continent (C)   - empire (CO)     - country (C)       - region (C)         - "county" (CO)           - "metro area" (CO)             - locality (C)               - macrohood (O)                 - neighborhood (C)                   - microhood (O)                     - campus (CO)                       - building (Carbon monoxide)                         - address (CO)                           - locus (C)        

The site! Buildings !!! Microdistricts !!! Empire !!!

There are and then many untested types, simply that's not completely. You see then far only a rough-cut skeleton. GitHub has a whole repository dedicated to types , including a discussion (and canonical golf links) about each character given above.

Hierarchies

The hierarchies in WHO's On First are presented as a list, from each one element of which is a directory containing a dead pecking order. Like here:

          "wof:pecking order": [     {         "country_id": "85633147",         "region_id": "85683255",         "county_id": "102072387",         "locality_id": "101750223",         "neighbourhood_id": "85794581"     } ]                  

This is due to the fact that the terrain in Who's On First can belong to some different hierarchies. Deem example a type such arsenic urban agglomeration (the "San Francisco Coloured Surface area" in and around San Francisco, "Newfangled York," which includes all five districts and even parts of New Jersey, and so on), which often includes units of such same type. Controversial territories, again. Wherefore you said it we came to this decision is a subject for a separate article, only in short:

Why is this answer swell:
  • information technology is visual
  • easy to compare multiple hierarchies
  • it does non require the drug user to exert unnecessarily brains to restore the sperm-filled hierarchy or provide support for the next "insight" that has just visited United States of America
  • easier to make changes in the development process (... we hold ahead the "official" plunge)

Why is this solution unfit surgery seems bad:

  • if we hold urban agglomerations, it means that many some other units (neighborhoods, districts, sites) may have several hierarchies, where some territory extends beyond all parent units
  • file size, disc space, channel width - all these are consequences of the beginning point and are cognate to spaces and coordinates with> 6 decimal places in GeoJSON files that can quickly get heavy

Controversial Territories

Although all regions and many settlements are "disputed" at the level of friendly kid, disputes terminated much territories take a real serious turn, since two or many states are involved in them (and sometimes, the supposed non-state subjects of worldwide police force ). Much disputes are fraught with violence and consequences, far from a "affable backchat."

For the duration of the disputed status, we assign a character to much territories disputed. Disputed territories, by definition, have two or more parental states in their hierarchy. This border on does not chew over all the facts of the situation for each dispute. On the other turn over, IT allows you to high spot the parties to the difference of opinion and, arsenic we said above, make a decision on how to reflect the arguing in the context of the task.

Parent IDs and Parent Rights

Even if a soil can belong to different hierarchies, we mean that in most cases information technology is actual "controlled" by one person. E.g., the Golan Heights are controversial by Syria and Israel, which is reflected in the hierarchy, but they are still under Israeli control.

          "wof:hierarchy": [     {         "continent_id": "102191569",         "country_id": "85632315",         "disputed_id": "85632221"         },     {         "continent_id": "102191569",         "country_id": "85632413",         "disputed_id": "85632221"     } ], "wof:id": "85632221", "wof:public figure": "Golan Heights", "wof:parent_id": "85632315",                  

In some cases, we can't say for sure who controls the territory, or are not certain astir it, since the argufy began freshly, and we are still checking the data. Then we set the raise record to -1.

It happens that we arrogate and -2. This should be interpreted as ": shrug: The world is a tall thing." For instance, the Baikonur Cosmodrome in Kazakhstan .

Fill in / Replaceable

One of the big and complex liberal arts questions that arise when working with geography: How to recognize a two-needled adjustment from a fundamental change?

This problem is not strictly geographical, but in geography it is most often an eyesore. For exercise, Poland, France and FRG have confiscated and surrendered (for a smashing cardinal years) (sometimes with succeeding assignment back) indefinite territory. Their boundaries, and periods of creation of borders, are serious discourse information not only for cartography, but also for many some other areas of activity. Engage the works of art that were ready when the territory belonged to Poland, but were created when the territory belonged to Germany. How would you steady identify changing terrain?

Another example. In Rising York City there is a soft of not-quite-districtcalled "BoCoCa". BoCoCa is curtal for Boerum Hill, Cobble Benny Hill and Carroll Gardens, deuce-ac adjacent districts south of Brooklyn's business center. BoCoCa is neither a name in the usual sense, nor a district, every bit most people think. In many maps and datasets, on the other hand, this is the area (and name). Whatever we think, BoCoCa "exists."

At Who's On First, we made BoCoCa a "macro district", which includes the three districts from which its name is derived.
The eccentric of administrative unit is a very important property, and it is obviously used past various applications. We do not need to have it off how OR why applications handle properties associated with the terrain. And if we decide to deal this WOF ID 85892915 as a region (which IT was when importing from Quattroshapes),we probably shouldn't transfer it that easily, at the petition of our left heel .

True, we arrange not view BoCoCa an area. We take up a firm opinion on this. While BoCoCa is well-advised a district, from our repoint of view this is no more the case. Our way to resolve a trouble like this is to create a new record with a new type (BoCoCa - macro) and adjust the rest period of the records, indicating that extraordinary is replaced by another.

For instance, an entry for BoCoCa as a district looks like this:

          "wof:superseded_by": [102147495], "wof:supersedes": [],                  

While BoCoCa American Samoa a macro district looks comparable this:

          "wof:superseded_by": [], "wof:supersedes": [85892915]                  

IT remains for applications to decide how (and whether) to separately answer for for replaced objects. The search locomotive, for exercise, can separately rank substituted objects or completely exclude it from processing.

Violations

Each entry has a list wof:breaches. At the time of reading this article, most of these lists whitethorn still equal empty. "Violations" happen when the geometry of one unit intersects the geometry of another building block of the like type.

These lists are used as a signaling to Who's On First users some that in that location are errors in the data (as a predominate, the borders of countries make out not cross neighboring borders), and that there is a difference of opinion all but the boundaries of the territory (for example, the region).

Like many other signals, its value, importance, and method of processing are left to the free will of the end applications.

Tell me more (simple things)

Remember what is meant by "simple things" ...

Names

Totally items are originally taken from the Quattroshapes and Natural Earth kits. However, GeoPlanet (GP) is generally amend in terms of trilingual and colloquial names.

GP has two properties for naming:

  1. Language Codification ISO 639-3
  2. The name "type" from a advantageously-known list of descriptions compiled by excellent people from the GP:
The Name_Type field is a unmatched-letter computer code that takes the following values:
  • P - preferred call in English
  • Q - favorite name in opposite languages
  • V - a common (but unofficial) version of the name (for exemplar, "New York City" for New York)
  • S - synonym or informal mention ("Big Apple" for New York)
  • A is the abbreviation operating theater code for an administrative unit ("NYC" for Original York)

GP also distinguishes diagnoseand also known as, in their world, you can find the following:

Name: Montréal Language: FRE Alias ​​(ENG_P): Montreal A.k.a. ​​(KOR_Q): 몬트리올        

GP does not submit into accounting that some countries have several state languages. We intellection about all this and distinct:

  • We must back up name calling in various languages, and the ability to specify the coordinates for the name
  • For the preferred name we use only p, regardless of words
  • We mustiness usage space nominatebecause it is more convenient.

For instance:

          {     "wof:lang": ["eng", "fre"],     "name:eng_p": "Montreal",     "name:eng_a": "YMQ",     "name:fre_p": "Montréal",     "name:kor_p": "몬트리올", }                  

Geometries


Consensus Geometry

Each body unit will have one "consensus" geometry. The concept of "consensus" has not one of these days been defined. Anyway, the exercise of this word is fraught with problems. It will be replaced by a more accurate term.

All "other" geometries

Also, each unit will have an "alternative" file cabinet with different named geometries. It is supposed to stash awa controversial, simplified geometries in them, operating theatre optimized for specific tasks (for example, geocoding).

The main matter here: Fetching INTO ACCOUNT ALL GEOMETRY.

A geometry source, consensus (sic) or alternative, is included in from each one first appearance. For case:

          {     "src:geom": "zetashapes",     "src:geom_alt": ["quattroshapes", "naturalearth"] }                  

Centroids

From each one record can have several centroids. The combination of "several centroids" really sounds like an oxymoron. By the terminus "centroid" we denote the focus sphere of ​​any geometry. Opposite centroids are indicated aside prefixes indicating the typewrite of use. For instance:

  • geom:latitudeand geom:longitude- the revolve about of the consensus polygonal shape obtained victimization mathematical magic.
  • lbl:latitudeand lbl:longitude- coordinates of the optimal placement of the diagnose. For instance, the San Francisco landfill includes the Farallon Islands, which implies that the "center of attention" of this landfill is located in the Pacific Ocean, and this is non the best range to name it.
  • nav:paralleland nav:longitude- the point to which the navigator should bring you. For lesson, the enchant to the ambulance station, and not a loading platform for trucks arriving at the same place.

Required lower limit properties

The Flickr API is designed reported to the rule: "What is the minimum data set that the API should getting even for any request connected photos?"

The essence of the Flickr answer is given in the " standard result for photographs ", namely: "The minimum data set should allow consignment / building a URL that points to the photograph page on Flickr"

In the case of Mapzen, the answer to a similar question would be: "The data set should allow displaying the do API on the map . "

E.g., IT should be possible to get a response from Pelias (Oregon any other API), just transmit IT to the Leaflet A a stratum GeoJSONand see the answer along the map.

Donated all this, a "minimal set of properties" power look like this:

          {     "wof:id": 85922583,     "wof:name": "San Francisco",     "wof:fullname": "San Francisco, California US",     "wof:placetype": "locality",     "wof:parent_id": 85688637,     "wof:prime": 9,     "wof:score": 100 }                  

A some words about the example above:

  • When we say "properties", present we meanspirited metadata associated with this orbit, non its geometry.
  • The property wof:sexual conquestshould be considered as the equivalent of a hunting rank, which is unregenerate by the forces of the Pelias team. It wof:qualityshould also be regarded arsenic the tantamount of a character / credibility membership determined by the Data team.

Future magic (under development)

The succeeding is a list of properties that are presently not supported, or their indorse is so crude so that it is easier to enjoin that it is non. We discuss these properties, because they must (and will) be supported in the future.

Rank (s)


Quality

How consummate operating theatre reliable in our opinion is the data in this record

Coating

By "coverage" we ungenerous the number of attributes that a terrain platte has. Because the record can have a wonderful set of official and alternative names, but very little metadata (population, height etcetera).

Dates

What is the appointment of organization or, in both cases, the abolition of an body unit? This data becomes especially great in a information set where one record can substitute another.

In national, dates are a rather heterogeneous space, and we mean to start with simple forms, step by step augmentative the complexness for describing humanistic discipline and modern realities. The Library of Congress is working on an extended escort / time data format (EDTF) , it makes sense to feel it if you are interested in this matter.

Wait ... Where can I get this data from you?

The first (and very very very outstanding) that we ask to understand - Who's On Forward is still in development, which means that:

  • Some (possibly many) of the data will be incorrect.
  • Some things are lacking. Some things are missing, and we know that we don't know nearly them, so we will deal with them shortly. Extraordinary things are missing, and we don't know that we Don't live about them, so we will deal with them as we detect errors.
  • Extraordinary (maybe large) part of the data wish change in one elbow room or another expected to the reason out from paragraph 1.
  • Adjusting a individual record Crataegus laevigata require updating the records associated with it. We have not yet formalized or finalized tools for updating related records. This means that in the short term, incompatibility with affinal records is conceivable. We'll deal with that.

Целью нынешнего релиза было не трубить и возвещать о новом рассвете прекрасных данных, а наполнить для всех содержанием то, о чём мы говорили, иметь набор данных, которым можно подтвердить или опровергнуть наши гипотезы, и дать ликбез о практике работы с этими данными.

Если вам не хватает времени или темперамента (персонального или коллективного), чтобы с таким же немного буйным рвением как у нас продираться через трудности, то лезть в наши данные вам, наверное, пока ещё рано. Мы планируем и далее держать вас в курсе и участвовать в открытых дискуссиях по нашему проекту, так что следите за блогом и сообщайте нам о том, что нужно улучшить.

Сырые данные в формате GeoJSON лежат в двух местах: публичная точка доступа AWS S3 и репозиторий GitHub с кучей мелких файлов. Их URL, соответственно:

  • s3.amazonaws.com/whosonfirst.mapzen.com/data
  • github.com/whosonfirst/whosonfirst-data

Note: the link to S3 above does not need to be wide in the browser, since it is an approach point, and only people WHO can work with S3 can coiffure something with it. If these words are doll language for you, you suffice not need to click on the link to S3. Play along the link to GitHub like a sho.

There is no publicly obtainable tool for showing data withal. We give an inside "caver", the code of which we architectural plan to open (along with libraries for working with World Health Organization's Happening First information), but at the moment this has not happened.

The repository happening GitHub also has "meta" files folded into a directory, intelligently namedmeta . These are mostly CSV files with a minimum of information on administrative units of a certain type. Like everything elsemeta files are in development, simply they ply minimal power to view data without loading the entire set into the database.

You said ... "sites"?

Yes. We have not enclosed sites in this handout, but we are working on them. Venues occupy a real large part of WHO's On First, but they are either multiplex or numerous, and often all of a sudden. Thus, we will act on from simple to complex.

A couple of quarrel about Git (and GitHub)

We strongly advise against being tied to WHO's On First data connected GitHub (and Git as a whole). Right at once, we have little musical theme of ​​the best way to share information and accept corrections and suggestions from the community at the said clock time.

Scorn the fact that the good multitude from GitHub continue their excellent exercise, making Git easier to use, the reality shows that Git clay a barrier for many people. In the absence of a more nominal decision on the secondary, GitHub, leastwise, allows you to outline the of import wishes:

  • An harsh and easy straggly dataset that people can download and use.
  • Anyone can make adjustments and more sober changes to the terrain data.
  • Ability to contain data changes.
  • A place to depot the change log.

And again: don't get wholly attached to Git while working with Who's On First data. He is needed to show the idea of ​​the project.

What's next?

A mint of shape stiff to be done.

More inside information: releases of tools and libraries for working with Who's On Commencement information, release of the internal speleologist web application, which we use to dig into data and formalise, create prototype services supported our data, closing (and in some places, the beginning ) documenting everything above and fixing wholly bugs .

Do not get uninterested!

Image by Aaron Cope.

DOWNLOAD HERE

GET "Who's on First Base" - Mapzen's New Geographic Directory / Sudo Null IT News FREE

Posted by: brownagen1949.blogspot.com

0 Response to "GET “Who's on First Base” - Mapzen's New Geographic Directory / Sudo Null IT News FREE"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel