Data dumps files:
- Weekly dumps of the entire Wikidata database can be downloaded from http://dumps.wikimedia.org/other/wikidata/
- JSON (javascript object notation) is the recommended format
- JSON dump files are named YYYYMMDD.json
File format:
- The file is encoded as a single list (i.e. a sequence of elements between the characters [ and ])
- The brackets have their own lines in the file
- Every other line holds a packed JSON encoded object.
There are two types of JSON objects:
- Items (e.g. universe, happiness, Jack Bauer, etc.)
- Properties (e.g. father, instance of, employer, etc.)
Both types have these common elements:
- id: “Q###” for items, “P###” for properties
- type: “item” or “property”
- labels: name of the item or property in different languages
- descriptions: description in different languages
- aliases: lists of aliases in different languages
- claims: statements, groups by property
- sitelinks: links to pages on different sites describing the item
- lastrevid: The JSON document’s MediaWiki revision ID
- modified: The JSON document’s publication date
To look up an id, browse to http://www.wikidata.org/Q### (or P###)