How to put your open dataset on the Wikidata knowledge graph
The article explains how the author put their open dataset, the Swedish Construction FAQ, on the Wikidata knowledge graph. It highlights the benefits of having a Wikidata entity for your dataset, such as a stable identifier, machine-readable citability, and free cross-linking.
Why it matters
Putting open datasets on Wikidata can increase their visibility and discoverability, making them more accessible to AI systems and researchers.
Key Points
- 1Putting the dataset on Wikidata as a first-class entity is more impactful than other distribution channels
- 2Wikidata provides a stable identifier, machine-readable citability, and free cross-linking for the dataset
- 3The author created six connected Wikidata entities for the dataset and its companion resources
Details
The article explains that by putting the Swedish Construction FAQ dataset on Wikidata, the author was able to achieve several benefits. Firstly, the dataset was assigned a stable identifier (Q139393633) that will outlive the author's GitHub account, domain, or company. Secondly, Wikidata is the knowledge graph that major AI systems like Google, Siri, Alexa, OpenAI, and Anthropic use, providing machine-readable citability for the dataset. The author also noted that Wikidata allows for the proper storage of the dataset's DOI, license, and language information as RDF triples. Additionally, Wikidata's cross-linking feature automatically connects the dataset to related entities like the publisher, companion datasets, and subject matter. All of this is achieved at no cost, requiring only a Wikidata user account.
No comments yet
Be the first to comment