View on GitHub

artsdata-data-model

Overview of how data is modelled in Artsdata.ca.

Artsdata Data Model v0.1

A simple data model for Performing Arts Events and related Places, People and Organizations.

You are welcome to give feedback and review current issues with GitHub Issues.

There is also a PDF of the data that Footlight feeds into Artsdata. At the moment, the data that Footlight publishes on websites of arts organizations and data in Artsdata are very similar, but as more and more sources feed into Artsdata, the shape of the data between Footlight and Artsdata may diverge, with Footlight being a subset of classes and properties collected in Artsdata.

The classes and properties used in Artsdata represent a “thin” layer of data roughly specified by Google Event Structured Data. The main difference is that Artsdata enforces links between entities within Artsdata and interlinks URIs outside of Artsdata including links to Wikidata and other LOD sources. Artsdata also generates unique global identifiers (IRIs also called URIs) for classes such as Event, Person, Place, and Organization.

Here are the main Classes used in Artsdata.

Image

[open drawing tool]

Classes

  1. Event
  2. Offer
  3. Organization
  4. Person
  5. Place
  6. PostalAddress
  7. WebPage

Ontologies & Inferencing

Artsdata.ca uses a basic set of RDFS and OWL entailments (or ruleset) to enable simple inferencing, called OWL-Horst (optimized).

The main ontology used in Artsdata.ca is Schema.org. Artsdata.ca imports the core Schema.org schema and the pending Schema.org schema (to include schema:EventSeries which is a pending class). In addition, Artsdata.ca imports the DBpedia core schema which includes class and property mappings between Schema.org and Wikidata.org. Note: in order to make use of the mappings in Artsdata.ca, we transform the domain wikidata.dbpedia.org/resource to match Wikidata’s domain www.wikidata.org/entity.

The versions of OWL-Horst, Schema.org schemas and DBpedia schema used by Artsdata.ca are located in this GitHub repository under “_triples”.

Artsdata.ca has a large number of class and property mappings between Schema.org, Wikidata.org, DBpedia.org, FOAF and DOLCE+DnS Ultralite (Ontology Design Patterns) using owl:equivalentClass and owl:equivalentProperty. The majority of mappings come prebuilt from external ontologies (see above) with some additional Artsdata.ca specific mappings to Wikidata.

Current work into the next version of the Artsdata.ca ontology is being influenced by the work at CAPACOA’s Linked Digital Future initiative and involves aligning the data model with data models used in cultural heritage including CIDOC-CRM, FRBRoo, PROV and RDA to mention a few. The data models will be futher specificed by a domain-specifc vocabulary to be released in the upcoming versions.

Provenance

Data is great, but it is not the ultimate truth, and without traceability it can lose our trust. For example, what if two web pages have different dates for the same performing arts event. Which source is more trust worthy? How can we follow the data back to the source to decide for ourselves?

To track provenance Artsdata.ca uses metadata attached to named graphs. Each data source in Artsdata.ca is stored in a separate named graph. The graph’s URI is used as the subject of the provenance metadata. This technique to track provenance is generally called the Named Graphs approach. Each named graph URI is a prov:Entity and is linked to provenance metadata including the date when the data was loaded, the software used to collect it and the email of the contributing organization. Each time data is imported, whether from a web site, spreadsheet or existing triple store, the graphs provenance metadata is updated. In addition, when the data source is directly from a crawled web page, the schema:WebPage entity includes the date when the web page was crawled.

In the future we will likely switch to RDF* (pronounced “RDF star”) inorder to have more granular provenance data on individual statements.

Data Flow Architecture

In principle anyone can add data to Artsdata.ca as long as certain data requirements are met. Here is a diagram about how data flows in and out of Artsdata.ca.

Caching LOD

Artsdata.ca loads LOD from Wikidata and DBpedia in order to cache it for performance reasons. The triples are obtained using content negotiation (instead of data dumps) and are cached unmodified in their respective named graphs. *Note: there is one notable exception, the Wikidata property P31 (instance of) is transformed to rdf:type. This same result could have been accomplished using owl:equivalentProperty but it was not selected for performance reasons.

Naming Conventions

Conventions on how to name things when in doubt.

Support or Contact

Contact support and we’ll help you sort it out.