View on GitHub

artsdata-data-model

Overview of how data is modelled in Artsdata.ca.


Artsdata Data Model v0.2

Edit page |

A simple data model for Performing Arts Events and related Places, People and Organizations.

The Artsdata data model (ontology) is a sub-set of Schema.org along with a few controlled vocabularies specific to Artsdata. The data mode is formally represented using the language SHACL here.

The classes and properties used in Artsdata resemble Google Event Structured Data. The main difference is that Artsdata creates links between entities within Artsdata and interlinks URIs outside of Artsdata including links to Wikidata and other LOD (Linked Open Data) sources. Artsdata also generates unique global identifiers (IRIs also called URIs) for classes such as Events, Persons, Places, and Organizations.

Here are the main Classes used in Artsdata.

Image

[open drawing tool]

Classes

  1. EventAttendanceModeEnumeration
  2. EventStatusType
  3. Event
  4. Offer
  5. Organization
  6. Person
  7. Place
  8. PostalAddress
  9. VirtualLocation
  10. WebPage

Bridge Identifiers

In addition to Artsdata Identifiers, the Artsdata Knowledge Graph relies on other persistent and unique identifiers, like wikidata and ISNI, to recognize and reconcile entites of type Organization, Person and Place.

Recommendations on using persistent identifiers in the performing arts

Structured Data Templates

Event templates Person templates

SHACL Validation Reports

SHACL shapes are used to validate data before importing.

Ontologies & Inferencing

Artsdata.ca uses a basic set of RDFS and OWL entailments (or ruleset) to enable simple inferencing, called OWL-Horst (optimized).

The main ontology used in Artsdata.ca is Schema.org. Artsdata.ca imports the core Schema.org schema and the pending Schema.org schema (to include schema:EventSeries which is a pending class).

Artsdata.ca has a large number of class and property mappings between Schema.org, Wikidata.org, DBpedia.org, FOAF and DOLCE+DnS Ultralite (Ontology Design Patterns) using owl:equivalentClass and owl:equivalentProperty. The mappings come prebuilt from external ontologies.

Current work into the next version of the Artsdata.ca ontology is being influenced by the work at CAPACOA’s Linked Digital Future initiative and involves aligning the data model with data models used in cultural heritage including, but not limited to, CIDOC-CRM, FRBRoo, PROV and RDA. The data models will be futher specificed by a domain-specifc vocabulary to be released in the upcoming versions.

Exceptions handling schema.org in Artsdata

Artsdata converts all schema.org https URIs to http URIs, and also makes the following transformations:

  1. schema:eventStatus and schema:eventAttendanceMode objects are converted to URIs in Artsdata, whereas the schema.org @context sets them to Literals.
  2. schema:url objects are converted to Literals in Artsdata, whereas the schema.org @context sets them to URIs.
  3. datatype schema:DateTime is converted to xsd:dateTime to enable SPARQL to handle time.
  4. datatype schema:Date is converted to xsd:date to enable SPARQL to handle time.

Ontologies loaded into Artsdata

Provenance

Data is great, but it is not the ultimate truth, and without traceability it can lose our trust. For example, what if two web pages have different dates for the same performing arts event. Which source is more trust worthy? How can we follow the data back to the source to decide for ourselves?

To track provenance, Artsdata.ca uses metadata attached to named graphs. Each data source in Artsdata.ca is stored in a separate named graph. The graph’s URI is used as the subject of the provenance metadata. This technique to track provenance is generally called the Named Graphs approach. Each named graph URI is a prov:Entity and is linked to provenance metadata including the date when the data was loaded, the software used to collect it and the email of the contributing organization. Each time data is imported, whether from a web site, spreadsheet or existing triple store, the graphs provenance metadata is updated. In addition, when the data source is directly from a crawled web page, the schema:WebPage entity includes the date when the web page was crawled.

Minted entities in Artsdata.ca is master data and is therefore not from an external source. To track provenance metadata on minted entity master data, RDF-star is used to quote triples as provenance entities using the provenance ontology.

Data Flow Architecture

In principle, anyone can add data to Artsdata.ca as long as certain data requirements are met. Here is a diagram about how data flows in and out of Artsdata.ca.

Caching LOD

Artsdata.ca loads LOD from Wikidata and DBpedia in order to cache it for performance reasons. The triples are obtained using content negotiation (instead of data dumps) and are cached unmodified in their respective named graphs.

Note: there is one notable exception, the Wikidata property P31 (instance of) is transformed to rdf:type. This same result could have been accomplished using owl:equivalentProperty but it was not selected for performance reasons.

Naming Conventions

Conventions on how to name things when in doubt.

  1. LDF/ANL Recommended identifiers: spreadsheet with guidelines on using identifiers. This document may eventually be converted into an Artsdata official recommendation.

Support or Contact

Contact support and we’ll help you sort it out.