Introduction
This is the third lesson on Design Patterns for the Semantic Web. In the last lesson we looked at some identifier patterns that can help guide the creation of an identifier scheme for a dataset. Thinking about the identifiers for a dataset can help us create a useful framework around which we can assemble our data.
In this lesson we will be looking at some modeling patterns. Modeling is a very broad area and so is a rich source of patterns. Some patterns are very general while others are domain specific and may just be recommendations to use particular RDF vocabularies or terms.
For this lesson we’ll focus on a couple of general patterns that provide a good starting point for those new to RDF modeling.
Prerequisites
Semantic Web Design Patterns—Introduction
Today’s Lesson
Modeling in RDF is really not that different to modeling in any other context. RDF builds on the Entity-Attribute-Value model that has been use for many years. We focus our efforts on identifying the key entities in the data (the RDF resources and types), their attributes and their relationships (RDF properties). This lets us build up a logical domain model that can be used to capture our data.
But unlike other approaches when using RDF, we don’t need to translate our logical model into a physical model. E.g. a set of relational database tables, or an object-oriented class hierarchy. In an RDF system the logical model is exactly how the data is stored: an RDF triple store can store any RDF graph using any RDF vocabulary.
Often we can avoid a modeling exercise entirely. RDF vocabularies and OWL ontologies can be shared on the web and re-used and combined to help us describe our data. The first step of any RDF modeling exercise should be to look at what has already been published. Where existing vocabularies don’t provide enough coverage, it may be enough to add some simple extensions that customize or extend the model for our needs. This lets us directly benefit from efforts in the broader community.
In our first lesson on patterns we learned that design patterns help to communicate modeling decisions between engineers. So, even when we’re re-using an existing model or someone’s else data, it’s useful to understand some basic modeling patterns so we can understand how some data has been structured.
Modeling Patterns
The next sections introduce a couple of useful modeling patterns that cover some questions commonly encountered by people new to RDF modeling.
As with the last article, we’ll use a slightly abbreviated form of design pattern focusing on the question, a solution and some brief discussion. Links are provided to the full pattern description in the Linked Data Patterns book.
Link Not Label
How do we model a dataset to maximize benefits of a graph based model?
Solution
Ensure that all entities in a dataset are modeled as first-class resources. In many applications and even industry standard models and formats, modeling effort is focused on a few core entities whilst others are under-described. For example, an author of a book might be captured as just a name. Or a subject category as a simple tag or label.
A good approach is to look for any controlled vocabularies, keywords, or dimensions in a dataset and model those as resources. Even structured literal values like dates might be more usefully modeled as resources.
Discussion
A graph model becomes richer as we add more entities and relationships. We can enrich our models if we Link Not Label resources. Creating a richer graph model means that we have more flexibility when it comes to adding more data to the graph. For example if we later need to capture biographical data about an author then we can simply annotate the relevant resource. This isn’t possible if we have only captured the name of the author as a literal label. A richer model also offers more options for querying and navigating over a dataset.
In RDF if we want to allow for annotation, linking and enrichment then we need to start with resources not literal values.
Once the entities in a data model have been identified we may find that the relationships between resources are more complex than a simple binary relationships between two resources. We may also need to qualify the relationship between those resources. There are two other design patterns that illustrate how to do this.
N-Ary Relation
How can a complex relation, involving several resources, be modeled as RDF?
Solution
RDF triples express binary relationships between two resources. To express more complex relationships we must model the relationship. The relationship becomes a resource in our data model. We can then associate together many different resources via this relationship resource.
For example we could model a marriage as a simple binary relationship between two people. However we could also describe the marriage as a resource in its own right, this would then let us also capture the venue for the marriage:
<http://example.org/person/bob> a foaf:Person.
<http://example.org/person/mary> a foaf:Person.
<http://example.org/marriage/mary-bob> a ex:Marriage;
ex:partner eg:bob;
ex:partner eg:mary;
ex:venue <http://example.org/venue/las-vegas>.
Discussion
An N-Ary Relationship is a relationship that involves more than two resources. N-Ary relationships are often necessary when we need to capture information about an event and its context. Examples include a marriage, an order in an ecommerce system, or a diagnosis. In these cases we may need to know where the event occured, who and what was involved, and perhaps identify the outcome.
N-Ary relationships allow our model to be more expressive at the cost of capturing more data. Instead of a single triple we end up with a whole new resource and several new relationships. This increases the size of the dataset and we often need new vocabulary terms to describe the relationships. That extra flexibility may not always be necessary: for some datasets simple binary relationships may be sufficient as there is little or no context to be captured.
Qualified Relation
How can we describe or qualify a relationship between two resources?
Solution
We may need to qualify relationships to indicate a level of certainty (“how confident is the diagnosis?”), or capture some additional context (“when did the marriage take place?”). As with the N-Ary Relationship pattern, whenever we need more than a simple binary relationship, we must explictly model the relationship in our data. Modeling the relationship gives us a richer graph model that can then be annotated with additional data. Extending our previous example we can add a date to the marriage relationship:
<http://example.org/marriage/mary-bob> a ex:Marriage;
ex:partner eg:bob;
ex:partner eg:mary;
ex:venue <http://example.org/venue/las-vegas>;
ex:date “2012-08-05″^^xsd:date.
Discussion
Qualified Relations are similar to N-Ary relationships: both require us to model a relationship rather than rely on a simple property to relate two resources. In the N-Ary relationship pattern we do this to describe relationships between more than two resources; in the Qualified Relations pattern our motivation is to add some qualification to a relationship, usually between two resources. The two patterns aren’t mutually exclusive and are often applied together.
Whenever we need to qualify a specific relationship between two resources, e.g. to annotate it with information on its strength, when it was created (or ended), or with details about who stated the relationship, then we need to apply the Qualified Relation pattern. If the qualification is more general, e.g. if it applies to all instances of that relationship, then we don’t need to apply this pattern. We can instead just annotate the RDF property in our schema.
As before, applying this pattern results in a larger dataset and a more complex model. Understanding when the extra flexibility is required is an important part of creating a good RDF model.
Conclusion
In this article we’ve looked at a couple of basic design patterns that can help us enrich our RDF models. When we first learn about RDF we focus on the simplicity of the RDF triple and how to relate together two resources. But new users then wonder how to go about expressing more complex relationships. Two of the patterns introduced in this article illustrate how to describe relationships between more than two resources and how relationships can be qualified to capture extra context.
Both of those patterns can be seen as an extension of the first (“Link Not Label”). By ensuring that we have a rich graph model we gain more flexibility when describing our data, no matter how much detail we need to capture. Teasing out the entities in a dataset and deciding how to model relationships are important first steps in creating a good RDF model.
In the next tutorial in this series we’ll look at some data management patterns that can help us organize a triple store and make our datasets more manageable.