Issue 391: Harmonizing Space Primitive

Starting Date: 
2018-08-02
Working Group: 
1
Status: 
Open
Background: 

Posted by Martin on 2/8/2018

Dear All,
I have just finished a draft of the section "recording space" of the guideline "Expressing the CIDOC CRM in RDF (https://docs.google.com/document/d/1zCGZ4iBzekcEYo4Dy0hI8CrZ7dTkMD2rJaxa...):

The recommended datatypes of RDF1.1 do not contain datatypes for describing geometric entities on the surface of earth. On the other side, they become increasingly important, and the CIDOC CRM version 6.2 on defines  E94 Space Primitive, subclass of:  E59 Primitive Value, as:

“This class comprises instances of E59 Primitive Value for space that should be implemented with appropriate validation, precision and references to spatial coordinate systems to express geometries on or relative to earth, or any other stable constellations of matter, relevant to cultural and scientific documentation.

An E94 Space Primitive defines an E53 Place in the sense of a declarative place as elaborated in CRMgeo (Doerr and Hiebel 2013), which means that the identity of the place is derived from its geometric definition. This declarative place allows for the application of all place properties to relate phenomenal places to their approximations expressed with geometries.

Definitions of instances of E53 Place using different spatial reference systems always result in definitions of different instances of E53 place approximating each other. It is possible for a place to be defined by phenomena causal to it, such as a settlement or a riverbed, or other forms of identification rather than by an instance of E94 Space Primitive. Any geometric approximation of such a place by an instance of E94 Space Primitive constitutes an instance of E53 Place in its own right, i.e., the approximating one. 

Instances of E94 Space Primitive provide the ability to link CRM encoded data to the kinds of geometries used in maps or Geoinformation systems. They may be used for visualisation of the instances of E53 Place they define, in their geographic context and for computing topological relations between places based on these geometries.

E94 Space Primitive is not further elaborated upon within this model. Compatibility with OGC standards are recommended.”

These standards currently do not have a common form comprising all others. Further, geometries defined with respect to particular object shapes, such as rotationally symmetric ones, are possibly open ended.

Therefore we define in the CRM RDFS the range of properties that use E94 Space Primitive in the definition of the CRM as rdfs:Literal, and recommend the user to instantiate it with adequate XML datatypes. These are for the surface of Earth  “ogc:gmlLiteral” or “geo:wktLiteral”.

In the current version of the CIDOC CRM, only the property “P168 place is defined by (defines place)” has range E94 Space Primitive.

Since any instance of E94 Space Primitive identifies unambiguously an instance of E53 Place by a symbolic expression, E94 Space Primitive must logically be regarded as a subclass of E41 Appellation, regardless whether this can be expressed in RDFS or OWL. See below for the relationship between datatypes an E41 Appellation.
In a footnote I make the argument that:

"The concepts E47 Spatial Coordinates,crmgeo: SP5 Geometric Place Expression, crmgeo:Q10 defines place and P168 place is defined by (defines place) need to be revised soon.
E94 Space Primitive should replace E47 Spatial Coordinates and SP5 Geometric Place Expression. P168 place is defined by (defines place) should replace crmgeo:Q10 defines place
. It may be useful in the CRM RDFS to specify two subproperties of P168, one having as range “geo:wktLiteral” and another “ogc:gmlLiteral”".

Current Proposal: 

Posted by Robert Sanderson on 2/8/2018

Martin, all,

 

I feel that the implications of your footnote are somewhat problematic. I agree overall with the clarifications but, SP4/SP5 add extra value.

In particular:

·         Use of literals prevents the association of additional information with the value, other than the custom datatype, especially:

o    Associating P2_has_type is enormously useful to give guidance on the usage for the particular geometry.  Types might distinguish simple bounding boxes for user interfaces from very accurate geo-political boundary data that would be useful for calculations. Or coastlines from other boundaries. Preferred from alternative.

o    The source / provenance of the data is very important.  Is this a bounding box that someone threw together, or data that is provided by an established authority?

o    There are more formats than just WKT and GML. GeoJSON and KML are very frequently used, and there are many more besides those. Not all formats have the capacity to embed the reference system within the literal.

o    Relationships between geometries are also useful, such as partitioning.

·         Literals can only be embedded within the serialized graph, rather than referenced externally. This means that the coastline of New Zealand (a 100+ mb file) would need to be embedded within the description of the E53 Place, rather than being referenced. Conversely a resource can have a URI and optionally a value, providing flexibility within a single model.

·         Relying on subproperties to manage the data type runs into the extensibility problem above. We would need to continually create new properties when there are new data types.

A cognate situation is rdfs:label vs E41 Appellation – label is great if you have very simple data, but E41 provides clear advantages when you want to do more than just display a string to a user. Having a single literal (be it a label or geometry) is great for the simple cases, but rdfs:label does not obviate the need for E41. Nor should P168 obviate the need for a richer spatial system.

Posted by Martin on 3/8/2018

Dear Robert,

Thank you for your quick comments!
Your comments well taken, I agree with all the needs you specify, but I would like to point you to a confusion between a place defined by a geometry and the place defined by natural features, that are approximated by places defined by geometry. It was the particular achievement of Gerald Hiebel's analysis to make this distinction. I kindly ask you to read his papers, if my explanations here are not clear enough:

    Hiebel, G.H, Doerr, M. (2013). Aspects of integrating geoinformation in Digital Libraries (Session S32, 669). Computer Applications and Quantitative Methods in Archaeology (CAA) 2013, Perth-Australia, 25th -28th March 2013.
    Hiebel, G.H, Doerr, M., & Eide, Ø. (2013). Integration of CIDOC CRM with OGC Standards to model spatial information (Session5, 522). Computer Applications and Quantitative Methods in Archaeology (CAA) 2013, Perth-Australia, 25th -28th March 2013. (pdf).
    Doerr, M., & Hiebel, G.H (2013). Where did the Varus battle take place? - A spatial refinement for the CIDOC CRM ontology (ID:760). Seventh World Archaeological Congress,  The Dead Sea, Jordan, January 13th - 18th 2013.
    Doerr, M., & Hiebel, G.H (2013). CRMgeo: Linking the CIDOC CRM to GeoSPARQL through a Spatiotemporal Refinement. 2013.TR435_CRMgeo_CIDOC_CRM_GeoSPARQL.pdf.


In  more detail:

On 8/2/2018 8:50 PM, Robert Sanderson wrote:
>

>
> Martin, all,
>

>
> I feel that the implications of your footnote are somewhat problematic. I agree overall with the clarifications but, SP4/SP5 add extra value.
>

>
> In particular:
>

>
> ·         Use of literals prevents the association of additional information with the value, other than the custom datatype, especially:
>
> o    Associating P2_has_type is enormously useful to give guidance on the usage for the particular geometry.  Types might distinguish simple bounding boxes for user interfaces from very accurate geo-political boundary data that would be useful for calculations. Or coastlines from other boundaries. Preferred from alternative.

All these are examples of "declarative places". The simple bounding box, the centroid, the representation of a coastline, all are places. The coastline itself, is another, a phenomenal place. Therefore, the distinctions you are making here are about the quality of approximation between a phenomenal and a declarative place (Q11 approximates). They are not a property of the geometric place expression. In my opinion, the only property geometric place expressions have is the type of encoding. The exactly same geometry can be defined with different encoding types. The encoding type however is embedded in the XML datatype already, so there is no need to create an intermediate URI. We had cases in which it was registered which encoding a GPS device created, and which encoding was a translation of the former. In both cases, the device measured the same Place. I'd argue that this is not enough reason to reify the encoded string itself.
>
> o    The source / provenance of the data is very important.  Is this a bounding box that someone threw together, or data that is provided by an established authority?
Again, these are properties of the declarative place. How was it defined and why? If, what you attribute to an instance of SP5, you would attribute to an instance of E53 Place.P168...E94, we are talking exactly about the same information, isn't it?
>
> o    There are more formats than just WKT and GML. GeoJSON and KML are very frequently used, and there are many more besides those. Not all formats have the capacity to embed the reference system within the literal.

No problem, use "P157  is at rest relative to (provides reference space for)" for the declarative place, or a suitable type.
>
> o    Relationships between geometries are also useful, such as partitioning.

Right, these are topological relations, and not relations between encodings. They hold for the mathematical space defined, and do not differ from encoding to encoding. So, they are relations between E53 Places, and we have a lot of them in CRMbase and CRMgeo.
>
> ·         Literals can only be embedded within the serialized graph, rather than referenced externally. This means that the coastline of New Zealand (a 100+ mb file) would need to be embedded within the description of the E53 Place, rather than being referenced.

Again: The coastline of New Zealand is a fuzzy, rough thing of infinite length. Any representation is a Place in its own right, related by Q11 to the real coastline. All properties you require should be there.

If you feel my text (not the foot note) does not make it clear enough that each geometry expression defines a PLace in its own right, distinct from the Place it was made to approximate, please propose additional wording.
>
> Conversely a resource can have a URI and optionally a value, providing flexibility within a single model.
>
> ·         Relying on subproperties to manage the data type runs into the extensibility problem above. We would need to continually create new properties when there are new data types.

Right. They are redundant, because the XML datatypes identify themselves. The query for a different property may be sometimes more convenient than querying for the datatype found in the Literal.
>

>
> A cognate situation is rdfs:label vs E41 Appellation – label is great if you have very simple data, but E41 provides clear advantages when you want to do more than just display a string to a user. Having a single literal (be it a label or geometry) is great for the simple cases, but rdfs:label does not obviate the need for E41.

I agree!!
>
> Nor should P168 obviate the need for a richer spatial system.

I argue that this is different. Geometric Place Expressions do not have a rich cultural history as names do (Martinus, Marty, Μαρτίνος, Αριανός....)
and the Place is not the Expression.

The devil is in the detail: OWL does not like classes which have either URIs or data values as instances. Therefore I argue not to make more of these constructs.
The problem with Appellation is already big enough.

Opinions?
 

Posted by Robert Sanderson on 3/8/2018

Thanks Martin!

To make certain I understand, the notion of SP6 Declarative Place would remain, along with Q11 approximates.  E53 would become the Phenomenal Place, and SP 5 Geometric Place Expression goes away in favor of P168 to a literal.

Thus:

Rob was born in Rangiora, New Zealand could be:

_:rob a E21_Person ;

  rdfs:label “Rob” ;

  p98i_was_born [

    a E67_Birth ;

    p7_took_place_at [

      a E53_Place ;

      rdfs:label “Rangiora” ;

      q11i_approximated_by [

        a SP6_Declarative_Place ;

        p2_has_type <xxx:Geospatial_Bounding_Box> ;

        rdfs:label “Bounding Box for Rangiora” ;

        P168_place_is_defined_by “POLYGON((172.565456 -43.285409, 172.622116 -43.285409, 172.622116 -43.323697, 172.565456 -43.323697, 172.565456 -43.285409))”

      ]

    ]

  ] . 

And further SP6s could be introduced for other approximations, such as centroids, points, exact boundaries, different coordinate systems, etc.

I had interpreted the footnote that SP6 would also be collapsed into Place, which I understand not to be the case now.

Given that I was only born at one location, the E53 provides the unique reference, and SP6 provides the ability to have different approximations of that location.  If only one approximation was needed, then E53 and SP6 could be collapsed, as SP6 is a subclass of E53. (Though that doesn’t seem like a good idea…)

Is that all correct?

Posted by Martin on 4/8/2018

On 8/3/2018 8:56 PM, Robert Sanderson wrote:

> Thanks Martin!


> To make certain I understand, the notion of SP6 Declarative Place would remain, along with Q11 approximates.  E53 would become the Phenomenal Place, and SP 5 Geometric Place Expression goes away in favor of P168 to a literal.
Nearly. SP6 Declarative Place should remain. E53 is and will remain the superclass of SP6 Declarative Place and SP2 Phenomenal Place. Q11 approximates may be generalized to E53 domain and range, because, e.g., the place of a building may approximate the place of a meeting. SP5 is already declared as subclass of E94.
Mapping E94 to Literal is a question of CRM RDFS, not of CRMbase or CRMgeo. In CRMgeo, SP5 appears to be redundant to E94. So, It could go away in CRMgeo.


> Thus:
> Rob was born in Rangiora, New Zealand could be:

> _:rob a E21_Person ;
>
>   rdfs:label “Rob” ;
>
>   p98i_was_born [
>
>     a E67_Birth ;
>
>     p7_took_place_at [
>
>       a E53_Place ;
>
>       rdfs:label “Rangiora” ;
>
>       q11i_approximated_by [
>
>         a SP6_Declarative_Place ;
>
>         p2_has_type <xxx:Geospatial_Bounding_Box> ;
>
>         rdfs:label “Bounding Box for Rangiora” ;
>
>         P168_place_is_defined_by “POLYGON((172.565456 -43.285409, 172.622116 -43.285409, 172.622116 -43.323697, 172.565456 -43.323697, 172.565456 -43.285409))”
>
>       ]
>
>     ]
>
>   ] .

Exactly!
> And further SP6s could be introduced for other approximations, such as centroids, points, exact boundaries, different coordinate systems, etc.
> I had interpreted the footnote that SP6 would also be collapsed into Place, which I understand not to be the case now.

The question is, what to do with Q10, and if SP6  is needed as distinct class, because using P168 or Q10 implies that the instance of E53 defined by a geometric expression is, in particular, a declarative place. For methodological reasons, we avoid in a core ontology to define a class and a property which imply each other, because it creates a priority conflict when ontological distinctions begin to differ. Currently, explicitly naming SP6 appears to be more didactically useful. It appears that Q10 is causal to SP6, and not Q10 a consequence of SP6.
> Given that I was only born at one location, the E53 provides the unique reference, and SP6 provides the ability to have different approximations of that location.  If only one approximation was needed, then E53 and SP6 could be collapsed, as SP6 is a subclass of E53. (Though that doesn’t seem like a good idea…)
This is not correct. Albeit that SP6 is a subclass of E53, deleting the subclass does not mean that two different places become one. Even if we do not distinguish at the class level between SP2 and SP6, and if there is only one approximation, the instances of the phenomenal and the approximating place are distinct, and will have different types. If a place is defined by P168, it can only be declarative.

If we would like to describe a phenomenal place for reasons of disambiguation etc. by a geometric expression directly, we would need a shortcut of SP2 - Q11 - SP6 - P168 - E94,  or, abandoning SP2 and SP6 explicitly, of E53 ("phenomenal") - Q11 - E53("declarative") - P168 - E94.

Another reason why I tend to avoid SP6 in CRMbase is that E53 Place may not be either phenomenal or declarative. There are mixed forms, we have not discussed in CRMgeo yet, such as borderlines partly defined by declaration, and partly by physical boundaries, and we need containers for them. 

Posted by Gerald Hiebel on 6/8/2018

 Dear Martin, Rob and All,
Thanks very much for elaborating on the issues related to Space Primitives.
I would like to add/emphasis some off the points Rob and Martin made:
Properties and Provenance of Declarative Places:
I believe Martins approach to relate these relations to the E53 Place (defined by (P168) an E94) and not the E94 is a good choice as the recording of the Properties and Provenance of Declarative Places
becomes increasingly important when having multiple geometries (coming from multiple sources and thus multiple methods to create the geometry) that approximate one phenomenal place.
For different applications and reasonings I need to know more about the declarative place (Geometry) and its provenance and type.

An example: Right now I am working on the integration of several different Gazetteers and I need to record the provenance and type of the geometry in order to make a decision which geometry to use as preferred or for a specific purpose.

Formats of serialisation:
One goal of CRMgeo was to relate CIDOC CRM to OGC GeoSPARQL and thus make use of the developments and standards of OGC. In OGC GeoSPARQL one goal for further work was to enhance the specific serialisation formats, explicitly stating 
KML and GeoJson. Unfortunately GeoSPARQL did not evolve quickly, although it is still discussed (https://www.w3.org/2015/spatial/wiki/Further_development_of_GeoSPARQL) and serialisation is a major issue.
Nevertheless GeoSPARQL offers a general property GeoSPARQL:#hasSerialization that allows for encoding in serialisations different to WKT or GML. The type of the encoding would then probably needed to be stated in a P2_has_type of the E53.
Another option may be to create specific subproperties of GeoSPARQL:#hasSerialization in a new version of CRMgeo.
(please comment)
Relationships between geometries:
When geometries are treated as declarative Places and in CRMcore as E53 the spatial relationships of CRM are available.
Through the linking to GeoSPARQL the topological relations of GeoSPARQL are available as well.
In the paper "CRMgeo: A spatiotemporal extension of CIDOC-CRM" (attached, https://link.springer.com/article/10.1007/s00799-016-0192-4) we provided some graphics showing the relations of CRMgeo and GeoSPARQL and in figure 4 and 5 you see that the topological properties of  GeoSPARQL are available CRM Places if you need richer topological relations.

A comment on the example of Rob:

_:rob a E21_Person ;

  rdfs:label “Rob” ;

  p98i_was_born [

    a E67_Birth ;

    p7_took_place_at [

      a E53_Place ;

      rdfs:label “Rangiora” ;

      q11i_approximated_by [

        a SP6_Declarative_Place ;

        p2_has_type <xxx:Geospatial_Bounding_Box> ;

        rdfs:label “Bounding Box for Rangiora” ;

        P168_place_is_defined_by “POLYGON((172.565456 -43.285409, 172.622116 -43.285409, 172.622116 -43.323697, 172.565456 -43.323697, 172.565456 -43.285409))”

      ]

    ]

  ] .


And further SP6s could be introduced for other approximations, such as centroids, points, exact boundaries, different coordinate systems, etc.

I had interpreted the footnote that SP6 would also be collapsed into Place, which I understand not to be the case now.

Given that I was only born at one location, the E53 provides the unique reference, and SP6 provides the ability to have different approximations of that location.  If only one approximation was needed, then E53 and SP6 could be collapsed, as SP6 is a subclass of E53. (Though that doesn’t seem like a good idea…)

E53 provides the unique reference:

I would interpret the  E53_Place in the example as  Rangiora the town and not the spatial projection(P161) of the spacetime volume(E92) of the birth event (E67), which is a much smaller place and unique.

The birth place and Rangiora are two distinctive places with the topological relation that one falls within the other. 

Posted by Robert Sanderson on 6/8/2018

Thank you for the clarifications :) I agree that Q10 is causal rather than consequential, and that Q10 / P168 have identical semantics when SP5 G.P.E. is no longer in the picture.  I also strongly agree with the design principle of not defining class-specific properties, for just the reason you cite.

One minor point…

I agree that not all E53s are Declarative [exclusive] or Phenomenal, otherwise there would be no need for the subclassing. However, you say …

> If a place is defined by P168, it can only be declarative.

Do you mean that P168 should have a domain of SP6 and no longer E53?  Meaning that SP6 would need to be pulled into core, or P168 moved out to CRMGeo?

Similarly, Q11 crosses the core/geo boundary by relating E53 and SP6. If SP6 moves to core, then so should Q11?

Posted by Martin on 6/8/2018

Dear Robert,

On 8/6/2018 7:34 PM, Robert Sanderson wrote:

> Thank you for the clarifications :) I agree that Q10 is causal rather than consequential, and that Q10 / P168 have identical semantics when SP5 G.P.E. is no longer in the picture.  I also strongly agree with the design principle of not defining class-specific properties, for just the reason you cite.
> One minor point…
> I agree that not all E53s are Declarative [exclusive] or Phenomenal, otherwise there would be no need for the subclassing. However, you say …
> > If a place is defined by P168, it can only be declarative.
> Do you mean that P168 should have a domain of SP6 and no longer E53?  Meaning that SP6 would need to be pulled into core, or P168 moved out to CRMGeo?

No, I mean that P168 can stay with domain E53 in CRMbase, and CRMgeo can declare SP6(x) == E53(x) AND exists y: P168(x,y) or so.
CRMgeo will still need SP6 because it matches with OPEN GIS "geometry". One purpose of CRM geo is to link CRM with OPENGIS.
>
> Similarly, Q11 crosses the core/geo boundary by relating E53 and SP6. If SP6 moves to core, then so should Q11?

I mean Q11 should have E53 domain and range even in CRMgeo, because any kind of place can approximate any kind of place.

Posted by Robert Sanderson on 7/8/2018

Gotcha, thanks Martin! That all sounds good.

 

Posted by Martin on 13/8/2018

Thank you very much for your comment! How would you link GeoSPARQL into CRM RDFS? Would it make sense then to recommend GeoSPARQL as THE form to encode geometry expressions?

By the way,
the meaning of    "p7_took_place_at [  a E53_Place ;" is the actual place of the event OR any wider one.
So, "Rangiora, the town" is a good range here, as well as the actual place of birth.
If we want to refer to the phenomenal place itself only, we would use the spatial projection
"P161 has spatial projection (is spatial projection of)".

Posted by Gerald Hiebel on 17/8/2018

Dear Martin and All,
In CRMgeo we specified the relations between CRMgeo classes and GeoSPARQL classes like crmgeo:SP1_Phenomenal_Spacetime_Volume as a subclass of geosparql:Feature .
I believe the value of CRMgeo and GeoSPARQL is in making explicit the relations between CRM concepts and OGC concepts.
GeoSPARQL is probably the most elaborated formalised conceptual model of the OGC standards and more complex than lots of other other spatial vocabularies/ontologies.
It offers quite sophisticated topological relations between its concepts that are based on models of the GIS community.

In regard to encoding  geometry expressions it only explicitly offers specific properties for GML (geosparql:asGML) and WKT (geosparql:asWKT), which are both used rather as exchange formats for geometries than directly in GIS systems.
For practical reasons it may make sense to store the geometries (if they are points, what is often the case) just in latitude/longitude in WGS 84  like specified in the "Basic Geo (WGS84 lat/long) Vocabulary” (https://www.w3.org/2003/01/geo/) or
schema.org (https://schema.org/GeoCoordinates). Also for more complex geometries other encodings than GML and WKT may be more at hand, practical and useful.
As the developments of GeoSPARQL are slow to incorporate other serialisations for geometric expressions I believe we need a recommendation/mechanism to specify additional encodings.
The recommendation should  follow the GeoSPARQL examples/practice, so that we have no problems with the existing GeoSPARQL concepts and are in line with further GeoSPARQL developments.
Should we put it as an issue for the next CRM-meeting?

Posted by Nicola Carboni on 22/8/2018

Dear all,

> For practical reasons it may make sense to store the geometries (if they are points, what is often the case) just in latitude/longitude in WGS 84  like specified in the "Basic Geo (WGS84 lat/long) Vocabulary” (https://www.w3.org/2003/01/geo/) or
> schema.org (https://schema.org/GeoCoordinates). Also for more complex geometries other encodings than GML and WKT may be more at hand, practical and useful.

While WKT is great for many reasons (compact, case insensitive, cross-domain support..) It seems to me that a quite large set of heritage institutions encode the geographical coordinates only using latitudine/longitude, therefore the basic geo would better fit their needs. What seems to me a viable option is to recommend the use of WKT or GML (as in “please use at least one of these in combination with other encoding"), and still be able to accomodate other formats such as GeoJson, KLM, Geohash etc.

In our case we use ISA Programme Location Core vocab and Basic Geo to express the latitude and longitude:

<https://collection.example.com/resource/P00006626> a crm:E53_Place ;
    crm:P168_place_is_defined_by "POINT (9.1232696 45.2503146)"^^geo:wktLiteral ;
    locn:geometry [
      geo:lat "51.477811" ;
      geo:long "-0.001475"
    ] .


but would be great to have the possibility to define both the WKT and lat/long directly with CRM.