Gellish
Gellish is an ontology language for data storage and communication, designed and developed by Andries van Renssen since mid-1990s.[1] It started out as an engineering modeling language ("Generic Engineering Language", giving it the name, "Gellish") but evolved into a universal and extendable conceptual data modeling language with general applications. Because it includes domain-specific terminology and definitions, it is also a semantic data modelling language and the Gellish modeling methodology is a member of the family of semantic modeling methodologies.
Although its concepts have 'names' and definitions in various natural languages, Gellish is a natural-language-independent formal language. Any natural language variant, such as Gellish Formal English is a controlled natural language. Information and knowledge can be expressed in such a way that it is computer-interpretable, as well as system-independent and natural language independent. Each natural language variant is a structured subset of that natural language and is suitable for information modeling and knowledge representation in that particular language. All expressions, concepts and individual things are represented in Gellish by (numeric) unique identifiers (Gellish UID's). This enables software to translate expressions from one formal natural language to any other formal natural languages.
Overview
Gellish is intended for the expression of facts (statements), queries, answers, etc. For example, for the complete and unambiguous specification of business processes, products, facilities and physical processes; for information about their purchasing, fabrication, installation, operation and maintenance; and for the exchange of such information between systems, although in a system-independent, computer-interpretable and language-independent way. It is also intended for the expression of knowledge and requirements about such things.
The definition of Gellish can be derived from the definition of Gellish Formal English by considering 'expressions' as relations between the Unique Identifiers only. The definition of Gellish Formal English is provided in the Gellish English Dictionary-Taxonomy, which is a large 'smart dictionary' of concepts with relations between those concepts (earlier it was called STEPlib). The Dictionary-Taxonomy is called a 'smart dictionary', because the concepts are arranged in a subtype-supertype hierarchy, making it a taxonomy that supports inheritance of properties from supertype concepts to subtype concepts. Furthermore, because together with other relations between the concepts, the smart dictionary is extended into an ontology. Gellish has basically an extended object-relation-object structure to express facts by relations, whereas each fact may be accompanied by a number of auxiliary facts about the main fact. Examples of auxiliary facts are author, date, status, etc. To enable an unambiguous interpretation, Gellish includes the definition of a large number (more than 650) of standard relation types that determine the rich semantic expression capability of the language.
In principle, for every natural language there is a Gellish variant that is specific for that language. For example, Gellish Dutch (Gellish Nederlands), Gellish Italian, Gellish English, Gellish Russian, etc. Gellish does not invent its own terminology, such as Esperanto, but uses the terms from natural languages. Thus, the Gellish English dictionary-taxonomy is like an (electronic) ordinary dictionary that is extended with additional concepts and with relations between the concepts.
For example, the Gellish dictionary-taxonomy contains definitions of many concepts that also appear in ordinary dictionaries, such as kinds of physical objects like building, airplane, car, pump, pipe, properties such as mass and color, scales such as kg and bar, as well as activities and processes, such as repairing and heating, etc. In addition to that, the dictionary contains concepts with composed names, such as 'hairpin heat exchanger', which will not appear in ordinary dictionaries. The main difference with ordinary dictionaries is that the Gellish dictionary also includes definitions of standard kinds of relations (relation types), which are denoted by standard Gellish English phrases. For example, it defines relation types such as ⟨is a subtype of⟩, ⟨is classified as a⟩, ⟨has as aspect⟩, ⟨is quantified as⟩, ⟨can be a performer of a⟩, ⟨shall have as part a⟩, etc. Such standard relation types and concept definitions enable a Gellish-powered software to correctly and unambiguously interpret Gellish expressions.
Gellish expressions may be expressed in any suitable format, such as SQL or RDF or OWL or even in the form of spreadsheet tables, provided that their content is equivalent to the tabular form of Gellish Naming Tables (which define the vocabulary) and Fact Tables (together defining a Gellish Database content) or equivalent to Gellish Message Tables (for data exchange). An example of the core of a Message Table is the following:
Left-hand term | Relation type | Right-hand term |
---|---|---|
centrifugal pump | is a subtype of | pump |
P-123 | is classified as a | centrifugal pump |
P-123 | has as aspect | the mass of P-123 |
the mass of P-123 | is classified as a | mass |
the mass of P-123 | is qualified as | 50 kg |
A full Gellish Message Table requires additional columns for unique identifiers, the intention of the expression, the language of the expression, cardinalities, unit of measure, the validity context, status, creation date, author, references, and various other columns. Gellish Light only requires the three above columns, but then it does not support, for example, capabilities to distinguish homonyms; automated translation; and version management, etc. Those capabilities and several others are supported by Full Gellish. The following example illustrates the use of some additional columns in a Gellish Message Table, where UoM stands for 'unit of measure'.
Fact UID | Intention | Left UID | Left term | Relat. UID | Relation type | Right UID | Right term | UID of UoM | UoM | Status |
---|---|---|---|---|---|---|---|---|---|---|
201 | statement | 130058 | centrifugal pump | 1146 | is a subtype of | 130206 | pump | accepted | ||
202 | statement | 102 | P-123 | 1225 | is classified as a | 130058 | centrifugal pump | proposed | ||
203 | statement | 102 | P-123 | 1727 | has as aspect | 103 | mass of P-123 | proposed | ||
204 | statement | 103 | mass of P-123 | 1225 | is classified as a | 550020 | mass | proposed | ||
205 | statement | 103 | mass of P-123 | 5020 | is qualified as | 920303 | 50 | 570039 | kg | proposed |
The collection of standard relation types define the kinds of facts that can be expressed in Gellish, although anybody can create their own proprietary extension of the dictionary and thus can add concepts and relation types as and when required.
As Gellish is a formal language, any Gellish expression may only use concepts that are defined in a Gellish dictionary, or the definition of any concept should be ad hoc within the collection of Gellish expressions. Knowledge bases can be created by using the Gellish language and its concept definitions in a Gellish Dictionary. Example applications of a Gellish dictionary are usage as a source of classes for classification of equipment, documents, etc., or as standard terminology (metadata) or to harmonize data in various computer systems, or as a thesaurus or taxonomy in a search engine.
Gellish enables automatic translation, and enables the use of synonyms, abbreviations and codes as well as homonyms, due to the use of a unique natural language independent identifier (UID) for every concept. For example, 130206 (pump) and 1225 (is classified as a). This ensures that concepts are identified in a natural language independent way. Therefore, various Gellish Dictionaries use the same UID's for the same concept. This means that those dictionaries provide translations of the names of the objects, as well as a translation of the standard relation types. The UID's enable that information and knowledge that is expressed in one language variant of Gellish can be automatically translated and presented by Gellish-powered software in any other language variant for which a Gellish dictionary is available. For example, the phrase ⟨is classified as a⟩ and the phrase ⟨ist klassifiziert als⟩ are denotations of the same UID 1225.
For example, a computer can automatically express the second line in the above example in German as follows:
Left-hand term | Relation type | Right-hand term |
---|---|---|
P-123 | ist klassifiziert als | Zentrifugalpumpe |
Questions (queries) can be expressed as well. Queries are facilitated through standardized terms such as what, which, where and when. They can be used in combination with reserved UID's for unknowns in the range 1-100. This enables Gellish expressions for queries, such as:
- - query: what <is located in> Paris
Gellish-powered software should be able to provide the correct answer to this query by comparing the expression with the facts in the database, and should respond with:
- - answer: The Eiffel Tower <is located in> Paris
Note that the automatic translation capability implies that a query/question that is expressed in a particular language, say English, can be used to search in a Gellish database in another language (say Chinese), whereas the answer can be presented in English.
Information models in Gellish
Information models can be distinguished in two main categories:
- Models about individual things. These models may be about individual physical objects as well as about activities, processes and events, or a combination of them. An information model about an individual physical object and possibly also about its operation and maintenance, such as a process plant, a ship, an airplane, an infrastructural facility or a typical design (e.g. of a car or of a component) is called as Facility Information Model or a Product Model, whereas for a building it is called a Building Information Model (BIM). These models about individual things are characterized by their composition hierarchy, which specify (all) their parts, and by the fact that the assemblies as well as the parts are classified by kinds or types of things.
- Models about kinds of things. These models are expressed as collections of relations of particular kinds between kinds of things. They can be further subdivided in the following sub-categories:
- Knowledge models, which are collections of expressions of facts about what can be the case (modeled knowledge).
- Requirements models, which are collections of expressions of facts about what shall be the case in a particular validity context (modeled requirements). This may include modeled versions of the content of requirements documents, such as standard specifications and standard types of components (e.g. as in component and equipment catalogs)
- Definition models, each of which consists of a semantic frame. A definition model is a collection of expressions about what is by definition the case for all things of a kind. The Gellish electronic smart dictionary-taxonomy or ontology is an example of a collection of definition models.
- Models that are collections that include a combination of expressions of the above kinds.
All these categories of models can include drawings and other documents as well as 3D shape information (the core of 3D models). They all can be expressed and integrated in Gellish.
The classification relation between individual things and kinds of things makes the definitions, knowledge and requirements about kinds of things available for the individual things. Furthermore, the subtype-supertype hierarchy in a Gellish Dictionary-Taxonomy implies that the knowledge and requirements that are specified for a kind of thing are inherited by all their subtypes. As a consequence, when somebody designs an individual item and classifies it by a particular kind, then all the knowledge and requirements that are known for the supertypes of that kind will also be recognized and can be made available automatically.
Each category of information model requires its own semantics, because the expression of the individual fact that something real is the case requires other kinds of relations than the expression of the general fact that something can be the case, which again differs from a fact that expresses that something shall be the case in a particular context or that something is by definition always the case. These semantic differences cause that the various categories of information models require their own subsets of standard relation types. Therefore Gellish makes a distinction between the following categories of relation types:
- Relation types for relations between kinds of things (classes). They are intended for the expression of knowledge, requirements and definitions. The various sub-categories knowledge, requirements and definitions are modeled by using different kinds of relations: relation types for things that can be the case, things that shall be the case and things that are by definition the case. All three within applicable cardinality constraints. For example, the specialization relation on the first line in the example above is used for defining a concept (centrifugal pump). The relation types <can have as part a> and <shall have as part a> are examples of kinds of relations that are used to specify knowledge and requirements respectively.
- Relation types for relations between individual things. They are intended for the expression of information about individual things. For example the possession of an aspect relation on the third of the above lines.
- Relation types for relations between individual things and kinds of things. They are intended for links between individual things and general concepts in the dictionary (or to private extensions of that dictionary). For example the classification and qualification relations above.
- Relation types for relations between collections and for relations between a collection and an element in the collection or a common aspect of all elements.
Gellish databases and data exchange messages
Gellish is typically expressed in the form of Gellish Data Tables. There are three categories of Data Tables:
- Naming Tables, which contain the vocabulary of the dictionary and the proprietary terms that are used in the expressions.
- Fact Tables, which contain the expressions of facts in the form of relations between UID´s, together with a number of auxiliary facts.
A Gellish Database typically consists of one or more Naming Tables and one or more Fact Tables together. Data Tables and Fact Tables are one-to-one equivalent to Message Tables.
- Message Tables, which combine the content of Naming Tables and Fact Tables into merged tables. Message Tables are intended for the exchange of data between systems and parties. A Message Table is a single standard table for the expression of any facts, including the unique identifiers (UID's' for the facts), the relation types and the related objects, but also including their names (terms) and a number of auxiliary facts, all combined in one table. Multiple Message Tables on different locations can be combined to one distributed database.
All table columns are standardised, so that each Gellish data table of a category contains the same standard columns, or of a subset of the standard ones. This provides standard interfaces for exchange of data between application systems. The content of data tables may also include constraints and requirements (data models) that specify the kind of data that should and may be provided for particular applications. Such requirements models make dedicated database designs superfluous. The Gellish Data Tables can be used as part of a central database or can form distributed databases, but tables can also be exchanged in data exchange files or as body of Gellish Messages.
A Naming Table relates terms in a language and language community ('speech community') to a unique identifier. This enables the unambiguous use of synonyms, abbreviations and codes as well as homonyms in multiple languages. The following table is an example of a Naming Table:
UID of language | UID of language community | UID for term | Inverse indicator | Term (name) | Comment |
---|---|---|---|---|---|
910036 | 193263 | 130206 | 0 | pump | English engineering term for concept 130206 |
910038 | 193263 | 130206 | 0 | Pumpe | German |
910037 | 193263 | 130206 | 0 | pomp | Dutch |
The inverse indicator is only relevant when phrases are used to denote relation types, because each standard relation type is denoted by at least one standard phrase as well as at least one standard inverse phrase. For example, the phrase <is a part of> has as inverse phrase <has as part>. Both phrases denote the same kind of relation (a composition relation). However, when the inverse phrase is used to express a fact, then the left hand and right hand objects in the expression should have an inverse position. Thus, the following expressions will be recognized as two equally valid expressions of the same fact (with the same Fact UID):
- A <is a part of> B
- B <has as part> A
So, the inverse indicator indicates for relation types whether as phrase is a base phrase (1) or an inverse phrase (2).
A Fact Table contains expressions of any facts, each of which is accompanied by a number of auxiliary facts that provide additional information relevant for the main facts. Examples of auxiliary facts are: the intention, status, author, creation date, etc.
A Gellish Fact Table consists of columns for the main fact and a number of columns for auxiliary facts. The auxiliary facts enable to specify things such as roles, cardinalities, validity contexts, units of measure, date of latest change, author, references, etcetera.:
The columns for the main fact in a Fact Table are:
- a UID of the fact that is expressed on this row in the table
- a UID of the intention with which the fact is communicated or stored (e.g. as a statement, a query, etc.)
- a UID of a left-hand object
- a UID of a relation type
- a UID of a right-hand object
- a UID of a unit of measure (optional)
- a string that forms a description (textual definition) of the left hand object.
These columns also appear in a Message Table as shown below.
A full Gellish Message table is in fact a combination of a Naming Table and a Fact Table. It contains not only columns for the expression of facts, but also columns for the names of the related objects and the additional columns to express auxiliary facts. This enables the use of a single table, also for the specification and use of synonyms and homonyms, multiple languages, etcetera. The core of a Message Table is illustrated in the following table:
Language | UID of left-hand object | Name of left-hand object | UID of fact | UID of relation type | Name of relation type | UID of right-hand object | Name of right-hand object | Status |
---|---|---|---|---|---|---|---|---|
English | 101 | The Eiffel tower | 201 | 5138 | is located in | 102 | Paris | accepted |
English | 101 | The Eiffel tower | 202 | 1225 | is classified as a | 40903 | tower | accepted |
English | 102 | Paris | 203 | 1225 | is classified as a | 700008 | city | accepted |
In the above example, the concepts with the names, as well as the (standard) relation types are selected with their UID's from the Gellish English Dictionary.
A Gellish Database table can be implemented in any tabular format. For example, it can be implemented as a SQL-based database or otherwise, as a STEPfile (according to ISO 10303-21), or as a simple spreadsheet table, as in Excel, such as the Gellish Dictionary itself.
Gellish database tables can also be described in an equivalent form using RDF/Notation3 or XML. A representation of “Gellish in XML” is defined in a standard XML Schema. An XML file with data according to that XML Schema is recommended to have as file extension GML, whereas GMZ stands for “Gellish in XML zipped”.
One of the differences between Gellish and RDF, XML or OWL is that Gellish English includes an extensive English Dictionary of concepts, including also a large (and extendable) set of standard relation types to make computer-interpretable expressions (in a form that is also readable for non-IT professionals). On the other hand, 'languages' such as RDF, XML and OWL only define a few basic concepts, which leaves much freedom for their users to define their own 'domain language' concepts.
This attractive freedom has the disadvantage that users of 'languages' such as RDF, XML or OWL still don't use a common language and still cannot integrate data that stem from different sources. Gellish is designed to provide a real common language, at least to a much larger extent and therefore provides much more standardization and commonality in terminology and expressions.
Gellish compared with OWL
OWL (Web Ontology Language/Ontological Web Language) and Gellish are both meant for use on the semantic web. Gellish can be used in combination with OWL, or on its own. There are many similarities between the two languages, such as the use of unique identifiers (Gellish UIDs, OWL URIs)[2] but also important differences. The main differences are as follows:
Target audience and meta level
OWL is a metalanguage, including a basic grammar, but without a dictionary. OWL is meant to be used by computer system developers and ontology developers to create ontologies. Gellish is a language that includes a grammar as well as a dictionary-taxonomy and ontology. Gellish is meant to be used by computer system developers as well as by end-users and can also be used by ontology developers when they want to extend the Gellish ontology or build their own domain ontology. Gellish does not make a distinction between a meta-language and a user language; the concepts from both 'worlds' are integrated in one language. So, the Gellish English dictionary contains concepts that are equivalent to the OWL concepts, but also contains the concepts from an ordinary English dictionary.
Vocabularies and ontologies
OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. In other words, it can be used for the definition of taxonomies or ontologies. The terms in such a vocabulary do not become part of the OWL language. So OWL does not include definitions of the terms in a natural language, such as road, car, bolt or length. However, it can be used to define them and to build an ontology.
The upper ontology part of Gellish can also be used to define terms and the relations between them. However, many of such natural language terms are already defined in the lower part of the Gellish dictionary-taxonomy itself. So in Gellish, terms such as road, car, bolt or length are part of the Gellish language. Therefore, Gellish English is a subset of natural English.
Synonyms and multi-language capabilities
Gellish makes a distinction between concepts and the various terms that are used as names (synonyms, abbreviations and translations) to refer to those concepts in different contexts and languages. Every concept is identified by a unique identifier that is natural-language-independent and can have many different terms in different languages to denote the concept. This enables automatic translation between different natural language versions of Gellish. In OWL, the various terms in different languages and the synonyms are in principle different concepts that need to be declared to be the same by explicit equivalence relations (unless the alternatives are expressed in terms of the alternative label annotation properties).[3] On one hand, the OWL approach is simpler but it makes expressions ambiguous and makes data integration and automated translation significantly more complicated.
Upper ontology
OWL can be regarded as an upper ontology that consists of 54 'language constructs' (constructors or concepts).[4] The upper ontology part of Gellish currently consists of more than 1500 concepts of which about 650 are standard relation types. In addition to that the Gellish Dictionary-Taxonomy contains more than 40,000 concepts. This indicates the large semantic richness and expression capabilities of Gellish. Furthermore, Gellish contains definitions of many facts about the defined concepts that are expressed as relationships between those concepts.
Extensibility
OWL has a fixed set of concepts (terms) that are only extended when the OWL standard is extended. Gellish is extensible by any user, under open source conditions.
History
Gellish is a further development of ISO 10303-221 (AP221) and ISO 15926. Gellish is an integration and extension of the concepts that are defined in both standards. The main difference with both ISO standards is that Gellish is easier to implement and has more (precise) semantic expression capabilities and is suitable to express queries and answers as well. The specific philosophy of spatio-temporal parts that is used in ISO 15926 to represent discrete time periods to represent time can also be used in Gellish, however the recommended representation of time in Gellish is the more intuitive method that specifies that facts have a specified validity duration. For example, each property can have multiple numeric values on a scale, which is expressed as multiple facts, whereas for each of those facts an (optional) specification can be added of the moment or time period during which that fact is valid.
A subset of the Gellish Dictionary (Taxonomy) is used to create ISO 15926-4. Gellish in RDF is being standardized as ISO 15926-11.
References
- ^ Van Renssen (2005).
- ^ RFC 3986 (2005).
- ^ Stevens & Lord (2012).
- ^ "OWL Web Ontology Language Overview". w3c. Retrieved 5 April 2019.
Bibliography
- Van Renssen, Andries (2005). Gellish: A Generic Extensible Ontological Language. Delft University Press. ISBN 90-407-2597-7.
- Henrichs, Michael Rudi (1 June 2009). A conceptual framework for constructing distributed object libraries using Gellish (Master of Science in Computer Science thesis). Netherlands: Delft University of Technology.
- Berners-Lee, Tim; Fielding, Roy; Masinter, Larry (January 2005). Uniform Resource Identifiers (URI): Generic Syntax. Internet Engineering Task Force. doi:10.17487/RFC3986. RFC 3986. Retrieved 31 August 2015.
- Stevens, Robert; Lord, Phillip (2012). "Managing synonomy in OWL". Ontogenesis. Retrieved 26 June 2019.