Data model

De Assothink Wiki
Aller à la navigation Aller à la recherche

Intro

AssothinkModel.png
AssothinkModel.png


The Assothink data model is the underlying formal description of all Assothink components. It has been compared to various formal models (wordnet, freebase, wikipedia, wikidata, babelnet, ontologies), and it is claimed here that the Assothink model is better because:

  • it is built on clearly distinguished layers (often or always mixed in current models - and this has disappointing effects on algorithmic issues)
  • is is easily extensible
  • it is suited for advanced performance oriented applications, like the Assothink engine

It is thus strongly suggested as target formal model in the evolutions of the initiative mentioned above.

It is also strongly suggested to have APIs based on this data model, with the benefits listed above.
A visual and animated version of this page (mouse reactive) is available online here.

Five subsets

The Assothink data model includes 5 well separated subsets:

  • Concepts
  • keySets
  • Concept links(passive jelly)
  • Language anchors
  • Active jelly

Concepts

Concepts exists without names and without keySet references - this is what makes them difficult to handle at both the formal level and the programming level.

Very few things may be said about concepts.

The only think that is relevant at this stage is that they are distinct and categorized entities.

The categorization includes typically:

  • categories themselves
  • properties
  • nouns
  • verbs (action descriptors)
  • adjective (noun qualifiers)
  • adverbs (verb qualifiers)
  • quantities
  • intrinsic concepts

It may be said that concepts exists in non human brain, i.e. in brain without verbal communication capabilities.

KeySets

A keySet is any system allowing a injective connection between 'unnameable and unreadable' concepts and readable references.

The readable references might be numbers, byte or character combination.

Keyset references are not linked to any language. They are language independent.

Injective connection means non-ambiguous mapping with

- one and only one concept per key

- at most one key per concept (but some concepts may be unknown in a keySet).

KeySets currently exists and are widely used. Freebase, wikidata, WordNet, BabelNet, etc... use their own keySets.

The main use (and benefit) of a keySet is the representation of links between concepts.

The second use of a keySet is the concept side anchoring of language anchors (see below).

Concepts links (passive jelly)

Links are connection between concepts.

Typically a link integrates 3 concepts: a property concept, a source concept and a destination concept.

In the Assothink model, the links form the "passive jelly": a set of connections evolving slowly or not evolving.

The links are the critical components of an active brain.

Links have nothing to do with language.

Qualified and fuzzy links

Links may be qualified or fuzzy. Hypernymy, hyponymy, etc... are typical qualified concepts.

However the vast majority of links (in associative intelligence) are unqualified and fuzzy. In the case of a fuzzy link however the link description also includes a (scalar) permeability measure). So the formal model of a link integrates 3 concepts (identified through keySet references) and 1 scalar value in the 0.0 1.0 range.

Languages

The languages is probably an attribute of the humans, but all of the known languages are obviously poorly evolved tools:

  • not performing in terms of data speed
  • not performing in terms of ambiguity
  • not performing in terms of descriptibility

However, language is the most used tool to communicate, an not only by the writer and the reader of this document.

In our formal model, the language is a complex set of 'language anchors'.

The language anchors is the meeting point of a concept with a human language.

A language anchors typically contains (for the involved concept):

  • definition(s)
  • reference words, aliases, forming a synset
  • example of use

Besides that a language involves various components that are not critical part of this data model.

  • grammar and syntax rules
  • variant form words

The structure of language anchors is not an important component in the Assothink data model. Neither the active nor the passive jelly need any language anchoring (remember that non verbal intelligence are performing well ass associative intelligence). Language and language anchors are peripheral components.

Any design may be accepted as long as:

  • it allows incoming concept identification (from words to concepts) with minimal errors
  • it allows outgoing concept expression (from concepts to words) with minimal ambiguity

Active Jelly

The active jelly is a set of excitation states quickly evolving in time.

The excitation state is linked to any individual concept (possibly many excitation states are defined, with various time reactivity: short term excitation, long term excitation, and possibly a continuous spectral description). 

The active jelly is directly connected to the concept layer and to the link layer.

The active jelly has nothing to do neither with keySets nor with languages.

The active jelly is a model of what 'moves' in a human brain. It is also the base of artificial associative intelligence system. It is the most critical component of Assothink, which is targeting software engines able to generate ~107 variations of excitations per seconds on standard sequential computers, using efficient concept and link software implementation. Typically this kind of engine should be better implemented using ed on parallel computers or parallel analogical system (similar to bio systems).

Additional notes on data model

Model Complexity

The data model introduced here is rather simple.

Practically, all parts of the model are somehat more complex.

However the global splitting of the model in 5 parts is always relevant.

Synonymy & Co

The formal model described here allows links only between concepts.

Links between words are rejected, but is this acceptable?

The human brain widely uses word links, and the most trivial example is synonymy.

The word "rose" mainly refers to two concepts: rose(color) and rose(plant). Should the links include the fact that two concepts are accessible trough the same words? In other words, should a property concept called synonymy be included in the concept universe, and should the links include something like {rose(color) - synonymy - rose(plant)}?

This questions is relevant for synonymy, but also for phonetic equivalence, phonetic and convergence and written proximity...

The answer is negative, because this would open the path to a never ending lists of language-based connections between concepts. It is never ending because the number of languages likely to contribute to this is not limited.

The language-level connections have to be treated in the input/output language channels and should be discarded in the concepts and links.

This may be considered strange by many people (knowledge experts, linguists, psychiatrist,...), but it is a must for the building of a concept-oriented language-independant model, a must for Assothink.