« En3ple » : différence entre les versions

De Assothink Wiki
Aller à la navigation Aller à la recherche
Contenu ajouté Contenu supprimé
Aucun résumé des modifications
Aucun résumé des modifications
Ligne 63 : Ligne 63 :
Step 2: transformation of the en3Files generated into pk files according to previous data management ruleswithin Assothink.
Step 2: transformation of the en3Files generated into pk files according to previous data management ruleswithin Assothink.


== En3Formatspecification summary<br> ==
== En3Format ==


The lines of an en3File are ordered.
The lines of an en3File are ordered.
Ligne 78 : Ligne 78 :


*=!&nbsp;: a new en3 with a random-unique key (the key format is #nnn)
*=!&nbsp;: a new en3 with a random-unique key (the key format is #nnn)
*==&nbsp;: the same as in the previous en3ple line (for nay of the 3 fields)<br>
*==&nbsp;: the same as in the previous en3ple line (for nay of the 3 fields)<br>
*=?=p=o= , or =s=?=0= , or =s=p=?= the one and only en3 matching the named rule in previously defined en3ples. If the matching is inexistant or multiple, an exception is thrown.
*=?=p=o= , or =s=?=0= , or =s=p=?= the one and only en3 matching the named rule in previously defined en3ples. If the matching is inexistant or multiple, an exception is thrown.
*=!content&nbsp;: a new en3 with a random-unique-key and with a content equal to the content part of the specifier. Lengthy content are welcome.
*=!content&nbsp;: a new en3 with a random-unique-key and with a content equal to the content part of the specifier. Lengthy content are welcome.

Version du 21 mai 2013 à 10:05

Introduction

The en3uple format is a possible way to formalize the full knowledge present in the passive jelly of Assothink.

After several works based on data produced by dbpedia and freebase, it is considered in the Assothink development team (may 2013).

The new class would be mscp.structure.en3

Benefits

The possible benefits turn around

  • readibility
  • standards
  • extensibility
  • integrability of external resources

Limitations

The en3uple format is not optimal in term of

  • memory consumption
  • startup time
  • runtime efficiency

Definitions: en3, en3ple, en3file, en3format

A en3 object may be any of most Assothink entities:

  • concept of any category
  • percepts
  • variants
  • keys
  • languages
  • keysets
  • link predicates
  • definitions
  • textual examples

...

A en3ple is a triple (subjet/predicate/object) definition linking 3 en3. The en3ple may be used to represent

  • all language data
  • all data imported from any LACS
  • qualified links of the passive jelly
  • fuzzy links of the passive jelly

Any given en3 is defined only by

  • the sets of en3ples where its is involved
  • its key
  • an optional content

An en3File is a file containing en3ples (1 per line). The file is readable, and the format used is the en3Format defined below.

Usage

The resource building process may be organized in 2 steps.

Step 1: accumulation/production of enfiles (and nothing else)

Step 2: transformation of the en3Files generated into pk files according to previous data management ruleswithin Assothink.

En3Formatspecification summary

The lines of an en3File are ordered.

An en3 key is a string, containing only the following chars : alphabetic (including accentuated chars) , numeric, "#:_".

The '#' in a key is reserved for random-unique generated keys.

The line format is subject|predicate|object .

The field separator is '|'.

The 3 en3 specifier (subject, predicated and object) are non-empty Strings with the following interpretation

  • =! : a new en3 with a random-unique key (the key format is #nnn)
  • == : the same as in the previous en3ple line (for nay of the 3 fields)
  • =?=p=o= , or =s=?=0= , or =s=p=?= the one and only en3 matching the named rule in previously defined en3ples. If the matching is inexistant or multiple, an exception is thrown.
  • =!content : a new en3 with a random-unique-key and with a content equal to the content part of the specifier. Lengthy content are welcome.
  • anything not starting with '=' : an en3 (possible created if not yet defined) with the specified used as key.

Data model

The data model used in the mscp.structure.entry class includes

  • a global HashMap<String,en3> linking keys to en3 objects
  • key (1 per en3)
  • contents (for some en3)
  • many HashMap<en3,HashSet<en3>> (probably 6 per en3, but none for en3 having content)).