« En3ple » : différence entre les versions

De Assothink Wiki
Aller à la navigation Aller à la recherche
Contenu ajouté Contenu supprimé
(Page créée avec « == Introduction<br> == The en3uple format is a possible way to formalize the full knowledge present i nthe passive jelly of Assothink.<br> After serveral works based on dat... »)
 
Aucun résumé des modifications
Ligne 1 : Ligne 1 :
== Introduction<br> ==
== Introduction<br> ==


The en3uple format is a possible way to formalize the full knowledge present i nthe passive jelly of Assothink.<br>
The en3uple format is a possible way to formalize the full knowledge present in the passive jelly of Assothink.<br>


After serveral works based on data produced by dbpedia and freebase, it is considered in the Assothink development team (may 2013)<br>
After several works based on data produced by dbpedia and freebase, it is considered in the Assothink development team (may 2013).<br>


The new class would be <span style="color: rgb(128, 128, 128);">mscp.structure.en3</span><br>
The new class would be <span style="color: rgb(128, 128, 128);">mscp.structure.en3</span><br>


== Benefits<br> ==
== Benefits<br> ==


The possible benefits turn around<span style="color: rgb(51, 51, 51);" /><br>
The possible benefits turn around<br>


*readibility<br>
*readibility<br>
*standards<br>
*standards<br>
*extensibility<br>
*extensibility<br>
*integrability of external resources<br>
*integrability of external resources<br>


== Limitations<br> ==
== Limitations<br> ==


The en3uple format is not optimal in term of <br>
The en3uple format is not optimal in term of <br>


*memory consumption<br>
*memory consumption<br>
*startup time<br>
*startup time<br>
*runtime efficiency<br>
*runtime efficiency<br>


== Definitions: en3, en3ple, en3file, en3format<br> ==
== Definitions: en3, en3ple, en3file, en3format<br> ==


A '''en3''' object may be any of most Assothink entities:<br>
A '''en3''' object may be any of most Assothink entities:<br>


*concept of any category<br>
*concept of any category<br>
*percepts<br>
*percepts<br>
*variants<br>
*variants<br>
*keys<br>
*keys<br>
*languages<br>
*languages<br>
*keysets<br>
*keysets<br>
*link predicates
*definitions<br>
*definitions<br>
*textual examples


...<br>
...<br>


A '''en3ple''' is a triple (subjet/predicate/object) definition linking 3 en3.<br>
A '''en3ple''' is a triple (subjet/predicate/object) definition linking 3 en3. The en3ple may be used to represent<br>


*all language data
An en3 is defined only by
*all data imported from any [[LACS_building|LACS]]
*qualified links of the passive jelly
*fuzzy links of the passive jelly


Any given en3 is defined only by
*the sets of en3ples where its is involved<br>

*the sets of en3ples where its is involved<br>
*its key
*an optional content
*an optional content


An '''en3File''' is a file containing en3ples (1 per line). The file is readable, and the format used is the '''en3Format''' defined below.<br>
An '''en3File''' is a file containing en3ples (1 per line). The file is readable, and the format used is the '''en3Format''' defined below.<br>


== Usage<br> ==
== Usage<br> ==


The resource building process may be organized in 2 steps.<br>
The resource building process may be organized in 2 steps.<br>


Step 1: accumulation/production of enfiles (and nothing else)<br>
Step 1: accumulation/production of enfiles (and nothing else)<br>


Step 2: transformation of the en3Files generated into pk files according to previous data management ruleswithin Assothink.
Step 2: transformation of the en3Files generated into pk files according to previous data management ruleswithin Assothink.


== En3Format ==
== En3Format ==


The lines of an en3File are ordered.
The lines of an en3File are ordered.


An en3 key is a string, containing only the following chars : alphabetic (including accentuated chars) , numeric, "#:_".
An en3 key is a string, containing only the following chars&nbsp;: alphabetic (including accentuated chars) , numeric, "#:_".


The '#' in a key is reserved for random-unique generated keys.
The '#' in a key is reserved for random-unique generated keys.


The line format is subject|predicate|object .
The line format is subject|predicate|object .


The field separator is '|'.
The field separator is '|'.

The 3 en3 specifier (subject, predicated and object) are non-empty Strings with the following interpretation


The 3 en3 specifier (subject, predicated and object) are non-empty Strings with the following interpretation
=! : a new en3 with a random-unique key (key format is #nnn)


*=!&nbsp;: a new en3 with a random-unique key (the key format is #nnn)
== : the same as in the previous en3ple line
*==&nbsp;: the same as in the previous en3ple line (for nay of the 3 fields)<br>
*=?=p=o= , or =s=?=0= , or =s=p=?= the one and only en3 matching the named rule in previously defined en3ples. If the matching is inexistant or multiple, an exception is thrown.
*=!content&nbsp;: a new en3 with a random-unique-key and with a content equal to the content part of the specifier. Lengthy content are welcome.
*anything not starting with '='&nbsp;: an en3 (possible created if not yet defined) with the specified used as key.


== Data model ==
=?=p=o= , or =s=?=0= , or =s=p=?= the one and only en3 matching the named rule in previously defined en3ples. If the matching is inexistant or multiple, an exception is thrown.


The data model used in the mscp.structure.entry class includes
==content : an new en3 with a random-unique-key and with a content equal to the content part of the specifier. Lengthy content are welcome.


*a global HashMap&lt;String,en3&gt; linking keys to en3 objects
anything not starting with '=' : an en3 (possible created if not yet defined) with the specified used as key.
*key (1 per en3)
*contents (for some en3)
*many HashMap&lt;en3,HashSet&lt;en3&gt;&gt; (probably 6 per en3, but none for en3 having content)).

Version du 21 mai 2013 à 10:04

Introduction

The en3uple format is a possible way to formalize the full knowledge present in the passive jelly of Assothink.

After several works based on data produced by dbpedia and freebase, it is considered in the Assothink development team (may 2013).

The new class would be mscp.structure.en3

Benefits

The possible benefits turn around

  • readibility
  • standards
  • extensibility
  • integrability of external resources

Limitations

The en3uple format is not optimal in term of

  • memory consumption
  • startup time
  • runtime efficiency

Definitions: en3, en3ple, en3file, en3format

A en3 object may be any of most Assothink entities:

  • concept of any category
  • percepts
  • variants
  • keys
  • languages
  • keysets
  • link predicates
  • definitions
  • textual examples

...

A en3ple is a triple (subjet/predicate/object) definition linking 3 en3. The en3ple may be used to represent

  • all language data
  • all data imported from any LACS
  • qualified links of the passive jelly
  • fuzzy links of the passive jelly

Any given en3 is defined only by

  • the sets of en3ples where its is involved
  • its key
  • an optional content

An en3File is a file containing en3ples (1 per line). The file is readable, and the format used is the en3Format defined below.

Usage

The resource building process may be organized in 2 steps.

Step 1: accumulation/production of enfiles (and nothing else)

Step 2: transformation of the en3Files generated into pk files according to previous data management ruleswithin Assothink.

En3Format

The lines of an en3File are ordered.

An en3 key is a string, containing only the following chars : alphabetic (including accentuated chars) , numeric, "#:_".

The '#' in a key is reserved for random-unique generated keys.

The line format is subject|predicate|object .

The field separator is '|'.

The 3 en3 specifier (subject, predicated and object) are non-empty Strings with the following interpretation

  • =! : a new en3 with a random-unique key (the key format is #nnn)
  • == : the same as in the previous en3ple line (for nay of the 3 fields)
  • =?=p=o= , or =s=?=0= , or =s=p=?= the one and only en3 matching the named rule in previously defined en3ples. If the matching is inexistant or multiple, an exception is thrown.
  • =!content : a new en3 with a random-unique-key and with a content equal to the content part of the specifier. Lengthy content are welcome.
  • anything not starting with '=' : an en3 (possible created if not yet defined) with the specified used as key.

Data model

The data model used in the mscp.structure.entry class includes

  • a global HashMap<String,en3> linking keys to en3 objects
  • key (1 per en3)
  • contents (for some en3)
  • many HashMap<en3,HashSet<en3>> (probably 6 per en3, but none for en3 having content)).