« LACS building » : différence entre les versions
Aucun résumé des modifications |
Aucun résumé des modifications |
||
(39 versions intermédiaires par le même utilisateur non affichées) | |||
Ligne 1 : | Ligne 1 : | ||
= |
= Introduction = |
||
== Context == |
|||
This page is related to the construction of the passive jelly. It has nothing to do with the active jelly. |
|||
This page is related to the construction of the Assothink ''passive'' jelly. It does not deal with the ''active'' jelly. |
|||
This paage describes a theoretical approach, not the parcatical software process used to build the Assothink universe. |
|||
This page describes a theoretical approach, not the practical software processes used to build the Assothink concept universe. |
|||
= Concepts and percepts = |
|||
This page is about : |
|||
*LACS (language-anchored concept space)<br> |
|||
*LAC (language-anchored concept)<br> |
|||
*Assothink concept universe building |
|||
== Concepts and percepts == |
|||
The passive jelly includes concepts, percepts and variants. Variants will not be discussed here. |
The passive jelly includes concepts, percepts and variants. Variants will not be discussed here. |
||
Ligne 12 : | Ligne 20 : | ||
*are intrinsically ''unnamed'' |
*are intrinsically ''unnamed'' |
||
*are universal (most of them) |
*are ''universal'' (most of them) |
||
*are primary object (in the Assothink model) |
*are primary object (in the Assothink model) |
||
*exist mainly thru the links they have with other concepts |
*exist mainly thru the links they have with other concepts |
||
Ligne 19 : | Ligne 27 : | ||
*are language words |
*are language words |
||
*are used to represent concepts |
*are used to represent concepts |
||
*are not universal |
*are not universal (because language anchored) |
||
*are secondary (coming as perceptions of concepts) |
|||
A human being uses his brain to |
A human being uses his brain to |
||
*manipulate concepts |
*manipulate concepts(link, excite, focus on...)<br> |
||
*communicate through |
*communicate through percepts(speak, read, write...)<br> |
||
= Concept categories = |
== Concept categories == |
||
Concepts are organized in categories.<br> |
|||
They are many ways to structure concepts into categories. |
They are many ways to structure concepts into categories. |
||
The Assothink model handles 8 main categories |
The Assothink model handles 8 main categories |
||
But in this page, |
But in this page, only the 4 most common categories are considered: |
||
*nouns (N) (things) |
*nouns ('''N''') (things) |
||
*verbs (V) (actions) |
*verbs ('''V''') (actions) |
||
*adjectives (P) (qualifiers for things) |
*adjectives ('''P''') (qualifiers for things) |
||
*adverbs (D) (qualifiers for actions) |
*adverbs ('''D''') (qualifiers for actions) |
||
= LACS = |
|||
== LACS definition == |
|||
A LACS is a '''L'''anguage-'''A'''nchored '''C'''oncept '''S'''pace. |
|||
A LACS is a '''L'''anguage-'''A'''nchored '''C'''oncept '''S'''pace. |
|||
Is it a set of connected words and concepts. |
|||
A LACS is a (big) set of [[#LAC|LAC]]. |
|||
Example of LACS (reviewed with more details below) include: |
|||
It is a set of connected words and concepts. |
|||
*wordnet |
|||
*wikipedia |
|||
*wiktionary |
|||
*freebase |
|||
Example of LACS (reviewed with more details below) include: |
|||
And of course Assothink uses its own LACS, the Assothink LACS. |
|||
*Wordnet |
|||
*Wikipedia (and brother DBpedia)<br> |
|||
*Wiktionary |
|||
*Freebase |
|||
And of course Assothink uses its own LACS, the Assothink LACS. |
|||
== MoLACS and MuLACS == |
== MoLACS and MuLACS == |
||
Ligne 61 : | Ligne 74 : | ||
A MoLACS is a '''Mo'''no-Language-ACS. |
A MoLACS is a '''Mo'''no-Language-ACS. |
||
A MuLACS is a '''Mu'''lti-Language-ACS. |
A MuLACS is a '''Mu'''lti-Language-ACS. |
||
== LACS |
== LACS compared == |
||
LACS may be described with various criteria |
LACS may be described with various criteria |
||
*categories handled |
*categories handled |
||
*MoLACS or MuLACS |
*MoLACS or MuLACS |
||
*concept-centric or percept-centric |
*concept-centric or percept-centric |
||
*size |
*size |
||
The |
The following table summarizes the properties of the most known LACS, compared to the Assothink LACS. |
||
{| width=" |
{| width="850" cellspacing="1" cellpadding="1" border="1" align="center" |
||
|- |
|- |
||
| |
| <br> |
||
| bgcolor="#ccffff" style="text-align: center;" | Categs |
|||
| Categories |
|||
| bgcolor="#ccffff" style="text-align: center;" | Mu or Mo |
|||
| Mu or Mo |
|||
| bgcolor="#ccffff" style="text-align: center;" | Size |
|||
| Size |
|||
| bgcolor="#ccffff" style="text-align: center;" | Centric |
|||
| Centric |
|||
| bgcolor="#ccffff" style="text-align: center;" | Remarks |
|||
| Remarks |
|||
|- |
|- |
||
| bgcolor="#ffffcc" style="text-align: right;" | '''Wordnet''' |
|||
| Wordnet |
|||
| style="text-align: center;" | NPVD |
|||
| NPVD |
|||
| style="text-align: center;" | |
|||
| MoLACS english |
|||
MoLACS |
|||
| average |
|||
| Concept-centric |
|||
english |
|||
| style="text-align: center;" | average |
|||
| style="text-align: center;" | |
|||
Concept |
|||
centric |
|||
| |
| |
||
The brilliant precursor |
The brilliant precursor |
||
Concepts = synsets |
Concepts = synsets |
||
Weak multi-lang attempts |
|||
Weak extensibility |
|||
see [http://Wordnet.princeton.edu wordnet.princeton.edu] |
|||
|- |
|- |
||
| bgcolor="#ffffcc" style="text-align: right;" | '''Wikipedia''' |
|||
| Wikipedia |
|||
| style="text-align: center;" | N... |
|||
| N... |
|||
| MuLACS |
| style="text-align: center;" | MuLACS |
||
| style="text-align: center;" | big |
|||
| big |
|||
| style="text-align: center;" | |
|||
| Word-centric mostly |
|||
Word-centric |
|||
mostly |
|||
| |
| |
||
Brilliant |
Brilliant and rich |
||
Weak organization |
|||
Poorly organized |
|||
Good growth process |
|||
see [http://www.wikipedia.org www.wikipedia.org ] |
|||
and [http://www.dbpedia.org www.dbpedia.org] |
|||
|- |
|- |
||
| bgcolor="#ffffcc" style="text-align: right;" | '''Wiktionary''' |
|||
| Wiktionary |
|||
| style="text-align: center;" | NPVD |
|||
| NPVD |
|||
| MuLACS |
| style="text-align: center;" | MuLACS |
||
| style="text-align: center;" | big |
|||
| big |
|||
| style="text-align: center;" | |
|||
| Word-centric |
|||
Word |
|||
centric |
|||
| |
| |
||
Briliant rich |
Briliant and rich |
||
Weak organization |
|||
Poorly organized |
|||
Good growth process |
|||
see [http://www.wiktionary.org www.wiktionary.org ] |
|||
|- |
|- |
||
| bgcolor="#ffffcc" style="text-align: right;" | '''Freebase''' |
|||
| Freebase |
|||
| style="text-align: center;" | N(pvd)xxx |
|||
| N(pvdxxx) |
|||
| MuLACS |
| style="text-align: center;" | MuLACS |
||
| style="text-align: center;" | huge |
|||
| huge |
|||
| style="text-align: center;" | |
|||
| Hybrid |
|||
Hybrid |
|||
Heterogenous |
|||
<br> |
|||
| |
| |
||
Anarchic linking |
|||
Remarkably exhaustive |
|||
Nice extensibility |
|||
uncontrolled growth |
|||
Uncontrolled growth |
|||
see [http://www.freebase.com www.freebase.com] |
|||
|- |
|- |
||
| bgcolor="#ffffcc" style="text-align: right;" | '''Assothink''' |
|||
| Assothink |
|||
| style="text-align: center;" | NPVD |
|||
| NPVD |
|||
| MuLACS |
| style="text-align: center;" | MuLACS |
||
| small |
| style="text-align: center;" | small |
||
| style="text-align: center;" | |
|||
| Concept-centric |
|||
Concept |
|||
centric |
|||
| |
| |
||
The best is coming |
The best is coming! |
||
Only 10K... 30K concepts |
Only 10K... 30K concepts |
||
see [http://www.assothink.com www.assothink.com] |
|||
|} |
|} |
||
<br> |
|||
The Assothink LACS is certainly not the biggest but it is the most demanding in terms on coherence and strength. This is necessary given the global goals of the Assothink project.<br> |
|||
The Assothink LACS would be the first '''<span style="color: rgb(51, 153, 102);">concept-centric full-NPVD MuLACS</span>''', so it is a pioneer.<br> |
|||
= LAC = |
|||
== LAC definition == |
|||
The LAC is a '''L'''anguage-'''A'''nchored '''C'''oncept. |
|||
A [[#LACS|LACS]] is a (big) set of LAC. |
|||
== LAC importance == |
|||
LAC are organized differently in all LACS. |
|||
But the matching (convergence) of a LAC in LACS A with another LAC in LACS B is a critical process. |
|||
This matching process is realized easily and frequently by human beings. It is maybe a typical and major performance of the human brain. Actually something similar is done whenever a word si perceived by a human. "La tour prend le fou".<br> |
|||
And this matching process is also the central part of the Assothink LACS building. |
|||
== LAC content == |
|||
A LAC unit is defined by<br> |
|||
*the LACS it is part of<br> |
|||
*a LAC identifier (an abtract, non-interpretable key)<br> |
|||
*a set of LA ('''L'''anguage '''A'''nchor), one per language (1 in a MoLACS, many in a MuLACS)<br> |
|||
And any LA contains whatever possibly links a concept to a given language:<br> |
|||
*a 'main' word<br> |
|||
*synonym words<br> |
|||
*definition(s)<br> |
|||
*example(s)<br> |
|||
*optional hyper concept(s) (in concept-centric LACS)<br> |
|||
*optional hyper word(s) (in percept-centric LACS)<br> |
|||
*optional anto concept(s) (in concept-centric LACS) |
|||
*optional anto word(s) (in percept-centric LACS) |
|||
*optional context concept(s) (in concept-centric LACS) |
|||
*optional context word(s) (in percept-centric LACS)<br> |
|||
*etc... |
|||
The LA contents are very different in all known LACS.<br> |
|||
= Building the Assothink LACS<br> = |
|||
The Assothink LACS is not built ''per se''.<br> |
|||
The main building tasks are integration and matching. |
|||
== Integration == |
|||
It is the process, from a given LACS, to interface, analyse, decode, classify, select, filter data from other LACS. This process is heavy in terms of managed data (the full wikipedia dumps are huge, and freebase is even much bigger). |
|||
An valuable example of integration is the integration of Wordnet by Freebase. It is not perfect, but it provides an excellent start point for nouns (but not for any other category of concepts). |
|||
== Matching == |
|||
It is the process of matching, from a given LACS, concepts present in other LACS to create a richer or more homegenous set of concepts. |
|||
The Wordnet matching in Freebase aims at exhaustivity. |
|||
The global matching in Assothink aims at homogeneity. |
|||
== Fuzzy Logic == |
|||
A matching result is necessary binary: 2 LAC from 2 distinct LACS are declared to describe the same concept - or not. |
|||
But the results gained in a matching process are widely imperfect because |
|||
*the source LACS use globally different approach, and different concept granularity. This implies that 1 LAC in LACS A matches (covers, includes) many LAC in LACS B. |
|||
*the matching process produces errors |
|||
Thus the binary results should not be gained thru a binary logic, but rather thru a fuzzy logic. This implies likelihood measures, scoring systems, acceptance thresholds, etc... |
|||
== Constraints for the Assothink LACS == |
|||
The Assothink LACS aims to be a square cross-referencing universe. <br> |
|||
It is also concept-centric full-NPVD MuLACS<br> |
|||
This implies <br> |
|||
#NPVD category coverage |
|||
#concept-centric hyper, anto, context... linkings |
|||
#bijective cross-references with other LACS (whenever they possibly exist in other LACS) |
|||
#multi-language anchors for all concepts |
|||
The Assothink LACS integrates selected parts of all LACS listed above.<br> |
|||
== Practical building of the Assothink LACS<br> == |
|||
Practically, the integration process used to built the '''''noun''''' Assothink LACS uses mainly Wordnet, Wikipedia, Freebase (and language thesauri). Freebase delivers good matching data.<br> |
|||
But the integration process used to built the '''''verb/adverb/adjective''''' Assothink LACS uses mainly wordnet and wiktionary. |
|||
The process details and the software description are not covered in this page. It is a wide, complex and evolving subject. |
|||
<br> |
|||
<br> |
Dernière version du 12 mai 2013 à 07:59
Introduction
Context
This page is related to the construction of the Assothink passive jelly. It does not deal with the active jelly.
This page describes a theoretical approach, not the practical software processes used to build the Assothink concept universe.
This page is about :
- LACS (language-anchored concept space)
- LAC (language-anchored concept)
- Assothink concept universe building
Concepts and percepts
The passive jelly includes concepts, percepts and variants. Variants will not be discussed here.
As explained elsewhere, concepts
- are intrinsically unnamed
- are universal (most of them)
- are primary object (in the Assothink model)
- exist mainly thru the links they have with other concepts
On the other side, percepts
- are language words
- are used to represent concepts
- are not universal (because language anchored)
- are secondary (coming as perceptions of concepts)
A human being uses his brain to
- manipulate concepts(link, excite, focus on...)
- communicate through percepts(speak, read, write...)
Concept categories
Concepts are organized in categories.
They are many ways to structure concepts into categories.
The Assothink model handles 8 main categories
But in this page, only the 4 most common categories are considered:
- nouns (N) (things)
- verbs (V) (actions)
- adjectives (P) (qualifiers for things)
- adverbs (D) (qualifiers for actions)
LACS
LACS definition
A LACS is a Language-Anchored Concept Space.
A LACS is a (big) set of LAC.
It is a set of connected words and concepts.
Example of LACS (reviewed with more details below) include:
- Wordnet
- Wikipedia (and brother DBpedia)
- Wiktionary
- Freebase
And of course Assothink uses its own LACS, the Assothink LACS.
MoLACS and MuLACS
A MoLACS is a Mono-Language-ACS.
A MuLACS is a Multi-Language-ACS.
LACS compared
LACS may be described with various criteria
- categories handled
- MoLACS or MuLACS
- concept-centric or percept-centric
- size
The following table summarizes the properties of the most known LACS, compared to the Assothink LACS.
Categs | Mu or Mo | Size | Centric | Remarks | |
Wordnet | NPVD |
MoLACS english |
average |
Concept centric |
The brilliant precursor Concepts = synsets Weak multi-lang attempts Weak extensibility |
Wikipedia | N... | MuLACS | big |
Word-centric mostly |
Brilliant and rich Weak organization Good growth process and www.dbpedia.org |
Wiktionary | NPVD | MuLACS | big |
Word centric |
Briliant and rich Weak organization Good growth process |
Freebase | N(pvd)xxx | MuLACS | huge |
Hybrid Heterogenous
|
Anarchic linking Remarkably exhaustive Nice extensibility Uncontrolled growth see www.freebase.com |
Assothink | NPVD | MuLACS | small |
Concept centric |
The best is coming! Only 10K... 30K concepts |
The Assothink LACS is certainly not the biggest but it is the most demanding in terms on coherence and strength. This is necessary given the global goals of the Assothink project.
The Assothink LACS would be the first concept-centric full-NPVD MuLACS, so it is a pioneer.
LAC
LAC definition
The LAC is a Language-Anchored Concept.
A LACS is a (big) set of LAC.
LAC importance
LAC are organized differently in all LACS.
But the matching (convergence) of a LAC in LACS A with another LAC in LACS B is a critical process.
This matching process is realized easily and frequently by human beings. It is maybe a typical and major performance of the human brain. Actually something similar is done whenever a word si perceived by a human. "La tour prend le fou".
And this matching process is also the central part of the Assothink LACS building.
LAC content
A LAC unit is defined by
- the LACS it is part of
- a LAC identifier (an abtract, non-interpretable key)
- a set of LA (Language Anchor), one per language (1 in a MoLACS, many in a MuLACS)
And any LA contains whatever possibly links a concept to a given language:
- a 'main' word
- synonym words
- definition(s)
- example(s)
- optional hyper concept(s) (in concept-centric LACS)
- optional hyper word(s) (in percept-centric LACS)
- optional anto concept(s) (in concept-centric LACS)
- optional anto word(s) (in percept-centric LACS)
- optional context concept(s) (in concept-centric LACS)
- optional context word(s) (in percept-centric LACS)
- etc...
The LA contents are very different in all known LACS.
Building the Assothink LACS
The Assothink LACS is not built per se.
The main building tasks are integration and matching.
Integration
It is the process, from a given LACS, to interface, analyse, decode, classify, select, filter data from other LACS. This process is heavy in terms of managed data (the full wikipedia dumps are huge, and freebase is even much bigger).
An valuable example of integration is the integration of Wordnet by Freebase. It is not perfect, but it provides an excellent start point for nouns (but not for any other category of concepts).
Matching
It is the process of matching, from a given LACS, concepts present in other LACS to create a richer or more homegenous set of concepts.
The Wordnet matching in Freebase aims at exhaustivity.
The global matching in Assothink aims at homogeneity.
Fuzzy Logic
A matching result is necessary binary: 2 LAC from 2 distinct LACS are declared to describe the same concept - or not.
But the results gained in a matching process are widely imperfect because
- the source LACS use globally different approach, and different concept granularity. This implies that 1 LAC in LACS A matches (covers, includes) many LAC in LACS B.
- the matching process produces errors
Thus the binary results should not be gained thru a binary logic, but rather thru a fuzzy logic. This implies likelihood measures, scoring systems, acceptance thresholds, etc...
Constraints for the Assothink LACS
The Assothink LACS aims to be a square cross-referencing universe.
It is also concept-centric full-NPVD MuLACS
This implies
- NPVD category coverage
- concept-centric hyper, anto, context... linkings
- bijective cross-references with other LACS (whenever they possibly exist in other LACS)
- multi-language anchors for all concepts
The Assothink LACS integrates selected parts of all LACS listed above.
Practical building of the Assothink LACS
Practically, the integration process used to built the noun Assothink LACS uses mainly Wordnet, Wikipedia, Freebase (and language thesauri). Freebase delivers good matching data.
But the integration process used to built the verb/adverb/adjective Assothink LACS uses mainly wordnet and wiktionary.
The process details and the software description are not covered in this page. It is a wide, complex and evolving subject.