Target architecture

De Assothink Wiki
Aller à la navigation Aller à la recherche

Summary 

The target architecture di one of the possible Assothink architectures. Like the others, it mainly aims at implementing the excitation propagation model (EPM).

The hardware suited to adequately run Assothink is very simple but massively parallel, with millions of basic active components; highly interconnected.  

This kind of hardware architecture is not currently available.

The IPSE board architecture is suited.  

It is described below, and may be considered as a basic set of board specifications. It is also a set of specifications for pssible software emulators.    

This might be suggested to some company producing wide scale IC (integrated circuits) products. An expensive project! This would be the most achieved and performing version of Assothink.

Context and purpose

This is part of the Assothink project.

It is strongly linked to concepts of emergence in natural and artificial intelligence.

The target architecture is designed to perform tasks that conventional CPU and computers do not handle efficiently.

The 'computing' model used here is quite different from the classical computing systems.

The target architecture is aiming at associative computing.

Through associative computing, this kind of structure is likely to achieve / mimic tasks performed by biological brains.

Integration

The target architecture describes a board to be added to conventional computers, working with them as "integrated programmable synaptic engine" (IPSE).

It could also be the heart of a standalone new kind of computer (with specific devices, operating system,...), but this ambitious option is not considered here. 

Components

The IPSE board contains mainly 2 components:

  • a synaptic shared memory (SM)  
  • a set of synaptic micro-engines (working in parallel)

Each of the synaptic micro-engines includes  

  • a synaptic micro-engine processor
  • a local memory (LM) 

Interaction with hosting computer

The hosting computer control the IPSE thru

  • a set of C functions
  • a java package offering the same functions

The needed (low-level) functions are:

  • read from shared memory addresses
  • write into shared memory addresses
  • read from local memory of micro-engines
  • write into local memory of micro-engines
  • run 1 processing cycle of all micro-engines
  • start continuous cycling of all micro-engines
  • stop continuous cycling of all micro-engines

Other functions may be available as nice to have, they are not directly needed.

Interaction functions do NOT need to be as fast as micro-engine computing cycles.

Component description

Synaptic Shared Memory (SM)

The synaptic shared memory is a set of 32-bits addressable memory registers (32-bit integers)

It is a kind of RAM.

The number of registers in the SM is Nsm. Nsm is always smaller than 232, so a 32-bit integer is sufficient to identify any of the registers.

Each register is individually readable/writable by all micro-engines.

Local Memory (LM)

There is a local memory for each synaptic micro-engine. 

The local memory is a set of 32-bits addressable memory registers (32-bit integers)

It is a kind of RAM.

The number of registers in one LM is Nlm. Nlm is always (much) smaller than 232, so a 32-bit integer is more than sufficient to identify any of the local memory registers.

Each register is readable/writable by the micro-engine itself, and when the IPSE is not active, it is accessible (read/write) by the hosting machine.

Micro-engine

The number of micro-engine is Ne. Ne is always (much) smaller than 232, so a 32-bit integer is more than sufficient to identify any of the local memory registers.

Each micro-engine is able to read and write in its own LM, and in the SM.

The micro-engines perform cycling operations, see below.

Working cycle of the micro-engine processor

The micro-engines operates per cycle.

They work in paralle, synchronously (see discussion on synchronicity below).

They work on the SM, on their individual LM, and process some very simple calculations (add, subtract, multiply, absvalue()).    

Shared memory organization

The shared memory registers are written SM[i]. The index value ranges from 0 (included) to Nsm (excluded).  Logical and bitwise operators are not needed. 

Local memory organization

Assuming Nlm integer per LM, let us define K = Nlm/5.

Each LM contains:

  • K input addresses, written IN[i]
  • K output addresses, written OUT[i]
  • K output values, written SIG[i]
  • K permeability values, written PER[i]
  • 1 indexing value (identifying the local LM excitation publication address), integer, IDXEXC
  • 1 indexing value (identifying the local LM external excitation address), integer, IDXIN
  • 1 excitation value, EXC
  • 1 signal factor, FAC
  • 1 maximum excitation value, EXCMAX
  • 1 input sizing value, INSIZE (INSIZE is smaller than K)  
  • 1 output sizing value, OUTSIZE (OUTSIZE is smaller than K)  
  • possibly other values - less important

Steps  

Step 1 - Values are transferred and summed from SM to EXC in LM

EXC += SM[IDXIN] 
EXC += SM[IN[i]]         (for 0<=i<INSIZE)
EXC=min(EXC,EXCMAX)      

Note that this last operation may be computed without any explicit test using : r = y ^ ((x ^ y) & -(x < y))  or a+=((b-a)+|b-a|)/2

So the micro-engine processor should be able either to execute a test, or to use bitwise operators, or to use an abs() operator.

Step 2 - computation of output values (this needs to be updated according to EPM!!!)

SIG[i] = (EXC*PER[i])/FAC     (for 0<=i<OUTSIZE) 

Step 3 - Decrease of local excitation

EXC -= SIG[i]                (for 0<=i<OUTSIZE) 

Step 4 - Values are transferred from LM to SM

SM[IDXEX]=EXC
SM[OUT[i]] = SIG[i]           (for 0<=i<OUTSIZE)

Potential and practical workload, synchronicity discussion

Assuming the worst case OUTSIZE=INSIZE=K=Nlm/5, the number of atomic computations per cycle in one micro-engine is probably close to 3*Nlm.

But this calls a paradoxal and pragmatic comment. Most SM register values contain 0. Most EXC values are 0, or closed to 0, which implies that only 1 percent of the micro-engines is numerically active. The average INSIZE OUTSIZE values are much lower than K. So like a brain, an IPSE board contains a very high number of components and connections, but at any moment most of them are idle! 

So it might be possible to let most micro-engine sleep most of the time, when their EXC level is low. But what would be the benefit of this?

A variant is then:

  • allow micro-engine to work asynchronously
  • check the EXC level against a EXCMIN level at the end of step 1, and terminate the cycle immediately if EXC<EXCMIN.

This variant is not obviously better than the synchronous (parallel) version. It certainly decreases the number of SM write operations. It probably reduces power consumption. But it does not accelerate much the global speed of the most active micro-engines. And it requires complete asynchronous read/write/access to the SM. And asynchronicity certainly raises high technical problems. 

Non overlapping contract

  • The step cycle shows that the IN[] and OUT[] registers are set by the hosting computers, they are never changed by the computing cycles. 

The contract says:

  • A given SM address may not be present in the IN[] set of more than 1 of the LMs. 
  • A given SM address may not be present in the OUT[] set of more than 1 of the LMs. 

In other words, any shared memory address is writable by at most one micro-engine, and readable by at most one micro-engine.

As a result, the read and write operations of all micro-engines may operate simultaneously without conflict.

The non-overlapping contract is checked thru the functions of the hosting computer, not by the IPSE board.

Critical points

Two critical points are identified in the IPSE project.

  • The feasability of a very large number of very simple processing units.
  • The parallel (but non overlapping) simultaneous read/write operations by all processing units in the SM.

Annexes

Sizing and performance figures   

Here are the targeted numbers:

  • 106 cycles per seconds
  • Msm = 224 (~16 000 000) 

This implies 224 x 32 bits, thus 229 bits, thus 226 bytes, thus the equivalent of 64 Mb RAM. 

  • Nlm=212 (~ 4 000) 

This implies for each local memory 212 x 16 bits, thus 216 bits, thus 213 bytes, thus the equivalent of 8 Kb RAM.

  • Ne = 218 (~ 256 000) 

This implies for the sum of all local memories 230 x 16 bits, thus 234 bits, thus 231 bytes, thus the equivalent of 2 Gb RAM. 

Higher figures are welcome, but these are reasonably ambitious figures for a first target (quite small compared to neurons and synapses figures in bio systems, like mammal brains!).  

Emulation of the IPSE with classical programming

PG has written 2 simple emulation programs.

The first program emulates the IPSE board in Java (running JVM). 

The second program emulates the IPSE board in ANSI C.

The figures (to be compared with target figures) handled / allowed by the emulators are currently (end 2012):  

  • ~ 103 cycle / second
  • Nsm~ 219
  • Nlm~ 25 (average value permitted by dynamic allocation)
  • Nlm < 28 (limit value)
  • Ne ~ 216 (working sequentially, not in parallel)