DPDM Architecture

DPDM among three possible architectures

DPDM stands for "Distributed Processing/Dispatching/Monitoring".

The 3 possible architectures are:

Java emulation of assothink on 1 machine, 1 process, multiple threads. This is the first tested architecture of assothink. It ran from 2010, on the Matscape machines.
The dream: a dedicated set of millions of microchips, each working autonomously. This might be suggested to some company producing wide scale IC (integrated circuits) products. This would be the most achieved and performing version of Assothink.
Intermediary, DPDM. Many machines connected in a TCP-IP network, one or many small process per machine. Java emulation.

Its is important to note that the passive jelly construction has nothing to do with these 3 architecture. The construction of the passive jelly is a previous step.

Why DPDM?

The idea of DPDM comes from the new availabilty of very small (and very cheap ; less than 50$) small Linux computers. These small computers have the size of a credit card or less. They mainly have a network and some USB interface. They run Ubuntu and have 0.5 Gb of RAM. (This is written in October 2012!).

It becomes then possible to organize .. 8 ... 256 ... connected small machines to emulate Assothink at a reasonable price.

Besides the very small computers, there is also the possibilty to use obsolete cheap computers with Linux & JVM installed. However this would have 2 drawbacks: (1) heterogeneity in the network (2) power consumption maybe significantly expensive compared to hardware.

Conclusion : DPDM offers reasonable pricing ans scaling capabilities for an intermediary solution.

DPDM components

The DPDM components are

1 ADU (associative dispatching unit)
n (... 8 ... 256 ...) APU (associative processing unit)
0 or some AMU (associative monitoring unit)

All components are able to send messages using a message delivery system (MDS)

MDS (message delivery system)

Instead of the expected request answer scheme, a message delivery system is chosen. Request-answers are eliminated because in most case the time required to produce the answer would result in the requester side being blocked and waiting, or forced to use multi-threading to process all concurrent dialogs.

Actually the MDS is realized this way:

all APU and the ADU are socket servers (they have a thread dedicated to this).
all answers given by the socket servers are immediate and minimal
the full answer formally comes in a delayed message available (and sent) later
the message transmission is 'asynchruonous'

A specific class netdaemon.java (Assothink network daemon) is the parent of the ADU and APU classes. Components of this class have 2 threads, an abstract running class, a message input stack, etc... The listening thread reads and process incoming messages, and uses a ServerSocket. The writting threads achieves most computations and send messages using a client Socket. Most tech doc is available in the java.net package documentation (http://docs.oracle.com/javase/7/docs/api/java/net/package-summary.html).

ADU

The ADU receives, sends, dispatches excitation signals between al the l APUs.

It does not compute much.

It keeps a copy in memory of

signals between nodes
excitation levels
list of APU (net adress, node set)
global vars

The associative dispatching unit requires

significant memory
very efficient nework I/O
average CPU capability

APU

The APU computes excitation levels of the nodes, receives and sends excitation signals from and to the ADU.

The ADU requires

some memory, proportional to the number of nodes, maybe not so much
efficient socket networking as server
CPU intensive resource : integer and floating point computations as fast as possible

The APU process runs 2 threads.

The listening thread receives and processes all messages.
The computation thread actively updates excitation levels and sends excitation output.

The message received by the APUs are various:

input excitation
permeability updates (possibly)
directives
a set of monitoring data requests to be reported (speed measure, memory measure, cycle count...).

The message sent by the APUs includes

output excitation the excitation signals to be propagated
the monitor data to be delivered according to the request

The typical directives are

halt (force)
restart (force)
hibernate (force)
resume from hibernation (force)
updates of tech parameters interactively set in the UI of the AMU (for instance number of computing cycle to be processed between message sending, smoothing constants, etc...)

The only message partner of any APU is the DPU.

The APU is I/O oriented, much more than computation oriented!

MPU

The MPU is the only interactive component of the arcitecture.

Its main purpose is the graphical display of the DPDM Assothink components.

The MPU sends and receives messages to the DPU.

It is also used to

start, stop, configure the parameters
view excitation levels and signals
monitor all kind of stats for the DPDM components.
create specific excitation inputs
update permeability figures

Overview of DPDM component relationships

To summarize the DPDM relationships

APU and DPU are necessary daemons
MPU are optional, intermittent interactive programs
APUs and DPU act as servers for the MDS
DPU speak to all APU
APU only speaks to DPU
MPU has a user interface and dialogs with the DPU

Hardware

All computers are on the same LAN, preferably on the same switch.

The DPU and MPU(s) run on standard computers. The DPU should receive significant resources (network I/O and memory).

The DPU machine includes an apache server.

The MPU run within browsers (GWT app).

The numerous APUs run on very small computers without screen, accessible with remote shells for admin, and thru sockets. Regarding software, these machine just need a linux OS, a JVM, network connectivity, and the mounting of a shared disk present on the DPU (DPDM shared disk).

Network bandwidth analysis

2 options have been considered for the transmission of excitation:

during any cycle, all APUs send excitation levels to all other APUs (all-to-all). With this option the number of packets sent per cycle is roughly Nu2.
during any cycle, all APUs receive excitation levels from the DPU, and send excitation levels to the DPU (all-to-1-to-all). With this option the number of packets sent per cycle is roughly 2 x Nu.
The second option has been selected. It is probably slower for small installations, but for larger installation, when Nu increases, it is more performing (considering also that the computing time decreases as the opposite of Nu).
But a critical point with all-to-1-to-all is : when should the DPU send excitation data to the APUs? The answer is complex, and depends on many factors (Nu, Nn, unit CPU capacity, switch and network capacity...). It will be answered later. In the meantime, a first answer is : as quickly and frequently as possible, as long as the network bandwidth is not saturated.

Network controller requirement

The net controller of the APU and DPU is critical.
A 100 Mbit controller is not enough (it sends & receives 100 Mbit/sec, or 100Kbit/msec, or 12 Kbytes/msec...).
A 1Gbit controller is goodh (it sends & receives 1000 Mbit/sec, or 1000Kbit/msec, or 120 Kbytes/msec...).

Power consumption

The power consumption of a typical APU (Raspberry Pi B) is 3.5 watt. With a price of the KWh at 0.20 €/KWh, this implies the daily cost of an Assothink DPDM set (APU part) is:
3.5 x 24 x N_u x 0.20 /1000 €/day = 0.0168 x N_u €/day
[ As a lateral consideration, let us note that for a device whose price is (for instance 35$), the power cost is as important as the investment cost after 35/0.0168 days, thus around 6 years. ]

APU memory requirement

We assume that
the emulator should handle N_n assothink concept nodes
the DPDM structure includes N_uAPU computers
an average value of N_ln links per node (N_ln is close to ... 30..).
the process memory much requires more memory for the nodes than for the code and for working vars (code + working vars < 10 MB)
the OS consumption is below 20 MB
The number of nodes handled by one APU is N_nu = N_n / N_u.
The input memory (excitations) requires 4 bytes per local node, thus 4 x N_nu bytes.
The excitation state requires also 4 bytes per local node, thus 4 x N_nu bytes.
The output channels require 8 bytes per link, thus 8 x N_ln x N_nu bytes.
The output memory (outgoing excitation levels) require 4 bytes per global nodes, thus 4 x N_n bytes.
So globally the required memory (in bytes) is M = 3x10⁷ + 8 x N_nu + 8 x N_ln x N_nu + 4 x N_n.
Taking 30 as value for Nln, this becomes M = 3x10⁷ + 248 x N_nu + 4 x N_n
and then
M = 3x10⁷ + (248 / N_u + 4 ) x N_n
or
M = 3x10⁷ + (248 + 4 * N_u ) x N_nu
or
M = 3x10⁷ + 248 * N_nu + 4 * N_n
For a small installation (N_n = 10⁶ , N_u=10, N_nu = 10⁵), the memory consumption turns around 60 Mb.
For an average installation (N_n = 10⁷ , N_u=10², N_nu=10⁵) the memory consumption turns around 100 Mb.
For a wider installation, the last term is critical. If N_n reaches 10⁸, the memory consumption exceeds 450 Mb, for any number of APUs.
So practically 512 Mb should be comfortable up to 10⁷ nodes, and the memory size of the available small computer is OK.

Disks

The APU do not need significant disk usage. The ADU needs at least the /DPDM disk.

Shared disk /DPDM

The shared disk (physical on the ADU, mounted on the APUs) contains at least
/DPDM/class : all class files
/DPDM/wake : wake up, wake down, pid files
/DPDM/cfg : the DPDM config file
/DPDM/pkFile : the pk formatted Assothink data
/DPDM/pkWorks ...
/DPDM/trace ...

APUs: suggested hardware

A good candidate (but neither the only nor the best one) is the raspberry pi model B.

Linux, but NOT Ubuntu
512 Mb ram, not more
100Mbit ethernet (NO gigabit!!!)
various JVM available, best from oracle
35$ (!)

More details on raspberry pi provided by wikipedia.

Alternatives: beagleboard, beaglebone, pandaboard... (generally more expensive).

Wake-up daemon

To java classes used by the APUs are present on the DPDM disk.

The APU permanently run a wake-up daemon, wich performs quite simple tasks (every seconds):

check if the local APU process runs (with a minimal alive request).
check the DPDM disk, and check the presence of wake-up signal files (/DPDM/wake/<host>.up) and wake down signal files (/DPDM/wake/<host>.down). Either of the 2 files should exist at any time. They are provided and deleted by the DPU.
in case of discrepancy between the APU status and the file status, launches the APU or kills it.

(the need to separate the APU from the wake-up daemon comes from the fact that in development mode, the java classes are frequently updated, and the updated APU daemon should frequently restart with modified sources).

When the APU starts it creates a pid file (/DPDM/wake/<host>.pid) in the shared sisk to annouce its PID number. This is necessary to permit to the wake-up daemon to possibly kill it.

The wake up daemon should be designed to consume minimal resources.

DPDM config file

This file contains a set of java properties.

The most important properties are

DPU host name & IP address
set of APU host names
node range for all APU

DPDM cycle speed

A DPDM cycle consists on several computations performed on the APUs, and numerous IP packets to be sent between the various machines working together.

During 1 cycle, more than 2xN_u data packets are transmitted.

The DPDM architecture is designed to achieve improved efficiency, i.e. 1 cycle per msec (1000 cycles per seconds) with 10 APUs, and hopefully ..5..10.. cycle per msec (10000 cycles per seconds) with 100 APUs.

This is to be compared with the basic architecture, delivering (after string optimization) a computation cycle around ... 10 ... msec.

But, this question is critical: would network bandwidth and capabilities be a limiting factor for the DPDM architecture (more than the APUs cpu speed)?

It is assumed here that the LAN would be able to propagate 2xNu packets per msec. For 100 nodes, it should be able to reach 200 000 packets per seconds. See for instance this image for Gigabit controller performance taken from http://wiki.networksecuritytoolkit.org/nstwiki/index.php/LAN_Ethernet_Maximum_Rates,_Generation,_Capturing_%26_Monitoring This to be explored!...

If network bandwidth becomes a limiting factor, several optimzations might be considered:

organize the APUs in K subsets, use K switches and K network controller on the ADU (traffic division)
work and optimize the emission process of the APUs to allow less (but bigger packets). This in turn has a negative effect on signal propagation speed.

Network : packet number and packet size analysis

The key question is : is it necessary to use APU with gigabit ethernet ports?

		Typical target	Remark
Number of nodes	N_n	80 000	Later 10 time more ?
Number of APUs	N_u	40	Depends on unit cost, CPU perf
Frequency	F	1000 Hz (1 msec / cycle)
Active node ratio	A_r	0.005	Active (excited) node Number / total node number
Active node total number		500
Nodes per APU	N_nu=N_n/N_u	2000
Links per nodes	N_ln	20	Average value
Memory : signal out buffer	4xN_n bytes	320K	1 integer per target
Memory: signal in buffer	4XN_nu bytes	8K	1 integer per local node
Active nodes / APU	N_au = N_nu x A_r	10
Signal-out generated / cycle	S_out=N_au x N_ln	200	actually less (redundancy is possible)
Bytes/signal		4	32 bits (20 to identify target, 10 to specify signal strength)
Bytes-signal-out / cycle	B_out=S_out x 4	0.8Kbyte
packet-out (on 1 APU) / sec		1000 / sec
bytes-out (on 1 APU) /sec	B_sec = F x B_out	0.8Mbyte 6.4Mbit	bytes of data
bytes-in (on 1 APU) / sec	4 x N_nux F x A_r	0.016Mbyte 0.128 Mbit	maybe optimistic?
Total data bandwidth (per APU)	B_apu	< 8 Mbit	^{_{should be correctly covered with a 100 Mbit connection for APUs (but data framing modfies perf)}}
Total data bandiwdth (per DPU)	B_dpu=B_apuxN_u	100Mbit

Limits

The DPDM archicture works within a LAN. Using distant computers through the web would produce poor results, because transmission delay would be too slow (a cycle being less than 1 msec). However the MPU may work trough the web.

Cost analysis

The goal of the DPDM architecture is to deploy Assothink in a cost-effective way.

With the basic architecture, one full computer (1000$) delivers around 100 cycle per second. And besides the computing time, there is a UI drawing time. (cfr performance figures in the browser page). The price ratio is 10$ per Hz.
With the DPDM architecture, 10 APUs (10x35$ + 500$ (switches, monitoring equipment)) would provide at least 1000 cycles per second, and possibly 10000 cycles per second with 100 APUs. This is based on (a) a slower simple CPU, but also (b) a smaller number of nodes to process. The price ratio is less than 1$ per Hz, thus 10 times more cost effective than the basic architecture.

As a conclusion, if the cost analysis and the tech analysis are correct, the DPDM deserves certainly development efforts.

Development

PG assumes that 2 months of intensive works would produce the concersion between the basic and the DPDM architecture. That is costly!

APU software summary

The APU software is writen in Java (or in C???)

The APU software includes 2 threads.

One thread is mainly a SocketServer, answering to various kind of request (see above), idle most of the time.

The other thread is CPU intensive and works on excitations and signals.

The APU memory contains mainly N_nu nodes (mainly an excitation level), N_nu x N_ln half-links, N_n output values, N_nu input values; also various parameters, and various variables to be reported to the ADU and MPU. Realistic values are computed above.

During the initialization phase, the link structure is loaded, and possibly excitation states (previously saved states).

The computing cycle performs simple tasks:

injection of input signals into excitations (signals reset then to 0)
computation of output signal values
sending (as socketClient) of the output messages
reading of socket answer, loading of input signals
excitation decrease (exponential decrease)

The code is very simple and very small. Probably the I/o consumes much more than the computations (about Nln x Nnu x Ar {multiplication,addition} per cycle, thus much less than a msecs.

Interesting Links

http://www.southampton.ac.uk/~sjc/raspberrypi/pi_supercomputer_southampton.htm

http://westcoastlabs.blogspot.co.uk/2012/06/parallel-processing-on-pi-bramble.html

http://en.wikipedia.org/wiki/Raspberry_pi

DPDM Architecture

Sommaire

DPDM among three possible architectures

Why DPDM?

DPDM components

MDS (message delivery system)

ADU

APU

MPU

Overview of DPDM component relationships

Hardware

Network bandwidth analysis

Network controller requirement

Power consumption

APU memory requirement

Disks

Shared disk /DPDM

APUs: suggested hardware

Wake-up daemon

DPDM config file

DPDM cycle speed

Network : packet number and packet size analysis

Limits

Cost analysis

Development

APU software summary

Interesting Links

Menu de navigation

DPDM Architecture

DPDM among three possible architectures

Why DPDM?

DPDM components

MDS (message delivery system)

ADU

APU

MPU

Overview of DPDM component relationships

Hardware

Network bandwidth analysis

Network controller requirement

Power consumption

APU memory requirement

Disks

Shared disk /DPDM

APUs: suggested hardware

Wake-up daemon

DPDM config file

DPDM cycle speed

Network : packet number and packet size analysis

Limits

Cost analysis

Development

APU software summary

Interesting Links

Menu de navigation

Rechercher