Architectural overview

The (new) CastleCompiler(s) share a common “pipeline” architecture. Which is flexible, both in functionality and implementation, as they share the AIGR –see below – for input and/or output.

AIGR pipeline

The pipeline starts with source, in text format, that is explored by the Readers [ToDo] and translated in the Abstract Intermediate Graph Representation. The next components all read this format, like the Transformers, that transform it to a “better” form – see later. Last, the AIGR is converted into a binary by (one of the) the Backends [ToDo]
Most Backends [ToDo] consist of two main parts: the Writers [ToDo] (which is part of CCastle), and a Translator: an external compiler that translate/compiles the generated intermediate code into a binary.

$@startuml skin rose !include AR_skins.inc ' it stat with txt-src file "*.Castle" as f1 file "*.Moat" as f2 ' those files are input () "files" as txt f1 --> txt f2 --> txt package "CCastle Compiler" as CCC { txt -> CCC () "AIGR" as a1 () "AIGR" as a2 [Readers] [Transformers] Folder Backends { [Writers] [Translators] Backends #->Writers () "files" as txt2 Writers -( txt2 txt2 )-> Translators } Readers -( a1 a1 )-> Transformers Transformers -( a2 a2 )-> Backends } Translators 0)-> bin @enduml$

As the AIGR is a format (not a call-interface!), this architecture gives flexibility on deployment of the components. A simple (Castle)compiler can hold them as (plug-in) libraries in one process.
Alternatively, each component can be a process, where te several processes communicate with e.g. unix-pipes, or network-connections). And there are many more option, not only separated in space, but also in time: As the AIGR can be serialised [1], it is possible to save it in file, and read it for the next component, later …

Important

Although it is possible to saving a (pickled) AIGR, that component (nor action) is NOT a Writer!

One should typically speak about “saving” the AIGR, and “loading the (AIGR) file”. It is a feature of the The AIGR auxiliary component (see below).

The Reader(s)

A typical reader reads (some) source-files and then translate that, in a few steps, into the Abstract Intermediate Graph Representation, as shown below.
The mockReader [ToDo] is different: it does output (needed) ‘AIGR’, and so can act as the starts of a pipeline (and therefor considered as a ‘Reader’), but has no input.

$@startuml skin rose !include AR_skins.inc frame "CCastle Compiler" as CCC #c0c0c0 { package Readers #white { node typicalReader { [parser] [analyse\n(ast)] as ast_ana [AST 2 AIGR] as AST2AIGR () "AIGR" as aigr1 [analyse\n(aigr)] as aigr_ana () "AIGR" as aigr2 parser -> ast_ana : AST ast_ana -> AST2AIGR : AST AST2AIGR -> aigr1 aigr1 -> aigr_ana aigr_ana -( aigr2 } TXT -> typicalReader node mockReader { [TestDoubles/\nAIGR] as mock () "AIGR" as aigr3 mock -( aigr3 } typicalReader -[hidden]down-> mockReader } @enduml$

Some sub-components in the ‘Reader’ may also work on the ‘AIGR’, as shown. The difference (to a ‘Translator’) is simple: the ‘Reader’ should do all error-checking, etc, to make sure the inputs (so the code of the developer) is valid. A normal Translator (nor the ‘Backend’) should ever find errors.
When implementing that (Reader) functionality is more convenient as after converting the ATS into the AIGR an “AIGR-analyser” is build.

Transformers

All ‘Transformers’ receive and post (a “dialect” of) the`AIGR`, pushing it into a form that van be handled by the backend to create an efficient binary. There can be many Transformers, and typically several of them are run in sequence. Other sets of Transformers exclude each other.

A Transformer is often triggered by one of the Rewriters (TODO)

We show two examples.

@startuml
skin rose
!include AR_skins.inc
left to right direction

''NOTE: Old RTD/plantuml.1.2020.2.jar syntax!

!procedure $comp($name)
''!function $comp($name)
!$in = $name + "in"
!$out = $name + "out"

[$name]
() "AIGR" as $in
() "AIGR" as $out

$in )-- $name
$name --( $out
!endprocedure
''!endfunction

frame "CCastle Compiler" as CCC #c0c0c0 {
folder Transformers #white {
package FSM {
$comp("FSM.NFA_2_FSM")
$comp("FSM.SuperStates")
' $comp("FSM.Epsilon")
$comp("FSM_2_Routine")
}
package Machinery {
$comp("DirectCall")
$comp("LibDispatch")
$comp("DDS")
}
package "more ..." as m {
$comp("...")
}
}
}

@enduml

FSM

In Castle, one can directly describe a FSM (see: FSMs are needed) including advance/extended variants. Like the non-deterministic “NFA”s, and the “State-Charts” (known from UML), with orthogonale regions and hierarchically ‘superstates’. See Castle has generic FSM synt... (U_FSM_Syntax) for the demands.
Those FSM are initially stored “asis” in the AIGR, and step-by-step rewritten by several FSM-Transformers.

The FSM.NFA_2_FSM Transformer reworks a NFA into a (bigger, deterministic) FSM. This is a well know algorithm, such that non-deterministic edges and epsilon-transitions are gone.
The resulting AIGR has the same functionality, but is simpler to translate into a binary

Similarly, the FSM.SuperStates Transformer can “flatten” complex hierarchical FSM’s into ones that are easier to translate into executable code.

This is an examples of set of Transformers that can work collectively. First, remove the non-determinisms, then handle the SuperStates and completely transformer the (simple) FSM into regulair routines.

Machinery

Caution

The Machinery part is still in development. And so, it’s not sure that the Machinery will be implemented as a Transformer!

‘The Machinery (ToDo)’ is an abstraction of the technology to connect ports and send data (like events) over them. Several implementations are possible, like direct function-calls, dispatching to concurent thread-pools, or distributing them over a network.

Typically, one wil only use one Machinery: connecting two port with DDS, ‘sending’ an event by dispatching it whereas the receiving event-handler expect a traditional call, will not work. By choosing one Machinery-Transformer, all bolt-and nuts will fit.

Caution

This is not an requirement!

One can imagine, that (eventually) a Mixed-Machinery is used. Ultimately, only the details of each (individual) connection should be aligned.

Advanced example

The Machinery-kind can be seen an attribute of the (super)component that holds the connections (and sub-component, with ports). By using that Machinery for those connections, it will work.
But the (external) ports and connections of that super(component) can use another Machinery; when the supper-super-component’s machinery-kind attribute has another value.

Again, this makes it complicated. But it gives flexibility: for deep-down connections we might prefer direct calls, but use concurent options at a (bit) higher level. And use maximal decoupling for networking-applications.

Note

Here, we see an example of having the “connected/concurent components” abstraction and ‘The Machinery (ToDo)’ abstraction.

The CCastle code, use “components, ports and connections” only. Later, we compiling it, the details of the Machinery are added.
And by implementing it in/as a Transformer we can add more-and-more advanced options without the need to change the source. Only some (global) “compiler options” have to be improved. (many) sor

Writers & Backends

The ‘Backends’ read the (simplified & optimised) AIGR and transform it to a binary that can be executed. Typically this is a two-step approach: A ‘Writer’ renders the AIGR into a low-level intermediate [2] language [3]

The interface between the ‘Writer’ and the ‘Translator’ is typically file-based, and depend heavily on the chosesn ‘Translator’ – which is not part of CCastle.
As those two are very depending on each other, there is little commonality between various Backend variants.

Some examples

The rPY: Use (r)Python as backend backend/writer renders to RPython, such that (PyY’s) rpython-translator can handle it. The intermediate file-format is fully described by RPython: the rPY: Use (r)Python as backend Writer needs to emit exactly that format.
CC2Cpy (now defunct [4]) generates standard C code, that can be translated into binaries by many (standard, C) compilers. So, it’s a bit more generic (then RPy), but still the writer is limited to C – and so has to emulate namespaces, as that isn’t handled in C
A possible variant is using C++, both as lll and translator (but as I’m not a a fan of it, somebody else has to make it
Both mentioned writers are implemented in python, for now (that is the ‘py’ part of CC2Cpy).
Future variants of those Writers will be implemented in Castle itself. This does not change the input (AIGR) not the output of the Writers (rpython and C). And such we can use easily upgrade the Backend as the Translator does not change.

The AIGR auxiliary component

The AIGR auxiliary component describes & handles the ‘Abstract Intermediate Graph Representation’ and is used by all regulair components. Sometimes it (the AIGR) is called an “intermediate language” (‘IM’), or “intermediate representation” (‘IR’). Many existing IMs are quite “flat”, low-level and very operational, like RTL or SSA – they are great to convert code to assembly.

@startuml
!include AR_skins.inc

folder "AIGR example" #c0c0c0 {
object StartSieve <<EventProtocol>>
object runTo <<Event>>{
max: int
}
object newMax <<Event>> {
max: int
}
StartSieve *-- runTo
StartSieve *-- newMax

object SlowStart << EventProtocol>>
object queue_max <<TypedParameter>> {
:int
}
object setMax <<Event>> {
setMax :int
}
SlowStart *-- setMax
SlowStart *- queue_max

object "SlowStart(1)" as SlowStart_1 <<ProtocolWrapper>> {
queue_max=1
}
SlowStart <-- SlowStart_1: based on

object SimpleSieve <<EventProtocol>>
SlowStart_1 <-- SimpleSieve: based_on

object input <<Event>> {
try :int
}
SimpleSieve *--input
}
@enduml — The *Sieve Protocols*, as example of a AIGR (part)

For the CCastle Workshop Tools, a more abstract representation is chosen, with more structure. Visually, it resembles a tree, but without the need to have a single root (making it a “forest”), and with interconnects (making it a graph). Structurally, it is not dissimular to the XLM/DOM, known by many webpages; but again without the “single document-root” – the DOM has interconnects, known as “links” (a term not used in the AIGR).
The AIGR reminds also to the AST (of CCastle), after all, each language construct is “stored” in the AIGR. Some see it as a semantically parsed AST. A namedID (like a variable, or function) in the source, when in the same namespace denotes the same artifact – even it is mention at several places (and can have aliases). In the AIGR it is the same ‘node’ having multiple incoming ‘edges’ – and so, violates the tree’s non-cycle rule.

The AIGR-component describes all possible elements, and the relations (so it is a bit like the XML DTD or Schema). And has (will have) general routines to facilitate handling the AIGR.
By example, you can expect routines to “save” an AIGR to file, and “load” is later.

Warning

Although the AIGR is a graph, the AIGR-component will not be able to visualize that graph.

Other workshop tools may do that, and probably use the AIGR-component to read it. The visualising is part of that tool. For the example above an manual conversion to plantUML is made.

Currently, the AIGR is in the design phase, and may change.
For that reason, only a Python dataclass reference model is available (Work-in-Progress). The (unit & behaviour) tests and TesDoubles make it quite understandable. Once, it will be fully documented (and versioned) And available for multiple languages (including Castle :-)

Footnotes

Comments

comments powered by Disqus