========================
Implementing SYNC Guards
========================


ALTing BARRIERs
---------------

The algorithms and data-structures for managing ALTable BARRIERs are more
expensive than those for committed barriers.  To constrain such costs only
to where they are needed, a distinct primitive type is proposed:

  ALT.BARRIER a, b, c:

These barriers have the same syntactic mechanisms as committed BARRIERs -
i.e. we can SYNC/ENROLL/RESIGN on them.  MOBILE versions are also allowed,
with the same rules for mobility (when communicating, assigning and passing
to FORKed processes).

The additional mechanism is that SYNCing on them can be used as an ALT guard:

  ALT
    SYNC a
      ...  response to completing the `a' barrier
    SYNC b
      ...  response to completing the `b' barrier
    SYNC c
      ...  response to completing the `c' barrier
    in ? x
      ...  ordinary guards are allowed as well
    tim ? AFTER timeout
      ...  ordinary guards are allowed as well


No PRI ALTs on SYNCs
--------------------

What is not allowed is the use of a SYNC guard in a PRI ALT.  This is because
different processes may set different priorities for the same guards and these
may clash.  For example:

      PRI ALT                                     PRI ALT
        SYNC a                                      SYNC b
          ...              [in PAR with]              ...
        SYNC b                                      SYNC a
          ...                                         ...

In the above, if the two processes shown were the only ones enrolled on `a' and
`b', complete sets of offers are in place for both barriers.  The left process
should choose `a' and the right `b'.  Both processes must choose the same barrier,
but their preferences are not resolveable.

The above shows a direct conflict between two processes.  More subtle conflicts
can arise where there are no direct conflicts between any two processes:

      PRI ALT              PRI ALT              PRI ALT
        SYNC a               SYNC b               SYNC c
          ...                  ...                  ...
        SYNC b               SYNC c               SYNC a
          ...                  ...                  ...

In the above, each barrier has just two enrolled processes and a complete set of
offers.  Each process wants to choose a different barrier!

Prioritised choice could be allowed if the priorities were set on the barriers,
rather than by the separate processes involved in their separate ALTs.  Then,
a consistent choice could be made (so long as all barriers on offer had distinct
priorities).  [I think Gerald Hilderink once suggested putting priorities on
the channels, rather than processes or ALTs?]  Let's leave this for now.


Mechanics
---------

These are based on those presented by the serial "oracle" protocol given in:

  mm-sync-oracle-0-1.csp
  mm-sync-oracle-1-0.occ

Each ALT.BARRIER has the same data-structure as our existing BARRIER:

  RECORD
    INT reference.count, enrolled.count, count.down:
    INT queue.front, queue.back:                       -- "wait-set" management
  :

Initialising and managing the count fields for declaration/construction and
ENROLL/RESIGN is exactly the same as for normal BARRIERs.

A simple (non-ALTing) SYNC on an ALT.BARRIER is allowed, but needs to be treated
as a one-choice ALT.

The queue pointers manage a *set* of processes that have made offers to SYNC
on this barrier.  Because a process may make such offers to many barriers,
it may be on many such waiting sets.  Therefore, we cannot use the normal
run-queue process workspace links for this set (as we do for normal BARRIERs).

Above, we assume this wait-set *is* a queue of nodes, where each node has just
two words: the workspace of a waiting process and a pointer to the next node.
Nodes can be dynamically allocated and de-allocated very rapidly, using the
normal mobile-space mechanisms.

Note: this waiting set need not be managed as a queue.  A dynamically sized
array may be more efficient - there are lots of possibilities.  See section
on optimisation later.

An ALT over `n' guarded processes is compiled into the sequence:

            ALT_BARRIER_START
            IF (ENABLE_<GUARD> <0>) JUMP D<0>
            IF (ENABLE_<GUARD> <1>) JUMP D<1>
            ...  etc.
            IF (ENABLE_<GUARD> <n - 1>) JUMP D<n - 1>
            ALT_WAIT
  D<n>:     DISABLE_<GUARD> (n - 1)
            ...  etc.
  D<2>:     DISABLE_<GUARD> 1
  D<1>:     DISABLE_<GUARD> 0
  D<0>:     ALT_BARRIER_FINISH
            JUMP <chosen.guard>

(or some optimisation thereof!).

Here is the logic for ALT_BARRIER_START:

  SEQ
    chosen.sync := -1    -- implies a SYNC guard has not (yet) been selected
    ...  save current priority (somewhere in workspace)
    ...  pop into "system-high" priority
    ...  ALT_START logic

where "system-high" priority is some level higher than anything directly settable
by an application process.  There are two reasons for this.  One is to ensure
that this ALT sequence is executed without interruption by anything that might
be operating on the same barrier structures.  The other is to ensure that,
if this process has to block at the ALT_WAIT, it is scheduled by the releasing
process to perform its DISABLE sequence (again uninterrupted) *before* any other
code -- apart from the DISABLE sequences of other processes released at the same
time.  The crucial property is that the DISABLE sequence of the releasing process
and the DISABLE sequences of *all* the released processes are executed in one
uninterrupted sequential block.  But see the section on Serialisation below.

Maybe process priorities are not the best way to ensure this - they don't work
for SMP/mulit-core platforms!  So, maybe some kernel lock has to happen?  Again,
see the section on Serialisation.

For a uni-processor platform without priorities, the "crucial property" can just
be arranged through careful scheduling in the release code (below).

Here is the logic for:

  ENABLE_SYNC b

where `b' is an ALT.BARRIER:

  IF
    -- the following test is in case a barrier is used more than once in the ALT
    ...  our workspace is in the wait-set for `b' (e.g. on front of wait-queue)
      ...  return FALSE (`b' is not ready to fire)
    TRUE
      SEQ
        b[count.down] := b[count.down] - 1
        IF
          b[count.down] > 0
            SEQ
	      ...  add our workspace to the wait-set (with D<n> as resume address)
	      ...  return FALSE (`b' is not ready to fire)
	  TRUE
	    SEQ
	      chosen.sync := <this-one>
	      b[count.down] := b[enrolled.count]
	      ...  schedule all processes in wait-set and remove from wait-set
	      ...  return true (`b' is ready to fire)

Here is the logic for:

  DISABLE_SYNC b

where `b' is an ALT.BARRIER:

  IF
    b[count.down] = b[enrolled.count]
      chosen.sync := <this-one>          -- at most, one will be chosen by this
    TRUE
      ...  remove our workspace from the wait-set

Note that ENABLE_SYNC/DISABLE_SYNC may set `chosen.sync', while any of the other
ENABLE/DISABLE varieties (channels and timers) set `chosen.guard'.  If a SYNC is
a completed guard (and at most one can be), it takes precedence over any other
ready guards.  This a legal refinement of the non-deterministic choice defined
for the semantics of the (un-prioritised) ALT.  This rule ensures that all other
processes offering the multiway synchronisation make the *same* choice.

The rule is implemented by ALT_BARRIER_FINISH:

  IF
    chosen.sync >= 0
      chosen.guard := chosen.sync
    TRUE
       SKIP

Doing it this way means no change to the existing ENABLE/DISABLE channels and 
timers (and zero impact on their performance).


Optimising the Mechanics
------------------------

Most of the above logic is unit-time and trivial.  The exception concerns some
management of the wait-set.

                          ------------------------

The simple implementation of the wait-set as a linked list of nodes holding
workspace pointers is not too bad.  Generating a new node and pushing it on
to the list is (small) unit time.  Before doing that, we must check to see
if our workspace is already there.  Again, that requires no search since,
if it were present, it would be at the front.

Scheduling all processes from the wait-set is not unit-time (unlike doing the
same from a non-altable barrier), but only adds a small extra to the per-process
cost of doing the ALT.

Removing a workspace from a wait-set, which has to be done by each DISABLE_SYNC,
requires a search.  Optimising that away would be good.

                          ------------------------

Note: in the occam model of this protocol (file: mm-sync-oracle-0-0.csp), we had
constant enrolled counts.  Even better, the ALTing processes were represented
by `id' numbers that never changed.  So, we just kept a fixed set of enrolled
processes for each barrier and the ALT logic only needed count-downs (and no
wait-sets).  We can't get away with that here (?) since the workspace pointer
of a serial process changes (as procedures are entered/exited) and we need the
current one when a process offers a barrier.

                          ------------------------

There is one wait-set for each (altable) barrier.  It is empty if no process
is offering to synchronise.  Its maximum size is dynamic but always known -
`(b[enrolled.count] - 1)', for barrier `b'.

Idea: allocate a dynamic array with size large enough to hold a full wait-set.
The next power-of-two larger (or equal to) the `reference.count' will be more
than enough.  I've chosen `reference.count' rather than `enrolled.count' since
it is slightly less volatile.

This array holds either nulls or workspace pointers for SYNC-offering processes,
filling up from index zero.  At the start of an ENABLE_SYNC, the index of the
next free slot (for adding a new workspace) is `enrolled.count - count.down'.
So, checking and adding a new workspace is fast.  Scheduling wait-set processes
is also fast.  DISABLE_SYNC still needs a search.  What has been saved is the
generation of a new node by a failing ENABLE_SYNC (which is most of them) and
the de-allocation of all wait-set nodes by a succeeding ENABLE_SYNC and the
de-allocation of a node by each DISABLE_SYNC.  That's probably worth saving.

This is not completely free.  If the `reference.count' for a barrier grows larger
than that power-of-two size of the array, a new array (of sufficient size) must
be allocated, and the old one copied over and de-allocated.  This must be checked
every time that `reference.count' grows: PAR enrollment, FORKing, communication
and assignment.  But such events are very rare compared with the executions of
ALTs (and associated ENABLE_SYNCs/DISABLE_SYNCs).

                          ------------------------

Can we remove the DISABLE_SYNC search?  All processes referencing a barrier hold,
obviously, the same reference.  If they could also hold an *individual* index
that gave them *their* slot in the wait-set array held by the barrier, removing
their workspace from the wait-set no longer needs a search: just zap the slot to
null.  Checking and adding the workspace is equally trivial.  Scheduling the
release of wait-set processes is slower though -- a scan through the array for
`(enrolled.count - 1)' non-null workspaces is needed.  Unless there has been
a deluge of termination, barrier communication (and loss) or mass resignation,
that will not be a big penalty.

Processes losing ALT.BARRIERs (through termination, communication, resignation)
must zap their wait-set slots to null.  If this happens a lot, fragmentation
of the wait-set array will occur and compression may be needed (which will need
processes to be informed somehow of their new slot indices?).  Such dynamics
should, again, be very rare compared with ALTs.

The trick is to give each process referencing a barrier its own "shadow" variable
to hold its slot index.  ALT.BARRIERs are now held indirectly.  Each ALT.BARRIER
variable needs two words: one for the reference to the ALT.BARRIER structure and
one for its slot index.  That sounds simple!

Maybe too simple!!  The slot needs setting whenever the reference count for a
barrier increases: PAR enrollment, FORKing, communication and assignment.  The
last three seem straightforward.

PAR enrollment is tricky!!!  The component processes of an enrolling PAR see
the barrier as a global -- the *same* global.  So, where are the individual
"shadow" indices?  Each component needs its own ALT.BARRIER variable.

Here's a suggestion.  Passing an (altable) barrier to a FORKed process is OK:
as the reference count increments, allocate a new slot to the index field of
the ALT.BARRIER argument.  So, restrict enrolling components of PAR to PROC
instances ... and we can do the same!

Such a restriction might have other benefits ... concerning allowing implicit
PAR enrollment ... for which we see other benefits still ... for simplicity
and flexibility.  We'll leave this for now.


Relaxing the Serialisation
--------------------------

There's a nagging worry about the dependency on atomicity of the ENABLE and
DISABLE sequences above.  In the Transputer, these sequences were immune to
interruption by high-priority processes -- indeed, without that, interrupt
response times could be delayed by however long it took those sequences to
complete.  In the initial JCSP implementation, those sequences were protected
by a monitor lock so that only one sequence per ALTing object could take place
at any time.  But that implementation had an race hazard that lay undiscovered
for two years!  The corrected (and, now, verified) implementation removes that
lock and allows those sequences to be interleaved.  So, our dependency here
on no interleaving is a worry.

However, some atomicity does seem necessary for these alting barrier sequences.
If two processes interleave their ENABLE_SYNC sequences (e.g. because of priority
pre-emption or running on an SMP platform), then the first could lower a barrier
`count.down' to one and move on ... while a second reduces that same count to
zero and selects it ... and the first goes on to reduce a different barrier count
to zero and selects that!  Now, there may be a solution, but we would need to do
something different/extra.

Here's something that still requires the ENABLE sequences to be atomic, but
removes the requirement for all the DISABLE sequences to be locked together.
In fact, that property is not enforced by the protocol spelt out by the occam/CSP
"oracle".  The released processes (and others) may be scheduled before the oracle
has restored all its counts.  However, it refuses all further consultations
(i.e. ALT_BARRIER_STARTs) until is has completed this.

All we need is to block new ALT_BARRIER_STARTs if the ENABLE sequence has made
a selection and triggered one or several DISABLE sequences.  That block lasts
until all those DISABLE sequences are complete ... but they no longer need to
be scheduled together.

We need one `alt.barrier.mutex' and `alt.barrier.count'.  The mutex can be an
existing SEMAPHORE type (initialised to one).  For an SMP or multi-core system,
this mutex must be honoured.  The algorithms above are almost unchanged.

The ALT_BARRIER_START is changed to grab the mutex and initialise the count:

  SEQ
    chosen.sync := -1    -- implies a SYNC guard has not (yet) been selected
    CLAIM alt.barrier.mutex
    alt.barrier.count := 1
    ...  ALT_START logic

The `alt.barrier.count' records the number of DISABLE sequences that must be
completed (unless we block at the ALT_WAIT) before releasing the mutex.

The (ENABLE_SYNC b) logic changes only to reset that count if it finds a zero
`count.down' and schedules the processes in the wait-set (all of which have
DISABLE sequences to complete):

	  TRUE              -- b[count.down] = 0
	    SEQ
	      chosen.sync := <this-one>
	      b[count.down] := b[enrolled.count]
	      ...  schedule all processes in wait-set and remove from wait-set
	      alt.barrier.count := b[enrolled.count]
	      ...  return true (`b' is ready to fire)

We need an ALT_BARRIER_WAIT (instead of ALT_WAIT) to release the mutex:

  SEQ
    alt.barrier.count := 0           -- tidy, but don't think this is necessary
    RELEASE alt.barrier.mutex
    ...  ALT_WAIT logic

There is no change for (DISABLE_SYNC b).

ALT_BARRIER_FINISH changes to:

  SEQ
    IF
      chosen.sync >= 0
        chosen.guard := chosen.sync
      TRUE
         SKIP
    alt.barrier.count := alt.barrier.count - 1
    IF
      alt.barrier.count = 0
        RELEASE alt.barrier.mutex
      TRUE
        SKIP

With this, an ALT_BARRIER sequence cannot be interleaved with another ALT_BARRIER
sequence until:

  - either it blocks at its ALT_BARRIER_WAIT;

  - or it selects a non-SYNC guard and completes its DISABLE sequence;

  - or it selects a SYNC guard, schedules all other processes waiting on that
    SYNC guard, completes its DISABLE sequence *and* all the released processes
    get scheduled and complete their DISABLE sequences.

In the case of SMP/multi-core platforms, the DISABLE sequences may be executed
in parallel (taking care when removing processes from a barrier's wait-set).
For uni-processors, they may be scheduled (and interleaved) arbitrarilly.  The
disabling processes are on the run-queue.  Any process attempting a barrier
ALT before they complete will be blocked ... the disabling processes will get
scheduled eventually ... and then the barrier ALT will be scheduled to continue.
There will be no deadlock.  :)


Peter Welch.
(25/11/2005)