======================== Implementing SYNC Guards ======================== ALTing BARRIERs --------------- The algorithms and data-structures for managing ALTable BARRIERs are more expensive than those for committed barriers. To constrain such costs only to where they are needed, a distinct primitive type is proposed: ALT.BARRIER a, b, c: These barriers have the same syntactic mechanisms as committed BARRIERs - i.e. we can SYNC/ENROLL/RESIGN on them. MOBILE versions are also allowed, with the same rules for mobility (when communicating, assigning and passing to FORKed processes). The additional mechanism is that SYNCing on them can be used as an ALT guard: ALT SYNC a ... response to completing the `a' barrier SYNC b ... response to completing the `b' barrier SYNC c ... response to completing the `c' barrier in ? x ... ordinary guards are allowed as well tim ? AFTER timeout ... ordinary guards are allowed as well No PRI ALTs on SYNCs -------------------- What is not allowed is the use of a SYNC guard in a PRI ALT. This is because different processes may set different priorities for the same guards and these may clash. For example: PRI ALT PRI ALT SYNC a SYNC b ... [in PAR with] ... SYNC b SYNC a ... ... In the above, if the two processes shown were the only ones enrolled on `a' and `b', complete sets of offers are in place for both barriers. The left process should choose `a' and the right `b'. Both processes must choose the same barrier, but their preferences are not resolveable. The above shows a direct conflict between two processes. More subtle conflicts can arise where there are no direct conflicts between any two processes: PRI ALT PRI ALT PRI ALT SYNC a SYNC b SYNC c ... ... ... SYNC b SYNC c SYNC a ... ... ... In the above, each barrier has just two enrolled processes and a complete set of offers. Each process wants to choose a different barrier! Prioritised choice could be allowed if the priorities were set on the barriers, rather than by the separate processes involved in their separate ALTs. Then, a consistent choice could be made (so long as all barriers on offer had distinct priorities). [I think Gerald Hilderink once suggested putting priorities on the channels, rather than processes or ALTs?] Let's leave this for now. Mechanics --------- These are based on those presented by the serial "oracle" protocol given in: mm-sync-oracle-0-1.csp mm-sync-oracle-1-0.occ Each ALT.BARRIER has the same data-structure as our existing BARRIER: RECORD INT reference.count, enrolled.count, count.down: INT queue.front, queue.back: -- "wait-set" management : Initialising and managing the count fields for declaration/construction and ENROLL/RESIGN is exactly the same as for normal BARRIERs. A simple (non-ALTing) SYNC on an ALT.BARRIER is allowed, but needs to be treated as a one-choice ALT. The queue pointers manage a *set* of processes that have made offers to SYNC on this barrier. Because a process may make such offers to many barriers, it may be on many such waiting sets. Therefore, we cannot use the normal run-queue process workspace links for this set (as we do for normal BARRIERs). Above, we assume this wait-set *is* a queue of nodes, where each node has just two words: the workspace of a waiting process and a pointer to the next node. Nodes can be dynamically allocated and de-allocated very rapidly, using the normal mobile-space mechanisms. Note: this waiting set need not be managed as a queue. A dynamically sized array may be more efficient - there are lots of possibilities. See section on optimisation later. An ALT over `n' guarded processes is compiled into the sequence: ALT_BARRIER_START IF (ENABLE_ <0>) JUMP D<0> IF (ENABLE_ <1>) JUMP D<1> ... etc. IF (ENABLE_ ) JUMP D ALT_WAIT D: DISABLE_ (n - 1) ... etc. D<2>: DISABLE_ 1 D<1>: DISABLE_ 0 D<0>: ALT_BARRIER_FINISH JUMP (or some optimisation thereof!). Here is the logic for ALT_BARRIER_START: SEQ chosen.sync := -1 -- implies a SYNC guard has not (yet) been selected ... save current priority (somewhere in workspace) ... pop into "system-high" priority ... ALT_START logic where "system-high" priority is some level higher than anything directly settable by an application process. There are two reasons for this. One is to ensure that this ALT sequence is executed without interruption by anything that might be operating on the same barrier structures. The other is to ensure that, if this process has to block at the ALT_WAIT, it is scheduled by the releasing process to perform its DISABLE sequence (again uninterrupted) *before* any other code -- apart from the DISABLE sequences of other processes released at the same time. The crucial property is that the DISABLE sequence of the releasing process and the DISABLE sequences of *all* the released processes are executed in one uninterrupted sequential block. But see the section on Serialisation below. Maybe process priorities are not the best way to ensure this - they don't work for SMP/mulit-core platforms! So, maybe some kernel lock has to happen? Again, see the section on Serialisation. For a uni-processor platform without priorities, the "crucial property" can just be arranged through careful scheduling in the release code (below). Here is the logic for: ENABLE_SYNC b where `b' is an ALT.BARRIER: IF -- the following test is in case a barrier is used more than once in the ALT ... our workspace is in the wait-set for `b' (e.g. on front of wait-queue) ... return FALSE (`b' is not ready to fire) TRUE SEQ b[count.down] := b[count.down] - 1 IF b[count.down] > 0 SEQ ... add our workspace to the wait-set (with D as resume address) ... return FALSE (`b' is not ready to fire) TRUE SEQ chosen.sync := b[count.down] := b[enrolled.count] ... schedule all processes in wait-set and remove from wait-set ... return true (`b' is ready to fire) Here is the logic for: DISABLE_SYNC b where `b' is an ALT.BARRIER: IF b[count.down] = b[enrolled.count] chosen.sync := -- at most, one will be chosen by this TRUE ... remove our workspace from the wait-set Note that ENABLE_SYNC/DISABLE_SYNC may set `chosen.sync', while any of the other ENABLE/DISABLE varieties (channels and timers) set `chosen.guard'. If a SYNC is a completed guard (and at most one can be), it takes precedence over any other ready guards. This a legal refinement of the non-deterministic choice defined for the semantics of the (un-prioritised) ALT. This rule ensures that all other processes offering the multiway synchronisation make the *same* choice. The rule is implemented by ALT_BARRIER_FINISH: IF chosen.sync >= 0 chosen.guard := chosen.sync TRUE SKIP Doing it this way means no change to the existing ENABLE/DISABLE channels and timers (and zero impact on their performance). Optimising the Mechanics ------------------------ Most of the above logic is unit-time and trivial. The exception concerns some management of the wait-set. ------------------------ The simple implementation of the wait-set as a linked list of nodes holding workspace pointers is not too bad. Generating a new node and pushing it on to the list is (small) unit time. Before doing that, we must check to see if our workspace is already there. Again, that requires no search since, if it were present, it would be at the front. Scheduling all processes from the wait-set is not unit-time (unlike doing the same from a non-altable barrier), but only adds a small extra to the per-process cost of doing the ALT. Removing a workspace from a wait-set, which has to be done by each DISABLE_SYNC, requires a search. Optimising that away would be good. ------------------------ Note: in the occam model of this protocol (file: mm-sync-oracle-0-0.csp), we had constant enrolled counts. Even better, the ALTing processes were represented by `id' numbers that never changed. So, we just kept a fixed set of enrolled processes for each barrier and the ALT logic only needed count-downs (and no wait-sets). We can't get away with that here (?) since the workspace pointer of a serial process changes (as procedures are entered/exited) and we need the current one when a process offers a barrier. ------------------------ There is one wait-set for each (altable) barrier. It is empty if no process is offering to synchronise. Its maximum size is dynamic but always known - `(b[enrolled.count] - 1)', for barrier `b'. Idea: allocate a dynamic array with size large enough to hold a full wait-set. The next power-of-two larger (or equal to) the `reference.count' will be more than enough. I've chosen `reference.count' rather than `enrolled.count' since it is slightly less volatile. This array holds either nulls or workspace pointers for SYNC-offering processes, filling up from index zero. At the start of an ENABLE_SYNC, the index of the next free slot (for adding a new workspace) is `enrolled.count - count.down'. So, checking and adding a new workspace is fast. Scheduling wait-set processes is also fast. DISABLE_SYNC still needs a search. What has been saved is the generation of a new node by a failing ENABLE_SYNC (which is most of them) and the de-allocation of all wait-set nodes by a succeeding ENABLE_SYNC and the de-allocation of a node by each DISABLE_SYNC. That's probably worth saving. This is not completely free. If the `reference.count' for a barrier grows larger than that power-of-two size of the array, a new array (of sufficient size) must be allocated, and the old one copied over and de-allocated. This must be checked every time that `reference.count' grows: PAR enrollment, FORKing, communication and assignment. But such events are very rare compared with the executions of ALTs (and associated ENABLE_SYNCs/DISABLE_SYNCs). ------------------------ Can we remove the DISABLE_SYNC search? All processes referencing a barrier hold, obviously, the same reference. If they could also hold an *individual* index that gave them *their* slot in the wait-set array held by the barrier, removing their workspace from the wait-set no longer needs a search: just zap the slot to null. Checking and adding the workspace is equally trivial. Scheduling the release of wait-set processes is slower though -- a scan through the array for `(enrolled.count - 1)' non-null workspaces is needed. Unless there has been a deluge of termination, barrier communication (and loss) or mass resignation, that will not be a big penalty. Processes losing ALT.BARRIERs (through termination, communication, resignation) must zap their wait-set slots to null. If this happens a lot, fragmentation of the wait-set array will occur and compression may be needed (which will need processes to be informed somehow of their new slot indices?). Such dynamics should, again, be very rare compared with ALTs. The trick is to give each process referencing a barrier its own "shadow" variable to hold its slot index. ALT.BARRIERs are now held indirectly. Each ALT.BARRIER variable needs two words: one for the reference to the ALT.BARRIER structure and one for its slot index. That sounds simple! Maybe too simple!! The slot needs setting whenever the reference count for a barrier increases: PAR enrollment, FORKing, communication and assignment. The last three seem straightforward. PAR enrollment is tricky!!! The component processes of an enrolling PAR see the barrier as a global -- the *same* global. So, where are the individual "shadow" indices? Each component needs its own ALT.BARRIER variable. Here's a suggestion. Passing an (altable) barrier to a FORKed process is OK: as the reference count increments, allocate a new slot to the index field of the ALT.BARRIER argument. So, restrict enrolling components of PAR to PROC instances ... and we can do the same! Such a restriction might have other benefits ... concerning allowing implicit PAR enrollment ... for which we see other benefits still ... for simplicity and flexibility. We'll leave this for now. Relaxing the Serialisation -------------------------- There's a nagging worry about the dependency on atomicity of the ENABLE and DISABLE sequences above. In the Transputer, these sequences were immune to interruption by high-priority processes -- indeed, without that, interrupt response times could be delayed by however long it took those sequences to complete. In the initial JCSP implementation, those sequences were protected by a monitor lock so that only one sequence per ALTing object could take place at any time. But that implementation had an race hazard that lay undiscovered for two years! The corrected (and, now, verified) implementation removes that lock and allows those sequences to be interleaved. So, our dependency here on no interleaving is a worry. However, some atomicity does seem necessary for these alting barrier sequences. If two processes interleave their ENABLE_SYNC sequences (e.g. because of priority pre-emption or running on an SMP platform), then the first could lower a barrier `count.down' to one and move on ... while a second reduces that same count to zero and selects it ... and the first goes on to reduce a different barrier count to zero and selects that! Now, there may be a solution, but we would need to do something different/extra. Here's something that still requires the ENABLE sequences to be atomic, but removes the requirement for all the DISABLE sequences to be locked together. In fact, that property is not enforced by the protocol spelt out by the occam/CSP "oracle". The released processes (and others) may be scheduled before the oracle has restored all its counts. However, it refuses all further consultations (i.e. ALT_BARRIER_STARTs) until is has completed this. All we need is to block new ALT_BARRIER_STARTs if the ENABLE sequence has made a selection and triggered one or several DISABLE sequences. That block lasts until all those DISABLE sequences are complete ... but they no longer need to be scheduled together. We need one `alt.barrier.mutex' and `alt.barrier.count'. The mutex can be an existing SEMAPHORE type (initialised to one). For an SMP or multi-core system, this mutex must be honoured. The algorithms above are almost unchanged. The ALT_BARRIER_START is changed to grab the mutex and initialise the count: SEQ chosen.sync := -1 -- implies a SYNC guard has not (yet) been selected CLAIM alt.barrier.mutex alt.barrier.count := 1 ... ALT_START logic The `alt.barrier.count' records the number of DISABLE sequences that must be completed (unless we block at the ALT_WAIT) before releasing the mutex. The (ENABLE_SYNC b) logic changes only to reset that count if it finds a zero `count.down' and schedules the processes in the wait-set (all of which have DISABLE sequences to complete): TRUE -- b[count.down] = 0 SEQ chosen.sync := b[count.down] := b[enrolled.count] ... schedule all processes in wait-set and remove from wait-set alt.barrier.count := b[enrolled.count] ... return true (`b' is ready to fire) We need an ALT_BARRIER_WAIT (instead of ALT_WAIT) to release the mutex: SEQ alt.barrier.count := 0 -- tidy, but don't think this is necessary RELEASE alt.barrier.mutex ... ALT_WAIT logic There is no change for (DISABLE_SYNC b). ALT_BARRIER_FINISH changes to: SEQ IF chosen.sync >= 0 chosen.guard := chosen.sync TRUE SKIP alt.barrier.count := alt.barrier.count - 1 IF alt.barrier.count = 0 RELEASE alt.barrier.mutex TRUE SKIP With this, an ALT_BARRIER sequence cannot be interleaved with another ALT_BARRIER sequence until: - either it blocks at its ALT_BARRIER_WAIT; - or it selects a non-SYNC guard and completes its DISABLE sequence; - or it selects a SYNC guard, schedules all other processes waiting on that SYNC guard, completes its DISABLE sequence *and* all the released processes get scheduled and complete their DISABLE sequences. In the case of SMP/multi-core platforms, the DISABLE sequences may be executed in parallel (taking care when removing processes from a barrier's wait-set). For uni-processors, they may be scheduled (and interleaved) arbitrarilly. The disabling processes are on the run-queue. Any process attempting a barrier ALT before they complete will be blocked ... the disabling processes will get scheduled eventually ... and then the barrier ALT will be scheduled to continue. There will be no deadlock. :) Peter Welch. (25/11/2005)