Design of the PRIMARY_PARTITION protocol ======================================== Author: Bela Ban Version: $Id: PrimaryPartition.txt,v 1.1 2005/07/21 20:33:01 belaban Exp $ Dawid Kurzyniec wrote: > Bela Ban wrote: > >> I think adding a protocol on top of (or below ?) GMS will work. However, there is the question of how you actually determine the primary and secondary partitions ? >> For example, if we have a switch crash, and all 5 members in the group become singleton groups, then the switch is turned back on, which one >> is the primary partition ? There is no majority. Of course, you simply need to take a deterministic decision, e.g. in this case do a lexical sort and take >> the first (A). Is this what you are thinking of doing ? So B, C, D and E would get an EXIT event, would have to leave and possibly re-join ? This >> would be simple to implement. > > > Yes, this is pretty much what we have in mind. We intend in a conflicting case to find the greatest address of all members from candidate groups, then pick as a surviving group the one where this greatest guy belongs to (or should we take the smallest one? It would be nice to bias towards the group containing the current coordinator; does the coordinator has the smallest or the largest address in its group?) It is a lexical sort, e.g. Address extends Comparable, so we sort and take the first member of the resulting merged group as the coordinator > In fact we figured we don't even need a protocol - handling this in MembershipListener should do the job, I guess?... (We are lazy and we want to deal with JGroups at the highest level possible). I think it should be a protocols, PRIMARY_PARTITION, and can be implemented as follows: * Place it somewhere below GMS, but above MERGE2, It probably needs lossless delivery, so it should be ablove UNICAST and NAKACK as well * Handle the MERGE event on the up() method: o Get the subgroups, e.g. {A,B}, {C}, {D,E} and {F} o Consult a *merge policy*, which determines (given the list of subgroups), the primary partition o If we are the coordinator of the primary partition: + Send an exit message to all other coordinators (hmm, you probably can't do that as you are not a member of the subgroups, so probably we have to handle VIEW(MergeView) rather than the MERGE event + The other coordinators forward the EXIT event to everyone else in their group, so all members leave (and possibly rejoin) the group o Else + Send the EXIT event to everyone in my group + Everyone shuts down and possibly rejoins later The MergePolicy implementation needs to be configurable, so devs can specify their own implementation. We would supply a default impl if not specified. Looks relatively straightforward. The only thing I don't really like is that we have to merge *first* before we send the EXIT message to members of the *previous* subgroups. However, this is probably necessary as we cannot send messages to members *not* in our group