Sophie

Sophie

distrib > Fedora > 13 > i386 > media > os > by-pkgid > 8be2a15ee5eee9f246f70603486aff76 > files > 16

jgroups-manual-2.2.9.2-6.6.fc12.i686.rpm


Design of the PRIMARY_PARTITION protocol
========================================

Author: Bela Ban
Version: $Id: PrimaryPartition.txt,v 1.1 2005/07/21 20:33:01 belaban Exp $


Dawid Kurzyniec wrote:

> Bela Ban wrote:
>
>> I think adding a protocol on top of (or below ?) GMS will work. However, there is the question of how you actually determine the primary and secondary partitions ?
>> For example, if we have a switch crash, and all 5 members in the group become singleton groups, then the switch is turned back on, which one
>> is the primary partition ? There is no majority. Of course, you simply need to take a deterministic decision, e.g. in this case do a lexical sort and take
>> the first (A). Is this what you are thinking of doing ? So B, C, D and E would get an EXIT event, would have to leave and possibly re-join ? This
>> would be simple to implement.
>
>
> Yes, this is pretty much what we have in mind. We intend in a conflicting case to find the greatest address of all members from candidate groups, then pick as a surviving group the one where this greatest guy belongs to (or should we take the smallest one? It would be nice to bias towards the group containing the current coordinator; does the coordinator has the smallest or the largest address in its group?)


It is a lexical sort, e.g. Address extends Comparable, so we sort and take the first member of the resulting merged group as the coordinator

> In fact we figured we don't even need a protocol - handling this in MembershipListener should do the job, I guess?... (We are lazy and we want to deal with JGroups at the highest level possible).


I think it should be a protocols, PRIMARY_PARTITION, and can be implemented as follows:

    * Place it somewhere below GMS, but above MERGE2, It probably needs lossless delivery, so it should be ablove UNICAST and NAKACK as well
    * Handle the MERGE event on the up() method:
          o Get the subgroups, e.g. {A,B}, {C}, {D,E} and {F}
          o Consult a *merge policy*, which determines (given the list of subgroups), the primary partition
          o If we are the coordinator of the primary partition:
                + Send an exit message to all other coordinators (hmm, you probably can't do that as you are not a member of the subgroups, so probably we have to handle VIEW(MergeView) rather than the MERGE event
                + The other coordinators forward the EXIT event to everyone else in their group, so all members leave (and possibly rejoin) the group
          o Else
                + Send the EXIT event to everyone in my group
                + Everyone shuts down and possibly rejoins later

The MergePolicy implementation needs to be configurable, so devs can specify their own implementation. We would supply a default impl if not specified.
Looks relatively straightforward. The only thing I don't really like is that we have to merge *first* before we send the EXIT message to members of the *previous* subgroups.
However, this is probably necessary as we cannot send messages to members *not* in our group