$Id: todo.lst,v 1.22 2005/06/06 07:47:57 belaban Exp $ Todo List --------- - Make TUNNEL/GossipRouter use Streamable, plus GossipRouter should use a ConnectionTable - FC: on a fast network, we can still run out of memory because received_msgs/delivered_msgs becomes very big. Instead of basing FC on the speed of the network, we should also take the number of messages processed (a.k.a delivered_msgs/received_msgs) into account, and only send credits when that number falls below a certain threshold. First: look at the number of received/delivered msgs in the perf test ! --> Look into memory-based FC: criteria for sending credits are - available free memory - outstanding retransmission requests (sender sends with each message number of messages sent, receiver keeps track of how many messages received, if diff is > threshold --> pause) - [optional] latency - Performance tests: - Fix TcpTransport: doesn't currently work (connection exception) : make TcpTransport runnable on single machine (define different ports) - Test m-m mode (everyone sends to everyone) with 4 nodes - Expose statistics (e.g. retransmission counter, flow control stats), maybe use AOP to do so - Create common TransportProtocol (subclass of Protocol): handles all message bundling, loopback etc, real transports (UDP, TCP, TUNNEL) have to extend this class and provide a send() and receive() method. Avoids code duplication - Add list of joined, crashed and left members to View - Support for IPv6 addresses - Convert TUNNEL, TCP, GossipRouter to Streamable - Logical IpAddress: use any number of NICs, and fail over between NICs, but always have the same logical address - TCPPING: make generic, e.g. STATIC_PING. Can apply to UDP transport as well. - Add pinging over socket established by FD_SOCK - RpcDispatcherSpeedTest is very slow when using tcp.xml - BSH: use CONFIG event from transport (UDP) to Channel and back for gathering of diagnostics information. - org.javagroups.tests.Bsh: use Channel with only unicast properties to connect to remote member(s). Advantage is that we can include fragmentation. todo: add connect(Address) to Channel. - SMACK: introduce negative acks (currently positive acks). This should be configurable. - TOTAL protocol: when B mcasts a message M, it will be unicast to coord A first, who then mcasts it on behalf of B. However, when A crashes before mcasting M, B will wait forever for responses (if unicast responses are expected). However, if we can request retransmissions from *any* member, not just the original sender of the message, M can be retransmitted to the members who did not receive the message. - Replace List/Queue/Stack with equivalent classes in java.util or Trove (trove4j.sf.net). Use concurrent.jar - ThreadPool: should release unused threads after some time. --> Replace ThreadPool with Executor in concurrent.util - Modify all building blocks to run on top of PullPushAdapter. Multiple building blocks should be able to run on the same PPA. - Correct implementation of merging protocol for org.javagroups.protocols.GMS: current merge protocol is simplistic and doesn't preserve vsync properties (bugid=556850) - Unification of ./protocols/pbcast and ./protocols directories: merge same functionality where possible or abstract out common functionality and create shared classes for these. There is currently a lot of redundant code in both directories. Also, there is the danger of fixing bugs in one branch, but not the other. Starting point is pbcast.GMS: to add vsync, do the following: - Flush protocol before installing view (ack-based) - Discard messages with view different from current view - LinkingPing Channel (LP) - Channel which acts as a 'linking pin' between 1 parent group and one or more subgroups. The LP always joins the parent channel and all child channels and forwards messages between the parent and all child groups, and vice versa (ie. acting as a router). - Provision of a mechanism to join members behind a firewall to a regular (e.g. UDP-based) group. Requires a new MASQ protocol, which hides the real addresses of members behind the firewall and provides the address of the LP instead. Upon reception of such a message, LP will rerieve the real address and forward the message to the correct member. - Multiplexer building block. Allows multiple building blocks on top of the same channel (could extend PullPushAdapter) - Make building blocks work over the same protocol stack (Channel). E.g. register building blocks with PullPushAdapter, mux/demux messages. - Documentation (DocBook ?) - pbcast.STATE_TRANSFER: allow multiple simultaneous state transfers to take place (use state transfer IDs) - pbcast.STATE_TRANSFER: allow for multiple chunks of state data due to large states - Debugging tool: shows layers with number of messages in each layer. Instrumentation of protocol layer should be as non-instrusive as possible (no penality for not running the stack in debugging mode). Possibility of attaching to running (possibly remote) stacks. Filter views: from only number of messages down to single messages (events) and their contents. Possibility to step (accept a message), drop, inject, modify messages. Possibility to see only certain types of messages. Assign different colors to different kinds of message to easily follow them through the system. - Configurator for protocol stacks: graphical interface, user chooses set of QoS (e.g. reliable transmission, fragmentation, TCP etc). Configurator does dependency checking, asks for parameters (e.g. UDP.mcast_addr etc). Output is a valid properties string. - TimedWriter: replace own thread with ReusableThread - Multiple Routers communicating with IP MCAST; clients connect via TCP. Tunnels through links that don't support IPMCAST. - Replace TimedWriter with TimedExecutor; more generic API for timed executions. Current implementation is hacked for writing to a file and creating a socket. - Adapter classes making use of Channel assume the channel is already connected when starting. However, if this is not the case, they should automatically connect instead of throwing an exception. Maybe this can be integrated into EnsChannel directly: whenever there is a Send() or Cast(), and the channel is not connected, do a Connect(). - (Ken) Allow objects outside the group to multicast messages to the group SharedSocket (CLIENTSERVER protocol): the coordinator of a group maintains a socket (UDP or TCP) and allows clients to send/mcast messages and read messages sent to the group via the socket. This way, clients do not have to join the group. - Distributed Shell: a number of xterm windows, each on a different machine. The screen is split: the upper half is for commands that are bcast to all members, executed there and the output sent back to the originator (output from each host in a different color). The lower half is local to each host: commands typed in there will be executed on that host and the result sent back to the originator. Done ---- - Add exception chaining (only available in JDK 1.4) - NAKACK: send xmits also to other members if no response from original sender (e.g. because it crashed). All members therefore have to respond to an xmit request by not only searching the sent-table, but also the received-table. Requires that members store *received* messages as well, not just *sent* messages. - Send xmit request to (a) sender (default) (b) random member (c) all members using multicast. Staggered replies, when 1 member sends response, other members suppress reply - 2.2.8: UDP/TCP doesn't work with 127.0.0.1 (works with 2.2.7) - April 2005 (bela) Retain only n messages (bounded storage) in NAKACK/NakReceiverWindow rather than unlimited. Discard newest messages when buffer is full - April 2005 (bela) TCP: instead of sending to each receiver in turn, use 1 queue/thread per receiver. This prevents the server from hanging of a receiver hangs - March 2005 (bela) FD: add similar mechanism as for FD_SOCK to accommodate for lost SUSPECT broadcasts - FD_SOCK: use membership to establish logical failure detection ring, rather than multicast - Feb 2005 (bela) NAKACK/NakReceiverWindow: - NakReceiverWindow.Retransmitter.remove(): linear search for removable element plus linear removal: 2 linear access patterns ! Replace with log(n) access pattern, e.g. TreeMap - updateHighestSeqno(): linear search - Feb 2005 (bela) NakReceiverWindow: make storing of *delivered* (delivered_msgs) messages optional. The pbcast version of NAKACK for example requests retransmission only from the sender directly, therefore we don't need the delivered messages. - Feb 2005 (bela) UDP should bind to *all* available network interfaces for receiving messages (JDK 1.4 specific) - Oct 6 2004 (bela) FD_SOCK problem on multi-homed systems - April 2004 (bela) UDP and bundling: error "ERROR 21:15:21,660 [UDP.IncomingPacketHandler thread] - message does not have a UDP header". Only occurs with fc.xml and bundling enabled. - 2003 (bela) ConnectionPool: garbage collect sockets over which no activity occurred for n minutes - Jan/Feb 2004 (bela) UDP: currently there are 3 sockets: 1 mcast in/out, 1 ucast in and 1 ucast out. Combine them into 1 or 2 sockets (e.g. 1 mcast in/out, 1 ucast in/out). The reason for this was that under Linux, because sockets were not BSD-compatible, a ucast send socket has to be closed and re-opened after every send ! This error may have been fixed in the meantime. --> Implemented in UDP and UDP1_4 protocols - 2003 (bela) Add reliable unicast communication, e.g. server channel and client channel. Add Channel.connect(Address) (overloading Channel.connect(String)) - 2003 (bela) MERGE: as an alternative to periodically mcasting merge packets, we could simply wait for mcasts from a member that has the same group name but is not in the membership. In this case we'd attempt a merge. --> Done (MERGEFAST protocol) - 2003 (bela) Peer-To-Peer protocols: - Simpler GMS, where joiners can ask anyone in the group to join it. That member will add the new member and mcast a ViewChange message. Receivers of the ViewChange message make the new membership the union of the existing mbrship, plus the newly joined members, minus the removed members. Membership may not be exactly the same on all members, but will converge over time. The advantage of this scheme is that the botleneck of the single coordinator is eliminated. - Simpler mcast retransmission protocol (SACK=Simple ACK). Ack-based scheme, mcasts with seqnos are sent to the group, acks are expected from each member. Outstanding acks from (potentially) crashed members can be reset by either (a) suspect messages, (b) view changes, or (c) by verifying whether the outstanding member is dead. --> done, added SMACK protocol - 2002/2003 (bela) Look into whether NAKACK/UNICAST/STABLE need to store copies of messages, or whether references are okay. Now that we use hashmaps instead of headers, this might be fine. (Immutable messages) - 2003 (bela) IpAddress: mainatain own cache to prevent unnecessary DNS lookups IpAddressFactory: too many instances of the same IpAddress In the future, we may remove this because InetAddress has its own internal cache as well. - Jan 2003 (bela) RpcDispatcher/MessageDispatcher: ship destination list with call and discard msg if local address is not part of destination list. - Dec 10 2002 (bela) Protocol/stack initialization: init(), start(), stop(), destroy() - Nov/Dec 2002 (akbollu) FLOW control protocol - Aug 21 2002 (bela) GMS-less reliable message transmission (see protocols/DESIGN) - SMACK and FD_SIMPLE protocols (see smack.xml for a sample protocol spec) - Aug 2002 (bela) Maintain bounded queue of suspects in GroupRequest - Aug 2002 (bela) Test program that tests whether IP multicast can be used - McastReceiverTest and McastSenderTest - June 1 2002 (bela) UNICAST: - UnicastTest has too many retransmissions for 10000 messages even with -loopback ! - Exponential backoff for message retransmission (similar to NakReceiverWindow) - variable retransmission configuration (currently only 1 value (2 secs)) - April 5 2002 (bela) Bug in NAKACK (change to hashmap-based headers): WRAPPED_MSG headers don't work because 2 NakAckHeaders are added, which doesn't work in the hashmap-based case --> Added header for WRAPPED_MSG under a different key ("NAKACK.WRAPPED_HDR") - March 2002 (fhanik) Configuration of protocol stack specified using a configuration file - XML as config file format - March 31 2002 (bela) Message: hashtable for headers. This allows direct access to protocol-owned header. Also prevents removing the wrong header. - pbcast.NAKACK: retransmit in bundles (new feature). Because we xmit multiple messages in one large message, the xmit message might become too big. The assumption was that there would be a FRAG layer below this protocol, however that is not the case in the default setup March 20 2002 (bela) - Nov 1 2001 (bela) Convert GUI demos to Swing for JDK >= 1.3 --> Preliminary port, more work needs to be done - Oct 25 2001 (bela) Revert to Thread.interrupt() bug in Linux JDK 1.3.1 (FD_SOCK) once we use JDK 1.4. SUN bug id=4514257 (new bug) --> Using socket.close instead of Thread.interrupt(), more portable - Oct 24 2001 (bela) The Draw demo triggers a lot of retransmissions if drawing extensively. This causes the drawing to block for short intervals. --> Was caused by mcast_{send,recv}_buf_size being too small (8k). If we assume that a message is ca. 0.5k, then 16 messages would already overflow the send buffer size. Therefore the size was increased, which causes fewer message to be dropped, hence fewer retransmissions. Note that the behavior was always correct; ie. message were indeed retransmitted (no msgs lost). See "Setting of recv/send buffer sizes in UDP (FRAG problem)" in JavaStack/Protocols/DESIGN for details. - Oct 2001 (bela) pbcast.GMS.InstallView(): if member is not in view, generate an EXIT event. However, for new joiners, this might be wrong. Either remove or correct -> CheckSelfInclusion() might be wrong - Oct 17 2001 (bela) Start 3 members (using FD_SOCK): kill 3rd: socket connection kill is not noticed under Linux: membership is 3 instead of 2. When other members leave, membership will be correct --> Fixed with FD_SOCK fix (see history for version 1.0) - Oct 12 2001 (bela) Retransmissions: [ERROR] NAKACK.Up(): XMIT_REQ: range of xmit msgs is null --> was caused by a bug in the externalization of pbcast/NakAckHeader (was correct before) - Oct 10 2001 (bela) Problem with Thread.interrupt on Linux/JDK1.3.1: thread waiting on input (e.g. System.in.read() do *not* get interrupted !) --> Fixed. See history.txt - Oct 10 2001 Check whether is_linux quirk is sill needed with Linux JDK 1.3 (UDP.java) --> Removed - May 15 2001 (bela) readFully(): don't allocate a byte buffer for every message, just allocate buffer once and only increase size if too small. (ConnectionPool and Link) - May 15 2001 (bela) Test UNICAST layer: with 15 members there are retransmissions - Rewrite the retransmission thread (user Timer) --> Caused by small UDP send and receive buffers (dropped large messages which were fragmented, retransmission was okay). See JavaGroups/JavaStack/Protocols/DESIGN for details. - May 11 2001 (bela) Change FD.java to use TimeScheduler - May 10 2001 (bela) IpAddress: printing of hostnames in dotted-decimal notation is wrong (e.g. 228.1.2.3 is printed as 228) - May 9 2001 (bela) Test new AckMcastSenderWindow and NakReceiverWindow classes (submitted by John) --> Integrated into JavaGroups - May 8 2001 (bela) Use ProtocolStack.timer (Timer) for recurring non time-critical tasks (saves some threads) - May 8 2001 (bela) Replace SortedList with SortedSet - May 4 2001 (bela) Trace: set flag 'trace' in Trace and initialize via Trace.init() - May 2 2001 (bela) FragTest doesn't work any more: UDP.SendUdpMessage(): java.lang.IOException: message too long --> see ./JavaStack/Protocols/DESIGN (frag problem) for details - April 3 2001 (bela) Fix "last msg lost" feature in pbcast/NAKACK (possibly same solution as for regular NAKACK) --> See pbcast/DESIGN for details - April 2001 (bela) Optimizations (OptimizeIt / JPprobe). Remove some of the down threads - March 30 2001 (bela) Replace all System.err/out() with Trace.print() --> Replaced most (still have some left to do, but not important ones) - March 9 2001 (bela) Property file for tracing: defines all modules/methods to be traced plus the desired tracing level. Will be read by JChannel before starting. The location of the property file needs to be defined via a -Dproperty_file=<location> option at startup. If the file cannot be found, no tracing will be enabled (other than the tracing set programmatically). - March 9 2001 (bela) PBCAST.java: dynamically adjust the 'subset' and 'gossip_interval' variables. E.g. when the group size increases, decrease the variables, and increase them when it decreases - March 2001 (bela) UDP: specify IP address to bind to - March 2001 (jmenard) Tracing for inner classes, e.g. Trace.println("FD.PingerThread.run()", ...) - March 2001 (bela) outOfMemory error when sending 'wrong' messages to a JavaGroups process. Wrong messages could e.g. be a telnet connection. - March 2 2001 (bela) FD: suspect member P, followed by UNSUSPECT P. This has to result in P being removed from suspected_mbrs in ParticipantGmsImpl --> Done for ./pbcast/GMS - Feb 15 2001 (bela) PBCAST: bounded buffer. Discard PBCAST-related messages when the buffer is full. This prevents flooding of buffers when subset or pbcast_interval are chosen too high. - Feb 15 2001 (bela) PBCAST: singleton members never garbage-collect their messages - Feb 2001 (i-scream) Gianluca Collot: disconnecting a channel is not very clear (QueueClosed Exceptions,GMS implementation dont switch to Client ...) so reconnecting a disconnected channel is inpossible. - Feb 15 2001 (bela) Double-check whether suspected members are really dead (e.g. in GMS, before bcasting new view) --> VERIFY_SUSPECT - Feb 13 2001 (bela) FD: create A, B, C, D. Kill A, B and D simultaneously. C will not become new coordinator - Feb 12 2001 (bela) Header as a typed class - Feb 12 2001 (bela) Address as a typed class - Feb 13 2001 (jmenard) Tracing module to enable/disable certain error messages (similar to syslog). - Feb 9 2001 (bela) Gialuca Collot: replace Objects as addresses with typed equivalent (e.g. Address). Subclasses are IpAddress, ATMAdress etc. - Feb 7 2001 (bela) FD: if a member is suspected and subsequently receives a view in which it is not a member, currently that member remains in the group. However, it should leave the group and possibly rejoin it (EXIT event). --> Implemented in FD_SHUN - Feb 2001 (jmenard) Use javac instead of jikes in Makefile: use jikes when available, else javac --> Added configure program - Feb 7 2001 (bela) Find out why UNICAST over TCP does not work --> Problem was that UNICAST did not remove members once they were excluded from the group. Usually, this does not matter as new members in UDP have different addresses (different ports). However, in TCP members may have the same port, therefore the same address. When a connection from such a member was received, the connection table returned the entry for the old (stale) member, which of course contained wrong seqnos. Therefore, messages accumulate in UNICAST's up_queue and not be propagated up by the AckReceiverWindow. - Feb 7 2001 (john georgiadis) TOTAL protocol - Feb 5 2001 (bela) PERF protocol: adds header with identity/seqno and removes at receiver. Logs time for each message. Can also be used to measure time for an event in the same protocol stack. - Jan 22 2000 (bela) Problem with TCP/ConnectionPool in PBCAST: start A, start B, kill B, restart B: B times out attempting to reconnect --> Due to UNICAST problem (see UNICAST problem). Removed UNICAST protocol (not needed over TCP anyway) from demo program - Jan 22 2000 (bela) TCP/UDP/TUNNEL: local messages should be sent back up the stack immediately --> Done only for TCP - Jan 22 2000 (bela) UDP: receive(packet): each time a new packet of 65000 bytes is allocated; however, we can reuse the buffer by doing the following: for(int i=0; i < buf.length; i++) buf[i]=0; // clear the buffer packet.setLength(buf.length); sock.receive(packet); - Dec 10 2000 (bela) MessageDispatcherTest: if using 100ms instead of 2000, the app crashes: problem with ReusableThread / Scheduler ? - use no Sleep(): app hangs ! - MessageDispatcherTest / RpcDispatcherTest: hangs when closing channel (number of threads still running) --> Due to bug in ReusableThread, AckMcastSenderWindow - Dec 11 2000 (bela) NotificationDemo: NotificationBus.Stop() does not release all threads --> Due to bug in ReusableThread - Dec 10 2000 (bela) Replace all occurrences of Thread.stop() with the recommended way of stopping threads. Also replace suspend()/resume(). to be modified: - Scheduler.java - AckMcastSenderWindow.java - AckSenderWindow.java - EnsChannel.java - NakReceiverWindow.java - PullPushAdapter.java - JavaStack/Protocol.java - JavaStack/Protocols/FD.java - JavaStack/Protocols/FD_RAND.java - JavaStack/Protocols/GMS.java - JavaStack/Protocols/MERGE.java - JavaStack/Protocols/PIGGYBACK.java - JavaStack/Protocols/STABLE.java - JavaStack/Protocols/TUNNEL.java - JavaStack/Protocols/UDP.java - Nov 29 2000 (bela) STATE_TRANSFER for PBCAST - Nov 29 2000 (bela) Make timeouts for UNICAST message retransmission user-configurable - Nov 26 2000 (bela) XMIT_REQs in PBCAST currently ask for more messages than is actually necessary. Correct so that only the actual missing messsages are requested for retransmission. E.g. my digest for A is +2 -3 +4 +5 +6. If I receive a digest with +5 as the highest seqno for A, then the XMIT_REQ will be [3-6] rather than [3-3]. - Nov 25 2000 (bela) PBCAST.SetDigest(): do we really need to explicitely set the initial digest, or could we just create a new NakReceiverWindow with whatever seqno is sent ? Problem: if P:16 is received first by the new member S, but P:15 was dropped, then we would lose 1 message. I think it would be best to leave it as it is and always set the digest we got from the coordinator as result of the Join() call. --> Currently, setting initial digests is disabled. We just take the *first* seqno sent to us by a new member to be its *initial* seqno (which may or may not be true) --> Therefore, we may lose some message --> However, as long as gossiping is not implemented (to retransmit those messages), we won't change anything --> Currently, I prefer not to block because some member is missing a message, but won't get it since retransmission (gossiping) is not yet implemented --> When gossiping works, remove the comments and put setting initial digests back in --> Therefore, the client currently does not set its digest received as a result of the Join request. This has to be uncommented once gossip is available ! ==> see ./JavaStack/Protocols/pbcast/DESIGN for an explanantion of the solution to this issue - Nov 19 2000 (bela) Draw2Channels.java: 2 channels with the same properties don't work --> Fixed by making the GMS non-singleton - Nov 8 2000 (bela) GMS: CreateInstance() creates a singleton. This prevents multiple channels in the same JVM (GMS cannot be instantiated multiple times) --> Removed singleton - Aug 23 2000 (bela) MessageDispatcher.CastMessage(GET_ALL): if local delivery is turned off, this will hang waiting for the message from the local channel. Solution: CastMessage() needs to check whether local delivery is enabled or not in the channel to determine whether to wait for the local msg or not. Workaround: the first parameter to CastMessage() takes the members to which the message is to be sent; set it explicitely. - July 12 2000 (bela) Bug: DistributedHashtable demo hangs when MembershipListener is set in RpcDispatcher(). This is inside DistributedHashtable.java --> This was due to a bug in the FLUSH protocols: when a client joined, it received FLUSH messages as well (although not yet a server). This blocked the whole process (STOP_QUEUING). Now FLUSH.HandleFlush() contains the set of processes that should be flushed. If the current member is not in it, it will simply discard the message. This bug did not show up when using TCP because there, the FLUSH message is really only sent to the current membership - July 7 2000 (bela) Link: look at problem of 'wrong' peer addresses; these will be rejected. What happens when a connection request is rejected ? --> Peer will be rejected and tries anew (until NIC is okay again) - July 7 2000 (bela) Link: look at CTRL-Z cases. Socket connections can still be made to a process which is CTRL-Z'ed (this is normal). Therefore, the ConnectionEstablisher will think it has re=established connection, while the heartbeat will later fail again. - July 3 2000 (bela) Bug: TCP between habutterfly and dragonfly: members don't detect each other. Interfaces hme0 are better than hme1 (slower). Timing problem (set timeout differently) ? --> It was a simple timeout problem. Increasing the timeouts in TCPPING, FD and GMS helped. Traffic going across hme0 is very slow (routed via Belfast). - June 30 2000 (bela) TCP: specify IP address to bind to - June 27 2000 (bela) Fix memory bug (FD, STABLE ?). Occurs only with TCP (ConnectionPool) as bottom protocol. --> The problem was in ConnectionPool: socket.GetOutputStream().writeObject() and socket.getInputStream().readObject() wrote/read a whole graph of objects (Message plus all referring objects). Now we just send/receive byte buffers. - June 21 2000 (bela) Removed UNICAST protocol in stack (still have to find out what the problem is with UNICAST). But it is not needed over TCP anyway (neither is FRAG). Bug: UNICAST does not work with TCP as bottom protocol Scenario: start P, start Q, kill Q, start Q: UNICAST does not pass up the HandleJoin() msg to the GMS - June 19 2000 (bela) ConnectionPool / TCP: outgoing connections were only removed when a member failed (SUSPECT). However, they weren't removed when a member left regularly. Therefore incarnations of a member on the same port used the old socket (which failed). Change involved adjusting the outgoing connection table in ConnectionPool when TCP receives a VIEW_CHANGE: remove connections to processes that are not members any more. - June 19 2000 (bela) Bug fixed: when only the coordinator is left, a leave hangs until the leave_timeout has expired. This was due to deadlock waiting for the same mutex (leave_mutex) in CoordGmsImpl. - June 18 2000 (bela) Fixed bug in join/leave protocol (GMS). This bug only showed when using TCP instead of UDP/IPMCAST. The leaving member would not get his 'last view' because the TMP_VIEW was not set correctly before mcasting the view. - Dec 14 1999 (bba) Put JavaGroups on Gamelan - Dec 1 1999 (bba) Modify view change protocol (GMS): - When a HandleViewChange() message is sent after the FLUSH protocol, members that receive it install a new view, thus resetting message retransmission. This means, that also the coordinator resets its retransmission table, resulting in the following problem: when a member on a lossy link does not receive the new view, it would usually be retransmitted until he receives it. But since the coordinator installs a new view, message retransmission is stopped, and that member may never receive the new view. Therefore we have to make sure that all non-faulty members have received the new view before resetting retransmission ! - Solution: see ./design/ViewChangeRetransmission.txt - Nov 30 1999 (bba) Complete FLUSH protocol: Resending of outstanding messages has to be modified: we have to wait until all ACKs from all members for all outstanding messages have been received (for REBROADCAST). Otherwise, since we reset message retransmission upon a new view, slow members (or members on a lossy link) may never receive outstanding messages due to stopped retransmission. - Nov 99 (bba) Add objects, ints etc to Message (not as Header, but to message itself !) - Nov 99 (bba) Modify layer FD to *not* send out are-you-alive messages to a member P while other (e.g. data) messages from P are received. Further improvement: use gossip-like failure detection as described in the "GSGC: Efficient Gossip-Style GC Scheme" paper - Nov 99 (bba) Paper on implementation of RpcGMS (synchr. group calls plus state pattern) - Nov 99 (bba) Phase out MessageCorrelator - Nov 99 (bba) Remove classes not needed any longer !!! - Nov 99 (bba) Remove IDs for Messages. IDs are added by MNAK/NAK layers - June 9 (bba) Paper on RpcProtocol - May 11 (bba) Paper on state transfer - April 29, 1999 (bba) Synchronous Group RPC (GRPC) and deadlocks: investigate use of concurrency to eliminate the problem while preserving ordering properties (- Implementation of priority scheduler) - April 13, 1999 (bba) Each protocol has certain prerequisites; GMS for example needs PING and FD to be present somewhere below it. Add a sanity check mechanism to the protocol stack (configurator) that allows each protocol to abort stack creation if it cannot find the protocol layers it requires. - April 1 (bba): Paper on protocol stack (how to write your own layer) --> user's guide - Feb 23, 1999 (bba): Wrong parameters in any protocol should cause error messages Null protocol will be created, which causes later abort - Jan 13, 1999 (bba): Change Channel according to Channel interface document - Provide unicast point-to-point channel: Connect(new Address("janet", 3456)) Implemented and then dropped again ! Connection-less group mcast and connection-oriented ucast don't mix very well in the same model. That's why there are DatagramSockets and MulticastSockets... - Dec 98 (bba): IP multicast: problems with GMS (PingMembers) Fixed - Dec 98 (bba): GossipServer and GMS: membership is not correct - Dec 11, 1998 (bba) Added multicast ack (MACK) layer - Aug 19, 1998 (bba) Dispatcher: use a factory to create a channel instead of creating an EnsChannel directly (no hardcoding) -- Introduced ChannelFactory, EnsChannelFactory and JChannelFactory - Jun 18, 1998 (bba) MessageCorrelator: stable messages should be purged periodically -- Changes in ./channel/MessageCorrelator.java - Jun 19, 1998 (bba) Create 1 outboard process per Hot_Ensemble instance. Currently, ejava implements 1 outboard process per Java VM -- Changes in ./Ensemble/Hot_Ensemble.java Rejected -------- - May 2005 (bela) Handle MergeViews in DistributedHashtable, ReplicatedHashtable etc --> these classes are deprecated, not maintained anymore - 2003 (bela) Since message IDs are ever increasing with PBCAST, we have to reset them (e.g. when they reach 2 E9; 4E9 is the current max size of a long on Solaris 8). Write a protocol that resets the message IDs in all members at the same logical point in time. Alternative: create new SequenceNumber class, use it instead of long for seqnos --> longs are 8 bytes, so we have 2E10 -1 numbers, that is more than enough. - 2003 (bela) EventPool: pool for Event instances. Since a lot of Event instances are used, we should reuse them to avoid constant instance creation. Use OptimizeIt to measure number of Events created. --> JDKs 1.4 and higher now manage object pools internally, this is not needed anymore - 2003/2004 TransactionalHashtable: when async replication, provide option to periodically replicate changes (e.g. using a queue) --> done in JBossCache, currently no plans to backport to JGroups - Aug 2002 Integrate latest version of EJAVA (comes with 0.70 distribution of Ensemble) --> Nobody uses the EnsChannel, so forget about this project - Summer 2001 (bela) Reduce message overhead for small messages by reducing headers (static array of header names). Suggested by Gianluca. --> Reduce IpAddresses as well (use InetAddress.getByName() with dotted-decimal notation for unserialization to avoid reverse DNS lookup) (suggested by John) --> Remove dest and src from Message serialization altogether in UDP: just send payload and headers. Reconstruct dest and src from addresses in DatagramPacket --> Experimented with various schemes, none of them was elegant