# About This is a follow up to the first MastRework document written by Michael Richardson in 2006. Everything in this document is in the brainstorming phase. # Current setup The current deployment of mast mode uses an iptables chain named IPSEC. This chain has an entry per tunnel, which sets the nfmark on a packet with a value that corresponds to the tunnel's SAref. This serves two functions: * an 'ip-route' rule is installed that directs all packets with the special nfmark value towards the mast0 device. * the mast0 device can quickly lookup the SA to use for this packet by using the SAref value decoded from the nfmark. In addition the SAref can be set on a per-socket, or per UDP packet basis using either setsockopt(), or sendmsg() options. # The problems ## Scalability Marking packets using iptables works really well for a small number of tunnels. The current model, however, runs into some scalability issues because it uses an iptables entry per tunnel, and the IPSEC chain is scanned sequentially attempting to find a match. With thousands of clients, we don't want to make this sequential lookup on each packet. ## Unmodified applications When applications can be modified to be SAref aware, setsockopt() or sendmsg() can be used to tag a socket or packet, respectively, with the SAref number that allows mast code to lookup the right SA. If we cannot modify all applications, we need to use iptables to do the packet-to-SAref matching for us. # Solution 1: get rid of IPSEC chain An alternate setup would reverse the meaning of an nfmark set on a packet. First we remove the iptables IPSEC chain. Using 'ip route', the default nfmark (of zero) would direct the traffic at the mast0 device. This means that any outgoing packets would always target the mast device. # ip rule add from all fwmark 0 lookup 50 # ip route add default dev mast0 table 50 Now every packet that has no nfmark flows through the mast0 device. If it came from an application that set the SAref using setsockopt() or sendmsg(), then looking up the SA is very quick. If however, the application is not SAref-aware, the SA needs to be looked up using the old mechanism -- which would require that we either set eroutes again, or invent some lighter mechanism to get from packet header to SAID/SAref. Now that SA is found, we could store the SAref for it in the associated TCP socket so future packets will have the same fate. Once the packet is encapsulated in AH/ESP, klips will reinject the packet back into the stack. By marking it with an nfmark that is non zero, the packet will traverse routing table 0 (or the default table), and be routed out to the physical interface. ## Required changes The following changes are required to make it work: * changes to init scripts - forget about IPSEC chain - use `fwmark 0` to match packets for routing table 50 * changes to updown scripts - forget about IPSEC chain * changes to pluto - klips needs to eroutes to be generated * changes to klips - in mast mode, use `skb->sp->ref` to locate SA - if no ref set, find the SA using `ipsec_findroute()` - if socket is connected (TCP or UDP) and `skb->sp->ref` is not set, we can set the socket's sk->ref # Solution 2: use conntrack It would be possible to reduce the impact of the `IPSEC` chain scan if conntrack was used for storing the nfmark once it was discovered the first time. A new chain would be used for detecting and assigning the nfmark encoded SAref values to connections. This chain would only be used when conntrack didn't already have an appropriate nfmark set. We start off wtih the standard rules to skip over ports 500 and 4500, and those that initiate the saref lookup. -A PREROUTING -j IPSEC -A OUTPUT -p udp -m udp --dport 4500 -j ACCEPT -A OUTPUT -p udp -m udp --sport 4500 -j ACCEPT -A OUTPUT -p udp -m udp --dport 500 -j ACCEPT -A OUTPUT -p udp -m udp --sport 500 -j ACCEPT -A OUTPUT -j IPSEC The `IPSEC` rule will now try to use the conntrack nfmark first, failing that it will jump to the other chain. -A IPSEC -j CONNMARK --restore-mark -A IPSEC -m mark --mark 0x80000000/0x80000000 -j RETURN -A IPSEC -j NEW_IPSEC_CONN -A IPSEC -j CONNMARK --save-mark Finally, the `NEW_IPSEC_CONN` chain is ued to locate the nfmark, and is populated by the *updown.mast* script as before: -A NEW_IPSEC_CONN -s 1.2.3.4 -d 5.6.7.8 -j MARK --set-mark 0x8xxx0000 ## Required changes The following changes are required to make it work: * changes to init scripts - confirm that conntrack is available in the kernel - flush `IPSEC` and `NEW_IPSEC_CONN` tables * changes to updown scripts - initialize IPSEC chain as above - manage per-connection rules in `NEW_IPSEC_CONN` chain * changes to pluto - none * changes to klips - none # Solution 3: use a light-weight lookup The overhead of connection tracking may be too high for assigning nfmark values to packets. It would be possible to create an iptables module that would assign nfmark (or secmark, or skb->sp->ref) given an externally managed list. We'd load in a module: insmod xt_IPSECSAREF Pluto will have an interface into this module and would update it with information about which packets should get what SAref values. In iptables we'd see: -A IPSEC -j IPSECSAREF When the packet reached klips, it would already be marked appropriately. ## Required changes The following changes are required to make it work: * new module `xt_IPSECSAREF` - hash table of what to match, and what mark to assign - needs an iptables lib and a module * changes to init scripts - load `xt_IPSECSAREF` - remove IPSEC chain * changes to updown scripts - remove IPSEC chain * changes to pluto - update the IPSEC SAREF module's hash table * changes to klips - none