Sophie: openswan-doc-2.6.39-3.2.mga4 x86

openswan-doc-2.6.39-3.2.mga4.x86_64.rpm

Xelerance has been working on making transport mode work when there are
two clients behind two NAT's that have the same IP.

It has other benefits as well: we can probably make a transport-mode OE
work through NATs, even when the clients have the same IP addresses.

That means that the clients do not have to be assigned unique IPs, but
we can't do gateway mode OE. Maybe that's okay, if we can do OE for
NAT-ed clients.

Also, BTNS is going towards transport-mode only SAs.

How does this all work.

1) we augment ipsecX with "mastX"

Of the three mastX modes that we discussed years ago, only one has
been implemented. That is the mode where if the nfmark is set to
0x80000000 | (SAref&0x7fff << 16)

then we extract the SAref and use it to lookup an SA in the SA table
that RGB wrote years ago.

Some changes to it: the SAref is now a proper extension. I did this
so that our SA create message was identical to stock PFKEYv2.
This also makes it easier to #include the KAME/NETKEY
liunx/pfkeyv2.h.

It turns out that NETKEY has skb->sp, which is:

struct sec_path
{
atomic_t refcnt;
int len;
struct sec_decap_state x[XFRM_MAX_DEPTH];
};

to this, I have added:

typedef unsigned int xfrm_sec_unique_t;
xfrm_sec_unique_t ref; /*reference to high-level policy*/

I don't know what the sec_decap_state{} structure does yet, but it
involves the xfrm code's policy check. We have a much easier "ref" which
is just an integer.

After checking the nfmark, we then check the skb->sp->ref value, and
if non-null use that instead for the SA lookup.

The other two modes for mastX are (for the record):
b) punt all traffic into a well-defined SA# ("PPP" mode).
This would be useful for building a virtual-leased line,
and would eliminate the GRE layer that is currently required to
make BGP-over-IPsec work.

c) a mode where it has a link-layer value, and for each tunnel
that is up, it populates the neighbour cache with something
is meaningful to it.

2) in ipsec_rcv(), we set up skb->sp to contain a sec_path that
has a ref set up correctly.

3) for UDP sockets, we set up a new IP-level option that permits the
skb->sp to be mapped to a new "ancilliary" data message. This message
is currently pretty primitive, consisting of two "ref"s.

These are called generally "ref" (or refme) and "refhim".
The first is refme, and is the SA on which the packet arrived.
The second is refhim, and is the ref of an SA on which a reply can be
sent.

For TCP sockets, the TCP layer will be taking care of this stuff.
(for SCTP, it will have to be a mix!)

For UDP, the application needs to worry about it.
Although, I guess connected UDP sockets could not care about it, but
that's really not that common a use.

4) In the case of xl2tpd, we use the "refhim" as an additional index into
the call/tunnel list, letting us distinguish two hosts that appear to
have the same outer IP.
Actually, we could probably use *ONLY* refhim.

We don't use "ref", although we do record it the first time, since it
currently changes during rekeys. I am going to be fixing that now
that I've figured exactly how.

The problem is that we have to keep the other (older) SAs around when
we are doing rekeys, since there may still be packets in flight. In
fact, the client could even, say, load balance among the currently
valid SAs... might occur for some HA system...

I'm going to be linking the "older" SAs to the newest SA on a singly
linked list.

Oh... didn't mention that SAs are now all properly reference counted.
(I HOPE!)

5) in pluto, we have to be a bit crafty to get things setup right.

KLIPS can assign SArefs if the passed SAref is IPSEC_SAREF_NULL,
and this is fine for new SAs. But, we need to know the outgoing SA
when we create the incoming SA, so that it can reference it.

Normally, pluto creates the incoming during QUICK_R1 (when it
receives I1, and sends R1). It then creates the outgoing SA when
it gets I2.

Now, if pluto doesn't know the outgoing SA's "refhim", it creates
the outgoing SA immediately. (It doesn't "eroute" it yet)

It then creates the incoming SA.

Then at I2, if it hasn't created the outgoing SA, it creates it.
Why might that happen? well, at a rekey, we already know the refhim
that we want to use, so we can just it normally.

For transport-mode with the MAST kernel driver, we actually don't
eroute ANYTHING.

We can find out the previous refhim by looking at the state
referenced by:
st->st_connection->newest_ipsec_sa

If the gateway is already EXPIRED the state, then we have a
problem. I don't have a solution for this problem yet. I think that
we will have to have some kind of mapping:
{public-key, SA-tuple} => refhim

6) for tunnel mode SAs, we can now use iptables.

We do:
iptables -I PREROUTING 1 -j IPSEC -t mangle
iptables -I IPSEC 1 -s 192.0.1.0/24 -d 192.0.2.0/24 -j MARK --set-mark 0x80120000

Except that it turns out iptabels doesn't grok 0x, and thinks of
set-mark value as being signed... so _updown.mast uses /bin/printf to do
the right thing

We then do:
ip rule add fwmark 0x80000000 fwmarkmask 0x80000000 table 50
ip route add 0.0.0.0/0 dev $PLUTO_INTERFACE table 50

fwmarkmask is a new option. It won't stay that way yet. It has kernel
side code too, which was posted, but needs changes to be accepted.

7) _updown is now a wrapper that basically does:

exec @IPSEC_LIBDIR@/_updown.${PLUTO_STACK} $*

where PLUTO_STACK={mast,klips,netkey}

8) for transport mode UDP sends, when the IPSEC_REFINFO option is
attached using sendmsg(), then basically we fill in the outgoing flow
information. We figure out which mastX device to force things to by
looking up the ref value given using ipsec_sa_getbyid(), and finding
the attached mastX device.

(Oh, yeah, each SA can now indicate which device it should go in or
out of, but only mast0 is supported by pluto)

9) tunnel and OE

we should be able to do the same trick with iptables with a new module:

iptables -I IPSEC 1 -s 192.0.1.0/24 -d 192.0.2.0/24 -j IPSEC --saref 0x012

just have to write this module. A trap will have to become

iptables -I IPSEC 1 -s 192.0.1.0/24 -d 192.0.2.0/24 -j IPSECTRAP --???

and once pluto sees that, it will have to change it to -j IPSEC.
the challenge is going to be programatic interface to iptables...

10) OE

iptables -I IPSEC 1 -s 192.0.1.0/24 -d 192.0.2.0/24 -j IPSECOE

This is different, because it has to use conntrack to look up the
particulars of the stream, and if there is no conntrack, then it
has to generate an ACQUIRE.

I may use the existing eroute code. Or not. I'm not sure yet.

IRC notes:

16:27:39) mcr: no rekeys yet.
(16:27:56) mcr: you want an explanation.... okay.
(16:28:16) mcr: so, the unique value that the l2tpd uses for the remote client is the "refhim" --- i.e. the outgoing SA.
(16:28:17) ***pat braces himself.
(16:28:45) mcr: l2tpd actually doesn't check the incoming SA anymore. This concerns me a bit, but we need to have a way to ask pluto, "is this SA the same as the previous one?"
(16:29:21) mcr: I was originally going to make the kernel keep the refme the same too. I didn't like my method. I've thought of a better method, and I'll be doing that soon.
(16:29:40) mcr: (we'll need it for TCP, and also for NAT-T friendly OE and NAT-T friendly BTNS...)
(16:30:00) mcr: so, the trick is that pluto tries to keep the refhim the same. It's okay to have only one outgoing SA --- because we get to pick which one.
(16:30:10) mcr: But, we need to keep all incoming SAs until the remote machine deletes them.
(16:30:37) mcr: one of the problems is if the gateway expires the IPsec SA soon --- then we have no record of what the SA was (no refhim=) so, we can't maintain it.
(16:30:44) mcr: okay, so fixed that, and then got a different problem.
(16:31:03) mcr: the client would rekey, and we'd try to keep the SA -- but we screwed up in logic.
(16:31:45) mcr: we have to install the outgoing SA before the incoming SA if we don't know what the outgoing SA is going to be (this is a change compared to what we did before. we would still install the "eroute" after we are sure everything is sane).
(16:31:59) mcr: so, we say, "if refhim==NULL" => install outgoing SA.
(16:32:15) mcr: then, when it is time to install outgoing stuff, we do again, "if refhim==NULL" => install outgoing SA.
(16:32:30) mcr: except that during a rekey, refhim is intentionally NOT NULL. (we keep the value).
(16:32:54) mcr: Since we know the value, we can install the outgoing SA whenever we liike (we already know what to reference in the incoming SA), but because the test was the same, we never
(16:33:19) mcr: installed the new outgoing SA, so we just kept using the old outgoing SA. Well, during a *REAL* rekey (due to expire of SA), the old SA on the client is gone.
(16:33:25) mcr: (vs when you simulate a rekey with --up!)
(16:33:29) mcr: which is what my tests did.
(16:33:50) paul: ahh. bad testcase :)
(16:41:41) pat: operator error :-)