Carl,
My personal preference would be for
a) k1 = k3
b) a single feedback loop.
I have yet to see much convincing evidence that the single feedback loop is
going to be a performance problem on the vast majority of platforms. If you're
doing DES in software, it doesn't make any difference; Also, most of the DES 
chips I've looked at will do triple-DES in chaining mode. So what you're really
worrying about are those hosts that have obsolete DES chips. I think only a tiny
minority of PEM platforms fall into this category.
[There I go, making rash claims with insufficient experimental evidence. I just
know someone is going to tell me that the CIS countries are full of hundreds
of thousands of PCs with Bulgarian DES chips that only do single DES :-) ]
I am quite prepared to give on the k1 = k2 issue. Even though 112 bits are
plenty, I'm prepared to go to to 168 if it'll keep everybody happy.
Mike
PS. There is another algorithms issue that will raise such a flame-war I 
dare not even mention it. However, it can be deduced from the MIC-Info of
this message.