Who needs VXLAN? Isn't it just GRE or PBB or MPLS or...?


GRE as standardized is point to point. This is fine with a pair of
vswitches but how do you scale this to 40,000 vswitches in a big data
center? If you stuck with standard GRE, you'd have to configure 780
million tunnels, and then broadcasts would effectively be source
replicated across all 40,000 tunnels so you'd need some new protocol to
figure out which tunnels were truly needed for which customer broadcasts
based on customer domain, or else extend GRE to use IP multicast for
broadcast emulation. GRE could be extended to do all this, and then it
would be very similar to VXLAN, and in fact I think that this "extended
GRE" would be a fine alternative to VXLAN. VXLAN is still a little
better because the inner header hash is captured in the outer header UDP
source port, improving hashing within the VXLAN-unaware spine. There is
no similar way to promote header entropy in GRE that I have found.
Eventually, assuming all switches become aware of inner headers and
could hash accordingly, this advantage would disappear. But
"eventually" could be a long time. More generally, I think that GRE
itself would have been better done on top of UDP --- who needs another
IP protocol, and with UDP, you can multiplex multiple logical tunnels on
the same pair of IP addresses using UDP ports to discriminate. i.e., yay SSL VPNs, boo IPsec. People always do things at lower levels
than they should. Anyway, that's just philosophy. In practical terms,
VXLAN is roughly equivalent to extended GRE (but not GRE as
standardized).


MPLS is fine if you have it, but not everyone wants to run an MPLS
backbone. Not many data center switches have robust MPLS support,
particularly at top of rack. If you have a standard L2/L3 core, then
VXLAN is deployable and VPLS etc is not.

Q-in-Q (or PBB or double-tagging) is all pure layer 2. It does not help
with the problem of spreading load across the data center spine because
of layer 2's lack of multipath support. With VXLAN, the outer
encapsulation is layer 3, where ECMP enables load spreading across a
spine. That's the practical issue. Phiosophically, it's at too low a
level: a layer 2 solution to a layer 4 problem. See also QCN (sigh),
another case of plumbers trying to fix the roof. Why people repeat this
mistake over and over is one of the great mysteries of technology.
Shared publiclyView activity