In the video Building Scalable Data Centers: BGP is the Better IGP, Petr Lapukhov tells us why BGP is chosen.
Firstly, he states the situation as the problem statement:
- many servers with 1G/10G NICs
- traffic is East-West as well as North-South
The objective:
- simple physical topology
- simple routing protocol
- well supported by vendors
The justification of choosing BGP is given:
- better vendor support
- simple enough
- per-hop traffic engineering. (In this context, he means the next hop route can be easily controlled. Not the traffic engineering in the context of RSVP.)
- easy to troubleshoot
- the event propagation is more constrained. That is to say, link failure is limited to the two routers of the link. In the case of OSPF, all the routers need to do SPF calculations.
- the information is straight forward, just the various RIB’s. For OSPF, the link state database needs to be consulted.
Down sides
- OSPF can let you discover MTU issues easily
- They have total control on the layer 2 switches so that all MTU settings are consistent
- slow converge time
- the network topology is simple and symmetric so the converge time is optimized
- configuration does not scale well
- the configuration is simple, only AS num and peers
- there are configuration automation solutions in place
He presented the limitations of BGP and clos with MLAGS.
There are two issues:
- clos does not scale well
- MLAG is proprietary, layer 2 broadcast issues. At most only two connections per MLAG.
Then he present the new design. There are two parallel clos topologies. Each ToR has one uplink to each clos. This approach scale well. Then layer 3 BGP is deployed. The benefits are:
- no more layer 2 problems
- more scalable
- uniform routing protocol
- BGP AS_PATH allows easier troubleshooting
There are some other considerations.
- BGP AS_PATH multipath relax are used as ECMP is deployed for load-balancing
- Uses 16-bit ASN
- The up side is that removing private ASN will hide BGP topology information
- The down side is that “allow AS in” is used to allow the ToR eBGP sessions to “recycle” the private ASN’s
- Do not use default route or summarization, it will blackhole traffic
It is very special use case for BGP. Indeed, BGP runs the internet.