Why corporate won't let you install that crappy Netgear switch.

I haven’t been in this sort of industry for awhile now, so I might be a bit of out of touch but I imagine this hasn’t changed since I left.

I saw a fedi post recently that talked about how corporate wouldn’t let them purchase a little switch for their office to make file transfers quicker. I won’t link it here because what I’m going to dive into isn’t the point of that post but I do have experience with why corporate don’t want you to plug in that 5 port Netgear switch that everyone buys¹.

BTW, IT is likely doing their best and balancing reliability, cost and supportability, along with a dozen other user issues. Regardless your company should help you solve the problems you have with infrastructure rather than just saying no.

Today we’ll be focusing specifically on the layer 2 (ethernet) technical details. We won’t be talking about the security, privacy, safety, or physical reasons. There’s a lot of legacy reasons leading to cause of this, so before we dive in we need to understand some context as to why a corporate network have separate network segments and how this impacts the fault conditions we’ll discuss.

A bunch of computers screaming at each other

If you work in a big building like mine, or a large campus you could hundreds to thousands devices connected to the network. Devices broadcast a lot of traffic. This traffic goes to all devices. This includes working out what IP address should assigned, what hardware address maps to what IP address, and things performing network discovery so that icons pop up to indicate there’s a printer or streaming device available. This traffic is all sent to every device on the same segment. Scaling this up to thousands of devices would cause a lot of wasted bandwidth.

Two network segments with a router in between

Even if we have enough bandwidth to handle all the screaming from devices, we still want to make sure our system is reliable. If some of the incidents I’m going to describe below happen, we want it to impact a smaller set of users, rather than the whole network.

Ok. Lets start building our network. One switch can’t handle all our users, so lets go down to our local office supply store and pick up the cheapest network switch we can find.

Two cheap netgear switches plugged into each other

Perfect. Then someone accidentally kicks the switch with their foot, stuff gets unplugged and there’s a rush to plug everything back in. We then end up with this.

The two switches end up plugged into each other. If you bought switches with spanning tree protocol (STP), then this is fine and will work². If you didn’t, we end up with a loop.

Diagram showing packets looping until the switches catch fire

What happens is the ethernet frames get sent back and forwards forever building up until there’s no bandwidth left for any legitimate data. This is why we have spanning tree protocol.

Demonstration of spanning tree blocking ports between switches

Problem solved! Just buy switches with spanning tree…. except… it’s a little more difficult.

Consider this example. We add our switch into this complex network.

We plug in our pirate network switch. And suddenly the whole network stops for a minute or two. What just happened is that spanning tree had to reconfigure itself. This is because your switch happened to be configured with a lower priority than others and became the root switch.

Network with tree of switches however blocking ports are now moved to make best path to pirate switch

You’ll notice that not only did we get hit with spanning tree reconfiguring, but also our pirate switch has forced the algorithm to select slower links than we would otherwise have available.

To make matters worse, there are multiple different types of spanning tree: STP, R(apid)STP, M(ulti)ST, P(er)V(lan)ST.

Lets go back to a dumb switch, that seems easier. What if we were to install it like above, but someone accidentally connects the dumb switch to two different switches on the network at the same time.

What happens here is that the spanning tree enabled switches can see a link to another spanning tree via the dumb switch. They aren’t aware of the dumb switch. This can cause issues like a large amount of traffic going via your tiny switch.

Ok. Maybe we are careful and we don’t connect another switch to ours. Someone finds a loose cable and accidentally plugs the switch into itself.

Of course we have a loop. However since spanning tree thinks everything is ok, that traffic is also transmitted to the rest of the network. From the point of view of spanning tree, there isn’t a network switch there. Even if the wired network can handle the bandwidth, the WiFi access points might not be able to.

Another little quirk is that many switches are configured with a system called “port fast”. Usually spanning tree waits a period of time to figure out if there is a network switch on the other end. Port fast assumes the port is meant for a device and skips that learning/listening phase. This means that loops can exist for some time before a loop is detected. Port fast exists so that computers don’t have to wait forever to get a DHCP lease to get going.

To summarise all of this

All switches need to be spanning tree enabled for spanning tree to be effective
All switches need to be configured correctly so that the suitable paths are selected
For a stable network, switches need to be configured to prevent pirate switches

Preventing pirate switches

A number of configuration options exist to to prevent issues when switches are connected:

BPDU Guard : BPDUs are the messages sent by spanning tree. If a switch detects this on a port that has been designated as a user port it will disable the network port and requires manual reset.
Root guard : This flags which ports on the switch we expect to find the spanning tree root. We disable ports that would have resulted in a root we didn’t expect
Loop guard : Detects packet loops and disables the port
Setting low spanning tree priorities
Mac address limit : We can detect dumb switches by counting how many devices a switch port can see

Bonus 0: VLANs

How your network is configured to handle spanning tree and VLANs could be one of many many many configurations. The network might have VLANs have cover only some switches for some VLANs. Spanning tree could be running per VLAN, or a group of VLANs. This means connecting a spanning tree switch might only impact one spanning tree instance leaving loops possible in other VLANs.

Bonus 1: Unidirectional Link Detection

Unidirectional what? We like to think of network links as working or not. But there’s a secret third option - working only in a single direction. This is especially common with fibre optics and media converters.

From spanning trees point of view, it can’t see a switch on the other side and will start forwarding packets towards it, thus causing a loop. We use UDLD (Unidirectional Link Detection) to prevent this.

Bonus 2: Virtual machines

Virtual machine systems, especially complex ones, can introduce their own switching and bridging to the equation which can cause loops when trying to configure redundant links or port aggregation. They also pose other possible threats such as duplicate virtual mac addresses. Typically these will trigger the mac address limits on ports.

Bonus 999999: what about TRILL? SPB?

Network vendors don’t have your best interests in mind and decided to fuck the standards for their own vendor lock in needs.

Bonus 1000000: UniFi have a web interface to make configuring this stuff simple

So did Cisco in like the 90s. Much like UniFi it also sucked at enterprise scale.

Cisco smartport configuration web interface - cisco express

Other reasons

So while we discussed just one technical aspect as to why just yeeting random switches into a network is a bad idea, there’s many more.

On going maintenance - firmware updates/patching
Security - if its a managed switch (common for STP support) then ensuring its configured securely
Privacy - we don’t want to open the network up to sniffing of traffic
Safety - Testing and tagging, cable tripping hazards
More technical - Sometimes what people think are switches are routers and provide a rouge DHCP server

I’m not sure if its just because they were super common or because they failed so often, but these were often found at the centre of network issues. ↩︎
Unless configured otherwise, this does not give you twice the amount of bandwidth. ↩︎

netgear network switch stp spanning tree