This points to firmware or hardware issues. Sure it could be a configuration issue but you usually find which box caused it by rebooting each box till the system comes back.
The really good news is that it's your job. That is, you can call support for the box to ask them if new firmware or other ideas about how to fix it.
I recently accepted a job as a mid-level engineer, about six months ago and consequently inherited a very troublesome LAN environment. Previous to this I worked as an junior engineer/administrator for a company with 12 sites, MPLS/Metro-Ether, and a full Cisco shop all the way around.
I'm a bit stumped- this new company is using two Juniper SRX firewalls in a HA config, each with a link to two new-ish TP-Link managed switches. We have a production LAN operating on 192.168.4.0/24 and about 25 people, but the VoIP system is riding on this subnet as well. There's another subnet 192.168.5.0/24 which is a maintenance subnet for our iSCSI NAS/SAN and a couple servers. The TP-Link switches support Vlans and trunking, but none of that has been configured. There are two Cat5e cables feeding from the datacenter TP-Link switches to one Dell PoE switch for voice and the second feeds two separate Dell PowerConnect switches in a stacked config for PCs, printers, etc.
The first outage we had a few weeks ago was resolved by removing the redundant links from each Juniper firewall to the TP-Link switches (3 connections to each switch, dropped down to one). Today's outage began as troubleshooting an IP conflict for one of our VMware servers on the 192.168.5 subnet. None of our switch ports are Vlan'd off, so essentially two broadcast domains running on the same switch ports. I've never been bitten by running a subnet scan so I added a 192.168.5 address to my NIC and ran a subnet scan on the maintenance subnet. About 2 minutes later, our LAN was down on the office side. Datacenter could still reach the internet, but couldn't reach the switches that provide connectivity to our office PCs, etc.
Sorry for making this post long, but I'm limited in reporting capabilities with these TP-Link switches, and the Dell's are even worse. I've never killed a LAN scanning a subnet, so I'm wondering if this was the breaking point of an over-whelmed and congested LAN or if I should scrap the TP-Links and Dells for some Cisco..
Thanks in advance.