Ring Ring, all circuits are busy, phones down (Spoiler, it was Verizon’s fault)

For my birthday (the 13th), Vitelity (now Sinch) (now Inteliquent) decided it would be fun if all inbound and outbound calls we serve didn’t work. In fact, they said “hold my beer”, and took 100% of all their customers nationwide offline for phone service.

We opened an emergency ticket with their support engineers; they were so slammed that the emergency website ticket system crashed. This is going to be fun I thought to myself. Vitelity hasn’t had a breaking outage on their SBC platform for all the years we have been on it.

10 very long hours later, Vitelity engineers came up with a workaround which was to change the IP addresses we use to peer with them. This meant we had to go in and change the voice trunk configurations on every phone system customer and our larger aggregation servers before traffic would resume. This was a breaking change; nothing would work until we made the changes.

Today in doing a post mortem analysis we discovered what went down. The original IP block was allocated to “Verizon Business”. About 8 days ago, someone upstream of Vitelity changed the routing to only use a discount transit provider (Zayo) rather than the other Tier-1 (Lumen/NTT). This eventually unravelled into a “our links are all up but our traffic dies somewhere in the internet” which is any network engineers worst nightmare.

We are now on a new set of IP addresses that are “cleanly owned” by the organizations we are using. I have verified this with ARIN (the number registry). I have also requested Vitelity look at backup peering on their AS number so that even if their upstream owners mess up again, they will have a backup.

Curiously, one of our technicians got a job offer posting on their phone this afternoon for a “network technician, level II” for Zayo networks. Hmmm.

Next
Next

Plaid speed (beyond ludicrous)