It was a Monday in every sense of the word.
Murphy is only a few weeks into his new position as IT Director and is still wrapping himself around the environment he inherited. Overall, it’s not a terrible network to walk into—a little dated sure, but it seems like it was pretty well thought out.
The company has an MPLS network with broadband internet circuits at every site for failover. They’ve virtualized most of their environment, and are 100% cloud-based from an application standpoint. The company is embracing cloud technology, and willing to invest in deploying it correctly.
Murphy has some work to do, but he has an agreeable leadership team, some good people on his support team and a seemingly stable network.
At least that’s the story he would have told you when he wrapped up work last Friday.
This is Monday…
This particular morning starts at 6 a.m. with a frantic call from one of the branches saying they can’t access their primary ticketing application which drives their production process and keeps them fulfilling orders. Production is screeching to a halt, people are frustrated and the company is losing money fast.
Not a great start to the week.
By 7 a.m. the issue has become significantly worse as site-by-site Murphy discovers that all of the sites have internet access, but can’t get to this application. Previously profitable employees are now watching squirrels water ski on YouTube because they have nothing better to do.
Time to troubleshoot.
A quick check of his monitoring system confirms that the MPLS network has gone down—it’s a core carrier outage. Could his luck be any worse? But the internet circuits are up and running. At least those bored employees can continue to enjoy videos online to pass the time, but it isn’t helping the larger problem.
It’s now 10 a.m. on a Monday and Murphy already feels like he’s done an entire week’s worth of work. His team is busy running down all possible causes on their side of the network while he opens a ticket with the application provider requesting every possible escalation to get his team back up and running. In the meantime, all he can do is sort through notes on the network turn-up to look for clues.
At first glance, all looks good.
MPLS circuits come up seamlessly and the broadband is no more painful than normal when working with cable providers. Every site makes notes of unplugging the MPLS circuit to test failover, and in every case they are still able to get to the internet and make VoIP calls.
So far this seems consistent with what Murphy is seeing today. The only thing he doesn’t see record of is testing of applications. They verify internet traffic and VoIP, which is configured to choose the best available path anyway.
Murphy calls the application provider again to see if failover has been configured for their solution, and if they have any record of the broadband circuits and relevant information on them (default gateway, WAN IPs, etc.).
To his surprise, there’s a failover strategy in place—just not the one he expected.
The provider has information on internet circuits coming out of the data centers, and during an outage application traffic is supposed to flow through the data center. Not a terrible strategy, but more to follow-up on. To make this solution work, there would need to be a VPN tunnel (or similar) between each branch and the data center.
Next step is to check the status of those tunnels and see if there’s a problem there. Murphy calls one of his engineers who was around during the deployment to start the process and is met with the last question he wants to hear at a time like this: What are VPN tunnels?
As the saying goes, the best-laid plans of mice and men often go awry, and this one fell apart at the last possible step.
After developing a plan, and doing a fairly good job of covering all of their basis, the final step of building the underlying VPN tunnels for failover had been missed.
Problem isolated, but now the work has just begun.
It’s only mid-afternoon on a Monday, but Murphy has already dealt with a full month’s worth of stress. We see a glass of whiskey in his near future after this catastrophe gets resolved.
- Always test ALL business critical applications on primary and backup network connections.
- Have a system of checks and balances to make sure all steps of a plan are followed, because it only takes one to throw the plan off.
- Mondays are always going to be Mondays; respect the power of Monday.
EnableIP: Helping Businesses Beat Murphy’s Law
Poor Murphy. Sooner or later, your network is bound to face challenges and obstacles too. Partner with EnableIP consultants to ensure that you’re fully prepared when it does. We’re so confident you’ll love doing business with us that we will put our money where our mouth is and guarantee it or you can walk away. Experience the difference for yourself!