On Wednesday, 13th March at around midday EST, Facebook suffered outages around the globe, during one of its longest recent outages to date . The outages related to problems in users logging into and using essential Facebook services, such as the platform itself, as well as popular side apps, Instagram and WhatsApp.
While it didn’t affect all users, it certainly caused issues for those involved. Their Tweet regarding the outage, “We’re aware that some people are currently having trouble accessing the Facebook family of apps. We’re working to resolve the issue as soon as possible,” and follow up, “We’re focused on working to resolve the issue as soon as possible, but can confirm that the issue is not related to a DDoS attack,” attracted 13,000 comments, including complaints from business owners, people scared of real-world human interaction, plenty of Twitter love (as it does not fall under the Facebook social media suite), and a cavalcade of amusing comments and memes.
Now, while we’re all fans of a witty follow up in the back offices here, we were definitely interested in what the real story was behind the outages. After all, these 3 platforms are essential to many peoples’ day to day interactions, and indeed businesses.
What was the root issue? Facebook says server configuration
The outages Facebook have blamed on server configuration issues. Server configuration changes have the potential to have widely unforeseen consequences if they have not been done “just so.”
Facebook’s server infrastructure would be a massive, complex beast, due to the sheer size and volume of the data that they need to house and process. Their data centre infrastructure would be some of the most complex in the world.
Data centres are massive housings of servers and configurations, dotted around the world in large, secure, cooled buildings. These buildings house racks and racks and racks of servers, with each perhaps housing plenty of virtual servers and containers.
All of these need to be expertly managed and configured, monitored for issues, and brought into the infrastructure when new physical and virtual machines, and data centres themselves, come online.
This infrastructure management is performed by using a number of tools and configuration processes writing infrastructure as code. While we are unaware of the specifics of the particular server configuration that caused the Facebook outage, it’s safe to say that outages caused by server configuration issues could happen to pretty much any business.
Your server configuration is just as at-risk too
Your data and processing power need to live somewhere, whether it’s on-site or clouded, or a combination of both, and how you manage and maintain these configurations of infrastructure points towards how well you could handle server configuration changes disabling critical infrastructure.
For instance, if you swap around physical servers, or add a new one, and it breaks your systems, you want to be able to roll back quickly – hopefully you have a diagram of the physical configuration, as well as a copy of the previous software connections to get it back quickly – or back up systems in place to take over while you fix the issue.
When it comes to virtual infrastructure, servers and containers, it may be mistakes in your infrastructure as code configuration – again, you want to isolate the part that’s causing the issue to rectify it quickly, or have back up systems in place that can handle the load.
What’s the best way to manage all this?
If you handle your infrastructure on-site, then you better have some good systems in place to begin with and some savvy engineers on hand that can step in and get things back up and running quickly if an incident happens.
These days, lots of businesses are utilising, or looking to cloud infrastructure instead of hosting on site. Cloud infrastructure can be far easily to manage, not requiring physical, on-site equipment, or engineers to help you out.
There are two options to manage cloud infrastructure: hiring cloud experts on your own team to do it. Or getting another expert partner (like A1 Technologies) to manage your clouded equipment and configurations.
We can help look after all your server needs, whether it’s on-site assessments for configuration and management best-practices, migration to cloud infrastructure, or management and support of cloud services, including back up fallback systems should you experience catastrophic outages yourself.
Give us a call and have a chat about your current infrastructure and what possibilities you have open to you for the best way to manage your systems.
Subscribe to our newsletter
Enter your email and stay in touch with the latest updates from A1.
You might also like…
- What’s new in AWS? The Amazon Web Services suite of products is a literal behemoth, which can make it very difficult to keep...
- Back in the early days of the internet and email systems, spam mail would just be rubbish messages such as “Make $5000 a...
- Whaling Attacks and How to Prevent Them A whaling attack is a clever little play on words that has its roots in phishing....