Automating to create a Transparent Culture
Inside Facebook Inc.’s operations: Automating to create a transparent culture
When a company as large as Facebook sketches a blueprint for the future of operations, it mainly thinks about how its own operations will change. Rarely does the vision of automating exclude the work that humans do today.
Running the operations of Facebook Inc. is not an easy feat. First of all, there’s the sheer scale of its global networks and the supreme need it puts on reliable service and satisfying user experiences. Then there’s the need to create new flexibility and capacity for the business to pursue its broader ambitions. This includes Facebook’s most recent initiatives, the Connectivity Lab, AI, deep learning and virtual reality as a nextGen computing platform.
To combat this, Facebook is using automation because the team has a ton of complexity to manage. They have an infrastructure that supports hundreds and thousands of computers all around the world. Facebook is actually serving 2 billion people in the map, and millions of people on the other apps. Their engineers are constantly improvising on the features, making things more optimized, and providing newer and better services. So, in reality, Facebook has a lot on its plate, and a lot that can be done via automation, in order to keep up with the scale and the pace of its product development.
Recently, Facebook built a system called FBAR, for Facebook Auto Remediation, to do the basic hardware remediation tasks. Before, if a server had a hard drive failure or some kind of hardware error, an alarm would go off and a Facebook worker would have to log in, or walk to the computer, and try to fix it. The
worker would have to fix the software and reboot the machine, before it starts working alright. Today, all of the software remediation and debugging is automated. No human is involved in the process. The automated system can detect the error, be it a disk drive, or a CPU, or a networking card failure. The system will recognize the error – and simply fix it.
In any case, these are truly idiotic things – things that are outrageously unimportant to automate. But, it allows Facebook engineers to work on higher-level things.
Facebook has a long list of menial tasks it’s looking to automate in the months to come. Take for example, clusters, .i.e. a bunch of servers doing a certain type of function in the infrastructure. They require a lot of configuration – installing software and a lot more, to make sure the right things are connected to one another. Back in 2009, this process was done manually. Facebook had to literally write things down on the whiteboard to assign cluster job to a certain group of engineers. The entire process was time consuming, and more importantly, error-prone. You don’t need to worry about that with automation, since work consistently gets done the same way, every single time.
In simple words, automation allows Facebook engineers to focus on simple, but time-rewarding tasks. It allows them to think about the future projects, rather than what they’ve already built. Another reason why automation is important is that it allows teams to do things for the future. Imagine, a majority of tech companies in the world invest most of their time bringing the best in tech. If employees end up doing humdrum tasks for a long time, they aren’t really learning anything. This leads to job burnout and dissatisfaction. It is necessary to keep automating your systems so that tasks don’t become
menial, boring, or repetitive for the employees.
Companies must set employees in a place where one group, which needs to keep on improving, and one needs to make everything cost-effective, are both aligned. Let’s take a look at Facebook’s product teams – it consists of the middleware as well as backend engineering teams, the operations teams, the security team and the IT team. Back in 2009, it was considerably normal to have so many teams. But then, Facebook realized it was causing inefficiencies in its operations. As a matter of fact, it was slowing people down. The teams couldn’t make the best decisions. Some of the decisions made were short-term costbased, while some were long-term. Automation didn’t exist in a lot of operations. Facebook, then, had to rethink its entire operational structure, the kind of people it wanted to hire, and break every possible wall. It is difficult to manage separate teams. There are daily interruptions coming in consistently that distract you from your long-term goal. You truly need to ensure you have the right team, with the necessary expertise and the right skills you require. In any case, there are one too many advantages of automating. To begin with, it makes work much more transparent. No one likes it when there is a team working in a secured location doing something cool that might replace your product and your team, someday. It ends all sorts of unwanted worries. The core business team can go back to focusing on more important issues.