CNET News Video
Scaling fast-growing FacebookJonathan Heiliger, vice president of technical operations at Facebook, talks with CNET News.com Editor in Chief Dan Farber about devising the infrastructure to support the social network's hypergrowth.
[ background music ] >> I'm Dan Farber, editor in chief of cnetnews.com, and I'm joined by Jonathan Heiliger [assumed spelling], who is the vice president of technical operations at Facebook. Facebook is up to about eighty million users right now, I think you're adding two hundred and fifty thousand users per day. How are you keeping the wheels on the train here? >> Well we're adding a lot of infrastructure and adding a lot of servers, constantly sort of looking at how we can improve user experience, and the time it takes to generate pages, and send those pages down to our users. So it's a never ending game. >> So how many servers are you up to now? >> We're up above ten thousand servers today. >> And what's the basic architecture for keeping those servers, delivering the information at you know, low latency to your user base. >> So our site is similar sort of to a typical web three two architecture. We run a collapsed web and [inaudible] on the top, which is Apache and PHP, both tremendous open source projects. The mid tier is mem cache D, which is an open source in memory distributed cache, and then the data is persisted and stored in mySQL databases. Again, open source technology. Then around that we have a number of other applications that we've developed in house, chat is one of the ones we launched recently, we have some search functionality in all of our data analysis that runs alongside as well. >> Are you mostly using open source? >> Mostly using open source. We are tremendous believers in the open source community, and a number of open source projects, and endeavor to contribute back any number of enhancements to those projects. We've also released some of our own technology into the open source community. Last year in 2007 we released Thrift, which is a set of RPC, it's basically a language independent network stack that people can use. >> Now how are you, what kind of challenges are you running into in terms of scaling this fast, in terms of growing at this hyper speed. >> Well it's almost a new challenge every day. When I joined the company late last year, we were looking at space challenges and power challenges. Earlier this year we were looking at the challenges of growing and scaling CPU time, and the amount of processor time it was literally taking for us to generate all of these pages. When I go into the office later today, I'm sure I'll have another new challenge to look at. >> Now you're doing about as I said, fifty thousand transactions per second, is that right? >> That's right. >> So what are some of the issues you're having in terms of making sure that the site performs at the kind of user expectation level, which is it's not too slow. I mean that's always the problem I used to have with Facebook, that it was really kind of slow. So what have you done to really speed it up? >> We've done a couple of things. Mostly we've done optimizations in our application, basically just to make that application perform better. One of the most recent examples of that that I think even you've observed is we've Ajaxed you know, creating new verbs here in the web world, we've Ajaxed a fair portion of the site, which has turned synchronous calls and synchronous page loads into a set of asynchronous calls, and basically things that can happen in the background. So for example, when you're looking at your friend's photo album, we'll load those photos asynchronously and pre-emptively. Or when you're going to add a new friend, we'll process that friend request and refresh the page to the user, while we stitch the friend, actually the friend relationships together in the background. >> Now you've got something like three hundred and twenty million dollars in venture capital, you're generating hundreds of millions in revenue. I also understand you took out a hundred million dollar loan for some reason to go help build data centers. Are you going to build your own data centers, or depend on these external providers that you're using today. >> So we've looked at building around data centers actually, and one of the things we're still considering doing, and haven't completely formulated our plan there yet, but we're constantly sort of in the market looking at, looking at space, primarily on the west coast, on the east coast. And we think building a data center could make a lot of sense as our demand continues to increase. And you know, the notion there is just the fewer people you have in the pipeline or the supply chain, you know, the lower cost it ends up being. >> Now you're having a lot of growth outside the U.S. Are you planning to build any data centers, or equip any data centers outside the U.S.? >> So we haven't yet decided to build any data centers outside the U.S. We do work with CDN providers, content distribution networks, to distribute a tremendous amount of static content out everywhere around the world. And we're also looking at extending perhaps our network outside of the U.S. to [inaudible] the performance. >> Now some of your I guess wouldn't be competitors, but people who are doing similar things to you, which would be like a Google or Amazon, are opening up their infrastructure to third parties. So come and mind your applications on our cloud. Are you interested, or have any plans to do a similar kind of service? >> Again, that's one of the things we've looked at and talked extensively about. We think Amazon has done a tremendous job with S3 and Easy 2. Most recently their online database that they've opened up for people, which is based on an internal project they developed at Amazon called Dynamo. One of the things we've yet to decide. >> Now most of your applications you know, tend to spit out a lot of data. In other words, if you have one person, and they have a hundred friends, those hundred friends have you know, a hundred objects or something. So how do you keep up with just this growing amount of data that has to be done, has to be actually served pretty much on a dynamic basis? >> Yeah, it not only has to be served dynamically, but as you said, it has to be served very quickly in order to delight a user. So, and that's really inherent in the architecture we've chosen, and we continue to evolve that architecture of our infrastructure, which is that all of that data is very distributed. We use mem cache and mySQL predominantly, and mySQL or, and have thousands of mySQL instances that's storing all this data in a very distributed fashion, so that as you're looking at the data for your hundred friends, and I'm looking at the data for my one hundred friends, we don't collide and all try to access the same disc or the same database at the same time. >> Great. Well Jonathan, thanks for speaking with me. >> It was a pleasure, thanks for having me Dan. >> I've been speaking with Jonathan Heiliger, who is the vice president of technical operations at Facebook. For CNET News, I'm Dan Farber. Thanks for watching. [ music ]