Facebook uses a NoSQL graph API called TAO that runs on sharded MySQL. Facebook started off with MySQL databases but their data requirements became far too great to use these databases directly. TAO converts the existing sharded MySQL master-slave pairs into a scalable and geo-distributed database cluster. They allow objects and associations stored persistently in the same MySQL instance and cached on the same set of servers.
They earlier used InnoDB to handle social activities and RocksDB, their own custom-built database, for certain data storage needs.
Facebook, the largest social media platform, boasts over 2.91 billion active users. Such a huge user base requires extensive data storage.
I’ll discuss the various databases Facebook uses to store data. Keep reading!
TAO
Facebook developed TAO, a data model and API, to manage their social network connections efficiently. This geographically dispersed database can rapidly process data requests, handling Facebook’s demanding workloads. TAO operates on thousands of machines, stores petabytes of data, and manages a billion reads and millions of writes every second. It even replaces memcache in some scenarios.
You can read the TAO paper on the Facebook research website. It covers the system architecture.
MySQL
MySQL is a free, open-source database management system that’s popular for its speed, reliability, and ease of use. Facebook uses MySQL as its primary database to store user data.
MySQL is a relational database system that arranges data into tables and columns. It supports many features like transactions, foreign keys, views, triggers, and stored procedures. MySQL also offers a rich set of APIs for accessing its functionality from your code.
Facebook uses MySQL for its high performance and superior capabilities. This architecture suits large volumes of interactive applications, ensuring rapid response times.
- Primary database: MySQL
- Known for speed & reliability
- Supports large, interactive applications
InnoDB
Facebook used the InnoDB storage engine for managing social activities like likes and comments. InnoDB is a free, open-source database engine, originally created by Innobase Oy and later acquired by Oracle Corporation. It’s based on the MySQL codebase.
InnoDB supports transactions and row-level locking, allowing multiple users to access the same data simultaneously without worrying about corruption. It also offers better crash recovery capabilities.
Facebook uses InnoDB due to its stability and reliability in handling large data volumes. InnoDB supports foreign keys, ensuring referential integrity between tables. This is crucial for Facebook to maintain data consistency across tables.
- InnoDB for social activities
- Supports transactions & row-level locking
- Handles large data & foreign keys
RocksDB
RocksDB is a fast and efficient open-source database developed by Facebook (Meta). It offers several advantages over InnoDB in terms of space efficiency.
RocksDB saves disk space with default compression and an LSM data structure, reducing disk reads and writes.
Facebook needed a solution that could manage its vast data. So, they created RocksDB. Some benefits of RocksDB are:
- RocksDB uses compression by default, which means it takes up less disk space.
- It is designed to be fast and efficient. It uses a log-structured merge-tree (LSM) data structure, which reduces the number of disk reads and writes.
- RocksDB is highly configurable. You can tune it to your specific needs.
- High performing, adaptable, with basic and advanced operations.
Although RocksDB has many benefits, it lacked features like replication support and an SQL layer, crucial for Facebook. To address this, they built MyRocks, a new storage engine for MySQL using RocksDB. MyRocks helped them improve the space efficiency by 50%.
Some benefits of the MyRocks include:
- MyRocks is highly compression efficient. This means it takes up less disk space.
- It offers faster replication as it doesn’t require random reads and arrives with read-free features.
- As it writes the data at the bottommost level, it can avoid compaction, which leads to faster data loading.