Twitter To Replace MySQL By Cassandra
The fact that Twitter and Digg are replacing MySQL by Cassandra have become very popular.
However for me it was also interesting to find out how Twitter engineers had conducted their research and what other solutions had considered.
For example, there are questions to evaluate potential tool:
- How will we add new machines?
- Are their any single points of failure?
- Do the writes scale as well?
- How much administration will the system require?
- If its open source, is there a healthy community?
- How much time and effort would we have to expend to deploy and integrate it?
- Does it use technology which we know we can work with?… and so on.
And they had to check various databases: HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable.
Another interesting fact is that Twitter guys actually test on production.
For instance, to roll out the new data store they:
- Write code that can write to Cassandra in parallel to MySQL, but keep it disabled
- Slowly turn up the writes to Cassandra (it can be done by user groups like “turn this feature on for employees only” or by percentages “turn this feature on for 1.2% of users”)
- Find a bug 🙂
- Turn the feature off
- Fix the bug and deploy
- GOTO #2
Eventually 100% doubling of writes are being done.
Read more details in the interview where Twitter Storage Team Lead Ryan King talks about the reasons for the switch and how it is planned to migrate tweets from MySQL to Cassandra.