Database sharding



1   Overview
2   Initial server setup
3   Enabling remote MySQL access
4   Spock Proxy
5   MySQL Proxy
6   Shard Query
  Related pages

1. Overview.

This document is intended for Linux and BSD developers. If you notice mistakes, write to:

“Sharding” is an approach to database organization that involves splitting database tables in such a way that they can be stored, transparently from the perspective of access or at least with reduced effort, across multiple database servers.

MySQL-based ventures may be interested in sharding techniques as it is said that MySQL doesn't scale well. However, MySQL sharding, and possibly sharding per se, may be useful primarily for niche applications:

  1. MySQL works reasonably well for most datasets below very large sizes.

  2. Ventures with very large datasets are moving to non-MySQL frameworks that are designed from the start to scale well.

  3. Sharding has some drawbacks: Higher I/O. “join” operations may not be practical. Backups, server updates, etc., must be coordinated across shards. Changes to application level software may be required.

I started to look at FOSS-based approaches to sharding in late 2012 with a focus on MySQL-compatible approaches. This document discusses steps taken through the most recent pause on December 2, 2012.

Conclusions to date: The FOSS components examined so far don't provide production-ready sharding solutions. However, these components and others that I've read about will probably be useful after some additional work was done.

Shard Query, in particular, is promising. The framework has apparently been tested in a few contexts. I used it myself to set up a virtual MySQL server that consisted of two actual MySQL servers running on different hosts; in fact, in different countries. In a smoke test, the virtual MySQL server seemed to operate correctly.

2. Initial server setup.

The first step for sharding experiments was to prepare a Linux development system, which I'll refer to as Bacon. The goal was to produce a reasonably up to date Ubuntu server that had a LEMP stack and packages suitable for general development.

For the procedure that I used, or a reasonably compatible procedure, click here.

3. Enabling remote MySQL access.

After doing some research, I decided that I'd run tests that involved MySQL servers located on at least two hosts: the development system set up previously, referred to as Bacon, and a second system that would act primarily as a database server. I'll refer to the second system as Eggs.

The idea was that the Bacon system would hold one set of shards and the Eggs system would hold another set.

I installed a MySQL server on the Eggs system and configured it to allow access by remote MySQL clients located on the Bacon system and/or on my personal machine.

For the procedure that I used, or a reasonably compatible procedure, click here.

4. Spock Proxy.

Spock Proxy is a “fork” (or descendant) of another sharding-related package known as MySQL Proxy that I'll discuss shortly.

Spock Proxy has an advantage over MySQL Proxy from a sharding perspective: Spock Proxy is designed specifically to support sharding. MySQL Proxy is not.

I tried to build Spock Proxy on the Bacon system. Both the latest release version and the latest Subversion sources produced multiple “build” errors related to issues such as outdated format, missing header files, and invalid data type conversions.

The errors appeared to be patchable. However, they weren't a positive sign. Additionally, Spock Proxy appeared, in general, to be a single-developer project that had been abandoned.

I concluded that Spock Proxy might need significant effort before it was useful for serious work. I've put the software aside for now but I may revisit it at a later date.

5. MySQL Proxy.

MySQL Proxy is a software interface layer that runs between a MySQL client and one or more MySQL “back end” servers. It can be configured so as to act as a single virtual MySQL server that happens to distribute its data internally across multiple back ends. This makes it a possible candidate for the basis of a sharding framework.

I decided to install MySQL Proxy on the Bacon system. For the procedure that I used, or a reasonably compatible procedure, click here.

6. Shard Query.

Shard Query is PHP-based software that supports MySQL Proxy-based sharding. I decided to install the software on the Bacon system. For the procedure that I used, or a reasonably compatible procedure, click here.

For acknowledgments related to CSS and other code used, click here.

Linked or embedded works (software, books, articles, etc.) are presented under their own licenses.

Other unique content on this page, excluding screenshots, is distributed under the following license: C.C. Attribution NonCommercial ShareAlike 3.0. In some cases, screenshots are distributed under the same license. In other cases, the associated program's license applies.

image Valid XHTML 1.0