Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228
Benchmarking filestorage in MySQL

Comments Blog About Development Research Sites

Benchmarking filestorage in MySQL

Jul 16, 2010
It is generally considered bad practice to store binary data in a database. There are exceptions, of course, but usually files should stay on a filesystem. In high performance scenarios it pays to put them on a static host with a special lightweight webserver and all the client caching directives you can find activated.

Since there is a lot of speculation going on, I figured I would run some benchmarks and get some hard data on how long it actually takes to retrieve binary data through various methods.

Test Environment
The test-data consists of 9 images ranging in size from 891 bytes to 55.242 bytes, with the largest image repeated to make a total of 10 files. These are taken at random from a website to keep the dataset as real as possible.

Each request is made through the webserver, running Apache2.2 with PHP5.2 installed as module. Both MySQL (5.1.40) and Memcached (1.2.1) interfaces are installed as Apache modules as well. The operating system used is Windows Server 2003, on a Duron1800+ CPU with 2Gb memory and a RAID1 SATA disk array.

The files are stored as ID's in the same folder as the PHP scripts. The MySQL table uses innoDB as storage engine and contains an 'id' column (primary key index), for memcache data is stored as id => bindata pairs.

Tests are performed using a PHP script that requests each of the ten images in turn using file_get_contents, for 100 loops. Thus a total of 1000 file requests are made per retrieval method. Results are averaged over several runs to reduce noise in the data.

Non-persistent Database
A connection is opened to the database, the correct database is selected and a very simple query is used to retrieve the image data and output it. The connection is closed at the end of each run: this should also happen automatically at termination but this ensures no other connection caching mechanisms interact. Furthermore, the example scripts on php.net always close the connection as well.
Code (php) (nieuw venster):
1
2
3
4
5
6
$link   = mysql_connect('127.0.0.1', 'user', 'pass');
          mysql_select_db('media', $link);
$result = mysql_query('SELECT * FROM `images` WHERE `id` = ' . (int) $_GET['id'], $link);
$row    = mysql_fetch_assoc($result);
echo
$row['data'];
mysql_close();

Average run time: 19.692s, 19.438s, 19.429s.

Persistent Database
Similar to the normal database retrieval scenario, using persistent database connections instead of non-persistent ones. Mostly included to give a measure of performance difference between normal and persistent connections in high-load scenarios.
Code (php) (nieuw venster):
1
2
3
4
5
$link   = mysql_pconnect('127.0.0.1', 'user', 'pass');
          mysql_select_db('media', $link);
$result = mysql_query('SELECT * FROM `images` WHERE `id` = ' . (int) $_GET['id'], $link);
$row    = mysql_fetch_assoc($result);
echo
$row['data'];

Average run time: 18.281s, 17.841s, 17.906s.

Webserver direct filesystem access
Using no PHP script, the files are requested directly through the webserver. This emulates a scenario where a fileserver is used for static content, though with Apache instead of a more lightweight webserver (such as LightTPD) and no client-cache or content compression.

Average run time: 14.238s, 14.088s, 14.102s.

Filesystem through PHP
To test the effect of using a PHP script to load and output a file, a simple script is used to load and output a file's contents. The readfile method directly outputs the filecontents to the output buffer.
Code (php) (nieuw venster):
1
readfile((int) $_GET['id']);

Average run time: 15.659s, 15.883s, 15.647s.

Memcache
Minimum overhead can be achieved by loading data directly from the servers memory. Memcache allows us to do just that, though of course there is a cost in opening a connection to the memcache daemon. The script opens a connection and retrieves a (previously) stored value containing the binary data.
Code (php) (nieuw venster):
1
2
3
$memcache      =  new Memcache;
$memcache      -> addServer('127.0.0.1', 11211);
echo
$memcache -> get((int) $_GET['id']);

Average run time: 16.290s, 16.354s, 16.300s.

Summary
Taking the average for each retrieval method we obtain the following result:



Interesting points to note:

Opening and closing a thousand database connections takes about 1.5 seconds.
Yes, persistent connections are consistently faster in multiple-request situations, but the difference is only about 8% in the most ideal situation (one very simple query per request), or 0.0015 second per request on average.
Invoking a PHP script is quite costly.
It takes 11% longer to retrieve a file through PHP instead of using Apache directly.
Somewhat surprisingly: filesystem access is about 4% faster than memcache.
Though this might be due to connection overhead, it would be interesting to see what happens with several file retrievals per request.


It should be clear from this that in very simple situations, the added cost of invoking a database connection (or even using an already established connection) results in a significant delay in response time. What is perhaps even more important, there appears to be no way to beat filesystem access. Even loading files directly from memory is slower than reading them in through PHP! Suffice to say, using a database for (simple) file storage is probably not the best idea ever if you want speed.

Update
Just to clarify: these results are limited to a single DB architecture on a single system. I'll be interested to know if anyone can get different results - so please post if you do! The test scripts I used, including image data, can be found here.

FragFrog out!

Jul 19, 2010 Vynsan

Thats actually quite interesting, how would you speed things up a little more though?

For example, say if your sever was required to work under a heavier load, what would be the ideal setup to use, well maintaining secuirty and such?

I always thought, php would be the best for requesting files, but as you've shown, it isn't, wouldnt ftp be a little faster? Or maybe even running the request location from a small ram disk, after all ram is the fastest storage around at the moment (apart from ssd ofcourse)

Regards,
Vynsan

New comment

Your name:
Comment: