Sujay Koduri's Blog: June 2007

Vic (CTO, Vendio), John (NOC Engineer, Vendio) and I have attended the Silicon Valley MySQL meet up that happened at Google’s office yesterday. It was organized by Jeremy Cole who is regarded very high in the MySQL community. (I guess most of you might have heard about him before). The session was mostly a Q&A kind of thing where Jeremy answered all the questions. (Lot of questions must be repetitions for him as we (I think some more guys also!!) are attending it for the first time but he was very generous in sharing things)

Below are the things which I found interesting from the session.

a) Question 1: How good is MySQL to store images? (Same as, how good is MySQL for blobs)

Though there is no straight forward answer on how many images (size??) will MySQL hold comfortably, he is saying that we can safely rely on MySQL if the size of blob is in KB’s If the size of blob is in MB’s, then MySQL may not be right choice. One of the main reasons behind this is that MySQL doesn’t use streaming to send blobs across the network. So if there is a blob of size 1G, then it requires 1G (or more?) on server and 1G (or more?) on client too to effectively transfer the blobs. MySQL is trying to address this problem by implementing streaming in its future releases.

For storing big blobs, the other open source tool that we can really look at is MogileFS. This is a very good system for Blobs it seems.

http://danga.com/mogilefs/

b) Question 2: How good is MyISAM compared to INNODB for applications where the data is loaded only once and don't change for ever?

The one thing that MyISAM suffered from all the time is the regular data corruption. But if we have a backup of the data in the file system which we can use at any time to load it back to the DB w/o taking much of time, we can safely use MyISAM. Also one other way to cope up with data corruption easily is, at the time when we are loading the data to Research DB, try to load the same data into a backup MySQL DB and incase of any corruption we can simply override the corrupted files from the backup.

c) Question 3: How effective using Query Cache would be?

There is one very interesting point that Jeremy has made here. Query cache flushes out a query if the table on which that query operates changes. It doesn't matter whether some thing has changed with respect to the rows it has stored or not. This clearly tells that Query Cache is not an option for any application that does decent writes to the DB. It just adds an additional overhead of searching in Query Cache for every query unnecessarily. Please turn off Query Cache or add the SQL_NO_CACHE hint to all the required select queries, if you still have some compelling reasons to go for a Query Cache.

d) Question 4: What's the best way to go about backups?

Jeremy has suggested using LVM snapshots to take backups from INNODB. From his experience, INNODB hot back tool (Commercial tool from INNODB) is not recommended. From my experience, don't go for Zmanda as it is just a script on top of the things offered by MySQL. It is definitely not worth paying 500$ per license every year for this very basic s/w.

e) Question 5: What's the best way to setup MySQL Replication? (Dual-Master or Master-Slave?)

The straight answer is to go for a Dual Master setup. Two other important things he has made is that, always set read-only=1 and skip-slave-mater=1 on master. We have to this once the master comes up. The advantage of doing this is that, if for some reason master fails and we have switched to slave, and when the master comes back again, it should not accept any connections for write. And other very important point he has made here is that, in case of failure on master, never try to apply the lost transactions automatically rather do it manually by comparing the sequence numbers on slave and master. Its better to spend couple of minutes looking at what went wrong rather than spending sleepless nights if the automated script really screws something. :)

f) General Discussion:

a. Size of bin-logs can be around 768M. (Not too high or too low). From his experience, bin-logs of this size would take around 5-10mins for recovery.

b. Always give minimal space for MyISAM (64M ??) as MySQL internally uses MyISAM to store some system tables and also for creating any temp table, it uses MyISAM.

c. The o/p of show status in 4.* versions gives all global values for variables where as in 5.* to get the same o/p we have to use "show global status". "show status" now gives info only for that session.

d. Always set no-auto-rehash under the client section of the my.cnf. This makes sure that MySQL don’t try to create tab completion indexes for every new client connection.

e. There is a way to look at the life cycle of a query on MySQL. This works from 5.0.37 community and 5.0.42 enterprise onwards. Please go through the below link for extra information on this, http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html .

f. Always try to keep number of files in a directory in hundreds and not more than that as many FS’s including ext3 do a linear search of files inside a directory.

I am really impressed by the way Jeremy has handled the whole session and I am looking for the future events.

Hmm..Have to go back to work. :)

Sujay Koduri's Blog

Wednesday, June 13, 2007

Key points from June MySQL Meetup (MV chapter)

Proud MySQL Meetup member

Blog Archive

About Me