Date-storing tests in MySQL; Part 2

The results are in. First, there was a little thing I forgot about when describing the test to you earlier. Namely, storage size. The amount of storage required is also another thing to think about.

The relevant sizes of MySQL data types:

  • DATE: 3 bytes
  • INT: 4 bytes

Looking at my 3 ideas, one should be thrown out right away. Having three integers, even reducing them to 2 TINYINT’s (1 byte) and 1 SMALLINT (2 bytes), we are still talking 4 bytes, and thats with a lot of extra complexity. So that was out.

I generated a dataset with a little over 7 million rows, with data looking like:

[sourcecode language=”sql”]
mysql> SELECT * FROM crunchme LIMIT 10;
+————+————+———-+——–+———+——-+
| idcrunchme | a_date | tim_date | a_year | a_month | a_day |
+————+————+———-+——–+———+——-+
| 12111 | 2009-01-01 | 20090101 | 2009 | 1 | 1 |
| 12112 | 2011-03-03 | 20110303 | 2011 | 3 | 3 |
| 12113 | 2008-11-19 | 20081119 | 2008 | 11 | 19 |
| 12114 | 2009-07-16 | 20090716 | 2009 | 7 | 16 |
| 12115 | 2008-10-19 | 20081019 | 2008 | 10 | 19 |
| 12116 | 2009-04-22 | 20090422 | 2009 | 4 | 22 |
| 12117 | 2009-04-03 | 20090403 | 2009 | 4 | 3 |
| 12118 | 2011-06-10 | 20110610 | 2011 | 6 | 10 |
| 12119 | 2009-06-01 | 20090601 | 2009 | 6 | 1 |
| 12120 | 2008-08-24 | 20080824 | 2008 | 8 | 24 |
+————+————+———-+——–+———+——-+
[/sourcecode]

Let’s see what happens when we run some queries!

[sourcecode language=”sql”]
mysql>SELECT SQL_NO_CACHE count(idcrunchme) FROM crunchme WHERE tim_date >= 20100101 AND tim_date <= 20110514;

+——————-+
| count(idcrunchme) |
+——————-+
| 2388242 |
+——————-+
1 row in set (2.96 sec)

mysql>SELECT SQL_NO_CACHE count(idcrunchme) FROM crunchme WHERE a_date >= ‘2010-01-01′ AND a_date<=’2011-05-14’;
+——————-+
| count(idcrunchme) |
+——————-+
| 2388242 |
+——————-+
1 row in set (3.61 sec)
[/sourcecode]

So, the integer solution is faster, by roughly 22% at this level. However, here is where it gets fun. It is larger, by one byte. Also, it will mean having to add extra code when I’m working with dates in my CodeIgniter app.

After all this, I am going to go with the MySQL DATE type. I was surprised to see the time difference being so low.

I hope this helps someone else out!

Date-storing tests in MySQL; Part 1

Data, what fun it is!

My issue is this, I’m working on a web app, which uses CodeIgniter, with a MySQL DB. My question was, since all I am storing for a date time is the year, month and day, what is the best way to do this? I’m going to be making a lot of queries like “select all entries from user bob from March to May of this year”.

As I see it, there are several options:

  1. Use the DATE data type in MySQL, which stores as “YYYY-MM-DD”.
  2. Generate 3 attributes, each an integer, one “year”, one “month”, and one “day”.
  3. My new idea, to store it as an integer, in a “YYYYMMDD” format. It will allow for decent queries, since the hierarchy of time still works. (May still comes before March, 20110301 < 20110501).
  4. Use the time function in php, which returns the number of seconds since 1970. A lot of people go this route, but I think it’s crap because in 2038 it will restart, causing issues . Y2K38 people! I’m going to pretend you didn’t even bring this up.

I’m writing a script that will generate about 10.000.000 rows of data, each with every date recording option I mentioned above, then I will make some pretty bar graphs and we shall see what we shall see.

My guestimate is that my integer idea (“YYYYMMDD”) will be the fastest, but let’s see.

Maybe I don’t need to rush into the Cloud

The Cloud is everywhere it seems. And this is okay. There are times where it is so useful! For example, I love Rackspace.

Recently though, I’ve been thinking about where everything is going, how everything “needs” to be in the cloud. Specifically, music and files. The problem is, these services are not guarantees, so to “be safe” a user would want to have there data in several places. Have pictures on Flickr? Use the API to mirror it to a privately hosted site. Data? Better schedule to pull it down occasionally in case of an issue with the company.

Then, I had an “ah ha!” moment. The cloud is great, but if you want access to files,pictures, music, etc everywhere, isn’t it easier to use a USB flash drive? I realize it isn’t the new hotness, but it kind of does the job pretty well. I’m not saying the Cloud doesn’t have it’s uses. But maybe remember the modest USB flash drive, and what it can do.

Network connectivity issues, a baby, and books

I have spent all day at work troubleshooting a network issue on my machine. Finally have it working, though my method of fixing it isn’t exactly the best.

So frustrating to lose an entire day to a problem like this.

And tomorrow, a large portion of my day will be spent moving to a new cubicle. Doh! Well, maybe next week I can get some things done here at work.

Regarding personal projects, they are at a standstill at a moment. The arrival of our little boy A has been wonderful. It also requires a lot of work! It’s a delight to be with him, and watch him observe more of the world around him. The pooey diapers are just a bonus.

I have been doing a little reading. I finally got around to reading “I, Robot” by Isaac Asminov, and I enjoyed it. Encourages me to work on my short stories, since that is mostly what that book is comprised of. Just because it’s short doesn’t mean it isn’t worth doing!

Now I started “Atlas Shrugged” by Ayn Rand. About 60 pages in, and it’s making me want to play Transport Tycoon. Also, there are portions of her philosophy that I can agree with.