Monday, January 06, 2014

Time intervals and other ranges should be half-open

It is a good practice to treat time intervals as half-open inequalities: start <= x < end. Note the asymmetry.  This is how the Joda-Time API implements time intervals, and also how the SQL "overlaps" keyword works.  Also note that SQL "between" does not behave the same way.

The main reason this is important is to allow adjacent time intervals like 10:00-11:00 and 11:00-12:00, such that the instant of 11:00:00 falls in the second interval, but not the first.

It is a mistake to try turning the intervals into 10:00-10:59, etc. An instant like 10:59:01 would fall through the cracks between intervals, and end minus start would be 59 minutes rather than an hour.

With date or timestamp ranges, the same logic applies.  But there's a key difference: while end-users tend to think of time ranges intuitively as half-open, they often tend to think of date ranges as closed.  That is, users expect 10:00-11:00 and 11:00-12:00 to be adjacent and non-overlapping even though 11:00 is the end of one range and the start of another.  For dates, though, users tend to think of ranges like Jan 1-Dec 31 inclusive of the last day of the month/year.

So what to do?  There are two workable solutions.  One is to apply the same half-open inequality as before, adding a day to the user-specified end date.  So you would have something like 1/1/2013 <= x < 1/1/2014. Another is to truncate the input under test: 1/1/2013 <= trunc(x) <= 12/31/2013.  This gives the same results, because any instant up to and excluding 1/1/2014 will be included in the range.  The first approach is better because you don't have to remember to strip the time portion off of x before testing.

A wrong answer is to do something like 1/1/2013@midnight <= x <= 12/31/2013@23:59:59.  This assumes you know the exact precision of x, and there's a risk of a moment of time falling through the cracks again.  It's also just icky.

Integer ranges where the input under test is a decimal behave like dates and times, with the same options. Currency is a good example: you can use half-open ranges such that $100-200 and $200-300 are adjacent and non-overlapping and $200.00 falls in only the second range.  You could also have the user specify closed ranges like $100-199 and $200-299 and do the same thing with floor(x) or start <= x <= end + 1.  What you can't do is let $199.01 fall between the two adjacent intervals or let $200.00 match both ranges.

Incidentally, it seems like even when we're dealing with pure integer ranges, the common practice in Java is to use half-open intervals for things like substrings, and Python does the same for array slicing.  This seems to be the best practice for programming in general, for readability and minimizing chance of mistakes (see http://stackoverflow.com/questions/8441749/representing-intervals-or-ranges).

No comments: