Date and Time


The Clusterpoint Database can handle date-time information. For JavaScript use a special CP_Date object with specific behavior is constructed when necessary and global functions operating on such objects (and/or strings that can be parsed to form similar objects) are defined.

More specific details below.

Insertion

Upon document insertion every field that is a string gets examined for whether it looks like a date-time info. This can be turned off by adding a remove-type override for the data type DATETIME. Doing so will save time on text fields that are not dates or are not supposed to be treated as such.

The default parsing uses the ISO8601 extended format (hyphens, a 'T' and colons in between the digit groups) and checks that the string forms a legal Gregorian (or proleptic Gregorian) calendar date. Other formats can be specified by modifying the data model---each field can have its own format for parsing the dates it contains. For formats that do not specify the offset from the UTC time explicitly, timezone information may be important.  It, too, can be specified in the data model for each field, or else each node will default to whatever it finds in its /etc/timezone file.
More information about this will be provided below.

If the field has been modified and contains an add-type override for DATETIME, the checking for Gregorian calendar correctness is not performed – the date-time information found in the field (if it parses as a date-time at all, of course) is simply "normalized" (for example, February 31 will be understood as March 2 or March 3, depending on the year).  Internally, the date-time information is converted to "microseconds from the epoch (1970-01-01T00:00:00Z)". This allows expressing years from roughly -291000 to 294000.  (If the date-time information is treated by JavaScript, a conversion to floating point format is performed which may lessen the microsecond precision for years before 1700 or after 2250.  This does not happen for well-optimized queries.)
Fractional digits beyond microsecond are simply dropped from the strings upon parsing (rounding would require information on the best rounding method and is therefore not performed).

ISO8601 extended format

The default format for date-time information parsing is the extended ISO8601 format.  This section explains it through examples, rather than definitions or regular expressions.

The typical date-time information in ISO8601 extended format looks like:
'2015-12-25T11:30:05+02:00' or '+2015-12-25T09:30:05.012345Z'.  

Thus, it obviously contains a 4-digit year, 2-digit month, 2-digit day within a month to denote the date. All these are separated by hyphens. Time components are a 2-digit hour, a 2-digit minute and a 2-digit second (plus up to 6 fractional digits after a decimal dot), separated by colons. The time-part is separated from the date-part by a capital letter 'T'. The date-time parts (where the time part is not even necessary) are usually followed by the offset-from UTC information that typically looks like '+02:00' or '-05:00'. For the UTC time itself, '+00:00' can be used but 'Z' is a common shorter alternative (and is called the "Zulu" time).
Almost all of the components can be dropped (starting from the right) and the field will still be considered valid. 
The last remaining time component (i.e., hour or minute, not just the second) can have a fractional part.  Fractional parts get truncated to 6 digits for the second, 8 digits for the minute and 10 digits for the hour. Only one fractional part can be present---it must be in the last time component used (e.g., no second can follow a minute specification that involves a fraction).
Any missing components will be taken as '00' for time components and '01' for the month or the day-within-a-month.
Although technically legal (when fractional time components are used), strings longer than 44 symbols will not get matched against this format.

Examples:

Dropped offset minutes: '2015-12-25T11:30:05+02'.
Dropped offset (parsing will use the timezone for the field or the /etc/timezone default): '2015-12-25T11:30:05'.
Dropped offset and second: '2015-12-25T11:30'.
Dropped offset and second but minute has a fractional part: '2015-12-25T11:30.25'.
Dropped offset and minute: '2015-12-25T11'.
Dropped offset and minute but hour has a fractional part: '2015-12-25T11.5'.
Dropped offset and time part: '2015-12-25'.
Dropped offset and day of month: '2015-12'.
Dropped offset and month: '2015'.  This would be accepted only if a DATETIME override is added for the field.
For years that cannot be expressed by exactly 4 decimal digits,  a '+' or a '-' sign must be used and the year must contain between 4 to 6 decimal digits, e.g., '-290307'.

Offset information is independent from the date and time parts, thus, the following are also legal:
Dropped second: '2015-12-25T11:30+02:00'.
Dropped second and offset minutes: '2015-12-25T11:30+02'.
Dropped minute: '2015-12-25T11'.
Dropped minute and offset minutes: '2015-12-25T11+02'.
Dropped time part: '2015-12-25'.
Dropped time part and offset minutes: '2015-12-25+02'.
Dropped day of month: '2015-12'.
Dropped day of month and offset minutes: '2015-12+02'.
Dropped month: '2015'.  (Only with a DATETIME add-override.)
Dropped month and offset minutes: '2015+02'. (Only with a DATETIME add-override.)

Specifying a format and timezone

More flexibility with parsing and printing dates can be obtained by specifying a format string. Parsing and printing is linked, i.e., there is one format for both. Likewise for timezone---the same zone information is used for both parsing and printing. This information is first looked up in the data model. The simplest way to add it is to export the data model from the Clusterpoint DB GUI, add a few lines to the necessary field(s) and import it back (for getting rid of this info, one would have to export, edit, import and then restart the DB).
The lines to be added should look similar to these:

<datetime_timezone>America/Los_Angeles</datetime_timezone>

<datetime_format>%a %b %d %H:%M:%E*S %Z %Y</datetime_format>

Please make sure to add them within the <field>...</field> tags but not inside <statistics>...</statistics> or some other sub-tags of field.

The format string may contain everything that strftime() allows---please consult the UNIX manual page for details.
Because parsing and formatting date-time info is performed by the cctz library, the extensions provided by it are also allowed (a quote from its documentation):

%Ez  --- RFC3339-compatible numeric time zone (+hh:mm or -hh:mm)
%E#S --- Seconds with # digits of fractional precision
%E*S --- Seconds with full fractional precision (a literal '*')
%E#f --- Fractional seconds with # digits of precision
%E*f --- Fractional seconds with full precision (a literal '*')
%E4Y --- Four-character years (-999 ... -001, 0000, 0001 ... 9999)

Please be careful with timezone information in the format. It may be useful for printing dates but should not at the moment be used for parsing, as the underlying library ignores it. The proper timezone should be provided explicitly by specifying it for the field (or it will be taken from /etc/timezone). This is only an issue for dates that do not contain the offset-from-UTC information.

By the way, specifying "%s" for the format allows interpreting a string containing a UNIX timestamp as a datetime value. For sub-second precision use "%s.%E*f".

The timezone specification must be unambiguous, for example, "America/New_York" or "Europe/Berlin". Common timezone abbreviations such as, e.g., "CST" are not unambiguous (excerpt from Wikipedia.org):

CST Central Standard Time (North America)  UTC−06
CST China Standard Time UTC+08
CST Central Standard Time (Australia) UTC+09:30
CST Central Summer Time (Australia)             UTC+10:30
CST Cuba Standard Time                                   UTC−05

As a rough guide, anything that can be constructed taking a directory under /usr/share/zoneinfo and a file under that directory will likely pass as a usable timezone.

Using the stored date-time information

Using the fields that have been auto-detected or add-overridden to have the type DATETIME is a little tricky. Depending on the context, either their "microseconds-since-the-epoch" or "formatted-to-string" or even "original string as in the document" value can be used.

For printing, obviously the preference is for a string representation. In a SELECT clause where an '*' overrules the previous mention of a DATETIME field (if any), the string from the original document is used. The same is currently true for the fields containing arrays. Otherwise, the timezone and format specified in the data model would be used for properly formatting the field. If not given, they default to whatever is in /etc/timezone and to ISO8601 extended format, respectively. The conversion of a date-time to a string can also be explicitly requested by using an appropriate function. Information on functions will be provided below. The functions may allow to specify yet another timezone and format, BTW. One can print the numeric value in the SELECT clause by prefixing the date-time field with a '+' sign, or by calling an appropriate function.

In other contexts, for example, in a WHERE clause, a numeric value is usually preferred. I.e., by writing 'WHERE (date > 0)' one can filter the results down to all those documents where a date-time field named "date" contains dates later than the epoch (the New Year of 1970 in London, basically). This is a great way to filter out the documents that don't contain a "date" at all and those where "date" turns out to not be a date-time. However, there is a caveat. JavaScript object comparison rules are complicated and the preference for numeric values is in effect for the '>', '>=', '<' and '<=' comparisons. Testing for equivalence (or its negation) works differently and will typically result in the string value being used (formatted as described above (unless taken from the original document altogether)). Using a unary '+' to force a numeric value is a good method, however, equivalence tests might fail on documents not containing the field or where the field is not a date-time anyway.

Finally, while various tricks can be used to make the JavaScript use one or another representation of date-time-containing fields, they will likely create problems for the query optimizer. Therefore, the use of special functions that do conversions, comparisons and date-time component extraction is strongly suggested.

Functions

The date-time objects have built-in valueOf(), toString() and toJSON() functions.  The first one returns the numeric representation as seconds (including the fractional part) since the epoch, but is usually unnecessary and thus best avoided.  The latter two both end up calling the DATE_TO_JSON() function, which is preferred.

NOW(optional_timezone_string, optional_format_string) --- creates a new CP_Date (Clusterpoint date-time) object corresponding to the current moment in time. The optional parameters merely tie the provided timezone and format to it, so that it can be printed properly.

DATE(optional date_time_number_or_string, optional_timezone_string, optional_format_string) --- creates a new CP_Date (Clusterpoint date-time) object at run time.  If the first argument is a number it is interpreted as the timestamp from the epoch in seconds (including the fractional part, if any). If the first argument is a string, date parsing is performed.  Not really necessary because date_time comparison functions can parse strings and use timezone and format information themselves and date_time comparisons against timestamps are supported. If there are no arguments, it is equivalent to NOW().

DATE_TO_JSON(date, optional_timezone_string, optional_format_string) --- formats the CP_Date object as a string using the provided timezone and format, if any. (Otherwise uses the data model or /etc/timezone and the ISO8601 extended format.)  This will be called automatically when a string representation of a date is needed, so usually unnecessary.

DATE_CMP(date1, date2, optional_timezone_string, optional_format_string) --- compares the two given dates, either of which can be a CP_Date object (bound from a field, for example) or a string. The timezone and format information is for parsing the first two parameters if any of them is a string; it will be of no use for CP_Date objects. It is optional, however, as it can be taken from the other non-string parameter or perhaps is not necessary (e.g., ISO8601 with an explicit UTC-offset).  The return value is the difference between date1 and date2 in seconds (including a possible fractional part); it will be less than 0 if date1 is earlier than date2, it will be greater than 0 if date1 is later than date2, and will be exactly 0 when both dates are equal.  Usually it is more convenient to use the other comparison functions given below.

LOCALTIME(date, optional_timezone) --- interprets the date object according to the provided timezone (taking it from the data model or /etc/timezone, if not provided) and extracts its year, month, day_of_month, hour, minute and second components.  The return value is an array containing these 6 values.  Useful when several components need to be extracted.  Convenience functions for just one particular component are given below.

YQQWW(date, optional_timezone) ---  interprets the date object according to the provided timezone (taking it from the data model or /etc/timezone, if not provided) and computes its day_of_year, quarter, day_of_quarter, week_of_year, weekday values.  The return value is an array containing these 5 values. 
Useful when several of these values are needed.  Convenience functions for just one particular component are given below.

YEAR(date, optional_timezone) --- based on LOCALTIME(), returns just the year.

MONTH(date, optional_timezone) --- based on LOCALTIME(), returns just the month (1--12).

DAY(date, optional_timezone) --- based on LOCALTIME(), returns just the day of the month (1--31).

HOUR(date, optional_timezone) --- based on LOCALTIME(), returns just the hour (0--23).

MINUTE(date, optional_timezone) --- based on LOCALTIME(), returns just the minute.

SECOND(date, optional_timezone) --- based on LOCALTIME(), returns just the second, including the fractional part.

YEARDAY(date, optional_timezone) --- based on YQQWW(), returns just the day of the year (1--366).

QUARTER(date, optional_timezone) --- based on YQQWW(), returns just the quarter (1--3).

QUARTERDAY(date, optional_timezone) --- based on YQQWW(), returns just the day of the quarter (1--92).

YEARWEEK(date, optional_timezone) --- based on YQQWW(), returns just the week of the year (0--53).

The user must note that 0 means that the date is considered as belonging to the last week of the previous year (52 or 53), and that 53 may mean that the date is considered as belonging to the first week of the next year. 
Additional checks are necessary if this is an issue.

WEEKDAY(date, optional_timezone) --- based on YQQWW(), returns just the day of the week (1--7, where 1 stands for Monday and 7 for Sunday).

DATE_EQ(date1, date2, optional_timezone_string, optional_format_string) --- based on DATE_CMP(), returns true if the two dates are equal, otherwise false.

DATE_NE(date1, date2, optional_timezone_string, optional_format_string) --- based on DATE_CMP(), returns true if the two dates are not equal, otherwise false.

DATE_GT(date1, date2, optional_timezone_string, optional_format_string) --- based on DATE_CMP(), returns true if date1 is later than date2, otherwise false.

DATE_GE(date1, date2, optional_timezone_string, optional_format_string) --- based on DATE_CMP(), returns true if date1 is the same or later than date2, otherwise false.

DATE_LT(date1, date2, optional_timezone_string, optional_format_string) --- based on DATE_CMP(), returns true if date1 is earlier than date2, otherwise false.

DATE_LE(date1, date2, optional_timezone_string, optional_format_string) --- based on DATE_CMP(), returns true if date1 is the same or earlier than date2, otherwise false.

SELECT examples with DATE functions

An example of SELECT query with datetime function "YEAR":

SELECT username, reg_date, YEAR(reg_date)
FROM collection
WHERE username.CONTAINS("Kate")

An example of SELECT query with datetime function "DATE_GE" in WHERE part:

SELECT username, reg_date
FROM collection
WHERE DATE_GE(reg_date, "2016/05/16")
ORDER BY reg_date DESC