Tcl Source Code

View Ticket
Login
Ticket UUID: a1bd37b7198f7bf51e8f1f554435045bf6cc8b68
Title: clock (free)scan of ISO 8601 timestamp with literal T behaves strange
Type: Bug Version: any
Submitter: sebres Created on: 2020-05-20 13:17:00
Subsystem: - New Builtin Commands Assigned To: sebres
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2020-07-08 15:55:02
Resolution: None Closed By: nobody
    Closed on:
Description:

It looks like clock free scan recognizes literal "T" in (undocumented?) ISO 8601 timestamp incorrectly, I guess as a legacy (military?) timezone "T".

# ISO:
% clock format [clock scan "20200520 14:40:00" -gmt 1] -gmt 1
Wed May 20 14:40:00 GMT 2020
% clock format [clock scan "20200520T14:40:00" -gmt 1] -gmt 1
Wed May 20 14:40:00 GMT 2020

# ISO-similar format: % clock format [clock scan "2020-05-20 14:40:00" -gmt 1] -gmt 1 Wed May 20 14:40:00 GMT 2020 % clock format [clock scan "2020-05-20T14:40:00" -gmt 1] -gmt 1 Wed May 20 21:40:00 GMT 2020 % clock format [clock scan "2020-05-20 14:40:00 T" ] -gmt 1 Wed May 20 21:40:00 GMT 2020

This seems not correct in my opinion (despite the documentation says about literal "T" by ISO without "-" separator in date only, recognize "T" as timezone looks weird to me).

I think either we must aggressively strict the timezones bison/yacc rules (must consider word boundaries or something similar) or ISO format must be extended with YYYY-MM-DD date.

User Comments: sebres added on 2020-07-08 15:55:02:

Sure, I think to have a better solution, but since it is based on my newest improved branch, this were a bit "larger" change (but as already said - better)... I'm still unsure about the fix using a cherry-pick from clock-speedup branch, because that would also imply several things (e. g. pre-requirements for positive time zone fix [5019748c73], etc). Surely I was also planning to prepare clock-speedup to merge it at some point (I still hope to fulfill that), but not yet now, since as discussed in tclchat, TCT (Kevin) waiting for a description about all the changes. But such partial cherry-pick could indeed have some pros for me (for instance free-scan would have few conflicts by scanning as well as few differences to speedup-branch in code hereafter).
So if I'd not provide my variant this week, feel free to merge Christian's solution.


jan.nijtmans added on 2020-07-08 14:46:17:

I would like to see this fix in Tcl 8.6.11. Any objections? Otherwise I'll do the merge in a few days!

@sebres, if you have additional suggestions, don't wait too long. I like chw's solution.


jan.nijtmans added on 2020-06-23 06:38:20:
> by rebase I noticed that several changes, you've made at some point directly in "tclDate.c", are inconsistent with current "tclGetDate.y"

Indeed, Thanks!  I think I corrected that now.

sebres added on 2020-06-22 18:52:20:

I rebased my tclclockmod with fix for that [4f064d306d] to the tcl-core.
I'll try to provide a fix (cherry-pick that) for 8.6 branch.

@Jan: by rebase I noticed that several changes, you've made at some point directly in "tclDate.c", are inconsistent with current "tclGetDate.y" (and so will be lost if you rebuilt it with yacc/bison, e. g. using make gendate). I guess they must be back-ported to .y-file.


chw added on 2020-05-27 16:03:51:
Or alternatively, use my variant which deals with "T" vs other +7 timezones
by numerically classifying military tz names vs others in the second attachment,
which is a patch relative to bug-a1bd37b719 branch.

sebres added on 2020-05-27 15:38:05:

Well, in-between I found basically more things there, that must be fixed now:

  • mistakenly recognizing of other literals (not T only) between digits inside of some digits groups (without space or word-boundaries); also other timezones (like MST/WAST) have 7 as GMT offset, so then will be wrong considered as ISO in this case.
  • parsing of integer (in lexer) as tNUMBER or tISOBASE is not restricted by the max length, so it can overflow to "valid" integer (and so can cause "correct" parse by totally incorrect string input), etc pp;
  • missing tests covering failing conversion for both above mentioned cases;
  • yacc/bison conflicts affected by issue must be reduced;
I tried to fix it in my re-implemented tclclockmod firstly (where it is already more strict as I already have some lookahead's to properly recognize a timezones) and I could better debug there, see comparison in branch iso-TZ-bug-21.
I'll try to rebase the patch to tcl-core, if I'd get it ready there.


jan.nijtmans added on 2020-05-27 14:01:16:

Proposed fix committed in a branch here: [f0ea6d62c12fcc40].

With test-case and documentation, what more do we want... I like it!


chw added on 2020-05-27 09:34:19:
sebres, you're right.

What about adding a 4th free scan format

  CCyy-mm-ddThh:mm:ss

like in the attached patch against core-8-6-branch?

oehhar added on 2020-05-26 06:25:00:

Sebres, Christian, thank you for constantly caring about TCL.

I can say that the format "2020-05-20T14:40:00" is a format I have often to deal with, as some SQL servers return this format for timestamps. I always scanned it with a detailed pattern and I am happy that there is a native support.

I would call it a bug, when the T is interpreted as a time zone indicator and not as the fix ISO separator.

Thank you, Harald


sebres added on 2020-05-25 10:11:40:

> Citing the "FREE FORM SCAN" section in clock(n) from core-8-6-branch

What you're trying to tell me with?

I wrote already that despite it is documented, one can not say that the behavior by format "YYYY-MM-DDThh:mm:ss" (that is also described as part of ISO 8601) is correct.

There are only two possibility as I suggested above, either to accept it as ISO (to extend the format), or to restrict TZ recognition rules (like word boundaries), so scan would reject it as invalid input.

> it seems to be some undefined behavior you're observing here regarding the presence of the -gmt option.

No, gmt option is only to illustrate that normally no TZ conversions are used by format and scan (and it is ignored if TZ specified in input string). And it has basically nothing with this bug. As already said the free scan considering T literal as a military time zone.


chw added on 2020-05-21 11:18:14:
Citing the "FREE FORM SCAN" section in clock(n) from core-8-6-branch

   ISO 8601 point-in-time
      An  ISO  8601 point-in-time specification, such as “CCyymmddThh‐
      mmss,” where T is the literal “T”, “CCyymmdd hhmmss”, or “CCyym‐
      mddThh:mm:ss”.  Note that only these three formats are accepted.
      The command does not accept  the  full  range  of  point-in-time
      specifications  specified in ISO8601.  Other formats can be rec‐
      ognized by giving an explicit -format option to the  clock  scan
      command.

it seems to be some undefined behavior you're observing here regarding
the presence of the -gmt option.

Attachments: