Ensure bu_scan_fastf_t() supports % as a delimiterBRL-CAD
Status: ClosedTime to complete: 100 hrs Mentors: Hardeep Singh Rai, HarmanpreetTags: c, code

This task is a follow-on to https://www.google-melange.com/gci/task/view/google/gci2014/5048599469621248

There is not a bu_scan_fastf_t() function in BRL-CAD but it needs some cleanup work. In particular, the behavior of the delimiter argument needs to be consistent with the behavior of strtok() in that any delimiter character(s) encountered are merely skipped between fastf_t values. This will likely entail removing the current scanf/sscanf logic that uses %n to get over the delimiter as a string.

Modify the implementation to match the header documentation. See the implementation file's TODO comment for more details too. Submit your changes as a patch file.

References:
  • include/bu/log.h
  • src/libbu/scan.c
Modify:
  • src/libbu/scan.c
Uploaded Work
Comments
Andromeda Galaxyon January 8 2015 15:19 UTCTask Claimed

I would like to work on this task.

Ch3ck on January 8 2015 15:22 UTCTask Assigned

This task has been assigned to Andromeda Galaxy. You have 100 hours to complete this task, good luck!

Andromeda Galaxyon January 8 2015 15:25 UTCReady for review

The work on this task is ready to be reviewed.

Sean on January 8 2015 15:53 UTCtested?

It looks like you nibble off and lose the first character of the next fastf_t if src is stdin?  It also looks like this won't handle interleaved delimiters like delim=" ," parsing "1.0 , 2.0  ,  3.0".

Sean on January 8 2015 15:54 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Andromeda Galaxyon January 8 2015 15:57 UTCtestedness

I've tested it with strings, let me quickly re-test it with stdin... what do you mean by interleaved delimiters?

Andromeda Galaxyon January 8 2015 16:03 UTCinterleaved delimiters

I thought that the original intention for the code (and what happened when using scanf() for the delim scanning) was that if the code didn't find a delim that was *exactly* the string passed in, it should fail...

Andromeda Galaxyon January 8 2015 16:50 UTCBacktracking

This version implements a balance between matching exactly the spaces in the delimiter and allowing extra spaces to surround it (post-delimiter spaces are always allowed because scanf() with '%ld' skips spaces): if no spaces are specified, leading spaces are allowed, otherwise exactly as many spaces as are specified must be present.  I'm finishing up another patch that backtracks to allow any number of spaces regardless of the number in the string that should be ready in a couple of minutes but I wanted to put this one up first since i do think that it implements a nice balance in allowing certain delims to specify how many leading spaces are allowed.

Andromeda Galaxyon January 8 2015 17:07 UTCReady for review

The work on this task is ready to be reviewed.

Andromeda Galaxyon January 8 2015 17:09 UTCExtra leading spaces

The -v2-allow-extra-leading-spaces.patch file includes scanning extra leading spaces by stripping spaces off of the delimiter string and then later checking that there are at least as many as necessary.

Sean on January 8 2015 18:10 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Sean on January 8 2015 18:27 UTCwrong direction... ;)

You had it closer the first time.  The problem isn't just spaces, it's with any combination of delimiter characters.  The intention is to match strtok()'s behavior **, not strsep() or strstr().  What you originally implemented was *similar* to strstr() as the tokenizer behavior but you even had some differences there as you noted.


Forgetting for a moment how it was or even is, just consider the header documentation and it tells me that I should be able to construct a function call where delim is set to "!@#$%^*(){}[]:;," for example, to match most punctuation characters.  This would let me parse numbers that are delimited by any of those (which you handle), e.g. input string: "1.0,2.0;3.0$4.0" but also other forms (which aren't handled), e.g., input string: "1.0**,$2.0!%,%$3.0".


 Basically, you need a loop within in a loop.  You want to look at the next char, check if it's any of the delimiter chars and if it is, skip it.  If it's not and you reach end of input, you're done.  If it's not but you're not at end of input (i.e., a non-delimiter char), then you need to keep that around for the next main loop iteration.


This is going to be really complicated if you use getchar().  You really want to read input from stdin at the beginning and use that as if it were passed in as the string, so you can peek ahead without worrying about blocking I/O, SIGPIPE, and other ways parsing can go catastrophically wrong.


** Recommend reading their man pages (strtok, strsep, strstr) if you haven't yet... they are ancient fundamental string handling functions in C with well-defined behavior.


 

Sean on January 8 2015 18:31 UTCa better example

A better (and real) parsing example might be reading numbers from a fixed-width column data format that pads fields.  Example:


|column1|column2|column3|
|____1.0|__+12.0|__123.0|
|___45.0|____2.0|1234567|

After the header column, I should be able to extract the latter rows with delim="|_+".  Make sense?

Andromeda Galaxyon January 8 2015 18:33 UTCInput format

While that does make sense, it isn't the pattern that I originally identified in the sources and originally wrote scan_fastf_t for; that assumed that the code had some strict delimiter string (often ", ") that needed to be found between fastf_ts.

Andromeda Galaxyon January 8 2015 18:39 UTCOriginal API specification

Here's the original header comment that shows what the function was *supposed* to do:


/**


 * Scans a sequence of fastf_t numbers from a string or stdin


 *


 * Scanning fastf_t numbers with bu_sscanf() is difficult, because


 * doing so requires scanning to some intermediate type like double


 * and then assigning to the fastf_t variable to convert the value to


 * whatever type fastf_t really is.  This function makes it possible


 * to scan a series of fastf_t numbers separated by some delimiter


 * easily, by doing the required conversion internally to the


 * functions.


 *


 * @param[out] c The number of characters scanned by the function


 * @param[in] src A string to scan from, or NULL to read from stdin.


 * @param[in] delim The delimiter between values on the input


 * @param[in] n The number of fastf_t values to scan for


 *


 * Returns the total number of characters consumed while scanning the


 * values.                                                            


 */

Andromeda Galaxyon January 8 2015 18:40 UTCReady for review

The work on this task is ready to be reviewed.

Andromeda Galaxyon January 8 2015 18:51 UTCLiteral string rationale

The rationale for doing the parsing this way is that most of the uses of *scanf() scan numbers separated by some specific delimiter string (often just " " or ",") as a single scanf() format string (e.g. "%lf %lf %f" or "%lf,%lf,%lf", so this behavior makes it seem transparently as if bu_scan_fastf_t is actually doing the same thing (building up a scanf() format string of n %lf specs with delim in between each).

Andromeda Galaxyon January 8 2015 19:55 UTCstrtok behavior patch

I've added a patch that reads in a line of input at the beginning and uses more strtok()-like behavior in reading delimiter chars (-strtok-behavior.patch).  Since I wasn't entirely sure what you meant by "read input from stdin at the beginning", I just read in a whole line, which seems like it should work for all the current use cases in the code to me.

Sean on January 9 2015 06:52 UTCgot it

Yes, I fully understood your original write-up and rationale. :)


When creating new public API, we have to consider more than just what the current code is doing, but how future code will be able to utilize the function.


While you identified a common pattern, having a function that only looks for a specific substring between values is somewhat limited in general applicability.  There are instances throughout our code where the format is definitely not rigid.  There are also cases where we currently scan them rigidly, but they should/need not.  Even the existing scanf instances aren't technically rigid due to how it handles whitespace, but then your later version aimed to address that by nearly mimicking scanf behavior (which is a fair bit more complex).


So knowing that there are other instances (even if they're minority, but numerous), we would conceivably end up with two API calls (one rigid and one not), which is bad for a utility library API that is already several hundred function calls too many (for this exact same reason, missing design forethought and only reacting to patterns).  The question becomes whether having a generalized delimiting function would harm any that are assuming a rigid format.


My assertion -- and reason thus for editing the objective to define it as a sequence of chars to skip ala strtok() -- is that any rigid scanning will not be adversely affected.  If any code needs to be rigid, this function can even accommodate that (probably by scanning one at a time sans delimiters, then strncmp the delimiter string).


We can talk more about this on IRC later and the interface will almost certainly change as it gets applied to more scanning cases.  The goal is to encapsulate all fastf_t scanning in just one function, which I'm sure will get trickier than this still.


Your latest -strtok-behavior.patch looks spot on.  I like how you strchr the next chars against the delimiters.  Clever.  (So clever, it probably warrants a one-line comment to say what that loop is doing!)


Nicely done!

Sean on January 9 2015 06:52 UTCTask Closed

Congratulations, this task has been completed successfully.

Andromeda Galaxyon January 9 2015 15:18 UTCmisunderstanding

That makes more sense. Unfortunately, I didn't realize at first that you'd changed the header comment intended the behavior to be different from what we layout out as the header comment for the original task, which is why I kept trying to do it the original way... this makes more sense.  I can also definitely add a quick comment about the strchr() call before committing, glad you like it!