Quote:
The unescaped vertical bars represent the separator for each action phase instance. The escaped vertical bars within each instance represent the separator for the multiple lookup values.
Just so we're on the same page.
I'm getting the escaped vertical bars for attributes with multiple lookup values
within an action section whether there is a single action phase instance
occurrence of the or multiple occurrences.
I'm not sure whether it would be the right thing for you or other MyStuff2
users, but I what if when an action is present if there is only a single action
phase instance any multi-valued attributes didn't have the vertical bars
escaped?
quote]I'm open to suggestions.[/quote]
I've been thinking about possible work-arounds since first posting, as yet i
don't have any good ones :(.
It's a hairy problem and to be sure this is really a CSV problem not a MyStuff2
problem, its just that MyStuff2 so happens to be exposing the ugly corners of
the CSV format.
Quote:
I freely admit that exporting multiple action phase
instances into a single CSV row isn't ideal and when combined with a
multi-select lookup list value it's even more cumbersome.
This bug report is really about the overloading of the multi-valued
delimiter in a nested context.
So, for example, I have a function which reads a line of CSV from an open input
stream (a source) and writes a transformation of that line to an open output
stream (a sink). Each column-value of the CSV line read is split on the field
delimiter (in my case the default `,') the CSV column names are normalized and
then pushed to an intermediate structure resembling this:
(
(column-quux "A-UNIQUE-NAME") ; text
(column-blarg "42.22") ; decimal
(column-baz "$42.42") ; currency
(column-dink NIL) ; text with null value
(column-foo "100") ; integer
(column-fuzz " ") ; text empty
(column-bar "lookup-val-A|lookup-val-B") ; lookup with multiple values
(column-phase "2012-05-24 22:24:23|2012-05-24 22:23:27") ; timestamp
(column-foo "100|100") ; integer
(column-baz "$42.42|$13") ; currency
(column-fuzz "not empty") ; text
(column-bar "lookup-val-A\|lookup-val-B|lookup-val-C\|lookup-val-D") ; 2x lookups with multiple values
...
)
Each column/column-value pair of the intermediate structure is written to the
sink. First the normalized column-name. Next, a pivot occurs around the
normalized column-name which invokes other functions associated with the
normalized column-name where each associated function is designed to transform
the column-value's intermediate representation (above either a string or NIL)
from its MyStuff2 "type" to a native type my application understands.
Note, in the above intermediate structure (which is abridged as indicated by the
ellipsis) the following columns each occur twice, once as an item attribute, and a
second time as an action phase attribute:
column-foo column-fuzz column-baz and column-bar
In a full non-abridged intermediate structure there may be multiple action
phases with attributes sharing a common column-name with the "item level" attributes.
By defining my actions with attribute names common to the items they are
typically attached to, it is possible to maintain a common interface for "types"
of attributes which appear commonly across many of the MyStuff2 categories we use.
As an example, I have an action "clothing-measurement" with three phases which (once
normalized) have the colon prefixed column-names:
:top-measurement-phase :bottom-measurement-phase :dress-measurement-phase
While each of these phases has attributes defined specifically for its purpose
they each share the following colon-prefixed attribute names:
:length-garment
:width-hip :width-waist
:measurement-note
Additionally, two of the phases :dress-measurement-phase and
:top-measurement-phase also share these colon-prefixed attribute names:
:length-sleeve
:width-shoulder :width-bust
This said, not all categories we define require the granular specificity gained
by attaching an action to record a single attribute value.
As an example, we have an action for recording the types of material contained of an article of
clothing.
This particular action has a phase with the normalized colon-prefixed name :clothing-material-phase and the following attributes:
:clothing-material-fabric ; lookup
:clothing-material-fabric-type ; lookup
:clothing-material-ratio ; text
:clothing-material-note ; text
The lookup-list for the ":clothing-material-fabric" attribute allows
multi-valued selections (e.g. a shirt may be of a cotton polyester blend)
It so happens that the values of this "attribute type" are so common that it is
often desired to used it as a lookup-list at an item level independently of an
action.
When this occurs, it is possible to have both an item level occurrence and an one
or more action phase occurrence(s) e.g.:
item A
(
...
(:clothing-material-fabric "cotton|polyester") ;; item level occurrence
...
(:clothing-material-fabric "cotton\|polyester") ;; phase level occurrence with a single occurrence of the phase
)
item B
(
...
(:clothing-material-fabric "cotton") ;; item level occurrence
...
(:clothing-material-fabric "cotton\|polyester|rayon\|polyester") ;; multiple phase occurrences of multi-valued lookups
)
Detecting and accounting for the presence/absence of `|' and `\|' is not
difficult. However, detecting whether `|' was seen in conjunction with `\|' and
reliably knowing what to do when this happens is.
My particular use case requires that each lookup-list value be normalized and
interned in a hash-table which I use to record all unique occurrences of a
lookup-list value used for for a given lookup-list. I do this (in part) to keep
tabs on how and in what context new lookup list values have been added to a
lookup list.
A difficulty arises because although transformation and recording is atomic the
transformation of lookup-list value(s) transpires prior to recording the column
value (multi-valued or otherwise) to the sink. The recording portion is similar
to that of SAX (e.g. reporting each parsing event as it happens) and having
written the normalized column-value to the stream it isn't easy (i.e. it is
difficult to provide a generalizable atomic abstraction) to account for whether
an attribute occurred inside an action phase or at the item level without:
A) providing a backtracking mechanism;
B) setting a global variable indicating that we're inside a multi phase
occurrence;
C) making a second pass over the intermediate structure to check for the
presence of `\|' occurring in an attribute at the action phase level prior
to writing to the sink;
A is ugly because they require storing the intermediate structure longer
and writing accessors to interrogate that structure.
B is ugly because there is not good way to ascertain when we've left an action
phase (e.g. in SAX an XML event ends when the element is closed).
C is most easily accomplished by making a second pass over the output written to
stream. Currently this output is a file on a networked disk and I'd prefer to
avoid the unnecessary i/o of dumping and immediately rereading the dumped file
just to check for nested multi-valued delimiters.
As it is I'm using option C.
Quote:
The current implementation does support round-trip export/import from/to MyStuff2
but I realize it's not ideal for other sources.
Yes, it is most important that CSV round-trip export/import from/to MyStuff2 works.
Thank you for providing a mechanism which does so effectively and
transparently.
It is understandable that MyStuff2 use of CSV does not always easily accommodate other sources/applications.
It should _not_ be a priority esp. if making it one would mean slower rollout of a better solution such as XML or JSON exports.
Quote:
Until another format (such as XML or JSON) is supported, I'm open to ideas on
how to make the CSV format more useful.
I will try to give this "problem" some careful consideration from a user
perspective and let you know if I come up with some reasonable suggestions.