-------------------------------------------------------------------------------
Why are there so many data file representations?

The problem is many people decide that, XYZZY data format, is too complicated,
so they make a replacement only to find out why data files are so complicated.

Specifically datafile usually needs...
  * A "magic number" at the top of the file to indicate format
  * Character encoding (ASCII, Unicode, etc)
  * Extensibility, for different data types
  * Escaping Rules
  * Accommodate multi-line data
  * Parsing and Syntax Rules (schema)

The best idea is for the file to be self-describing instead of looking like
line noise.

But in the end you end up with something as complex as XML and JSON all over
again.

-------------------------------------------------------------------------------
Type of data files by complexity

  Key-Value pairs
    Data forms specific key and value pairs of simple data types

    The many different configuration file formats
    For example using '=', ':', or spaces, between key-value pairs

    As complexity grows you can add...
      Grouping with '[section]' lines.
      Allow the creation of arrays

    Simple forms are basically python, perl, or bash script defining variables.
    But as these are executed such data formats can become dangerious,
    if the file or the data used in the file comes from external sources.

  Array or Table based
    The data forms distinct rows (records) of fixed columns of data

    These MUST be interpreted, usualy by some library,
    and can become very application dependant.

    Ex:  CSV FLIRT

  Hierarchical Tree

    You can embed complex freeform data structures within data structures.
    Some elements may be mssing or as yet undefined.

    These are well knows with standard libraries to read the data.
    But the data being defined can itself become complex.

    Ex:  XML JSON

-------------------------------------------------------------------------------
Parsing freeform records...

For techniques for ASCII reading/handling records (mostly perl)...
See "Record Reading and separation" in "multiline_records.txt"

-------------------------------------------------------------------------------
General UNIX 'text' file conventions

As per "The Art of UNIX Programming"  Eric S. Raymond

  * One record per line (if possible, easier to edit)
  * Less than 80 characters per line (if posible, easier to edit)
  * Use # as a introducer for comments (full line or end of line)
  * Support a backslash convention (for escapes and continued lines)
  * Use colon (passwd), or a run of white space as field separators
  * Do not make distinctions between tab and whitespace!!!
  * Favor Hexadecimal over Octal
  * Do not make compression or binary encoding part of the format.
    (it just makes it hard to read and edit)

'Stanza' Format (more formal UNIX test file format)
  * Multi-line records
  * Use of % or %% on there own as record seperators
    The %% can also act as a comment by ignoring any following text
    This means of course that only intra-record comments can be used.
    Empty records are ignored.
  * One field per line using colon between key and value (as per email)
  * Support some form of line continuation
    Either: \ at end of line   OR  white space at start of line (as one space)
  * Ignore trailing white space
  * Include a version number or self-describing chunks (future proof it)

Finally beware of floating-point round-off problems. Especially when
converting between between: a string representation, and its numerical binary
form.

-------------------------------------------------------------------------------