lsp: lists the species and selected parameters defined in PHREEQC-format database files.

Usage: lsp [-a] [-b=] [-c[=]] [-co=] [-d] [-e] [-f=] [-f1[=]] [-f2[=]] [-f3[=]] [-g=] [-h] [-i] [-k[=]] [-l] [-n[=]] [-o] [-p] [-pre=] [-q] [-r] [-s] [-t] [-v] [-w=] [-x=] "<dir|file1>" ["<file2>"] [> "<file3>"]

Options are mostly lowercase letters/numbers preceded by a hyphen, sometimes with an argument
following an = sign. Use double quotes to enclose all arguments - the quotes will be removed.
Single quotes are treated literally. The options are case dependent.

Switches controlling the file selection and file output:

 -a     analyze all text files found (other than those specifically excluded with -x).
        Otherwise, the default is to focus on normal database files by ignoring any files
        with the extension .pqi, .pqo, and lsp.* (or its custom extension).
        Only text files containing a SOLUTION_MASTER_SPECIES block will ever be analysed.

 -c[=]  line-by-line comparison of <file1> and <file2>. The default is to reduce to -f2 (filtered)
        format for the comparison but if -f1 or -f3 options are included then these file formats
        will be used for the comparison and the files saved. Two files only: no wildcards allowed.
        Use -o for direct comparison in the original file format.
        The default level of detail used is -d. Specified options, if any, apply to both files.
        By default, 'WinMerge' is used if installed or else the Windows 'fc' app is used.
        For a full installation of WinMerge and documentation, see https://winmerge.org/.
        'fc' is fairly similar to the Linux 'diff' app that also comes with this distribution.
        See -c=<exe> and the Examples below for specifying the file comparison app to use.
        -c=<exe> where <exe> is the path to any file comparsion excutable which has a command line:
           <exe> <options> file1 file2
        -co=<options> where <options> are a string of options passed to the file comparison app.
        Enclose in double quotes if necessary. Reminder: only compares proper database files.
        Illegal options with -c: -a, -k, -r, -v, -w, -x.

 -f1[=<ext>]
        create original <file1> in raw format after removing comments, extra spaces etc.
        Optionally include -f1=<ext> and -pre=<prefix> to redefine the extension and output
        location: <prefix>filename<ext>. Default <ext> is "lsp1" and <prefix> is the current
        directory.
 -f2[=<ext>]
        create filtered file from <file1> after filtering.
        Optionally include -f2=<ext> and -prefix=<prefix> to redefine the extension and output
        location: <prefix>filename<ext>. Default <ext> is "lsp2" and <prefix> is the current
        directory. Use the -f and -b filters for selecting the species and blocks to include,
        e.g. -f="Alkalinity|H|O|E|Na|K|Ca|Mg|N|S|Cl|C". Add -t to tidy the equations and 
        remove non-significant digits from numbers. Use -k to preserve comments (see -k).
        The database produced is checked for errors.

 -f3[=<idents>]
        create spreadsheet-style file where <idents> is optionally a comma-separated
        list of column names given by their identifiers, e.g. "line,logk,ae,dh,eq".
        "*" means all remaining identifiers in alphabetic order.
        All <idents> are converted to lowercase; case is therefore not significant.
        No <idents> is the equivalent of "*" and will list all identifiers.
        The level of detail must be at least -s for output.
        The block number and 'species' columns are always prepended to the list.
        The comma-separated output file location is: <prefix><filename>.lsp.csv where
        <prefix> is derived from the -pre= option and <filename> is the name of
        the original db file (without extension). The normal filters for selecting
        the species and blocks apply, e.g.
              lsp -f3="line,logk,ae,dh,eq,*" -b=3-6 -e -t "wateq4f.dat"
        or for a full listing,
              lsp -f3 -e -t "*"

 -o     line-by-line comparison of two files in their original format otherwise in lsp format.

 -pre=  prefix. Directory for the -f1, -f2 and -f3 output files. The trailing file
        separator is optional. The directory must already exist. Default is the same directory
        as the file being analysed.

 -r     recursively searches subfolders of <dir|file1>. Default is off.

 -x=    -x="<dir|file>" exclude these files from analysis using regex patterns (Windows) or
        wildcards (Linux).

        Regular expression quick reference:
        .        a single character
        *        zero or more occurrences of previous character or class
        ?        match zero or one of the previous character
        ^        beginning of line
        $        end of line
        .*       match any string of characters
        []       contains a class or set of possible characters

        Wildcard quick reference:
        ?        a single character
        *        zero or more occurrences of previous character or class

        If more than one pattern is wanted (OR), separate by a comma, e.g. -x=".doc.*,.csv$"
        to exclude files ending with .doc, .docx or .csv extension.
        .pqi and .pqo files are automatically excluded unless the -a option is used.
        The files created with the -f1 and -f2 options are also automatically excluded.
        The default is that no other files are excluded from consideration.

 Switches controlling the output:

 -b[=]  only print species from specified keyword blocks where
        <block_number> ranges from 1-20:

        1  = SOLUTION_MASTER_SPECIES_(primary)
        2  = SOLUTION_MASTER_SPECIES_(secondary)
        3  = SOLUTION_SPECIES
        4  = PHASES_(non-gas)
        5  = PHASES_(gas)
        6  = SURFACE_MASTER_SPECIES
        7  = SURFACE_SPECIES
        8  = EXCHANGE_MASTER_SPECIES
        9  = EXCHANGE_SPECIES
        10 = SOLID_SOLUTIONS
        11 = PITZER
        12 = SIT
        13 = ISOTOPES
        14 = ISOTOPE_ALPHAS
        15 = ISOTOPE_RATIOS
        16 = RATES
        17 = NAMED_EXPRESSIONS
        18 = CALCULATE_VALUES
        19 = LLNL_AQUEOUS_MODEL_PARAMETERS
        20 = END

        Multiple blocks can be given by a list of integers (no spaces) separated by commas. Ranges 
        indicated by hyphens (no spaces). Enclose in double quotes, e.g. -b="1-6,10,12". "*"= all.
        Default is to print all populated. -b="0" outputs a list of the files analyzed. -b alone
        prints a list of the block names.

 -d     adds more detailed information to the -s option. If available, this includes the log_k 
        value and an indication of whether a non-default analytical expression (a_e) or delta_h (dh)
        value has been defined.

 -e     adds the defining equation for species formation to the -l option.
        -t also tidies the equation by removing redundant digits.

 -f=    -f=<filter> filters the 'species' name (case sensitive unless switch -i is set).
        A simple form of regular expression matching is supported:
        .        Dot, matches any character
        ^        Start anchor, matches beginning of string
        $        End anchor, matches end of string
        ?        Question mark, match zero or one (non-greedy)
        *        Asterisk, match zero or more (greedy)
        +        Plus, match one or more (greedy)
        [a-zA-Z] Character ranges, the character set of the ranges { a-z | A-Z }
        [abc]    Character class, match if one of {'a', 'b', 'c'}
        [^abc]   Inverted class, match if NOT one of {'a', 'b', 'c'}
        \d       Digits, [0-9]
        \D       Non-digits
        \w       Alphanumeric, [a-zA-Z0-9_]
        \W       Non-alphanumeric
        \s       Whitespace, \t \f \r \n \v and spaces
        \S       Non-whitespace

        e.g. -f="U.*CO3" for all uranyl carbonate species defined in that order.
        The pattern can specify all or part of the target string.
        e- is known as E-. Default without -f is no filtering. If the filter is to apply 
        to PHASES and you want to filter on the phase formula rather than the phase name,
        use -p. You can use multiple filters separated by a separator: ',' means 'or', '&'
        means 'and' while '|' defines a set of available 'elements', i.e, any of but
        no others. The last of these finds all species that can be derived from a given set of
        elements. It does not support regexpr. It also ignores the following characters:
        ()[].:+-_0123456789 as well as terminal descriptors between parentheses such as ...(s),
        ...(aq) etc. '*' relaxes the 'no others' condition to 'and any others' but
        requires all of the explicitly specified 'elements' to be present. So B|F|* will
        select all species containing only B or F, or B, F and any other 'element(s)'.
        The | filter applies to species/phase formulae not to their defining equations
        or to phase names. So if the species list is: NaCl, KCl, KOH, NaOH, NaOCl, NaClO4
        -f="Na,OH"     selects NaCl, KOH, NaOH, NaOCl, NaClO4
        -f="Na..."     selects NaClO4 and NaOCl
        -f="Na&OH"     selects NaOH
        -f="Na,Cl,O"   selects KCl, KOH, NaCl, NaClO4, NaOCl, NaOH
        -f="Na|Cl|O"   selects NaCl, NaClO4 and NaOCl
        -f="O|H|*"     selects KOH and NaOH
        -f="Na|"       selects nothing
        The escape (\) character is also supported. No mixing of separators is allowed in
        the filter string. Always enclose the filter string in double quotes to avoid
        special interpretation by the OS or shell. Symbols such as |&*^ can have a 
        special meaning if the string is not quoted.

 -g=    -g= temporarily changes codepage to this value (Windows only).

 -h     comprehensive help (ignored unless given simply as lsp -h).

 -i     makes the -f filter case insensitive.

 -k[="above"|"below"]
        include the original comments in the f1 and f2 output files.
        By default (no -k), all comments and blank lines are removed. 
        All lines following the first END are considered as comments.
        Because of the reordering of keyword/species blocks in f2 files,
        there is a choice as to which blocks/species comments lying
        in betweeen two keyword/species blocks should be attached. There
        are two options which normally apply throughout a file: (i) comments
        are placed 'above' a keyword/species (the default), or (ii)
        comments are placed 'below' a keyword/species. Mid-block comments
        are not affected. The following demonstrates these two options:

        original                  -k[="above"]            -k="below"

        # comment 1               # comment 6               SOLUTION_MASTER_SPECIES
        SOLUTION_SPECIES          # comment 7               F F- 0 F 18.9984
        #comment 2                SOLUTION_MASTER_SPECIES   # comment 9
        Na+ + F- = NaF            # comment 8               # comment 10
        -log_k -0.24              F F- 0 F 18.9984          Na Na+ 0 Na 22.9898
                                  Na Na+ 0 Na 22.9898       # comment 8
        # comment 3
        Na+ + CO3-2 = NaCO3-      # comment 1               SOLUTION_SPECIES
        # comment 4               SOLUTION_SPECIES          #comment 2
        -log_k 1.27 # comment 5   # comment 3               Na+ + CO3-2 = NaCO3-
        # comment 6               Na+ + CO3-2 = NaCO3-      # comment 4
        # comment 7               # comment 4               -log_k 1.27 # comment 5
                                  -log_k 1.27 # comment 5   # comment 6
        SOLUTION_MASTER_SPECIES                             # comment 7
        Na Na+ 0 Na 22.9898       #comment 2
        # comment 8               Na+ + F- = NaF            Na+ + F- = NaF
        F F- 0 F 18.9984          -log_k -0.24              -log_k -0.24
                                                            # comment 3
        # comment 9               # comment 9
        # comment 10              # comment 10              PHASES
        PHASES                    PHASES                    Halite
        Halite                    Halite                    # comment 11
        # comment 11              # comment 11              NaCl = Cl- + Na+
        NaCl  =  Cl- + Na+        NaCl = Cl- + Na+          log_k 1.570
        log_k   1.570             log_k 1.570               -delta_h 1.37
        -delta_h  1.37            -delta_h 1.37             # comment 12
        # comment 12

        Adjacent comments are treated as one comment block and are moved together.
        'Orphan' comments such as comment 1 with "below", and comment 12 with
        "above" are omitted. Use '#' alone to insert blank comment lines.
        This default behaviour can be overridden for individual comment blocks
        by adding a special symbol immediately after the # in any of the
        comments of that block. The special symbols are: '<' for 'above' and
        '>' for 'below' signalling that the comment is above or below the
        adjacent keyword/species, e.g. if -k="above" then

        ...
        SOLUTION_MASTER_SPECIES
        #> Comment 1
        # Comment 2
        SURFACE_MASTER_SPECIES
        ...

        will place this comment block below SOLUTION_MASTER_SPECIES.

 -l     adds source line numbers and a_e data to the details (-d) output.
        Line numbers refer to the first line of the definition of the item.

 -n[=]  display species by their canonical formula (ordered and parenthesis-free),
        'elements' are ordered according to various schemes (see below). Neutral species are
        normalized to a coefficient of one for the first 'element', e.g. UO2 and U2O4 both become
        UO2. Phase formulae are NOT canonicalized if the primary display is the phase name, i.e. -p
        has not been set. The [] order in e.g. isotope species will be lost and so may degenerate.
        -n="e" master species sorted by electronegativity (default).
        -n="c" master species sorted by the 'Standard order of arrangement' (CODATA).
        -n="a" master species sorted alphabetically.

 -p     use PHASE formulae as the primary output (and filter) rather than the phase names.
        (name<formula).

 -q     quiet mode - minimal additional output other than the normal block output.
        Use redirection, <file3>, possibly to null, for zero output to screen.

 -s     individual species names are printed.

 -t     tidy - removes non-significant digits from chemical equations and
        many numbers including for f2 & f3 (but not f1) output. Also converts all
        charge formats to the M+n notation, e.g. Th++++ to Th+4 and SO4-- to SO4-2.

 -v     version date.

 -w=    maximum width of output. However, if a single entry is longer than this,
        it NOT wrapped to the next line. -w=0 forces one entry per line.
        Default is -w=120.

 <dir|file1>
        file mask for file(s) to analyse based on normal OS wildcards.
 <file2>
        is only used when comparing two files. This file is compared with file1.

 > <file3>
        redirects all screen output to <file3>. Optional.

        No spaces are allowed before or after an = sign in arguments. Case is not sensitive
        unless indicated (e.g. -f). File names and arguments can be quoted with double quotes to
        preserve spaces and to avoid shell processing. These double quotes are discarded;
        single quotes are not and should normally not be used. If binary files are detected,
        they are not analysed. If no <dir|file1> is given but just one option is given, the help

        There is increasing information going from:
        lsp *
        lsp -s *
        lsp -d *
        lsp -l *
        lsp -e *

        Files analyzed   = files analysed after discarding binary files and excluded files.
        Files populated  = files found with at least one PHREEQC keyword data block of interest.

Examples
========

 lsp wateq4f.dat                   outputs a summary of the major PHREEQC database-related
                                   keywords found in the file wateq4f.dat in the current
                                   directory.
 lsp -s wateq4f.dat                as above but includes a list of all species.
 lsp -d wateq4f.dat                as above but appends details including key parameter
                                   values (e.g. log_k's).
 lsp -l wateq4f.dat                as above but appends line numbers and analytical
                                   expressions.
 lsp -e -w=0 wateq4f.dat           comprehensive species output including defining
                                   equations, all in one species per line format.
 lsp -s -b=4,5 wateq4f.dat         outputs all the PHASES (solids and gases) in wateq4f.dat.
 lsp -r -s "*.dat" >lsp.csv        recursively outputs a list of all species in all *.dat 
                                   files in the current directory and all sub-directories
                                   and sends results to lsp.csv.
 lsp -r -p -s -f="U" "*"           recursively find all U species in all files.
 lsp -d -f="U.*CO3" "*.dat"        outputs details for all U-CO3 species in .dat files
                                   in the current directory.
 lsp -d -f="C" "*.dat"             all species containing C (C, Ca, Cr, Cs, Cu....).
 lsp -d -p -f="C[^a-z],C$" "*.dat" all carbon species.
 lsp -d -f="C|*" "*.dat"           all carbon species.
 lsp -r -s -x="\.pqo$ \.doc$" "*"  analyse all files in the current directory and
                                   sub-directories excluding all files with extensions .pqo
                                   and .doc.
 lsp -d -c new.dat old.dat         compare line-by-line after reducing both files to lsp -d
                                   format. Then send the output to 'WinMerge' if found else
                                   'fc' or if set, an explicit -c=<app>.
 lsp -c="diff" -co="-wB -C 0" -o "new.dat" "old.dat"
                                   compare new.dat and old.dat in their original formats using
                                   the 'diff' program. Ignore white space and blank lines;
                                   output no extra context lines with diff.
 lsp -c="winmergeu.exe" -co="/cfg DiffContextV2=0" file1 file2
                                   output no extra context lines with WinMerge.
 lsp c="diff" -f3 "new.dat" "old.dat"|awk -F, "/^</ {print \"changed\", $2,\"in block\" substr($1,2,8)}"
                                   comparison that uses diff & awk to post-process 
                                   spreadsheet-style output to something simple.
 lsp -f2 -f="Alkalinity|H|O|E|Na|K|Mg|Ca|Cl|C|S|N" llnl.dat
                                   makes a filtered llnl.dat database file containing just the
                                   major element species. Use phase formulae.
 lsp -f3 -e -t "wateq4f.dat"       makes a comma-separated spreadsheet format file including
                                   tidied equations and numbers.
 lsp -f                            help for option -f.
 lsp -h | more                     full help with paging.
 lsp -h > lsp.hlp                  save help to file lsp.hlp.