PARSER.EXE

Version 1.7 of June 2002
COPYRIGHT (C) 1996-2002 Norbert H. Doerry


1. Description

PARSER.EXE is a utility for processing both fixed length and field delimited ASCII flat-file representations of databases. Record lengths and Field positions are defined in a Profile File as are operations to perform on the fields. Fields can be rearranged, truncated, padded, reordered, characters deleted or replaced, & characters inserted. The Profile File also specifies Text delimiters, field delimiters, and record delimiters. Through careful crafting of the Profile File, PARSER.EXE can be used to create comma-delimited files, or other fixed length ASCII flat-files. PARSER.EXE can also be used to create a file for each record, using one of the fields from the original record as a filename, or using a default filename. Output files can be straight text files, or HTML files.


2. Syntax

      PARSER.EXE INFILE.TXT [-oOUTFILE.TXT] [-pPROFILE.INI] -? 

Where: 
         INFILE.TXT = Name of fixed length ASCII flat file.
      -oOUTFILE.TXT = Results are written to the file OUTFILE.TXT,
                      uses the standard output if not specified. 
      -pPROFILE.INI = PROFILE.INI is the name of Profile File. 
      -?            = Provides the Syntax for using Parser. 


3. Profile File Format

The Profile File can be created using any text editor. Its general format is as follows:

! Any line starting with ! is a comment line
!
! Everything before the line starting with [parser] is ignored. 
! All command arguments are space delimited.
! 
[parser]
!
! The first section defines parameters needed to write the
! OUTFILE.TXT
!
record_per_file    0
! filename_field Field1    ! Not used if record_per_file is set to 0
fixed_field                ! Optional indicates source file has fixed field lengths
! delimited_field $, $"    ! Indicates comma is field delimiter, " is text delimiter
!                            Not used if fixed_field is used.
print_column_names 0
field_delimiter    $\ 
record_delimiter   $\n
start_of_text_char $"
end_of_text_char   $"
record_length      62
output_text                ! Output file is pure text (default)
! output_html              ! Output file is in HTML
!
! The following lines define the fields in INFILE.TXT
! Note: Field names may not have embedded spaces
!
field Field1  1 10 TEXT 1
field Field2 11 10 NUMBER 2
field Field3 21 10 TEXT 3
field Field4 31 10 TEXT 4
field Field5 41 10 TEXT 5
field Field6 51 10 TEXT 6
field Field7 61  1 TEXT 0
!
field_text    Field8 Text\ inserted\ in\ Field8\ with\ print_order\ 8 8
field_counter Field9                       9
field_filename_field Field10              10
field_lookup Field11 Field1 lookup.txt 2  11
field_date   Field12                      12
field_time   Field13                      13
!
! The following lines define the operations to perform on the 
! the fields defined above.  The operations are performed on the 
! fields in the order they are entered in the PROFILE.
!
delete_alpha      Field1
delete_nonalpha   Field2
delete_numeric    Field3
delete_nonnumeric Field4
delete_left       Field5 1
delete_right      Field6 1
keep_left         Field1 5
keep_right        Field2 5
insert_left       Field3 <<<
insert_right      Field3 >>>
replace_char      Field7 $. $+
replace_text      Field7 Original\ Text   Replacement\ Text
strip             Field6
reorder           Field6 3 4 $/ 5 6 $/ 1 2
delete_char       Field5 $-
pad_left          Field1 10
pad_right         Field2 10
to_upper_case     Field3
to_lower_case     Field4
!
[eof]
!
! All lines after the line starting with [eof] are ignored.

3.1 record_per_file 0
If record_per_file is set to anything but 0, this command results in a file being written for each record. The filename is either produced based on a field specified by filename_field, or is automatically generated. If OUTFILE.TXT is specified on the command line, it is ignored. If record_per_file is set to 0, this command prints all of the records to the OUTFILE.TXT specified on the command line.

3.2 filename_field Field1
This command is only active if record_per_file is set to anything but 0. It specifies which field should be used to generate the filenames for each record. The contents of this field should be less than 8 characters and be limited to characters allowable for filenames. The field for each record must be unique, or records will be lost. The filename is generated by taking the contents of the specified field and appending .txt

3.3 fixed_field
This command indicates that the input file consists of fixed length fields and records. It should not be used if delimited_field is also defined. If neither delimited_field nor fixed_field is defined, then the input file is assumed to be fixed_field.

3.4 delimited_field $, $"
This command indicates that the input file consists of delimited fields. The first character argument is the field delimiter and the second character argument is the text delimiter. It should not be specified in additon to fixed_field.

3.5 print_column_names 0
This command enables the printing of the field names as the first record of OUTFILE.TXT if record_per_file is set to 0. If record_per_file is set to 1, the field name is printed prior to the field and has a leading @ character attached. Set print_column_names to 1 to enable printing of field names, and Set to 0 (default) to disable printing of field names.

3.6 field_delimiter $\
This command specifies the character or characters (max 31) to use for separating fields in OUTFILE.TXT. The default character is the comma. In this example, the field delimiter is being set to a space. If record_per_file is set to 1, field_delimiter should normally be set to the character $\n. If multiple characters are specified, they should be separated by a space.

3.7 record_delimiter $\n
This command specifies the character or characters (max 31) to use for separating records in OUTFILE.TXT. The default character is the newline character. If record_per_file is set to 1, field_delimiter specifies the character(s) to terminate the final line of each file with. If multiple characters are specified, they should be separated by a space.

3.8 start_of_text_char $"
This command specifies the character or characters (max 31) to use for starting text records in OUTFILE.TXT. The default character is the " character. If multiple characters are specified, they should be separated by a space.

3.9 end_of_text_char $"
This command specifies the character or characters (max 31) to use for ending text records in OUTFILE.TXT. The default character is the " character. If multiple characters are specified, they should be separated by a space.

3.10 record_length 62
This command specifies the length in characters of each record in INFILE.TXT. The length should include the record delimiter in INFILE.TXT it it exists. If a length of 0 is specified, then each record is assumed to be terminated with a newline character.

3.11 output_text
This command specifies that the output file should consist only of text with no headers or footers. output_text should not be specified in addition to output_html.

3.12 output_html
This command specifies that the output file should be in the HTML format. output_html should not be specified in addition to output_text.

3.13 field Field1 1 10 TEXT 1
This command specifies a field in INFILE.TXT. In this example, Field1 is the name of a field starting with the first character of a record and is 10 characters long. The TEXT argument means that the field, if printed in OUTFILE.TXT will be preceded with the start_of_text_char and followed with the end_of_text_char. The final argument, 1 in this case, is an integer that determines the print order of the fields. The fields are printed to OUTFILE.TXT in numerical order of the print order argument. A print order argument less than or equal to zero suppresses that field from being printed.

field Field2 11 10 NUMBER 2 This example defines Field2 to be a field starting with character 11 of a record and is 10 characters long. The print order of this field is 2.

3.14 field_text Field10 [text_to_print] 2
This command creates a field containing the specified text. In this example, Field10 is set to the string [text_to_print] for every record. The final number, 2 in this case, indicates the print order. The fields are printed in numerical order of the print order argument. A print order argument less than or equal to zero suppresses that field from being printed.

3.15 field_counter Field9 4
This command creates a field containing the number of the record, starting with 1 for the first record. In this example, Field9 is set to 000001 for the first record, 000002 for the second record, 000003 for the third record, and so on. The final number, 4 in this case, indicates the print order.

3.16 field_filename_field Field8 5
This command creates a field containing the default name for the file created if record_per_file is set to 1 and filename_field is not specified. The final number, 5 in this case, indicates the print order.

3.17 field_lookup field8 field1 lookup.txt 2 9
This command creates a field containing text from another file. In this case, field8 is created by finding the line in lookup.txt that begins with the text contained in field1. The 2 after lookup.txt indicates that the second word of the matched line is inserted into field8. The final number, 9 in this case, indicates the print order.

3.18 field_date field2 5
This command creates a field containing the date PARSER.EXE is executed in the format 12/31/1997. The final number, 5 in this case, indicates the print order.

3.19 field_time field 3 6
This command creates a field containing the time PARSER.EXE is executed in the format 14:32:59. The final number, 6 in this case, indicates the print order.

3.20 delete_alpha Field1
This command deletes all alphabetic characters (a-z and A-Z) from the specified field (Field1 in this example) before printing to OUTFILE.TXT.

3.21 delete_nonalpha Field2
This command deletes all non-alphabetic characters (all characters other than a-z and A-Z) from the specified field (Field2 in this example) before printing to OUTFILE.TXT.

3.22 delete_numeric Field3
This command deletes all numeric characters (0-9) from the specified field (Field3 in this example) before printing to OUTFILE.TXT.

3.23 delete_nonnumeric Field4
This command deletes all non-numeric characters (all characters other than 0-9) from the specified field (Field4 in this example) before printing to OUTFILE.TXT.

3.24 delete_left Field5 1
This command deletes from the field specified (Field5) the number of characters specified (1) from the beginning of the field.

3.25 delete_right Field6 1
This command deletes from the field specified (Field6) the number of characters specified (1) from the end of the field.

3.26 keep_left Field1 5
This command deletes from the field specified (Field1) all of the characters except the number of characters specified (5) starting with the beginning of the field.

3.27 keep_right Field2 5
This command deletes from the field specified (Field2) all of the characters except the number of characters specified (5) starting with the end of the field and working back.

3.28 insert_left Field3 <<<
This command inserts at the beginning of the field specified (Field3) the text specified (<<<).

3.29 insert_right Field3 >>>
This command inserts at the end of the field specified (Field3) the text specified (>>>).

3.30 replace_char Field7 $. $+
This command replaces from the field specified (Field7) all occurrences of the first character specified (a period or $.) with the second character specified (a plus sign or $+).

3.31 replace_text Field7 original\ text replacement\ text
This command replaces in the field specified (Field7) all occurrences of the first text argument specified (original \text) and replaces the text with the second text argument (replacement\ text). Note that embedded spaces must be preceded with a \.

3.32 strip Field6
This command deletes all leading and trailing spaces, tabs, newlines, & carriage returns from the specified field (Field6).

3.33 reorder Field6 3 4 $/ 5 6 $/ 1 2
This command reorders the characters of the field specified (Field6) and can insert new characters as well. In this case, the third character of Field6 is printed first, followed by the fourth character, followed by the / character, followed by the fifth character, followed by the sixth character, followed by the / character, followed by the first character and finally followed with the second character of Field6. This command is very useful for rearranging date fields.

3.34 delete_char Field5 $-
This command deletes from the field specified (Field5) all occurrences of the character specified (a dash or $-).

3.35 pad_left Field1 10
This command inserts spaces at the beginning of the specified field (Field1) so that the specified field has the specified number of characters (10). If the specified field has more than the specified number of characters, the field is left unaltered. See keep_left and keep_right to truncate fields.

3.36 pad_right Field2 10
This command inserts spaces at the end of the specified field (Field1) so that the specified field has the specified number of characters (10). If the specified field has more than the specified number of characters, the field is left unaltered. See keep_left and keep_right to truncate fields.

3.37 to_upper_case Field3
This command converts all lower case characters in the specified field (Field3) to upper case.

3.38 to_lower_case Field4
This command converts all upper case characters in the specified field (Field4) to lower case.


4. Comments

4.1 Additional Command Line Arguments
Two additional command line arguments are unadvertised features for the advanced user:

PARSER.EXE INFILE.TXT [-oOUTFILE.TXT] [-pPROFILE.INI] -d -tFIELD

Where:
        -d        = Set the Debug Flag to print out internal 
                    variables of the program. Used to find 
                    programming errors in PARSER.EXE
        -tFIELD   = Results in only this one FIELD as defined 
                    in PROFILE.INI being displayed on the screen. 
                    Used to ensure the record_length command in
                    PROFILE.INI is set properly

4.2 Format of Characters

Within a PROFILE FILE, whenever a character is specified (with the exception of the reorder line) it can be specified by either its ASCII code in hexadecimal, or by the character itself preceded by the $ character. Additionally, several special characters are also defined. Examples include:

$A  = the character 'A'
41  = the character 'A' as well, this is the hexadecimal code
20  = the space character
$   = the space character too (This may not always work)
$\  = the space character
$\a = the Bell character
$\b = the backspace character
$\f = the form feed character
$\n = the newline character
$\r = the carriage return character
$\t = the horizontal tab character
$\v = the vertical tab character
$\\ = the \ character

The reorder line does not allow the use of hexadecimal codes.

4.3 Format of Text

Within a PROFILE FILE, whenever text is specified, it should be specified using ASCII characters and the following special characters:

\ = the space character
\a = the Bell character
\b = the backspace character
\f = the form feed character
\n = the newline character
\r = the carriage return character
\t = the horizontal tab character
\v = the vertical tab character
\\ = the \ character

Embedded spaces should have a leading \ since text arguments are space delimited.


5. Example

     parser.exe test.txt -otestout.txt -ptest.ini

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

If you discover any bugs, or have any questions concerning these programs, please send me an email (doerry@aol.com)