Hi Carlos,
No apologies are necessary, I fully understand the sometimes unreasonable demands of work. Attached find the completed second attempt to solve your problem. I resolved all the issues I had with your sample data except one, which is described below. Since the sample data I have is limited, my speculation is that there will be several new anomalies recognized when the software is tested with an expanded sample of e-mail data.
Anomalies and/or Ambiguities:
1. '96H' is defined as 'Price', but in the Ideal output for the 'MOMS' line ( 2nd data line in the first e-mail) it is in the 'Level' column.
2. Some Id Numbers had 10 digits. The e-mail was edited to reduce the number of digits to 9.
Suggestions for formatting e-mails (if possible):
The following suggestions may improve perfomance (i.e. reduce errors):
a. Have at least two spaces between different items.
b. Items that are part of the same field (.e.g. Name) should not be separated by more than one space.
How the software works:
E-mails MUST be in Column 'G' starting at cell 'G2' on sheet 'Main'.
The following items can be changed by modifying a constant at the top of 'ModParse':
a. Sheet Name
b. Starting Cell (included column containing the -emails)
c. Last row cleared before start of processing (initially set to row 30)
d. Columns used for output data items
1. Each line in an e-mail is treated as a separate 'data line'.
2. Each 'data line' is scanned for key words (e.g. to indicate e-mail sender or e-mail send date).
Other key words indicate a line should be excluded from processing (e.g. 'Reserves Apply').
3. If a 'data line' is NOT EXCLUDED, then it is processed.
4. Before a 'data line' is processed, certain additional 'key words' in the 'data line' are removed (e.g. 'Buying').
This is to speed further processing and to prevent FALSELY identifying a keyword as ACTUAL DATA.
5. Each processed 'data line' is separated into 'tokens'.
A 'token' is a series of characters separated by no more than one space (in most cases).
Occassionally 'special processing' will be done to identify 'data itmes' that are separated by one space.
6. Each 'token' is evaluated from left to right with a special set of rules for each data item,
to determine if the token is one of:
a. Name
b. Id
c. Lag
d. Rating
e. Price
f. Level
g. Lot
h. None of the above
7. Occassionally a token will meet the criteria for more than one 'data item'. An additional
set of rules is applied to determine which 'data item' the token PROBABLY belongs to.
NOTE: All items in Sheet 'Main' on or below row 35 CAN BE DELETED.
Lewis
Bookmarks