Hi IcyBricks,
Does anyone know how to do part-of-speech tagging and grammatical parsing in excel or can anyone otherwise point me in the right direction?
I'm not a word person, so I took a slightly different approach. I looked at KEYWORDS in an attempt to match 'Part Names', and 'Failure Modes' for the 'Part Names'.
Algorithm used:
a. Copy the 'RawData' Sheet Contents to the 'Scratch' Sheet ('Data Area').
Column 'A' contains a sentence to be processed on each line (.e.g. 'The 150pf capacitor was leaking.').
b. Create a Dictionary (list) of unique 'Part Names' from the sheet containing the MASTER 'Part Name' list.
c. Remove extraneous words (.e.g. 'the') and remove punctuation from the 'Data Area'.
There is a list of extraneous words and extraneous symbols on the sheet containing the MASTER 'Useless Items' list.
d. Replace abbreviations (and other items) in the 'Data Area' with synonyms (e.g. replace 'cap' with 'capacitor').
Some words are manipulated to be able to match 'Part Name' or 'Failure Mode' keywords (e.g. 'power cord' becomes 'Power Cord/Converter').
There is a list of synonyms on the sheet containing the MASTER 'Synonym' list.
e. Process One Raw Data sentence (one per line) at a time.
(1) Find a matching 'Part Name'.
(2) Get potential 'Failure Mode' Text for that Part Name.
(3) STOP with a MATCH, If the 'Failure Mode' Text is contained VERBATIM in the 'Raw Data' Sentence.
(4) STOP with a MATCH, If all the 'Failure Mode' Text WORDS are contained in the 'Raw Data' Sentence.
(5) Count the number of 'Failure Mode' Text WORDS contained in the 'Raw Data' Sentence.
(6) Repeat steps (2) thru (5) until a MATCH has been found, or there is no more 'Failure Mode' Text for the Part Name.
(7) Declare a MATCH for the 'Failure Mode' Text that contains the MOST words in the 'Raw Data' Sentence.
Lewis
See the attached file and code that follows and is continued in the next post:
Bookmarks