Ron Rosenfeld wrote...
....
>For the "0 lost cuts"
>
>B1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){"&COLUMN()-1&"}",ROW())
>
>ROW() resolves to a '1' which means take the 'first' sequence that matches the
>pattern. As you copy/drag the formula down, ROW() will resolve to '2', '3',
>etc. which means match the 2nd, 3rd, etc sequence that matches the pattern.
>
>The basic pattern is defined by "(\w+?([^FP]K|$)){" which means look for a
>sequence of letters that ends with a K that is not preceded by an FP, or that
>is at the end of the string.
....
Wrong. [^FP] means any characters other than F or P, not the sequence
FP. While your formula may have produced the OP's expected results, if
there were an "FK" in the sequence, it wouldn't have. You need to use a
negative look-ahead asertion.
B2:
=REGEX.MID($A$2,"([^K]?|.*?((?!FP)..))K",ROWS(B$2:B2))
Note that if 0, 1, 2, etc. cuts are always wanted, the 1, 2, etc. cuts
could all be generated from the 0 cuts by concatenating each group of
2, 3, etc. 0-cut items, respectively.
If there could be multiple prohibited sequences, e.g., never cut before
FPK or DK (note the different lengths), it actually gets simpler if you
use an inner REGEX.SUBSTITUTE call to mask the cut character (K) ending
prohibited sequences, then unmask it after the REGEX.MID calls.
B2:
=SUBSTITUTE(REGEX.MID(REGEX.SUBSTITUTE($A$2,"(FP|D)K","[1]_"),
".*?K",ROWS(B$10:B10)),"_","K")
If the cut character were named CC, the list of prohibited sequences -
not including the ending cut character - were named PS, then this could
be rewritten as
B2:
=SUBSTITUTE(REGEX.MID(REGEX.SUBSTITUTE($A$2,"("&MCONCAT(PS,"|")&")"&CC,
"[1]#"),".*?"&CC,ROWS(B$10:B10)),"#",CC)
making use if MOREFUNC.XLL's MCONCAT function as well.
If you REALLY want to learn regular expressions, read
comp.lang.perl.misc, comp.editors or comp.unix.shell. Then when you
THINK you know them, reply to threads in those ngs. You'll find out
VERY QUICKLY whether you know them or not.
Bookmarks