+ Reply to Thread
Results 1 to 37 of 37

Text manipulation

  1. #1
    paulinoluciano
    Guest

    Text manipulation

    "Is there some VBA code which could delete all first, second or third
    characters of a text? Could it be done to the three last characters
    from this same text and these be displayed on reverse order?"

    Example:
    AAAASAHDASK
    AAASAHDASK
    AASAHDASK
    ASAHDASK
    SADHASAAAA
    ADHASAAAA
    DHASAAAA


  2. #2
    Barb Reinhardt
    Guest

    Re: Text manipulation

    I'm guessing you could use the LEFT, MID and RIGHT functions, but I've not
    used them in VBA.

    "paulinoluciano" <paulinoluciano@zipmail.com.br> wrote in message
    news:1135761865.111822.80740@g47g2000cwa.googlegroups.com...
    > "Is there some VBA code which could delete all first, second or third
    > characters of a text? Could it be done to the three last characters
    > from this same text and these be displayed on reverse order?"
    >
    > Example:
    > AAAASAHDASK
    > AAASAHDASK
    > AASAHDASK
    > ASAHDASK
    > SADHASAAAA
    > ADHASAAAA
    > DHASAAAA
    >




  3. #3
    paulinoluciano
    Guest

    Re: Text manipulation

    You are right Barb Reinhardt... But in this case I would need to use
    some kind of VBA code because I would not like to have to put each
    sequence o characters and functions for each cell independently. I
    would like to perform an automatic approach because I have a lot of
    files to do that.
    Thank you anyway!
    Luciano


  4. #4
    Bob Phillips
    Guest

    Re: Text manipulation

    Sub Test()
    Dim iLastRow As Long
    Dim i As Long, j As Long
    Dim temp
    Const nIndex As Long = 3 'every third, change to suit

    iLastRow = Cells(Rows.Count, "A").End(xlUp).Row
    For i = 1 To iLastRow
    If nIndex = 1 Then
    Cells(i, "A").Value = Right(Cells(i, "A").Value, _
    Len(Cells(i, "A").Value) - 1)
    Else
    Cells(i, "A").Value = Left(Cells(i, "A").Value, nIndex - 1) & _
    Right(Cells(i, "A").Value, Len(Cells(i,
    "A").Value) - nIndex)
    End If
    Next i

    End Sub


    --
    HTH

    Bob Phillips

    (remove nothere from email address if mailing direct)

    "paulinoluciano" <paulinoluciano@zipmail.com.br> wrote in message
    news:1135761865.111822.80740@g47g2000cwa.googlegroups.com...
    > "Is there some VBA code which could delete all first, second or third
    > characters of a text? Could it be done to the three last characters
    > from this same text and these be displayed on reverse order?"
    >
    > Example:
    > AAAASAHDASK
    > AAASAHDASK
    > AASAHDASK
    > ASAHDASK
    > SADHASAAAA
    > ADHASAAAA
    > DHASAAAA
    >




  5. #5
    paulinoluciano
    Guest

    Re: Text manipulation

    Thank you Bob Phillips! It solve my problem in part. However the major
    question is just a few more complex. I explained it better in topic
    Text subsequences. In the present topic instead subtract one letter any
    time I need a macro capable to let the first cell (e.g. A1) as it is
    and remove the characters only in the next row (A2) that it would be
    identical to the previous without the first letter.
    Luciano


  6. #6
    Bob Phillips
    Guest

    Re: Text manipulation

    I didn't understand the other one!

    --
    HTH

    Bob Phillips

    (remove nothere from email address if mailing direct)

    "paulinoluciano" <paulinoluciano@zipmail.com.br> wrote in message
    news:1135780063.756751.135410@z14g2000cwz.googlegroups.com...
    > Thank you Bob Phillips! It solve my problem in part. However the major
    > question is just a few more complex. I explained it better in topic
    > Text subsequences. In the present topic instead subtract one letter any
    > time I need a macro capable to let the first cell (e.g. A1) as it is
    > and remove the characters only in the next row (A2) that it would be
    > identical to the previous without the first letter.
    > Luciano
    >




  7. #7
    paulinoluciano
    Guest

    Re: Text manipulation

    In fact, the other topic is just a few more complex until to explain.
    Let me try explain better. I have a sequence of characters like:

    AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ

    This sequence must be put in cell A2.
    Thus, I have to perform some specific operations in this text:

    Example 1:
    Rules:
    a) Fragment the sequence before K but not always (you could have lost
    cut).
    b) Sequence is not cut if K is found before FP

    Results:

    AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ

    0 lost cut = Cutting the sequence all the time in which K is present
    (The subsequences of this process should be put in B column:
    AASSASDK
    ASASDASFAFSASASADK
    ASASAFPKQREWEAQEOK
    SPADAOEK
    OQPPDAOPSK
    AEPQ

    1 lost cut = Cutting the sequence after the first K present in the
    sequence (The subsequences of this process should be put in C column::
    AASSASDKASASDASFAFSASASADK
    ASASAFPKQREWEAQEOKSPADAOEK
    OQPPDAOPSKAEPQ
    AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    SPADAOEKOQPPDAOPSKAEPQ

    2 lost cut = = Cutting the sequence after the second K (just for the
    third and following) present in the sequence (The subsequences of this
    process should be put in D column:
    AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    SPADAOEKOQPPDAOPSKAEPQ

    Repair that in some cases I need lost cuts in which you cut after 1, 2,
    3, 4,... specific characters.
    I have to specify such rules in some place of the sheet containing the
    precursor text.
    The rules are:

    Cut after "XXX" (In this example I have put K but the some cell in the
    sheet must contain what is the character in which the sequence will be
    fragmented). In some cases it could be more than only one character
    (e.g. K and R; nor necessarily together)
    Cut before "XXX" (The cut may be after like previous example or before
    the character)

    Never before "XXX" (In some cases I have prohibitive situations; e.g.
    It must not cut a sequence in K if K is preceeded by P or by RP)
    Never after "XXX" (Same for after)

    Number of times that the character could be missed prior cut "XXX" (In
    some place of the sheet I must explicit how many characters could be
    "lost" prior cut (see example).


  8. #8
    Niek Otten
    Guest

    Re: Text manipulation

    Just out of curiosity, what not-Excel, real world problem is this?

    --
    Kind regards,

    Niek Otten

    "paulinoluciano" <paulinoluciano@zipmail.com.br> wrote in message
    news:1135798777.456374.90110@f14g2000cwb.googlegroups.com...
    > In fact, the other topic is just a few more complex until to explain.
    > Let me try explain better. I have a sequence of characters like:
    >
    > AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ
    >
    > This sequence must be put in cell A2.
    > Thus, I have to perform some specific operations in this text:
    >
    > Example 1:
    > Rules:
    > a) Fragment the sequence before K but not always (you could have lost
    > cut).
    > b) Sequence is not cut if K is found before FP
    >
    > Results:
    >
    > AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ
    >
    > 0 lost cut = Cutting the sequence all the time in which K is present
    > (The subsequences of this process should be put in B column:
    > AASSASDK
    > ASASDASFAFSASASADK
    > ASASAFPKQREWEAQEOK
    > SPADAOEK
    > OQPPDAOPSK
    > AEPQ
    >
    > 1 lost cut = Cutting the sequence after the first K present in the
    > sequence (The subsequences of this process should be put in C column::
    > AASSASDKASASDASFAFSASASADK
    > ASASAFPKQREWEAQEOKSPADAOEK
    > OQPPDAOPSKAEPQ
    > AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    > SPADAOEKOQPPDAOPSKAEPQ
    >
    > 2 lost cut = = Cutting the sequence after the second K (just for the
    > third and following) present in the sequence (The subsequences of this
    > process should be put in D column:
    > AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    > SPADAOEKOQPPDAOPSKAEPQ
    >
    > Repair that in some cases I need lost cuts in which you cut after 1, 2,
    > 3, 4,... specific characters.
    > I have to specify such rules in some place of the sheet containing the
    > precursor text.
    > The rules are:
    >
    > Cut after "XXX" (In this example I have put K but the some cell in the
    > sheet must contain what is the character in which the sequence will be
    > fragmented). In some cases it could be more than only one character
    > (e.g. K and R; nor necessarily together)
    > Cut before "XXX" (The cut may be after like previous example or before
    > the character)
    >
    > Never before "XXX" (In some cases I have prohibitive situations; e.g.
    > It must not cut a sequence in K if K is preceeded by P or by RP)
    > Never after "XXX" (Same for after)
    >
    > Number of times that the character could be missed prior cut "XXX" (In
    > some place of the sheet I must explicit how many characters could be
    > "lost" prior cut (see example).
    >




  9. #9
    paulinoluciano
    Guest

    Re: Text manipulation

    Oh, Sorry! This is applied to proteomics reserch (biology). In that
    case, amino acid sequences are fragmented in small parts by proteases.
    There are a lot of non-Excel softwares devoted to do that but it would
    be easier and nore roboust for my current applications if could I use
    excel devoted to this end.
    Best regards,
    Luciano


  10. #10
    Niek Otten
    Guest

    Re: Text manipulation

    Thanks for the info, Luciano

    --
    Kind regards,

    Niek Otten

    "paulinoluciano" <paulinoluciano@zipmail.com.br> wrote in message
    news:1135801518.206757.220650@g44g2000cwa.googlegroups.com...
    > Oh, Sorry! This is applied to proteomics reserch (biology). In that
    > case, amino acid sequences are fragmented in small parts by proteases.
    > There are a lot of non-Excel softwares devoted to do that but it would
    > be easier and nore roboust for my current applications if could I use
    > excel devoted to this end.
    > Best regards,
    > Luciano
    >




  11. #11
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 28 Dec 2005 11:39:37 -0800, "paulinoluciano" <paulinoluciano@zipmail.com.br>
    wrote:

    >In fact, the other topic is just a few more complex until to explain.
    >Let me try explain better. I have a sequence of characters like:
    >
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ
    >
    >This sequence must be put in cell A2.
    >Thus, I have to perform some specific operations in this text:
    >
    >Example 1:
    >Rules:
    >a) Fragment the sequence before K but not always (you could have lost
    >cut).
    >b) Sequence is not cut if K is found before FP
    >
    >Results:
    >
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ
    >
    >0 lost cut = Cutting the sequence all the time in which K is present
    >(The subsequences of this process should be put in B column:
    >AASSASDK
    >ASASDASFAFSASASADK
    >ASASAFPKQREWEAQEOK
    >SPADAOEK
    >OQPPDAOPSK
    >AEPQ
    >
    >1 lost cut = Cutting the sequence after the first K present in the
    >sequence (The subsequences of this process should be put in C column::
    >AASSASDKASASDASFAFSASASADK
    >ASASAFPKQREWEAQEOKSPADAOEK
    >OQPPDAOPSKAEPQ
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    >SPADAOEKOQPPDAOPSKAEPQ
    >
    >2 lost cut = = Cutting the sequence after the second K (just for the
    >third and following) present in the sequence (The subsequences of this
    >process should be put in D column:
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    >SPADAOEKOQPPDAOPSKAEPQ
    >
    >Repair that in some cases I need lost cuts in which you cut after 1, 2,
    >3, 4,... specific characters.
    >I have to specify such rules in some place of the sheet containing the
    >precursor text.
    >The rules are:
    >
    >Cut after "XXX" (In this example I have put K but the some cell in the
    >sheet must contain what is the character in which the sequence will be
    >fragmented). In some cases it could be more than only one character
    >(e.g. K and R; nor necessarily together)
    >Cut before "XXX" (The cut may be after like previous example or before
    >the character)
    >
    >Never before "XXX" (In some cases I have prohibitive situations; e.g.
    >It must not cut a sequence in K if K is preceeded by P or by RP)
    >Never after "XXX" (Same for after)
    >
    >Number of times that the character could be missed prior cut "XXX" (In
    >some place of the sheet I must explicit how many characters could be
    >"lost" prior cut (see example).


    You may want to look into "regular expressions" to do what you are trying to
    describe. If you download and install Longre's free morefunc.xll add-in from
    http://xcell05.free.fr/ you will see that you can use them as worksheet
    functions and also call them from a VBA module.

    What you write is a bit confusing. For example, one rule you give is:
    "Sequence is not cut if K is found before FP" but in your example you seem to
    be acting as if the rule applies if K is found AFTER FP.

    I am assuming the output starts in B1; if it starts in a different row, then
    adjust the ROW() function to result in a 1 as the output:

    seq is the character sequence (Insert/Name/Define and set seq = "your string")

    For the "0 lost cuts"

    B1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){"&COLUMN()-1&"}",ROW())

    ROW() resolves to a '1' which means take the 'first' sequence that matches the
    pattern. As you copy/drag the formula down, ROW() will resolve to '2', '3',
    etc. which means match the 2nd, 3rd, etc sequence that matches the pattern.

    The basic pattern is defined by "(\w+?([^FP]K|$)){" which means look for a
    sequence of letters that ends with a K that is not preceded by an FP, or that
    is at the end of the string.

    The {"&COLUMN()-1&"}" resolves, in Column B, to {1} which means look for one
    occurrence of the preceding pattern.

    If you copy/drag the formula down until you get blanks for the results, you
    will see what you posted in your previous message.

    If you copy/drag across to column D, you will see the results of "1 lost cut"
    or "2 lost cuts".

    I think once you understand the formula construction and the regular
    expressions, it will be simple to use this for your other rules.

    Without the COLUMN and ROW functions, the formulas would look like:

    B1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){1}",1)
    B2: =REGEX.MID(seq,"(\w+?([^FP]K|$)){1}",2)

    C1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){2}",1)
    C2: =REGEX.MID(seq,"(\w+?([^FP]K|$)){2}",2)

    To use this in VBA, you would use the RUN method which is outlined in HELP for
    morefunc.xll
    --ron

  12. #12
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 28 Dec 2005 11:39:37 -0800, "paulinoluciano" <paulinoluciano@zipmail.com.br>
    wrote:

    >1 lost cut = Cutting the sequence after the first K present in the
    >sequence (The subsequences of this process should be put in C column::
    >AASSASDKASASDASFAFSASASADK
    >ASASAFPKQREWEAQEOKSPADAOEK
    >OQPPDAOPSKAEPQ
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    >SPADAOEKOQPPDAOPSKAEPQ


    See my other answer. But I did not understand how you obtained the last two
    lines in the "1 lost cut" sequence.

    They are identical to the two lines in the "2 lost cut" sequence, so I thought
    this might be a typo. But perhaps I am missing something?


    --ron

  13. #13
    paulinoluciano
    Guest

    Re: Text manipulation

    Hi Ron Rosenfeld,
    Thank you very much for your help.
    Could I use your first function as a VBA code?
    In this second case, is is possible that we have a typo. In this case
    speak about 1 lost cut means that you will cut the sequence only after
    the first K appear, never to the first one. But you will have
    intermediates in such process because you never now where will be
    performed the first cut.
    Luciano


  14. #14
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 29 Dec 2005 01:26:31 -0800, "paulinoluciano" <paulinoluciano@zipmail.com.br>
    wrote:

    >Hi Ron Rosenfeld,
    >Thank you very much for your help.
    >Could I use your first function as a VBA code?


    Yes, you can. The morefunc.xll add-in functions can be used in VBA by using
    the RUN method. See HELP for those add-ins for more details.


    >In this second case, is is possible that we have a typo. In this case
    >speak about 1 lost cut means that you will cut the sequence only after
    >the first K appear, never to the first one. But you will have
    >intermediates in such process because you never now where will be
    >performed the first cut.


    How do you determine, if you specify ONE lost cut, whether the first cut will
    occur after the SECOND 'K', or after the THIRD 'K', or ???

    The formulas assumed that with ONE lost cut, the first cut would occur after
    the SECOND 'K'.

    Here is a UDF written in VBA to do the same thing, using the REGEX.MID function
    from the morefunc.xll add-in. The variables should be self-explanatory. The
    return value is an array, and the individual components can be obtained using
    the INDEX worksheet function.

    e.g. with the sequence stored in A1:

    =INDEX(SplitK($A$1,0),1) would return the first item in the '0 lost cuts'
    splitting function).

    ===================================
    Option Explicit
    Function SplitK(ByVal seq As String, LostCut As Long) As Variant
    Dim i As Long, j As Long
    Dim KCount As Long
    Dim Temp() As String

    If LostCut < 0 Then
    SplitK = CVErr(xlErrNum)
    Exit Function
    End If

    KCount = Len(seq) - Len(Replace(seq, "K", ""))
    ReDim Temp(1 To KCount)

    For i = 1 To KCount
    Temp(i) = Run([regex.mid], seq, "(\w+?([^FP]K|$)){" & LostCut + 1 & "}", i)
    Next i

    SplitK = Temp
    End Function
    =========================

    This could also be written as a SUB to automatically place the results into
    specified cells, but it would be less flexible.

    To write results into columns B, C, D:

    =====================================
    Option Explicit
    Sub SplitK()
    Const seq As String =
    "AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ"
    Const MaxLostCuts As Long = 2
    Const ResultColumn As Long = 2 'Column B

    Dim i As Long
    Dim LostCut As Long
    Dim KCount As Long
    Dim Temp() As String


    KCount = Len(seq) - Len(Replace(seq, "K", ""))
    ReDim Temp(1 To KCount)

    For LostCut = 0 To MaxLostCuts
    For i = 1 To KCount
    Cells(i, ResultColumn + LostCut) = _
    Run([regex.mid], seq, "(\w+?([^FP]K|$)){" & LostCut + 1 & "}", i)
    Next i
    Next LostCut

    End Sub
    ====================================


    --ron

  15. #15
    paulinoluciano
    Guest

    Re: Text manipulation

    Hi Ron,
    When we are talking about "lost cut" it means that inside the sequence
    will be present 1 "K" or 2 "K" or 3 "K" that will not detected in order
    to be cut.
    Do you understand?
    Luciano


  16. #16
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 29 Dec 2005 05:42:15 -0800, "paulinoluciano" <paulinoluciano@zipmail.com.br>
    wrote:

    >Hi Ron,
    >When we are talking about "lost cut" it means that inside the sequence
    >will be present 1 "K" or 2 "K" or 3 "K" that will not detected in order
    >to be cut.
    >Do you understand?
    >Luciano


    I understood that to mean that if there is
    ZERO lost cuts then
    cut after every K (that is not preceded by an FP)

    if there is ONE lost cut then
    cut after every second K that is not preceded by an FP

    if there are TWO lost cuts then
    cut after every third K that is not preceded by an FP




    --ron

  17. #17
    paulinoluciano
    Guest

    Re: Text manipulation

    Yes, it is almost this. However, since some cut is performed the
    sequence to be considered to serach the next possible cut is the
    remained subsequence. In such case, it could be expected any place in
    the text sequence displayng two (or three) K being that the second or
    third should be at the end of the sequence.
    Luciano


  18. #18
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 29 Dec 2005 06:34:48 -0800, "paulinoluciano" <paulinoluciano@zipmail.com.br>
    wrote:

    >Yes, it is almost this. However, since some cut is performed the
    >sequence to be considered to serach the next possible cut is the
    >remained subsequence. In such case, it could be expected any place in
    >the text sequence displayng two (or three) K being that the second or
    >third should be at the end of the sequence.
    >Luciano


    I don't understand how what you are writing is different from the results that
    my algorithm produces.

    Perhaps if you gave some examples of the results of my formula on a text string
    vs what you expect to have as a result.

    For example, with ONE lost cut, and using your original seq, I get:

    AASSASDKASASDASFAFSASASADK
    ASASAFPKQREWEAQEOKSPADAOEK
    OQPPDAOPSKAEPQ

    There as been ONE cut missed in each string:

    AASSASDKASASDASFAFSASASADK
    ^
    ASASAFPKQREWEAQEOKSPADAOEK
    ^^^ ^
    OQPPDAOPSKAEPQ
    ^

    The FPK sequence in the second string is also not cut based on your initial
    specifications.

    What kind of output are you expecting from this, and why??


    --ron

  19. #19
    paulinoluciano
    Guest

    Re: Text manipulation

    It is exactly this I`m waiting for. However I could not apply your
    algorithm yet. It is telling me that are an error: "Run-time error
    '1004': Method 'Run' of object'_Global' failed".
    I did not understand what is happening.
    Regards,
    Luciano


  20. #20
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 29 Dec 2005 11:03:06 -0800, "paulinoluciano" <paulinoluciano@zipmail.com.br>
    wrote:

    >It is exactly this I`m waiting for. However I could not apply your
    >algorithm yet. It is telling me that are an error: "Run-time error
    >'1004': Method 'Run' of object'_Global' failed".
    >I did not understand what is happening.
    >Regards,
    >Luciano


    1. Perhaps you did not follow all of the instructions.
    2. Perhaps there is a problem with line-wrapping in your newsgroup reader, so
    that coding is not copied precisely as I posted.

    Suggestion:

    Write me exactly what you did to reproduce what I recommended.
    Copy/Paste the code you are using into your post.
    --ron

  21. #21
    Harlan Grove
    Guest

    Re: Text manipulation

    Bob Phillips wrote...
    ....
    > If nIndex = 1 Then
    > Cells(i, "A").Value = Right(Cells(i, "A").Value, _
    > Len(Cells(i, "A").Value) - 1)
    > Else
    > Cells(i, "A").Value = Left(Cells(i, "A").Value, nIndex - 1) & _
    > Right(Cells(i, "A").Value, Len(Cells(i, "A").Value) - nIndex)
    > End If

    ....

    Could simplify: Right(x, Len(x) - y) == Mid(x, y + 1)

    Then again, the whole If block could be replaced by

    Cells(i, "A").Value = _
    Application.WorksheetFunction.Replace(Cells(i, "A").Value, nIndex,
    1, "")


  22. #22
    Harlan Grove
    Guest

    Re: Text manipulation

    paulinoluciano wrote...
    ....
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ
    >
    >This sequence must be put in cell A2.

    ....
    >Rules:
    >a) Fragment the sequence before K but not always (you could have lost cut).
    >b) Sequence is not cut if K is found before FP
    >
    >Results:
    >
    >ASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAOEKOQPPDAOPSKAEPQ
    >
    >0 lost cut = Cutting the sequence all the time in which K is present
    >(The subsequences of this process should be put in B column:
    >AASSASDK
    >ASASDASFAFSASASADK
    >ASASAFPKQREWEAQEOK
    >SPADAOEK
    >OQPPDAOPSK
    >AEPQ


    You could use formulas.

    B2:
    =LEFT($A$2,FIND("K",SUBSTITUTE($A$2,"FPK","###")))

    B3:
    =REPLACE(LEFT($A$2,FIND("K",SUBSTITUTE($A$2,"FPK","###")&"K",
    SUMPRODUCT(LEN(B$2:B2))+1)),1,SUMPRODUCT(LEN(B$2:B2)),"")

    Fill B3 down as needed. Filling it into B4:B8 (one cell more than
    needed) gives B2:B8

    AASSASDK
    ASASDASFAFSASASADK
    ASASAFPKQREWEAQEOK
    SPADAOEK
    OQPPDAOPSK
    AEPQ
    <blank>

    >1 lost cut = Cutting the sequence after the first K present in the
    >sequence (The subsequences of this process should be put in C column::
    >AASSASDKASASDASFAFSASASADK
    >ASASAFPKQREWEAQEOKSPADAOEK
    >OQPPDAOPSKAEPQ
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    >SPADAOEKOQPPDAOPSKAEPQ


    You have all the information you need for this in column B.

    C2:
    =INDEX(B$2:B$99,2*ROWS(C$2:C2)-1)&INDEX(B$2:B$99,2*ROWS(C$2:C2))

    Fill C2 down as needed. Filling it into C3:C5 (one more than needed)
    gives C2:C5

    AASSASDKASASDASFAFSASASADK
    ASASAFPKQREWEAQEOKSPADAOEK
    OQPPDAOPSKAEPQ
    <blank>

    >2 lost cut = = Cutting the sequence after the second K (just for the
    >third and following) present in the sequence (The subsequences of this
    >process should be put in D column:
    >AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    >SPADAOEKOQPPDAOPSKAEPQ


    D2:
    =INDEX(B$2:B$99,3*ROWS(D$2:D2)-2)&INDEX(B$2:B$99,3*ROWS(D$2:D2)-1)
    &INDEX(B$2:B$99,3*ROWS(D$2:D2))

    Fill D2 down as needed. Filling it into D3:D4 (one more than needed)
    gives D2:D4

    AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
    SPADAOEKOQPPDAOPSKAEPQ
    <blank>

    >Repair that in some cases I need lost cuts in which you cut after 1, 2,
    >3, 4,... specific characters.
    >I have to specify such rules in some place of the sheet containing the
    >precursor text.
    >The rules are:
    >
    >Cut after "XXX" (In this example I have put K but the some cell in the
    >sheet must contain what is the character in which the sequence will be
    >fragmented). In some cases it could be more than only one character
    >(e.g. K and R; nor necessarily together)
    >Cut before "XXX" (The cut may be after like previous example or before
    >the character)
    >
    >Never before "XXX" (In some cases I have prohibitive situations; e.g.
    >It must not cut a sequence in K if K is preceeded by P or by RP)


    The RP preceding K is redundant if P alone preceding K indicates a
    prohibited situation. You'd only need to check for PK.

    >Never after "XXX" (Same for after)
    >
    >Number of times that the character could be missed prior cut "XXX" (In
    >some place of the sheet I must explicit how many characters could be
    >"lost" prior cut (see example).


    Generalizing the formulas above with the character to match in a cell
    named CC and the prohibited sequence (in this case FPK) in a cell named
    PS,

    B2:
    =LEFT($A$2,FIND(CC,SUBSTITUTE($A$2,PS,REPT("#",LEN(PS)))))

    B3:
    =REPLACE(LEFT($A$2,FIND(CC,SUBSTITUTE($A$2,PS,REPT("#",LEN(PS)))&CC,
    SUMPRODUCT(LEN(B$2:B2))+1)),1,SUMPRODUCT(LEN(B$2:B2)),"")

    The column C, D, etc. formulas wouldn't need to change.

    If you have multiple prohibited sequences, then regular expressions
    would be MUCH BETTER tools for doing this. Symbolic processing like
    this is reducible to text processing, but Excel provides poor built-in
    tools for text processing, but since it was meant to calculate numbers
    this shouldn't be surprising.


  23. #23
    paulinoluciano
    Guest

    Re: Text manipulation

    Dear Harlan Grove,
    Thank you very much for your attention. I have tried apply your
    formulas but my excel related some problems. The message is that: "The
    formula you typed contains an error". In addition, do you think that
    would it be possible to perform such formulas by using VBA?
    Luciano


  24. #24
    Harlan Grove
    Guest

    Re: Text manipulation

    Ron Rosenfeld wrote...
    ....
    >For the "0 lost cuts"
    >
    >B1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){"&COLUMN()-1&"}",ROW())
    >
    >ROW() resolves to a '1' which means take the 'first' sequence that matches the
    >pattern. As you copy/drag the formula down, ROW() will resolve to '2', '3',
    >etc. which means match the 2nd, 3rd, etc sequence that matches the pattern.
    >
    >The basic pattern is defined by "(\w+?([^FP]K|$)){" which means look for a
    >sequence of letters that ends with a K that is not preceded by an FP, or that
    >is at the end of the string.

    ....

    Wrong. [^FP] means any characters other than F or P, not the sequence
    FP. While your formula may have produced the OP's expected results, if
    there were an "FK" in the sequence, it wouldn't have. You need to use a
    negative look-ahead asertion.

    B2:
    =REGEX.MID($A$2,"([^K]?|.*?((?!FP)..))K",ROWS(B$2:B2))

    Note that if 0, 1, 2, etc. cuts are always wanted, the 1, 2, etc. cuts
    could all be generated from the 0 cuts by concatenating each group of
    2, 3, etc. 0-cut items, respectively.

    If there could be multiple prohibited sequences, e.g., never cut before
    FPK or DK (note the different lengths), it actually gets simpler if you
    use an inner REGEX.SUBSTITUTE call to mask the cut character (K) ending
    prohibited sequences, then unmask it after the REGEX.MID calls.

    B2:
    =SUBSTITUTE(REGEX.MID(REGEX.SUBSTITUTE($A$2,"(FP|D)K","[1]_"),
    ".*?K",ROWS(B$10:B10)),"_","K")

    If the cut character were named CC, the list of prohibited sequences -
    not including the ending cut character - were named PS, then this could
    be rewritten as

    B2:
    =SUBSTITUTE(REGEX.MID(REGEX.SUBSTITUTE($A$2,"("&MCONCAT(PS,"|")&")"&CC,
    "[1]#"),".*?"&CC,ROWS(B$10:B10)),"#",CC)

    making use if MOREFUNC.XLL's MCONCAT function as well.

    If you REALLY want to learn regular expressions, read
    comp.lang.perl.misc, comp.editors or comp.unix.shell. Then when you
    THINK you know them, reply to threads in those ngs. You'll find out
    VERY QUICKLY whether you know them or not.


  25. #25
    paulinoluciano
    Guest

    Re: Text manipulation

    Thank you very much Harlan Grove!
    In fact, I just realize that it was just a language problem. My excel
    has formula in portuguese; different from yours. I`m now trying put
    your formula to run as VBA codes but in this case I think it will be
    necessary to define each variable because some people will use thes
    worksheet will not have morefunc.xll.
    Luciano


  26. #26
    paulinoluciano
    Guest

    Re: Text manipulation


    paulinoluciano wrote:
    > Thank you very much Harlan Grove!
    > In fact, I just realize that it was just a language problem. My excel
    > has formula in portuguese; different from yours. I`m now trying put
    > your formula to run as VBA codes but in this case I think it will be
    > necessary to define each variable because some people will use thes
    > worksheet will not have morefunc.xll.
    > Luciano



  27. #27
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 29 Dec 2005 14:08:41 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

    >Ron Rosenfeld wrote...
    >...
    >>For the "0 lost cuts"
    >>
    >>B1: =REGEX.MID(seq,"(\w+?([^FP]K|$)){"&COLUMN()-1&"}",ROW())
    >>
    >>ROW() resolves to a '1' which means take the 'first' sequence that matches the
    >>pattern. As you copy/drag the formula down, ROW() will resolve to '2', '3',
    >>etc. which means match the 2nd, 3rd, etc sequence that matches the pattern.
    >>
    >>The basic pattern is defined by "(\w+?([^FP]K|$)){" which means look for a
    >>sequence of letters that ends with a K that is not preceded by an FP, or that
    >>is at the end of the string.

    >...
    >
    >Wrong. [^FP] means any characters other than F or P, not the sequence
    >FP. While your formula may have produced the OP's expected results, if
    >there were an "FK" in the sequence, it wouldn't have. You need to use a
    >negative look-ahead asertion.
    >
    >B2:
    >=REGEX.MID($A$2,"([^K]?|.*?((?!FP)..))K",ROWS(B$2:B2))
    >
    >Note that if 0, 1, 2, etc. cuts are always wanted, the 1, 2, etc. cuts
    >could all be generated from the 0 cuts by concatenating each group of
    >2, 3, etc. 0-cut items, respectively.
    >
    >If there could be multiple prohibited sequences, e.g., never cut before
    >FPK or DK (note the different lengths), it actually gets simpler if you
    >use an inner REGEX.SUBSTITUTE call to mask the cut character (K) ending
    >prohibited sequences, then unmask it after the REGEX.MID calls.
    >
    >B2:
    >=SUBSTITUTE(REGEX.MID(REGEX.SUBSTITUTE($A$2,"(FP|D)K","[1]_"),
    >".*?K",ROWS(B$10:B10)),"_","K")
    >
    >If the cut character were named CC, the list of prohibited sequences -
    >not including the ending cut character - were named PS, then this could
    >be rewritten as
    >
    >B2:
    >=SUBSTITUTE(REGEX.MID(REGEX.SUBSTITUTE($A$2,"("&MCONCAT(PS,"|")&")"&CC,
    >"[1]#"),".*?"&CC,ROWS(B$10:B10)),"#",CC)
    >
    >making use if MOREFUNC.XLL's MCONCAT function as well.
    >
    >If you REALLY want to learn regular expressions, read
    >comp.lang.perl.misc, comp.editors or comp.unix.shell. Then when you
    >THINK you know them, reply to threads in those ngs. You'll find out
    >VERY QUICKLY whether you know them or not.



    Thank you for pointing that out. And also for the references to those other
    NG's.


    --ron

  28. #28
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 29 Dec 2005 14:08:41 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

    >B2:
    >=REGEX.MID($A$2,"([^K]?|.*?((?!FP)..))K",ROWS(B$2:B2))


    By the way, I think in keeping with the OP's specifications, this should be:

    =REGEX.MID(seq,"([^K]?|.*?((?!FP)..))(K|$)",ROWS(B$2:B2))

    I think he always wants the end of the string, even if it doesn't end with a
    'K'.


    --ron

  29. #29
    Harlan Grove
    Guest

    Re: Text manipulation

    Ron Rosenfeld wrote...
    ....
    >By the way, I think in keeping with the OP's specifications, this should be:
    >
    > =REGEX.MID(seq,"([^K]?|.*?((?!FP)..))(K|$)",ROWS(B$2:B2))
    >
    >I think he always wants the end of the string, even if it doesn't end with a
    >'K'.


    Good point, but it begs the question whether any symbols after the last
    K would be considered valid data.


  30. #30
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 30 Dec 2005 11:04:25 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

    >Ron Rosenfeld wrote...
    >...
    >>By the way, I think in keeping with the OP's specifications, this should be:
    >>
    >> =REGEX.MID(seq,"([^K]?|.*?((?!FP)..))(K|$)",ROWS(B$2:B2))
    >>
    >>I think he always wants the end of the string, even if it doesn't end with a
    >>'K'.

    >
    >Good point, but it begs the question whether any symbols after the last
    >K would be considered valid data.


    It's been a long time since I was fragmenting amino acid sequences ...


    --ron

  31. #31
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 29 Dec 2005 14:08:41 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

    >Wrong. [^FP] means any characters other than F or P, not the sequence
    >FP. While your formula may have produced the OP's expected results, if
    >there were an "FK" in the sequence, it wouldn't have. You need to use a
    >negative look-ahead asertion.
    >
    >B2:
    >=REGEX.MID($A$2,"([^K]?|.*?((?!FP)..))K",ROWS(B$2:B2))


    Harlan,

    How about a negative look-behind assertion?

    =REGEX.MID($A$1,".*?(?<!FP)(K|$)",ROWS($B$2:B2))


    --ron

  32. #32
    Harlan Grove
    Guest

    Re: Text manipulation

    "Ron Rosenfeld" <ronrosenfeld@nospam.org> wrote...
    ....
    >How about a negative look-behind assertion?
    >
    >=REGEX.MID($A$1,".*?(?<!FP)(K|$)",ROWS($B$2:B2))


    I keep forgetting Longre uses PCRE rather than VBScript regexp syntax. Too
    many regexp varieties.

    That should work for one literal substring in the assertion, but multiple
    options, e.g., ".+?(?<!(FP|XYZ|D))K" [note: it's more efficient to append a
    K to A1 and use a simple K at the end of the regexp than to use (K|$) - to
    me at least that's a clearer indicator that trailing symbols after the last
    K should be included] could cause backtracking problems. Assertions with
    alternation subexpressions with closures (not the case here, but generally)
    can become a big PITA. If there were multiple prohibited sequences, it may
    be more efficient to mask immediately following cut characters and replace
    them in the results. This is the regexp analog to using ancillary cells for
    intermediate calculations rather than single huge formulas in spreadsheets.
    Just as 2 or 3 formulas may recalc more quickly than a single equivalent
    formula, 2 or 3 simpler regexp operations may recalc more quickly than a
    single, more complex one. Assertions may work well in this case, but I tend
    to avoid them when possible myself.



  33. #33
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On Mon, 2 Jan 2006 14:26:16 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

    >"Ron Rosenfeld" <ronrosenfeld@nospam.org> wrote...
    >...
    >>How about a negative look-behind assertion?
    >>
    >>=REGEX.MID($A$1,".*?(?<!FP)(K|$)",ROWS($B$2:B2))

    >
    >I keep forgetting Longre uses PCRE rather than VBScript regexp syntax. Too
    >many regexp varieties.
    >
    >That should work for one literal substring in the assertion, but multiple
    >options, e.g., ".+?(?<!(FP|XYZ|D))K" [note: it's more efficient to append a
    >K to A1 and use a simple K at the end of the regexp than to use (K|$) - to
    >me at least that's a clearer indicator that trailing symbols after the last
    >K should be included] could cause backtracking problems. Assertions with
    >alternation subexpressions with closures (not the case here, but generally)
    >can become a big PITA. If there were multiple prohibited sequences, it may
    >be more efficient to mask immediately following cut characters and replace
    >them in the results. This is the regexp analog to using ancillary cells for
    >intermediate calculations rather than single huge formulas in spreadsheets.
    >Just as 2 or 3 formulas may recalc more quickly than a single equivalent
    >formula, 2 or 3 simpler regexp operations may recalc more quickly than a
    >single, more complex one. Assertions may work well in this case, but I tend
    >to avoid them when possible myself.
    >


    I will need to keep that in mind. Clearly I have insufficient experience with
    regular expressions to have run into some of the issues you have mentioned.

    I will need to think more about the backtracking issues in order to better
    understand what you write. As well as the pros and cons of trying to avoid
    assertions.

    I have been trying to work out some of the questions posed on the NG's you
    referred me to. Educational. Thanks.

    Best wishes and Happy New Year,

    --ron
    --ron

  34. #34
    Harlan Grove
    Guest

    Re: Text manipulation

    Ron Rosenfeld wrote...
    ....
    >I will need to keep that in mind. Clearly I have insufficient experience with
    >regular expressions to have run into some of the issues you have mentioned.
    >
    >I will need to think more about the backtracking issues in order to better
    >understand what you write. As well as the pros and cons of trying to avoid
    >assertions.

    ....

    If you want to get a real grounding in regular expressions, read
    Jeffrey Friedl's book 'Mastering Regular Expressions, 2nd Ed.', ISBN
    0-596-00289-0. Lots of discussion about backtracking and optimizing
    regular expessions.


  35. #35
    Ron Rosenfeld
    Guest

    Re: Text manipulation

    On 2 Jan 2006 19:05:16 -0800, "Harlan Grove" <hrlngrv@aol.com> wrote:

    >Ron Rosenfeld wrote...
    >...
    >>I will need to keep that in mind. Clearly I have insufficient experience with
    >>regular expressions to have run into some of the issues you have mentioned.
    >>
    >>I will need to think more about the backtracking issues in order to better
    >>understand what you write. As well as the pros and cons of trying to avoid
    >>assertions.

    >...
    >
    >If you want to get a real grounding in regular expressions, read
    >Jeffrey Friedl's book 'Mastering Regular Expressions, 2nd Ed.', ISBN
    >0-596-00289-0. Lots of discussion about backtracking and optimizing
    >regular expessions.


    Thanks. Just ordered a copy from Amazon.


    --ron

  36. #36
    paulinoluciano
    Guest

    Re: Text manipulation

    Thank you Harlan Grove!
    In fact, this solve my first "problem" related to this kind of text
    manipulation. However, now I have to specify in such VBA code that the
    sequence could be cut after or before two or more letters
    simultaneously.
    Example:

    The sequence is

    IADASFDTYEREPWQNMSDFGHKEASADSASSASADRASERAS

    cut after K or R

    0 lost:
    IADASFDTYER
    EPWQNMSDFGHK
    EASADSASSASADR
    SER
    AS

    1 lost cut:
    IADASFDTYEREPWQNMSDFGHK
    EPWQNMSDFGHKEASADSASSASADR
    EASADSASSASADRSER
    SERAS
    ......


    Harlan Grove wrote:
    > Bob Phillips wrote...
    > ...
    > > If nIndex = 1 Then
    > > Cells(i, "A").Value = Right(Cells(i, "A").Value, _
    > > Len(Cells(i, "A").Value) - 1)
    > > Else
    > > Cells(i, "A").Value = Left(Cells(i, "A").Value, nIndex - 1) & _
    > > Right(Cells(i, "A").Value, Len(Cells(i, "A").Value) - nIndex)
    > > End If

    > ...
    >
    > Could simplify: Right(x, Len(x) - y) == Mid(x, y + 1)
    >
    > Then again, the whole If block could be replaced by
    >
    > Cells(i, "A").Value = _
    > Application.WorksheetFunction.Replace(Cells(i, "A").Value, nIndex,
    > 1, "")



  37. #37
    paulinoluciano
    Guest

    Re: Text manipulation


    paulinoluciano wrote:
    > Thank you Harlan Grove!
    > In fact, this solve my first "problem" related to this kind of text
    > manipulation. However, now I have to specify in such VBA code that the
    > sequence could be cut after or before two or more letters
    > simultaneously.
    > Example:
    >
    > The sequence is
    >
    > IADASFDTYEREPWQNMSDFGHKEASADSASSASADRASERAS
    >
    > cut after K or R
    >
    > 0 lost:
    > IADASFDTYER
    > EPWQNMSDFGHK
    > EASADSASSASADR
    > SER
    > AS
    >
    > 1 lost cut:
    > IADASFDTYEREPWQNMSDFGHK
    > EPWQNMSDFGHKEASADSASSASADR
    > EASADSASSASADRSER
    > SERAS
    > .....
    >
    >
    > Harlan Grove wrote:
    > > Bob Phillips wrote...
    > > ...
    > > > If nIndex = 1 Then
    > > > Cells(i, "A").Value = Right(Cells(i, "A").Value, _
    > > > Len(Cells(i, "A").Value) - 1)
    > > > Else
    > > > Cells(i, "A").Value = Left(Cells(i, "A").Value, nIndex - 1) & _
    > > > Right(Cells(i, "A").Value, Len(Cells(i, "A").Value) - nIndex)
    > > > End If

    > > ...
    > >
    > > Could simplify: Right(x, Len(x) - y) == Mid(x, y + 1)
    > >
    > > Then again, the whole If block could be replaced by
    > >
    > > Cells(i, "A").Value = _
    > > Application.WorksheetFunction.Replace(Cells(i, "A").Value, nIndex,
    > > 1, "")



+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1