+ Reply to Thread
Results 1 to 13 of 13

Extract text strings

Hybrid View

  1. #1
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,887

    Re: Macro/VBS to extract text strings...please help!

    Hi Wazing, welcome to the forum.

    Perhaps this code will help you out. It loops through the cells in column A on Sheet1 and creates a list of the values following the href tags into column A on Sheet2.

    Sub extract()
    Dim delim As String, nextRow As Long, i As Long
    Dim ws1 As Worksheet, ws2 As Worksheet, tmpArr As Variant
    
    Set ws1 = Sheets("Sheet1")
    Set ws2 = Sheets("Sheet2")
    delim = "href="""
    nextRow = Sheets("Sheet2").Range("A" & ws2.Rows.Count).End(xlUp).Row + 1
    
    For i = 1 To Sheets("Sheet1").Range("A" & ws1.Rows.Count).End(xlUp).Row
        tmpArr = Split(Cells(i, 1).Value, delim)
        For j = 1 To UBound(tmpArr)
            tmpArr(j - 1) = Left(tmpArr(j), InStr(1, tmpArr(j), """") - 1)
        Next j
        ws2.Range("A" & nextRow).Resize(UBound(tmpArr), 1).Value = Application.Transpose(tmpArr)
        nextRow = nextRow + UBound(tmpArr)
    Next i
    End Sub
    May need some tweaking, but I don't have any sample data to work with...

  2. #2
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Extract text strings

    sub snb()
      sq=sheets(1).usedrange.columns(1)
      for j=1 to ubound(sq)
        c01 =c01 & "|" & join(filter(split(sq(j,1),chr(34)),"HREF"),"|")
      next
      sheets(1).cells(1,1) resize(ubound(filter(split(c01,"|"),"HREF")))=application.transpose(filter(split(c01,"|"),"HREF"))
    End Sub

  3. #3
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    Paul....Thanks for the code. It did it flawlessly! I really appreciate the help!

  4. #4
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    Hey Paul, quick followup question: One variable I had not included.....in my source worksheet1 column (b) has the sourcepage url for the html in column (A) (could be different source url for each cell). Is there a quick addition to the code that will write this value to the column next to each href link from that cell in worksheet2? This would be a huge help in extracting the data....

    Thanks!

  5. #5
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,887

    Re: Macro/VBS to extract text strings...please help!

    Can you provide a worksheet with some sample data that I can test on? Doesn't sound difficult, but want to make sure I get it right the first time.

  6. #6
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    Here is some "sample" data in the same format to test on......Thanks!
    Attached Files Attached Files

  7. #7
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Macro/VBS to extract text strings...please help!

    You may notice that in each raw html cell there are typically duplicates of each href link....and in cells from the same source URL in column 2, I have to manually remove duplicates. So typically we have to take the output file (which may be thousands of lines long) and sort by column 2, then take that html and remove the href links/remove duplicates for all the href links from the same source url, then copy the source url to the final list from that link and put it in the final worksheet for analysis. It would save a multiple steps if it removed duplicates where column 2 is the same as well....not sure if that is possible but thought I would ask. Maybe it is a separate script I run on the results where it looks for duplicates where they have the same source url in column 2......Thanks!

  8. #8
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: Extract text strings

    I can't detect a connection between herefs in column A and URL in column B.

    Based on your example: all the unique hreff in column A
    from cell A20 downward.
    Sub unique_hrefs_snb()
      sq = Cells(1, 1).CurrentRegion
      For j = 1 To UBound(sq)
        sn = Filter(Split(Replace(sq(j, 1), ">", " ")), "href")
        For jj = 0 To UBound(sn)
           If InStr("|" & c01 & "|", "|" & sn(jj) & "|") = 0 Then c01 = c01 & "|" & sn(jj)
        Next
      Next
      sq = Split(c01, "|")
      Cells(20, 1).Resize(UBound(sq)) = Application.Transpose(sq)
    End Sub

  9. #9
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,887

    Re: Extract text strings

    Hi Wazing, I think this amended code does what you're after.

    Sub extract()
    Dim delim As String, nextRow As Long, i As Long, j As Long
    Dim ws1 As Worksheet, ws2 As Worksheet, tmpArr As Variant, tmpStr As String
    
    Set ws1 = Sheets("Sheet1")
    Set ws2 = Sheets("Sheet2")
    delim = "href="""
    nextRow = ws2.Range("A" & ws2.Rows.Count).End(xlUp).Row + 1
    
    For i = 1 To ws1.Range("A" & ws1.Rows.Count).End(xlUp).Row
        tmpArr = Split(Cells(i, 1).Value, delim)
        tmpStr = Cells(i, 1).Offset(0, 1).Value
        For j = 1 To UBound(tmpArr)
            tmpArr(j - 1) = Left(tmpArr(j), InStr(1, tmpArr(j), """") - 1)
        Next j
        With ws2.Range("A" & nextRow).Resize(UBound(tmpArr), 1)
            .Value = Application.Transpose(tmpArr)
            .Offset(0, 1).Value = tmpStr
        End With
        nextRow = nextRow + UBound(tmpArr)
    Next i
    ws2.Range("$A:$B").RemoveDuplicates Columns:=Array(1, 2), Header:=xlNo
    End Sub

  10. #10
    Registered User
    Join Date
    07-16-2010
    Location
    USA
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Extract text strings

    Quote Originally Posted by snb View Post
    I can't detect a connection between herefs in column A and URL in column B.
    Well, the data has been obviously altered to keep it confidential....but a great way to get an example of what I am referring to is to take the sample text and repeat it several times (so you have triple the rows...with the data repeating). I'll give this code a run...thanks for the help! I'll post a response shortly....
    Last edited by Paul; 07-22-2010 at 12:49 PM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1