+ Reply to Thread
Results 1 to 2 of 2

Parse Data from HTML Code / Website to Excel

Hybrid View

rkymtns Parse Data from HTML Code /... 04-01-2013, 01:01 PM
rkymtns Re: Parse Data from HTML Code... 04-01-2013, 10:54 PM
  1. #1
    Registered User
    Join Date
    04-01-2013
    Location
    Colorado
    MS-Off Ver
    Excel 2010
    Posts
    2

    Lightbulb Parse Data from HTML Code / Website to Excel

    Good morning!

    I have been working on parsing data from a website into a VBA array from Excel and I am absolutely stuck.

    Objective: Parse out specific data located between certain html tags from a website and load the parsed data into a VBA array

    Here is a sample copy of the html code from the website (website address for the data below is: http://cosprings.craigslist.org/sear...tor&catAbb=sss)


    <blockquote id="toc_rows">
    	<div class="sortby">
    		sort by:
    		<strong>most recent</strong>
    		<a href="/search/?sort=rel&areaID=210&catAbb=sss&query=tractor">best match</a>
    		<a href="/search/?sort=priceasc&areaID=210&catAbb=sss&query=tractor">low price</a>
    		<a href="/search/?sort=pricedsc&areaID=210&catAbb=sss&query=tractor">high price</a>
    	</div>
    	<h4 class="ban resultshdr">
    		<span class="nplink prev">« Prev</span>
    		<span class="nplink next">
    			<a href="http://cosprings.craigslist.org/search/?areaID=210&catAbb=sss&query=tractor&s=100">Next »</a>
    		</span>
    		Found: 192 Displaying: 1 - 100
    		<br>
    	<span class="pagelinks">
    			<b>1</b>
    			|
    			<a href="http://cosprings.craigslist.org/search/?areaID=210&catAbb=sss&query=tractor&s=100">2</a>
    		</span>
    	</h4>
    	class="srch row" data-pid="3647539519" data-longitude="-105.076448916774" data-latitude="40.611477699806">
            <span class="i" data-id="3Ee3K73Mc5I85N75E2d2rd1b1b16a5f981f75.jpg"> </span>
    		<span class="pl">
    			<span class="itemdate">Mar 29</span>
    			<a href="http://cosprings.craigslist.org/grq/3647539519.html">2012 TYM T723 CAB Tractor 74HP With Loader & Bucket</a>
    		</span>
    		<span class="itempnr">
    			<span class="itempp"> $40532</span>
    			<font size="-1"> (Fort Collins, CO)</font>
    		</span>
    		<small class="gc">
    			<a href="/grq/">farm & garden - by dealer</a>
    		</small>
    		<span class="itempx">
    			<span class="p">
    				pic 
    				<span class="maptag" data-pid="3647539519">map</span>
    			</span>
    		</span>
    	<br class="c">
    	</p>
    
    ...code continues on in a repeating pattern of class "srch row"
    What I need loaded into the VBA array is in BOLD RED. So from the example code above, I would need the array to have the elements:

    Mar 29
    http://cosprings.craigslist.org/grq/3647539519.html
    2012 TYM T723 CAB Tractor 74HP With Loader & Bucket
    $40532
    (Fort Collins, CO)
    farm & garden - by dealer


    For the data contained within every class "srch row" in the HTML, I need the one row (each separate line of data above should be put into a separate column of the array, on the same row) of the VBA array to be filled with all of the data in the example above and then move on to the next array row entry (for each unique data set contained within the HTML class "srch row") until there are no more "srch row" data in the HTML code.

    I hope that makes sense - I have tried to make my way through the coding and I cannot get it to work right. Here is the code I have now:

     With CreateObject("MSXML2.XMLHTTP")
              .Open "GET", "http://cosprings.craigslist.org/search/?areaID=210&subAreaID=&query=tractor&catAbb=sss"
               .send
             Do: DoEvents: Loop Until .readystate = 4
             Set htmlFile = New HTMLDocument
                htmlFile.body.innerHTML = .responseText
                .abort
          End With
    
    
          With htmlFile
          rowCount = .getElementsByClassName("srch row").Length
          On Error Resume Next
          ReDim data(1 To rowCount, 1 To 5)
          On Error Resume Next 'in case the search produces 0 results
          x = 1
        
         For Each orow In .getElementsByClassName("srch row")
             data(x, 1) = orow.Children(0).innerText
             data(x, 2) = orow.Children(1).href
             data(x, 3) = orow.Children(2).innerText
                 data(x, 4) = orow.Children(3).innerText
              data(x, 5) = orow.Children(3).innerText
               
              x = x + 1
         Next orow
    
          
          
        End With

    I am thinking that the above code is not working because the class "srch row" is nested within the HTML code?


    Thank you in advance for the help!

  2. #2
    Registered User
    Join Date
    04-01-2013
    Location
    Colorado
    MS-Off Ver
    Excel 2010
    Posts
    2

    Re: Parse Data from HTML Code / Website to Excel

    May have it figured out... still need a little help though

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1