+ Reply to Thread
Results 1 to 5 of 5

Querying websites -beginner question

  1. #1
    Shiny
    Guest

    Querying websites -beginner question

    Hi,

    I am trying to compile a database of restaurant inspection records that
    are posted on the web.

    The reports are generated through a search engine, but the results just
    pull up static websites.

    The URL Of the website results look typically like this:

    http://www.region.waterloo.on.ca/web...gent&pid=03102

    The number at the end of the URL changes for each record that is pulled
    up. There are hundreds (maybe thousands) of records, each with their
    own URL. I can't seem to identify the range of report numbers, they
    seem to have no set order. However, even if if you type in a report
    number that doesn't exist, it still brings up a webpage.

    Is there any way for me to run a macro in Excel that will repeatedly
    query the website, each time changing the 5-digit number at the end of
    the URL.

    The front page of the website is here:
    http://www.region.waterloo.on.ca/web...nAgent&valid=y

    Thanks,
    Tamsin


  2. #2
    Dick Kusleika
    Guest

    Re: Querying websites -beginner question

    Shiny wrote:
    > The URL Of the website results look typically like this:
    >
    > http://www.region.waterloo.on.ca/web...gent&pid=03102
    >
    > The number at the end of the URL changes for each record that is
    > pulled up. There are hundreds (maybe thousands) of records, each with
    > their own URL. I can't seem to identify the range of report numbers,
    > they seem to have no set order. However, even if if you type in a
    > report number that doesn't exist, it still brings up a webpage.
    >
    > Is there any way for me to run a macro in Excel that will repeatedly
    > query the website, each time changing the 5-digit number at the end of
    > the URL.
    >
    > The front page of the website is here:
    > http://www.region.waterloo.on.ca/web...nAgent&valid=y


    Tamsin:

    The "beginner" answer is no. The best way to get at that data is to get
    access to the database that's behind that site. Usually, that's not
    possible unless you work there. You can automate Internet Explorer and try
    to scrape the information off of the site. This macro loads each page in
    turn and reads through all the hyperlinks until it gets to one that look
    like a restaurant. Paste it in a new workbook and set a reference (tools -
    references) to Microsoft Internet Controls. Sheet1 should then contain a
    list of all the five digit page id's that exist on that site.

    Sub ListPageIDs()

    Dim appIE As InternetExplorer
    Dim docIE As Object
    Dim vaCityCode As Variant
    Dim i As Long, j As Long, k As Long
    Dim idxLinks As Long
    Dim cResults As Collection
    Dim lStart As Long
    Dim sPid As String
    Dim vaTemp As Variant
    Dim dTimeOut As Date

    'starting url doesn't change
    Const sURL As String = _
    "http://www.region.waterloo.on.ca/web/foodinspection.nsf/Index?OpenAgent"
    'this will identify links we want
    Const sJAVALINK As String =
    "onclick=""popUp('aaShowDetails?openagent&pid="

    'the city codes used in the url
    vaCityCode = Array("k", "c", "w", "x")

    Set appIE = New InternetExplorer
    Set cResults = New Collection

    For i = LBound(vaCityCode) To UBound(vaCityCode) 'loop through cities
    For j = 97 To 122 'loop a to z
    For k = 1 To 1000 Step 15 'loop through pages
    appIE.Navigate sURL & "&city=" & vaCityCode(i) & _
    "&prefix=" & Chr$(j) & _
    "&start=" & k & _
    "&valid=y"

    'Loop until the web page is fully loaded
    dTimeOut = Timer
    Do
    DoEvents
    If Timer - dTimeOut > 120 Then
    'if the page won't load, put in column C
    ThisWorkbook.Sheets(1).Range("C65000").End(xlUp).Offset(1,
    0).Value = _
    appIE.Document.URL
    Exit For
    End If
    Loop Until appIE.ReadyState = READYSTATE_COMPLETE

    Set docIE = appIE.Document

    'if there are no records, skip to the next letter
    If InStr(1, docIE.body.innertext, "No Record Found") > 0
    Then
    Exit For
    Else
    'loop through all the links looking for SJAVALINK
    For idxLinks = 1 To docIE.links.Length - 1
    lStart = InStr(1, docIE.links(idxLinks).outerhtml,
    sJAVALINK)
    Debug.Print docIE.links(idxLinks).innertext
    If lStart > 0 Then
    sPid = Mid(docIE.links(idxLinks).outerhtml,
    lStart + Len(sJAVALINK), 5)
    'add link to the collection with sPid as key to
    avoid duplicates
    On Error Resume Next
    cResults.Add sPid & "-" &
    docIE.links(idxLinks).innertext, CStr(sPid)
    On Error GoTo 0
    End If
    Next idxLinks
    End If
    Next k
    Next j
    Next i

    'write the results to sheet1
    For i = 1 To cResults.Count
    vaTemp = Split(cResults(i), "-")
    With ThisWorkbook.Sheets(1)
    .Cells(i, 1).Value = Format(vaTemp(0), "00000")
    .Cells(i, 2).Value = vaTemp(1)
    End With
    Next i

    appIE.Quit

    End Sub

    --
    **** Kusleika
    MVP - Excel
    Daily Dose of Excel
    http://www.*****-blog.com



  3. #3
    Shiny
    Guest

    Re: Querying websites -beginner question

    Hi ****,

    Thanks for taking the time to create that coding. When I run it, I get
    this error: "Compile error: syntax error"
    It highlights the line:
    " Const sJAVALINK As String ="

    Any suggestions on what I can do?

    This is all, currently, Greek to me...


  4. #4
    Dick Kusleika
    Guest

    Re: Querying websites -beginner question

    Shiny wrote:
    > Thanks for taking the time to create that coding. When I run it, I get
    > this error: "Compile error: syntax error"
    > It highlights the line:
    > " Const sJAVALINK As String ="
    >
    > Any suggestions on what I can do?


    I'd guess that it's a word wrapping problem. That line and the line below
    it should all be on the same line. You may run into that again later in the
    code once you fix that one.


    --
    **** Kusleika
    MVP - Excel
    Daily Dose of Excel
    http://www.*****-blog.com



  5. #5
    Shiny
    Guest

    Re: Querying websites -beginner question

    Fixed it and it worked, thanks!


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1