+ Reply to Thread
Results 1 to 13 of 13

Is it possible to scrape header and meta tags from a site without a browser?

  1. #1
    Registered User
    Join Date
    11-01-2009
    Location
    California
    MS-Off Ver
    Excel 2013
    Posts
    46

    Is it possible to scrape header and meta tags from a site without a browser?

    Greetings and Happy New Year to all!

    I want to scrape some sites and I have most of the data being pulled in, but not all that I am looking to get.
    I am trying to stay with QueryTables as it is MUCH faster, but, I am not seeing how (if possible) to get HEADER data. I am mainly looking for the meta tags.


    If not with querytables, is it possible with other tools with out loading the page?
    If, no, what is the fastest tool to use?


    I have tried Selinium, but I seem to have a botched install that I have not been able to correct and it seems that on the two computes I use, it needs differnt flavors (even though both use chrome). This was several years ago and I got so fed up, I stopped, so I do not remember the install differnce, but seemed to me it was a differnt library. On this computer it is Selinium Type Library.

    Thanks
    Mc

  2. #2
    Forum Moderator alansidman's Avatar
    Join Date
    02-02-2010
    Location
    Steamboat Springs, CO
    MS-Off Ver
    MS Office 365 Version 2407 Win 11 Home 64 Bit
    Posts
    24,067

    Re: Is it possible to scrape header and meta tags from a site without a browser?

    1. Are you still using XL2007. If you are using a later version, please update your profile to reflect the current version you are using
    2. Please provide a URL for the web site you are trying to scrape so that we can see specifically what you are trying to do.
    3. Provide specifically the information on the web site you wish to load to an excel sheet.
    Alan עַם יִשְׂרָאֵל חַי


    Change an Ugly Report with Power Query
    Database Normalization
    Complete Guide to Power Query
    Man's Mind Stretched to New Dimensions Never Returns to Its Original Form

  3. #3
    Registered User
    Join Date
    11-01-2009
    Location
    California
    MS-Off Ver
    Excel 2013
    Posts
    46

    Re: Is it possible to scrape header and meta tags from a site without a browser?

    ello Alan and all others,

    Sorry about my old profile info! I can see how it could be imporatant and I should have actaully included it im my request!

    As a general rule, I normally do not include links to sites, but since you have asked, I am providing the details.
    I have a web site (timberturners.com) and am a authorized reseller of Penn State Industry (pen/pencil making kits and other wood working kits) and I want to promote my bussiness.
    I have asked them for the information and they said I could get it from the web site, which I have been doing for several years via VBA and IE. I am just looking for a faster way.
    Here is an example of the url I want to get infor from: https://www.pennstateind.com/store/PKY03.html

    I am looking to get things like: <meta property="og:title" content="Padauk Yo-Yo Blanks"> or <meta name="description" content="These Padauk yo-yo blanks will make 2 beautiful yo-yos">

    If I use the VBA with IE, I can get this data, but by looping through these pages, it takes a very long time.
    I am seeing if there is a way to access this type of data without loading the page using something like Query or MSXML2.XMLHTTP60, I currently have versions that scrape the site for most of data with both methods. The query method is the fastest and would prefer this, but I am not sure it has the header data exposed.

    Thanks!
    Mc
    Last edited by mctabish; 01-14-2023 at 03:08 PM. Reason: spelling and grammer

  4. #4
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: Is it possible to scrape header and meta tags from a site without a browser?


    Hello,

    as you have some request VBA procedure so you must already know you can grab the data via their ID
    like you can check just using the webbrowser inspector tool …

  5. #5
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: Is it possible to scrape header and meta tags from a site without a browser?


    … Or you can load the webpage initial code via a web request VBA procedure
    then use the standard VBA text functions to find out what you are really looking for …

  6. #6
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Cool Try this !


    According to my previous post a starter VBA demonstration reading the webpage initial code :

    PHP Code: 
    Sub DemoReq1()
        
    With CreateObject("WinHttp.WinHttpRequest.5.1")
            .
    Open "GET""https://www.pennstateind.com/store/PKY03.html"False
            
    .setRequestHeader "DNT""1"
             
    On Error Resume Next
            
    .send
             
    If Err.Number 0 Then If .Status 200 Then T$ = .responseText
             On Error 
    GoTo 0
        End With
            
    If "" Then Beep Else MsgBox Split(Split(T"<meta name=""description"" content=""")(1), """>")(0)
    End Sub 
    ► Do you like it ? ► So thanks to click on bottom left star icon « Add Reputation » !

  7. #7
    Registered User
    Join Date
    11-01-2009
    Location
    California
    MS-Off Ver
    Excel 2013
    Posts
    46

    Re: Is it possible to scrape header and meta tags from a site without a browser?

    Hello Marc
    WOW! Just AMAZING!

    I was a little confused at first as I did not see that the split actually contained another split! I have not used the split very often and probably should get much more famailar with it!

    But the MAGIC is with the .setRequestHeader "DNT". I am pretty new to using the WinHttpRequest, as I have written all of my code with ie about 5 years ago. It is just SO SLOW and now obsolete. With IE, I have several classes that I relied on and it could do everything I wanted, but again, ust WOW!

    At fist I was having a little issue as you sample used only the ">" , but in my other tags, I needed to use "/>" (which is totally logical based on the DOM)
    Took me 30 min to run getting my 2500 products (including writing to spreadsheet and progress bar overhead)

    Again a BIG thanks!

  8. #8
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: Is it possible to scrape header and meta tags from a site without a browser?


    As a badly coded progress bar can dramatically slow down the execution so check the time needed without using it …

  9. #9
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: Is it possible to scrape header and meta tags from a site without a browser?


    Another way to reduce the execution time :

    if you use my demonstration within a loop so a new request object is created each time the code executes the With codeline.
    So the better is the With codeline outside the loop aka the loop must be within the With block
    in order only a single object is used so far faster
    or you may use an Object variable needed to be set to Nothing before the procedure ends …

  10. #10
    Registered User
    Join Date
    11-01-2009
    Location
    California
    MS-Off Ver
    Excel 2013
    Posts
    46

    Re: Is it possible to scrape header and meta tags from a site without a browser?

    Quote Originally Posted by Marc L View Post

    Another way to reduce the execution time :

    if you use my demonstration within a loop so a new request object is created each time the code executes the With codeline.
    So the better is the With codeline outside the loop aka the loop must be within the With block
    in order only a single object is used so far faster
    or you may use an Object variable needed to be set to Nothing before the procedure ends …
    Hello Marc,
    I am very happy with tthe results!
    I will take you recommendations and compare with my current code. The previous code using IE, scraping the whole page (this is just meta tags) took up to 4 x 12 hour runs.
    I am sure that the scrapping of the rest of the data will NOT increase the time much, as the loading of IE was the true bane!.

    Again, very happy and will make the changes you recommend.

    For a run of 2500 records doing this type of task, I do like to know what is going on (I could just put the record num in the status bar), and at this point I have NOT done ANY performace tweaking, I thinks I can eek out much better perfomarnce.

  11. #11
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Arrow Re: Is it possible to scrape header and meta tags from a site without a browser?


    As often the progress bars are coded by feet like you can see in this thread, until the end …

  12. #12
    Registered User
    Join Date
    11-01-2009
    Location
    California
    MS-Off Ver
    Excel 2013
    Posts
    46

    Re: Is it possible to scrape header and meta tags from a site without a browser?

    Hello Marc,

    I am back...Been having really buzy due to our shop got flooded in Jan...

    I thought I could figure out how to do this:
    if you use my demonstration within a loop so a new request object is created each time the code executes the With codeline.
    So the better is the With codeline outside the loop aka the loop must be within the With block
    in order only a single object is used so far faster
    or you may use an Object variable needed to be set to Nothing before the procedure ends


    But, I have not figured it out
    Certain things I am pretty good at with VBA, but I have many times just used what I know, not always expanding my skills until I have NEED (or heavy desire...). I have even gone so far as using manipulating different exe programs using thing like FindWindow() , submit, click, not sendkey (too unreliable!)
    The items I need to scrape are located in column A of the first worksheet, if this info is needed.

    I am also looking on expanding this to use scrape the main pages of the item (example item: https://www.pennstateind.com/store/PKSLFUNSG.html). I do have this working in the browser version, so I have all the field names (and how to find things that DONT have names).

    Would I need to make a separate winhttprequest call or can I request it on the same current winhttprequest?

    Samples of things I would like is the PRICE.
    I can use the getElementsByClassName "tab-price" for this.

    Please Login or Register  to view this content.
    And the "product number". the product number is not a labeled item, so I would need to open the previous DIVCLASS and then parse for the "#", and then get the next item. It is basically
    Please Login or Register  to view this content.
    Is it possible to use something like the full xpath? It would help instead of using the recursive drilling down
    For example, for the item at https://www.pennstateind.com/store/PKSLFUNSG.html, to find the first accessory, the xpath would be:
    Please Login or Register  to view this content.
    For about 2000 items, woul dit be best to separate the HEADER scrape, and then the main body scrape?


    Thanks a lot!

  13. #13
    Forum Expert
    Join Date
    11-24-2013
    Location
    Paris, France
    MS-Off Ver
    Excel 2003 / 2010
    Posts
    9,831

    Question Re: Is it possible to scrape header and meta tags from a site without a browser?


    If the price is within the main page code so yes you can extract hit directly with VBA text functions like Split …
    So do you see the price in the responseText ?

    As the product number seems already in the URL …

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 3
    Last Post: 05-27-2020, 12:22 PM
  2. Scrape HTML meta data
    By ck248 in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 02-18-2017, 05:36 PM
  3. [SOLVED] Help Modifying Scrape Code after Web Site Change.
    By Doc.AElstein in forum Excel Programming / VBA / Macros
    Replies: 87
    Last Post: 08-12-2016, 08:56 AM
  4. Web Scrape To Excel VBA: JAVA Site
    By excel2425 in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 07-18-2016, 05:26 PM
  5. Replies: 0
    Last Post: 09-09-2008, 01:47 PM
  6. Getting Meta Tags,
    By ezykiwi in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 12-11-2006, 01:06 AM
  7. how do web query excel 2003 from site in browser?
    By Sayville Library in forum Excel General
    Replies: 0
    Last Post: 05-03-2006, 03:40 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1