# Microsoft Office Application Help - Excel Help forum > Excel Programming / VBA / Macros >  >  Getting Web page source (HTML or XML) in VBA?

## Riorin

I would like to retrieve contents of a web page, be it HTML or XML, into VBA variable!

Later, I would chop, cut, parse or extract the data I need.

Both importing as XML or WebQueries is unsatisfactory for a certain number of pages I need. XML has bad schema, WebQuery tells me it can't find any data.

I tried with WinHTTPRequest, but Excel gives me back error "undefined user type" in other words it doesn't recognize that object.

Basically I want the source of web page to become a string in my VBA code. In other words that would be replication of funcionallity of 


```
Please Login or Register  to view this content.
```


 which is provided by AutoHotKey scripting language (if anyone used it, very simple and effective for many things).

Can it be done in Excel's VBA?

----------


## Riorin

I get no takers on this one?

Come on, this is chalenging, it's not like "How do I display a message box before deleting something?" or "How do I make cell A4 red?". Not that I don't think these don't deserve replies, they just have standard answers.

----------


## SuitedAces

I've done some related stuff ,but I never found a way to download the source file.

I did end up of having success downloading tables where I could reference individual elements in tables , count how many tables were on a page etc.

Prior to that I was also seeking to download the source file, but abandoned that route once I able to selectively pick my data directly off the webpage.

If you think it can be any help I can probably dig up an example.

----------


## Riorin

Thanks SuitedAces (are they red or black?)!

The very reason I'm looking to download source is because using WebQuery to retrieve data fails to retrieve data from pages I need. I don't know the reason why is that so.

Example:
http://www.wagerline.com/Scores-Odds...ontID=9361&t=0

Site displaying various betting and sports info. In this particular example we have a boxscore which has perfectly formatted HTML as source. 

You can verify that tables are really present by looking at HTML source or by using WebDeveloper add-on for Firefox which can display information on just about any part of the web page! This particular page has 5 of them: login box, two team statistics, technical fouls and team statistics.

Why Excel's WebQuery can't pick any of them - I have no idea (it gives something like "connection is there, but there are no data"). That's why I need the source to parse out tables myself.

----------


## SuitedAces

> Thanks SuitedAces *(are they red or black?)!*
> 
> The very reason I'm looking to download source is because using WebQuery to retrieve data fails to retrieve data from pages I need. I don't know the reason why is that so.
> 
> Example:
> http://www.wagerline.com/Scores-Odds...ontID=9361&t=0
> 
> Site displaying various betting and sports info. In this particular example we have a boxscore which has perfectly formatted HTML as source. 
> 
> ...



I don't understand the question , red or black ?
Please explain , it's probably an expression I'm not familiar with.

I encountered identical problems with WebQuery , it's very limited and problematic and my first thought was the same as yours ...download the source code and parse it.
I haven't been able to find a way to do that .

But hacking my way through the properties of the html doc (I don't know a damn thing about html) I was able to write procedures that enabled me to extract everything I need.

In fact I believe I have a closely related example to yours...downloading hockey scores.
But with that link you gave me, that might be a whole different animal because I wasn't able to look at the source file from the menu.

----------


## SuitedAces

Ok, check that , I WAS able to open the source file the second time I tried.
That was strange.

----------


## Riorin

> Ok, check that , I WAS able to open the source file the second time I tried.



How did you managed that?





> because I wasn't able to look at the source file from the menu.



Mozilla Firefox (unlike IE) with its extensions or add-ons (WebDeveloper, FireBug, View Source Chart, TableTools etc.) is your friend. You'll find anything you ever wanted to know or extract from a particular web page and much more.





> I don't understand the question , red or black ?



I thought your screenname came from poker terms, "suited connectors" 76s meaning 7 and 6 of the same suit like spades for example!
Aces being cards of the same rank cannot in fact be suited, you can only have two red (hearts and diamonds) or two black (clubs and spades) aces in hole cards.

But same terms have different meanings for different people. You might have meant a "dressed up (suited) champion in his field (Ace)"?

----------


## SuitedAces

> I thought your screenname came from poker terms, "suited connectors" 76s meaning 7 and 6 of the same suit like spades for example!
> Aces being cards of the same rank cannot in fact be suited, you can only have two red (hearts and diamonds) or two black (clubs and spades) aces in hole cards.



LOL Ok I guess my mind is slow today.

Here's an example file.
The procedure will navigate to the number of pages specified by the StartDate and EndDate, I have it set at 3 dates (It can download the entire season).
The code is rough draft but it's functional for an example.

----------


## Riorin

Good stuff, SuitedAces!

I guess if you can instantiate IE object you should be able to instantiate WinHTTPRequest object, however I fail at that!

The simplest solution that I found to actualy getting "your hands on the source of web page" is this (in pseudocode):



```
Please Login or Register  to view this content.
```


Haven't implemented it yet, but it should work.

I was just hoping for a neat all-Excel solution!

----------


## SuitedAces

To be honest I'm not aware of what a WinHTTPRequest object is.

As far as the source code goes, at this point I'm not convinced that parsing it would be any less time consuming than working with the HTML Document Object .
But I think if you go that route you could simplify it by just copying to the clipboard.
Because I know I have seen code to paste the contents of the clipboard in VBA , if you can get the sourcecode to the clipboard I can find you code to then just paste the clipboard .
This way you can eliminate the use of an external file.

----------


## SuitedAces

http://word.mvps.org/FAQS/MacrosVBA/...eClipboard.htm

----------


## Riorin

> As far as the source code goes, at this point I'm not convinced that parsing it would be any less time consuming than working with the HTML Document Object .



It wouldn't, parsing it would be certainly more time-consuming. I just don't know how to access HTML DOM programtically.

----------


## SuitedAces

Give me a liitle time and I will put together a sub that will demonstrate the htlm doc properties that I am familiar with .
With these I was able to write the procedure in the example I uploaded.
Also I will find some links for the HTML Document Object Model (HTML DOM) reference.

----------


## SuitedAces

Riorin here is a link to the HTML reference that I was using to determine what properties I could get from the html doc.
http://msdn2.microsoft.com/en-us/lib...9(VS.85).aspx#

I attached a procedure that I am working on to break  Html docs down into elements after downloading.

Placing the URL in B2 on the sheet will not work.( bugs)
So for now I have the URL of the webpage that you gave me  hardcoded in the procedure (you'll see it in the sheet module)

So just run it and see that what it does is break out the htlm doc into various properties.
Those are the properties that I used for downloading the scores in the other workbook I uploaded.

The idea being that you look down the worksheet and match those elements to the webpage.

From there you can start to write a procedure that extracts the information you need from the html doc.


You aren't limited to the properties you see me using . 
But those are some of the properties I was able find success with.

It might the case that not all of the properties that are contained in the HTLM reference are available in Excel.

What I did was go through the reference and play around with the properties of the html doc by trial and error.

----------


## Riorin

Great stuff SuitedAces. Thanks a ton!

----------


## SuitedAces

Also be aware of properies like length which will give you counts  of collection.

If you break out the elements from the webpage that the hockey download uses and then compare that to the finished code you will get a very good idea on how you can use the properties of the html doc.

I had a problem downloading the second file I gave you when setting IE.visible = false , so there still are a few minor bugs in that code.

----------


## delveneto

Yes, very nice stuff, thank you!  I was looking for the solution to the same problem (grabbing an html page source code since web query won't work in the page that I want to parse), _Almighty Google_  :Smilie:  directed me to this page, and I register just to take a look at your code and found something amazing.

----------


## ne0phyte

This stuff is just what I was looking for but i can't seem to download the attachments. Can someone advise. Thanks

----------


## londolozi

I am trying to automate a Web query that pastes the data on a spreadsheet.
It seems I am 95% there however yet an inch is as good as a mile.

I need to step thru several web pages. I have a cell on the worksheet that creates the new wed address but a just need a command that will use the web address on the worksheet.

As an example the cell P10 formula = http://www.thenewwebaddress.com./1950.xml

Every new web address changes the _195?.xml to say _1951.xml then 1952.xml etc.

I'm using this code I found on this site.

_____________________
Const MyUrl As String = Activesheet.cells("P10").String <<problem child

  With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;" & MyUrl, Destination:=Cells(1, 1))
_____________________

Can anyone advise the code I need to use to grab the changed wed address from cell P10 on the worksheet?

Thanks

----------


## londolozi

Making progress however every time I run the query again it pastes the new date X rows to the right. Is there a true/false I need to change?

.FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False

----------


## tonykau

Brilliant code, SuitedAces!

Working off your code, I was able to turn it into an automated share count extractor from Yahoo Finance (soon to port it to Google Finance as well).  Sheet2 now has a list of ~500 stock symbols in column A, and column B is being filled in one by one (~4 seconds/stock symbol) with the respective share count.  It's anything but robust at this point, but it's handling the job flawlessly.  

It uses two loops - one to check each of the cells of coded output for the correct piece of info, and one to run through the list of symbols, changing the URL each time.

If anyone would like this code, it's attached...

----------


## prabhubreaker

how to get the data from a particular tag in website?

----------


## cmb80

Great coding and kind of along the lines of what i've been looking for - but is there a fix regarding the URL - ie - can this be called from cell A2 as originally designed?

----------


## arlu1201

cmb80,

This is an old thread.  Its best if you discuss this in a new thread or in your own thread.

----------


## cmb80

I have started a new thread but no responses

----------


## arlu1201

You can bump it if you do not get a reply after 12 hrs of starting the thread.

----------

