So I'm new to VBA, but I'm also completely hooked on it. I've learned Quite a bit, and right now I'm trying to learn how to pull Data from IE. Web Scraping.
so wit m monster of a project that I've been posting all over this site i would like to populate a sheet from IMBD
Web Queries are much too slow for this, I'm searching the site using a Variant from Cell value, then pulling specific values from the site to the same row next column again dependent on Cell Value, then moving to the next row.
I have no code for this, and anything pointing to the right direction will give me something to build off. I'm just not sure where to start.
There are two locations I'm trying to read the source Data from:
The Main Site: with "tt1520211" being the Variable
HTML Code:
and: Same as the Main Site, without "combined" and "episodes?season=" and the number "1" also being the Variable
then Matching Ep1 to a line on the workbook
HTML Code:
and the Sheet that the code will read from: Series_V1.xlsx
on the sheet the variables will come from Sheets("Series"), range being A3:A for hte title, but most importantly B3:B for the "tt1520211"
then sheets("The Walking Dead") etc will contain also the "tt1520211" as well as the number Variable, and the ,EP1 to match.
the code i'm trying to scrape looks like this:
PHP Code:
<div class="clear" itemscope itemtype="http://schema.org/TVSeason">
<meta itemprop="numberofEpisodes" content="6"/>
<div class="sort">
<button data-direction="asc" class="small sort_direction btn sort_asc" title="Reverse the order"> </button>
</div>
<h3 id="episode_top" itemprop="name">Season 1</h3>
<div class="list detail eplist">
<div class="list_item odd">
<div class="image">
<a
onclick="(new Image()).src='/rg/episodes/image-1/images/b.gif?link=/title/tt1589921/';"
href="/title/tt1589921/"
title="Days Gone Bye"
itemprop="url"> <div data-const="tt1589921" class="hover-over-image zero-z-index ">
<img width="120" class="zero-z-index" alt="Days Gone Bye" src="http://ia.media-imdb.com/images/M/MV5BMTM5NDkxNDM0Nl5BMl5BanBnXkFtZTcwNDI3MDQwNA@@._V1_SX120_CR0,0,120,180_.jpg">
<div>S1, Ep1</div>
</div>
</a> </div>
<div class="info" itemprop="episodes" itemscope itemtype="http://schema.org/TVEpisode">
<meta itemprop="episodeNumber" content="1"/>
<div class="airdate">
Oct. 31, 2010
</div>
<strong><a
onclick="(new Image()).src='/rg/episodes/episode-1/images/b.gif?link=/title/tt1589921/';"
href="/title/tt1589921/"
title="Days Gone Bye"
itemprop="name">Days Gone Bye</a></strong>
<div class="item_description" itemprop="description">Sheriff Deputy, Rick Grimes, wakes up in the hospital, after being shot, to find his town overrun by flesh-eating zombies. After making friends with survivor Morgan Jones and his son Duane, Rick sets out to find his wife and son.</div>
<div class="popoverContainer">
<div class="watchInfo">
<label><span>Watch now</span></label>
<div class="watchIcon"><img src="http://ia.media-imdb.com/images/G/01/imdb/images/video/provider_logos/amazon-1815097792._V149567120_.gif" /></div>
<div class="watchIcon"><img src="http://ia.media-imdb.com/images/G/01/imdb/images/video/provider_logos/amazon-1815097792._V149567120_.gif" /></div>
</div>
<div class="htwPopover">
<div class="popoverClose touch"></div>
<h5>Watch now</h5>
<div class="watchOptionList">
<div class="watchOption">
<a
onclick="(new Image()).src='/rg/episodes-watch-now/on-demand-amazon/images/b.gif?link=/video/amazon/vi3316753177/offsite';"
href="/video/amazon/vi3316753177/offsite"
target="_blank"> <div class="watchIcon"><img src="http://ia.media-imdb.com/images/G/01/imdb/images/video/provider_logos/amazon-1815097792._V149567120_.gif" /></div>
Watch on
Amazon
</a> </div>
<div class="sep"></div>
<div class="watchOption">
<a
<div>S1, Ep1</div> for the Matching using ("C6")
This would Definitely get me started in the right directcion, I would be able to make the loops, and things of such nature, but the IE interaction, and pointing to the correct items for the .getElementsbytagname type items, I'm not at all clear on. Again, any help would be apprciated, and i will of course run with what ever gets me started.
Thanks Again for the time
itemprop="name">Days Gone Bye</a></strong> for Cell.Value("D6")
<div class="airdate">
Oct. 31, 2010
</div> for Cell.Value("H6")
<div class="item_description" itemprop="description">Sheriff Deputy, Rick Grimes, wakes up in the hospital, after being shot, to find his town overrun by flesh-eating zombies. After making friends with survivor Morgan Jones and his son Duane, Rick sets out to find his wife and son.</div>
for ("E6")
'having alot of issues here
Set sht = Sheets("Sheet1")
RowCount = 1
Set title = .document.getElementsByName("title=")
If Not title = "" Then
sht.Range("M" & RowCount) = title.Item(0).Value
End If
Right now I'm just dumping code into sheet1 to connect the dots, but anyone willing, i would still very much appreciate a clue
Sub Need_A_Clue()
Dim objIE As InternetExplorer
Dim lRow As Long
Dim abc As String
Dim striEst As String
Dim ele As Object
Dim Airdate As Variant
Dim title As Object
Dim i As Long
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Visible = False 'true
.Navigate "http://www.imdb.com/title/tt1520211/episodes?season=1"
For lRow = 2 To 2
Do While .Busy Or _
.ReadyState <> 4
DoEvents
Loop
Set sht = Sheets("Sheet1")
RowCount = 1
For Each itm In .document.all
'If itm.tagname = "DIV" And itm.classname = "airdate" Then sht.Range("C" & RowCount) = Left(itm.innerText, 1024)
'If itm.tagname = "DIV" And itm.classname = "" And itm.ID = "" Then sht.Range("A" & RowCount) = Left(itm.innerText, 1024) 'And Left(itm.innertext, 2) = "S1"
'If itm.tagname = "STRONG" And itm.classname = "" And itm.ID = "" Then sht.Range("B" & RowCount) = Left(itm.innerText, 1024)
'If itm.tagname = "DIV" And itm.classname = "item_description" And itm.ID = "" Then sht.Range("D" & RowCount) = Left(itm.innerText, 1024)
sht.Range("A" & RowCount) = itm.tagname
sht.Range("B" & RowCount) = itm.classname
sht.Range("C" & RowCount) = itm.ID
sht.Range("D" & RowCount) = Left(itm.innerText, 1024)
RowCount = RowCount + 1
'having alot of issues here
Set sht = Sheets("Sheet1")
RowCount = 1
Set title = .document.getElementsByName("title=")
If Not title = "" Then
sht.Range("M" & RowCount) = title.Item(0).Value
End If
RowCount = RowCount + 1
'Set sht = Sheets("Sheet2")
RowCount = 1
For Each ele In .document.all
sht.Range("H" & RowCount) = ele.tagname
sht.Range("I" & RowCount) = ele.classname
sht.Range("J" & RowCount) = ele.ID
sht.Range("K" & RowCount) = ele.innerText
RowCount = RowCount + 1
Next ele
RowCount = RowCount + 1
Next lRow
End With
Set objIE = Nothing
MsgBox "finished"
End Sub
to learn it, for lack of better discription. I'm actually building the same thing in paralell using the xml as we did with the movies, but theres alot of projects i can do if i learn the IE automation.. Though i'm finding out that its a great deal slower and difficult to write.. least for me
Automation of IE should always only be a last resort, it's extremely slow as you have found. In the majority of times it's not necessary since you can run GET and POST requests directly from Excel using WinHTTP and XML objects, once you understand how web-servers send and receive information, it isn't too difficult to emulate the browser calls from Excel, try using the developer tools in Chrome/IE to see what data is actually sent/received.
So My Frustration has hit it's peak, and will be putting the IE automation on the back burner for the time being..
this is a link to the same project using the XML search, rather than the IE approach.. which is Rather Irritating XML Series Search
Bookmarks