I have to go to a lot of websites and download data, mostly pattern based data, sometimes in a row/column table but most often in a format where the data is set to a pattern but not in a simple table (such as http://www.parischevrolet.com/invent.../New/Special1/) I have routines that can read a URL from a spreadsheet, go to that site and capture the page source. From there, I can process the page source and parse out the data as needed.
The problems that I have are:
1. I'm not being very efficient. I download page source and then parse through the source looking for key start and stop characters. I know there are tools (like Mozenda) where you can visually tap on similar data fields on a page and it knows how to match the pattern and deliver results quickly. I don't know HTML very well so I think I'm missing all kinds of ways to more efficiently find the patterns.
2. Most importantly, most of the sites have similar data and fields, but each site is somewhat different, so I need a routine that is flexible enough to first find the pattern, then download the data, even if the routine has never seen the exact pattern before.
3. I have seen some VBA code that makes specific table queries, such as:
ActiveSheet.QueryTables.Add(Connection:= _
"TEXT;http://ichart.finance.yahoo.com/table.csv?s=VMW&a=" & myMonthStart & "&b=" & myDayStart & "&c=" & myYear & "&d=" & myMonthEnd & "&e=" & myDayEnd & "&f=" & myYear & "&g=d&ignore=.csv" _
, Destination:=ActiveCell). . . . .
but I don't understand that at all and I'm not sure if the data that I'm looking at in the samples below is a "table" in the sense that the table queries would even work.
Here are some examples of the type of data that I need to download:
http://www.dickbrantmeier.com/invent...h/New/Special1
http://www.parischevrolet.com/invent.../New/Special1/
http://www.santaclaracountyhousesfor...FQSCQgodAH8AsQ
I know there are many ways to do this, and I know that I can painstakingly write a script for each specific pattern, but I'm looking for a more efficient way.
Can someone guide me please?
Thanks.
Bookmarks