I'm working on a project where I need to reference a large number of URLs (19344). These urls are stored from A1:A19344. A sample of those urls:
http://www.basketball-reference.com/...1/gamelog/1979
http://www.basketball-reference.com/...1/gamelog/1980
http://www.basketball-reference.com/..../gamelog/1957
http://www.basketball-reference.com/...1/gamelog/1947
http://www.basketball-reference.com/...1/gamelog/1948
http://www.basketball-reference.com/...1/gamelog/2006
http://www.basketball-reference.com/...1/gamelog/2007
http://www.basketball-reference.com/...1/gamelog/2008
This is for a basketball project. Each URL goes to a page that shows the data for every game a player played in a season. As you can see, the first part of each URL is the same, but the /x/yyyyyzz02/ and /XXXX parts change with the player and year. I need the same table from each URL: .WebTables = ""pgl_basic"". Each table has exactly 29 columns, but the number of rows varies between a minimum of 2 and a maximum of 87. I need the entire table.
The closest thing I've found on the web was a query made that referenced 4,100 URLs looking for data in a single cell (another URL). I found that here. The code they used:
I would like to do something similar, a looping program that starts putting data in the last unused row and continues until there are no more hyperlinks. The questions I have:
1. What would be the best way to import large tables with a variable number of rows?
2. What can I do to make sure that it cycles through the hyperlinks row by row but doesn't override any data from previous cycles?
3. What resources can I use to learn more about this type of problem (large web queries)?
4. Which parts of the code above are inapplicable to my project? Which should I adjust or get rid of? Which are vital?
5. This project will likely take 700,000-800,000 rows of data. What methods can I use to make sure it continues the web queries in a new Sheet starting back at A1?
Alternatively, if this is too complex, what would I need to do to set-up a macro/button to do this one at a time?
Bookmarks