I have an excel spreadsheet that has to make a ton of web queries (around 9000 at the moment) and is using 2 websites to filter the result page to text only before searching for the data (if you are reading this, thanks again JasperD for your help)
Those 2 text extractors are boilerpipe (http://boilerpipe-web.appspot.com/ex...ctImages=&url=) and google (http://www.google.com/gwt/x?noimg=1&u=).
Boilerpipe is super fast in returning the result but the issue is that it has a low daily usage cap and stops working after only a short while. Google (as expected) is more robust and doesn't seem to have such a limit, but it's roughly 3 times slower to return a result, and when you compound that over 9000 queries, the difference becomes very substantial.
Does anyone know better text extractors out there that will split out a text only version of a URL you give them, doing it fast (hopefully as fast/faster than boilerpipe) and do not have a daily usage limit? (or at least one that's way higher than my needs). If you do, please post away, the help would be much appreciated.
Bookmarks