I had a need to return to this; here are my notes, I start with some products. Also the related pages widget has been tuned.
- http://docs.seleniumhq.org/projects/ide/
- https://www.seleniumhq.org/projects/webdriver/
- http://wwwsearch.sourceforge.net/mechanize/
- http://maxq.tigris.org/
- http://twill.idyll.org/
I found these articles in the Summer of 2020
- https://thenextweb.com/syndication/2020/07/22/how-to-use-python-and-selenium-to-scrape-websites/, Web scraping has been used to extract data from websites almost from the time the World Wide Web was born. In the early days, scraping was mainly done on static pages – those with known elements, tags, and data.
- https://towardsdatascience.com/web-scraping-a-less-brief-overview-of-scrapy-and-selenium-part-ii-3ad290ce7ba1 , the first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website.
Also on this wiki,
I have added some content about scrapy and more recent insights into web scraping.