I agree, but that does not mean that you should use excessive requests and unnec...

I agree, but that does not mean that you should use excessive requests and unnecessary scraping and overloading the servers to access them. The files should be mirrored. Some may be better copied in other ways, e.g. a git repository can be cloned and mirrored in that way, and should not need to crawl the web pages to do so.