Vintage Erotica Forums - View Single Post - Useful Programs

**deepsepia** · June 23rd, 2018, 02:12 PM

Quote:

Originally Posted by shasta

Could you provide me with a template you use? I've taken a look at HTTrack in the past but I found it too complex. I'm trying to use a program called Cyotek Webcopy at the moment, but think I might have to spend a lot of time figuring it out. I did a test run on one page and it downloaded way too much stuff, so that I had to stop it. There are lots of things that would have to be excluded from the crawl.

I don't use that particular program so can't give any specific advice about using it, but here are the general issues with webspiders:

1) how deep will they search
2) will they stay on the same server, or look at other servers if there are links there.

each spider will have some kind of config panel where you set these options-- and they're important. Additionally, most spiders default to "polite" behavior, obeying robot exclusion rules; you usually have to override robot exclusion to rip a site.

As web technology has become more sophisticated, there are all sorts of technologies being used to defeat webscraping; if you can see it in your browser it almost certainly can be downloaded somehow, but it may take some doing, and you'll have to understand your particular application.