View Single Post
Old May 30th, 2018, 03:03 AM   #9
deepsepia
Moderator
 
deepsepia's Avatar
 
Join Date: Jul 2007
Location: Upper left corner
Posts: 7,205
Thanks: 47,956
Thanked 83,443 Times in 7,199 Posts
deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+deepsepia 350000+
Default

Quote:
Originally Posted by saint825xtc View Post
That is super helpful Halvar. I appreciate it.

Do you have any ideas on how to pull the image files from these 2 sites. I couldn't find the source location of the images.

http://magzus.com/read/penthouse_let...pril_2017_usa/
This one has an option for "reading online" which opens up a frame and allows you to flip through the pages.
Oh, these are fun. Not all of them work the same way, but this one can be cracked.

What you're going to want to do is to turn on the "Developer Tools" option (I'm using Firefox) and then take a look at the "GET" functions listed under "cached media" tab . . . you can see there the plain URL to the images. These tools in Firefox (there are similar ones in other browsers, they all work similarly) are incredibly powerful, they can let you watch as a particular webpage goes back to a server for graphics resources; since the aim of the designers is to make this hard to do, there are a lot of wrinkles.



So that's what I'm seeing when I load that page, it may be hard to see, but I've selected the "Storage" tab-- its in blue because its selected-- so what I'm looking at are the "GET"s that this page is doing to store locally on my machine, which include the URLs of all the actual JPGs of pages in the magazine.

Notice that I've got the URLs to images, like this

Code:
http://image.issuu.com/170228092429-a44baae32e0c0ec0323085902a9faef1/jpg/page_17.jpg

and notice that the pattern


hXXp://image.issuu.com/170228092429-a44baae32e0c0ec0323085902a9faef1/jpg/page_ [somenumber.jpg]

is repeated for all the pages, so you can use a CURL loop as halvar illustrated above and grab all the pages, by iterating the [somenumber] from 1 to the highest page number

Copy and paste this into Terminal on a Mac (with CURL)

Code:
for x in $(seq 1 148); do curl -o $x.jpg http://image.issuu.com/170228092429-a44baae32e0c0ec0323085902a9faef1/jpg/page_$x.jpg; done

Last edited by deepsepia; May 30th, 2018 at 03:59 AM..
deepsepia is offline   Reply With Quote
The Following 3 Users Say Thank You to deepsepia For This Useful Post: