Register on the forum now to remove ALL ads + popups + get access to tons of hidden content for members only!
vintage erotica forum vintage erotica forum vintage erotica forum
vintage erotica forum
Home
Go Back   Vintage Erotica Forums > Information & Help Forum > Help Section > Image Hosts
Best Porn Sites Live Sex Register FAQ Members List Calendar Mark Forums Read

Notices

Reply
 
Thread Tools Display Modes
Old November 4th, 2017, 04:01 PM   #171
halvar
Blocked!
 
Join Date: Jan 2008
Location: HH
Posts: 1,963
Thanks: 115,040
Thanked 32,801 Times in 1,955 Posts
halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+
Default Future Plans for forum-backup

I will continue to improve it gradually. If there is something that could make things easier for you give me a note. Often little things achieve much.
Of course bugs have a priority if you find one, drop me a line and I will look into it.

Long term topics are:
Duplicate Detection
  • adding an MD5 hash for each image to the database
  • try image finger printing to find duplicates that have the same content but different resolution or quality. This is something that interests me from a mathematical point of view. Maybe http://phash.org/ could be of use here.
Dealing with edited/updated posts
I do not know of a way to specifically search or get a list of updated posts. So one way to deal with it could be
  • rename the old thread page files with a date-suffix like 't1234-p3-x.html.20171104
  • re-download all thread pages (this is something that should be avoided)
  • parse the posts again and compare the result.
  • If the result (imagelinks) differs then
  • add the links to the database, but with an offset at the index to distinguish them from the original links. like 1001, 1002, ...
  • an alternative to the offset could be storing thread edit date with the links
  • download the new links. This would probably lead to duplicates, but I want to keep the original file.
  • I could do an md5 or content compare of old and new files and delete the new files if they are identical. But what if the new files have nicer names? So I would rather keep all the files. Having is better than needing ;-)
  • Since this is a lengthy process that also causes load on the server this process shoul be started rarely (once a year per thread?). Or only started manually if one needs it.
  • Do edits matter anyway?
Dealing Deleted Posts
Last week a couple of pages were deleted from a thread due to a DMCA takedown:
http://vintage-erotica-forum.com/sho...&postcount=718
http://vintage-erotica-forum.com/t58...n-phoenix.html

So locally where pages 1 to 9, on the server only 1 to 6. If new posts where added the pages 6-8 would not have been downloaded again.

A workaround for this is
  • renaming page files 6 to 9 (append a date suffix like t1234-p6-x.html.2011104)
  • force the processing of the thread during the next run. This can be achieved by updating the thread using the webconsole without changing the page no. This deletes the last process date of the thread and the thread is processed during the next run.


I am planning to do some of this in my week off work in January.
halvar is offline   Reply With Quote
Old November 4th, 2017, 10:06 PM   #172
deezer
Sunny Mod
 
deezer's Avatar
 
Join Date: Jan 2016
Posts: 5,511
Thanks: 48,469
Thanked 53,318 Times in 5,482 Posts
deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+
Default

Does the forum-backup detect merges?

E.g., this thread was originally started by SIT (now post #4).
Later were three posts of Jmailman merged into the thread, because
they were posted in the LKMM-thread, but belongs to this thread.
As Jmailman's post have a lower post-id, they are now on top of the thread.

If you now would parse the thread after the merge there would be no update detected,
because forum-backup parses the thread for the latest update.
Therefore the new added/merged posts wouldn't be saved, right?

If the above is correct, wouldn't it make sense, to compare the whole thread using the
post-id's, to detect updates, or would this be to time consuming?

Wouldn't this procedure solve the problem with Deleted Posts too?
__________________
.
deezer is offline   Reply With Quote
The Following 7 Users Say Thank You to deezer For This Useful Post:
Old November 5th, 2017, 03:04 PM   #173
halvar
Blocked!
 
Join Date: Jan 2008
Location: HH
Posts: 1,963
Thanks: 115,040
Thanked 32,801 Times in 1,955 Posts
halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+
Default

I hesitate parsing whole threads. For a few threads it could be done. But it is too much if there are hundreds of threads configured with tens or hundreds of pages each.

A solution could be:
  • compare the post numbers on the last local page with the last thread page on the server
  • if the page does not exist on the server or if local posts are missing on the server's version of the page:
    move backwards to the previous and repeat comparison until a page is found where all posts match
    download all pages from this page forward.
Posts and links are stored in the database using the post number as key. A post would not be overwritten if it is found on another page.
But the page no, which is stored with the post, should be updated.

addendum:
But on problem persists: a thread is only processed, if the change results in an updated time stamp on the thread list:

Last edited by halvar; November 5th, 2017 at 03:13 PM.. Reason: fixed typo, addendum
halvar is offline   Reply With Quote
Old January 24th, 2018, 08:15 AM   #174
halvar
Blocked!
 
Join Date: Jan 2008
Location: HH
Posts: 1,963
Thanks: 115,040
Thanked 32,801 Times in 1,955 Posts
halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+
Default

I have created a new version of the forum backup tool:
Version 0.0.18, https://1fichier.com/?p5izqj8leqw5m3c3jqjt
  • usability is improved, editing of files is not needed anymore
  • a bug was fixed that prevented the generation of reports under Windows
  • there are three download modes supported: single post, threads, backup
I do not know if anybody is still interested. If not, I am not disappointed.
The tool is mainly created for my personal use. It would be nice somebody else finds it useful. But if not then not. I know that it is not very easy to use.

What does it do
  • download pages from a forum, threads or posts
  • store the information about these in an embedded database
  • parse posts for links to image hosts
  • download linked images
About
  • this is a java program. The program requires a Java8 runtime environment
  • the distributed zip contains a JRE for Windows
  • it is developed on Linux, but it also run on Windows.
  • I personally use it on Linux and only test it briefly under Windows
  • configuration values are stored by default under <user home>/forum-backup/forum-backup.properties
  • there are log files in <user home>/forum-backup/logs
  • downloaded files and the embedded database are stored in configured location
  • do not alter your thread display options like display mode and number of posts per page in your forum settings. It would mess up the downloaded pages.
Upgrade
New versions pick up settings and data from previous versions.
  • unzip zip file
  • start program
  • goto settings and check configuration values
  • do a check configuration

New Installation
unzip the zip file anywhere

start forum-backup.bat, e.g. by double click. A console window opens

open the url http://localhost:3137/ with your browser

click on the settings page link and enter your values

click on 'Check Configuration'. The output should look like this (every line starting with 'OK')

click on 'Save Configuration' to save the configuration. The configuration values are saved to <user-home>/forum-backup/forum-backup.properties.
e.g. 'C:\Users\<your login name>\forum-backup\forum-backup.properties' depending on your operating system.

Click on the 'Back to main view' link and start with downloading a single post

There are tree ways to download
posts
Only posts and linked images are downloaded. Nothing is stored in the database.
Enter one or more post numbers and click on 'Download Post'

The images are saved to <storage-path>/adhoc-posts/<post-folder>, e.g. 'C:\data\forum-storage\adhoc-posts\Judith_Ramirez-post-2230617'


threads

One or more threads are downloaded. For each thread
  • thread pages are downloaded if they are not downloaded yet
  • the last page of a thread is always downloaded, because it could contain new posts.
  • the downloaded pages are parsed for posts and image links
  • this information is stored in the database
  • image links that have not been downloaded yet are downloaded
  • failed image downloads are marked for retrying them later
  • previously failed image downloads are retried
enter one ore more threads with optional start page and click 'Download Threads'

during the download a simple progress bar is shown

downloaded files are stored at the storage location:


thread backup
  • for all active threads configured in the database the forum is determined
  • forum pages are downloaded for these until a the last forum scan date is reached (maximum 14 days)
  • if a thread was modified since that date it is downloaded
  • due to the overhead of checking for modified threads it is only economically if you have many threads (> 100) configured and if the process is started regularly, e.g. once a day.

Last edited by halvar; May 10th, 2020 at 08:54 AM.. Reason: updated link to 0.0.18
halvar is offline   Reply With Quote
Old March 31st, 2018, 07:47 AM   #175
halvar
Blocked!
 
Join Date: Jan 2008
Location: HH
Posts: 1,963
Thanks: 115,040
Thanked 32,801 Times in 1,955 Posts
halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+
Default 0.0.7 with pixhost.to support

The new version 0.0.7 was released. It contains:
* support for pixhost.to
* pixhost.to links are converted to pixhost.to when downloading. This should not be necessary anymore since the links were fixed on the forum
* a new report to show downloads of the last n days, with optional filtering for failed downloads.

Full install (including java runtime)
https://1fichier.com/?d4gkjyjy94 (66MB)

Only the jar:
https://1fichier.com/?lwymlqpmoa (7MB)
If you have a working installation just replace your forum-backup-0.0.x.jar with the new forum-backup-0.0.7.jar
halvar is offline   Reply With Quote
Old April 1st, 2018, 02:57 PM   #176
The Old Hacker
Vintage Member
 
Join Date: Jun 2007
Location: England Town
Posts: 1,107
Thanks: 1,592
Thanked 19,880 Times in 984 Posts
The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+The Old Hacker 100000+
Default

Halvar: These GitLab links all seem to be dead. It's the same for the Teknik Git links posted earlier in this thread. Are you still hosting on Git?

Last edited by The Old Hacker; April 1st, 2018 at 03:14 PM.. Reason: Edited text.
The Old Hacker is offline   Reply With Quote
The Following 8 Users Say Thank You to The Old Hacker For This Useful Post:
Old April 1st, 2018, 03:39 PM   #177
halvar
Blocked!
 
Join Date: Jan 2008
Location: HH
Posts: 1,963
Thanks: 115,040
Thanked 32,801 Times in 1,955 Posts
halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+
Default

Quote:
Originally Posted by The Old Hacker View Post
Halvar: These GitLab links all seem to be dead. It's the same for the Teknik Git links posted earlier in this thread. Are you still hosting on Git?
No, it is not in a public repo. First I moved to gitlab because I had technical problems with teknik.io. Then I had doubts about having it in a public repo. I was paranoid and wanted to remove most of the stuff I had online.

Also I did not want to have the documentation with screen shots containing VEF in a public repo. Even though issue tracking and wiki would be nice.

The source is included in the zip download. It is a Java project using gradle. If you have a JDK installed you can build it using gradle, e.g. './gradlew build'.

Please drop me a line if you have suggestions for improvements or found errors. (I know of the text errors on the settings page).

Sadly I cannot make this my official side project, since "Downloader for erotic material" does not look that good on a resume. So I have another official side project where I spend most of my free programming time on.
halvar is offline   Reply With Quote
Old April 15th, 2018, 06:15 PM   #178
halvar
Blocked!
 
Join Date: Jan 2008
Location: HH
Posts: 1,963
Thanks: 115,040
Thanked 32,801 Times in 1,955 Posts
halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+
Default forum-backup 0.0.8 with improved handling of deleted posts/pages

The new version 0.0.8 was released. It contains:
  • improved logging of exceptions (now with stack trace)
  • improved handling of deleted posts/pages

handling of deleted posts/pages
situation:
A thread had pages from 1 to 10 and those were previously downloaded. Now multiple posts are deleted and only pages 1 to 7 exist. A new post is added to page 7.

previous behavior:
Page 7 is not downloaded because locally page 10 exists. The new post is missed.

new behavior:
previously downloaded pages 8 to 10 are renamed by appending the date to the filename.
page 7 is downloaded.

There are still scenarios where new posts are missed, but this fixes a major one. I expect more deletions in the future (because of dmca, cleanups and newly banned content)

Full install (including java runtime)
https://1fichier.com/?lb6qavanj1 (66MB)

Only the jar:
https://1fichier.com/?bpf29h2561 (7MB)
If you have a working installation just replace your forum-backup-0.0.x.jar with the new forum-backup-0.0.8.jar
halvar is offline   Reply With Quote
Old May 11th, 2018, 10:14 AM   #179
deezer
Sunny Mod
 
deezer's Avatar
 
Join Date: Jan 2016
Posts: 5,511
Thanks: 48,469
Thanked 53,318 Times in 5,482 Posts
deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+deezer 250000+
Default

I haven't tested your new tool yet.


It seems that p&h changed the BB-code long ago. (the post I noticed that is from 2008)
The images are still up, but the thumbnail don't work.
The image url has changed but redirects to the new url.

I can't show you the post, as the old BB-code was now manually updated. But here is an example:


Old BB-code:
[URL=http://image.pimpandhost.com/guest/879324_x.html][IMG]http://pimpandhost.com/media/simple/1/thumbs/da498bdb2dda_1.jpg[/IMG][/URL]

New BB-code:
[URL=http://pimpandhost.com/image/879324][IMG]http://ist1-3.filesor.com/media/image/1/_/_/_/1/d/a/4/9/thumbs%2Fda498bdb2dda_0.jpg[/IMG][/URL]

My question is: does your tool download the images with the old BB-code too?
IHG doesn't work on these.
__________________
.
deezer is offline   Reply With Quote
The Following 8 Users Say Thank You to deezer For This Useful Post:
Old May 11th, 2018, 12:07 PM   #180
halvar
Blocked!
 
Join Date: Jan 2008
Location: HH
Posts: 1,963
Thanks: 115,040
Thanked 32,801 Times in 1,955 Posts
halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+halvar 100000+
Default

Old or new should not matter. The HTML page behind that URL is downloaded, no matter what the URL looks like. The challenge is usually to find the image and the original filename on that page.

I just downloaded post
Quote:
Originally Posted by chrisnewcombe View Post
some more very hot shots of the incredible Gillian
and got 11 images.
Is that the kind of post you are referring to? It seems to have old style URLs:
HTML Code:
<a href="http://image.pimpandhost.com/guest/498275_x.html" target="_blank"><img src="http://pimpandhost.com/media/simple/1/thumbs/7d156ac06bf7_1.jpg" border="0" alt="" onload="..." /></a>
I think the problem is the pimpandhost urlpattern in the IHG hostfile
HTML Code:
<host id="pimpandhost"><urlpattern>^https?:\/\/(?:image\.|www\.)?pimpandhost\.com\/image\/.+?$</urlpattern><searchpattern>function(pageData, pageUrl) {
	var iUrl = pageData.match(/img class=('|")normal\1 src=('|")(https?:\/\/.+?)(_l)?(\.(gif|jpe?g|png|GIF|JPE?G|PNG))\2/);
	return iUrl ? {imgUrl:  iUrl[3] + iUrl[5], status: "OK"} : {imgUrl: null, status: "ABORT"}
}</searchpattern></host>
It expects pimpandhost URLs to have a path beginning with /image, but the old urls begin with /guest.
This urlpattern could work:
HTML Code:
^https?:\/\/(?:image\.|www\.)?pimpandhost\.com\/(image|guest)\/.+?$
I do not have IHG, so I cannot test it.

Last edited by halvar; May 11th, 2018 at 12:26 PM.. Reason: grammar
halvar is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump




All times are GMT. The time now is 03:52 AM.






vBulletin Optimisation provided by vB Optimise v2.6.1 (Pro) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.