Friday, October 13, 2023

HOW TO - Scrape an old htm / html website

q: Customer called requesting a method to download their htm based website to their local computer.

a: We recommended HTTrack - We tried a few different applications and none of them got every single site file. Some missed image files because they were configured as background images.  HTTrack got every single file and folder.  Download here: https://www.httrack.com/page/2/en/index.html
notes:
The application will rewrite an .htm site to all .html renaming files and updating hyperlinks.
The application will inject a line into each .html at the top and bottom of the file explaining that it was downloaded by httrack from the original URL.

Wikipedia: HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3. HTTrack allows users to download World Wide Web sites from the Internet to a local computer.

No comments:

Post a Comment