How do I completely download a web page, while preserving its functionality? [duplicate]
I've been trying to save this webpage using all of the methods that I know, but none of them have worked so far. The website itself has some great functionality. It is able to render MathJax in realtime, without any noticeable lag. I want to be able to use it offline, so I wanted to save it. I haven't been very successful. I'm on MacOS. Here is what I have tried so far:
- Save as on Safari as a Web Archive (.webarchive) – doesn't preserve the page's functionality
- Save as on Safari as Page Source (.html) – Completely messes the page up
- HTTrack – doesn't preserve the webpage's functionality
- Save as on Chrome as Webpage, Complete (.html) – messes up layout and functionality
- WebDumper – gives me a "Forbidden" error
- itsucks – messes webpage up
- SiteSucker – messes webpage up
- ScrapBook (firefox) – messes up
- A couple of other things that I can't remember anymore.
I just want to save the website and be able to use it offline. I noticed something interesting, however. When I'm in Safari and I go offline, the webpage performs fine. This undoubtedly means that the webpage can run offline with no problem – I just need a way to save it properly. I suppose I could create a virtual machine, load up the site on it and then save it as a snapshot and use it whenever I want to offline, but that seems like quite a disproportionate solution for such a seemingly simple problem.
On a side note: would it be possible to save a webpage like this (iPhone 6S page) with all of the scrolling animations, embedded pictures and videos and all the rest? I've only tried creating a Web Archive using Safari, but it only saved the nice scrolling animation – not the embedded pictures and such.
144 Answers
It's not possible to do this with many websites these days. And for sites that seem like it's possible, it would still require some Javascript experience for reverse-engineering and "fixing" the scripts that are saved to your computer. There is no single method that works for all websites, you have to work through each unique problem for every site you try to save.
A lot of websites are no longer just static files that are sent from the server to your computer. They have become 2-way interactive applications, where the web browser is running code that continuously interacts with the web server from the same page.
When you load a website in a browser, you are seeing the "front end" of the entire system that makes up the website. This "front end" (including the HTML, Images, CSS, and Javascript) can even be dynamically generated by code on their end! Which means there is code executing on the server side that is not sent to your web browser, and that code may be critical to supporting the code that is sent to your web browser.
There is simply no way to "download" that server-side code, which is why many websites don't work properly when you save them.
The most common problem causing things to break is that websites use javascript to load content after the initial page response is sent to your browser. The HostMath site you are trying to save offline definitely uses a back-end to retrieve javascript files that are critical to the site's functionality. In Firefox I get this error for several different javascript files when I try to open the site locally:
Loading failed for the <script> with source “file:///D:/Home/Downloads/hostmath
/HostMath%20-%20Online%20LaTeX%20formula%20editor%20and%20browser-
based%20math%20equation%20editor_files/extensions/asciimath2jax.js?rev=2.6.0”See that ?rev=2.6.0 after the filename? That is a parameter that is passed to the back-end (webserver) to determine which asciimath2jax.js file should be sent to your web browser. My D: drive isn't a web server, so when Firefox is trying to load a file with a URL parameter, it fails.
You could try downloading the file from HostMath manually and save it in the right location without the ?rev=2.6.0 though. Then you would need to change the site's scripts and HTML to load the file from your drive without a URL parameter. This would have to be done for all of those scripts that failed to load.
You will hit a dead-end if there is any Javascript that makes requests to a web service (an API) on the host website though. This would be done to off-load computation for something that the site doesn't compute locally in the web browser, which means the back-end is essential to running the front-end.
Chrome extension and standalone app: WebRecorder
You might have luck with WebRecorder, a 'system for high-fidelity web archiving' — it's an open-source project which offers a free Electron-based desktop app, and a chrome extension. Scroll down to Desktop Tools for the app download links. (you don't need to create an account in order to use the desktop app).
Note: Seems to works fine with medium-complexity sites, but did not work with a specific site that was heavy on javascript.
Firefox add-on: Save Page WE
A firefox add-on which is much lighter than the web-recorder above and which worked well for some specific cases. Configurable, flexible, and can optionally scroll pages in order to retrieve lazy-loaded content, embed images and scripts as data-URLs, etc.
2Open the website that you want to shop. Any web browser can speedy shop the web site which you are currently visiting. ... Open the "Save web page as" window. ... Give the saved page a call. ... Select a vicinity to store the page. ... Select whether or not you need the entire web page or just the HTML. ... Open the saved webpage.
11 Clear the cache memory of the browser u are using 2 Open browser and go on the site u want to download 3 Open the folder of the cache 4 Download all files into the same folder where the is 5 Go offline and test the downloaded page
More in general
"Zoraya ter Beek, age 29, just died by assisted suicide in the Netherlands. She was physically healthy, but psychologically depressed. It's an abomination that an entire society would actively facilitate, even encourage, someone ending their own life because they had no hope. Th…"