Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago .Keeping in mind browser support and ease of editing the page, what do you think's the best way to save web pages in a single archive? What would be best as a "standard"? Or should I just buckle down and deal with the HTML file and separate folder? For the sake of my project, I could support that, but I'd best avoid it.
1 1 1 silver badge asked Nov 3, 2008 at 21:40 369 1 1 gold badge 3 3 silver badges 5 5 bronze badgesThanks for the responses! It really stinks that there's no standard, and one should really be developed. PDF comes closest as it's a widely supported format -- but ZIP's a good choice for its superior editability. Browsers really should support ZIPs imo, but until then, I may use both solutions!
Commented Nov 4, 2008 at 5:58MAFF is based on the ordinary ZIP format with index.html as entry point for browsers to recognize the start page. Check this out: maf.mozdev.org/maff-file-format.html
Commented Feb 23, 2015 at 21:27It looks like that the situation is still the same. I want to migrate from office documents to html documents where interactivity is necessary. I'm looking for a solution to have all in a single file. As html5 allows offline web apps that could be a solution what could help here as well. What do you think? Offline html: Creating HTML5 Offline Web Applications and Tutorial: How to make an offline HTML5 web app, FT style
Commented Sep 6, 2018 at 9:53My favourite is the ZIP format. Because:
The alternatives all have some flaw:
PDFs are supported on nearly all browsers on nearly all platforms and store content and images in a single file. They can be edited with the right tools. This is almost definitely not ideal, but it's an option to consider.
answered Nov 3, 2008 at 21:51 Joel Anair Joel Anair 13.9k 3 3 gold badges 32 32 silver badges 36 36 bronze badgesDUH! Why didn't I think of that? Yeah, PDF is used by everyone and their mother to share documents. It's not easy to edit without tools, but what's more important is the browser support. 'Specially if I coupled PDF with another solution, it turns out ideal. Thanks!
Commented Nov 4, 2008 at 5:46It is not only question of file format. Another crucial question is what exactly you want to store? Is it:
Most current "save page as" functionality in browser, be it to MAF or MHTML or file+dir, attempts the first way. This is ultimately flawed approach.
Don't forget web pages there days are rather local applications then a static document you can easily store. Potential issues:
And many many more issues.
Check Chrome SingleFile extension. It stores a web page to one html file with images inlined using already mentioned data URIs. I haven't tested it much so I cannot say how well it handles "volatile" ajax pages.
answered Apr 21, 2013 at 18:26 2,583 25 25 silver badges 32 32 bronze badgesYou could always make a program/script that extracts the zip file to a temp directory and loads the index.html file in your browser. You could even use an index.ini/txt file to specify the file that should be loaded when extracting.
Basically, you want something like the Mozilla Archive format, but without the unnecessary rdf crap just to specify what file to load.
MHT files are good, but they usually use base64 to embed files, which will make the file size bigger than it should be (data URIs are the same way). You can add attachments as binary, but you'll have to manually do that with a hex editor or create a tool and support for it by clients might not be as good.
Of course, if you want to use what browsers generate, MHT (Opera and IE at least) might be better.
answered Nov 3, 2008 at 22:09 Shadow2531 Shadow2531 12.1k 5 5 gold badges 38 38 silver badges 51 51 bronze badgesA very creative answer. You're very right in using a ZIP file and then extracting to a temp dir for my project. I might end up doing that. Good advice on the other formats as well. Thanks!
Commented Nov 4, 2008 at 5:48Depending on the impementation you may not even have to extract it to a temp directory, I know that in PHP I can directly read the contents of a ZIP on the fly so I would not have to extract to a temp file, however this will increase CPU load a bit.
Commented Dec 2, 2008 at 6:23i see no excuse to use anything other than a zipfile
answered Nov 3, 2008 at 21:51 62k 9 9 gold badges 80 80 silver badges 126 126 bronze badges I agree, and I like the way you put it ;-) Commented Nov 3, 2008 at 21:55Well, if browser support and ease of editing are the biggest concerns I think you are stuck with the file+directory approach unless you are willing to provide an editor for the single file format and live with not very good support in browsers.
You can create a single file by compressing the contents. You can also create a parent directory to ease handling.
answered Nov 3, 2008 at 21:54 Vinko Vrsalovic Vinko Vrsalovic 338k 55 55 gold badges 338 338 silver badges 374 374 bronze badgesThe problem is that html is bottoms up not top down. Look at your file name which saved on my box as "What's the best "file format" for saving complete web pages (images, etc.) in a single archive? - Stack Overflow.html"
Just add a '|' and one has trouble doing copy and paste backups to a spare drive. In the end you end up. chopping the file name in order to save it. Dozens/ perhaps hundreds of identical index.html or index.php are cluttering my drives.
The partial solution is to write you own CMS and use scripts to map all relevant files to a flat file database - then use fileName, size, mtime and md5 to get a unique Id for each file. Create a flat file index permitting 100k or 1000k records. The goal is to write once and use many times. So you need a real CMS you need a unique id based on content (eg index8765432.html) that goes in your files_archive. Ditto for the others. Then you can non-destructively symlink from the saved original html to the files_archive and just recreate the file using a php or alternative script if need be. Don't know if it will work as I'm at the same point you're at - maybe in a week will know for sure. The more useful approach is to have a top down structure based on your business or personal wants and related tasks. So your files might be organized top down but external ones bottom up to preserve the original content. My interest is in Web 3.0 services and the closer you get to machine to machine interaction the greater the need to structure the information. Maybe time to rethink the idea of bundling everything into a single file. So you have hundreds of main.css why bundle when a top down solution might let you modify one file instead of hundreds.