Reviews for WebScrapBook
WebScrapBook by Danny Lin
Review by 14802265
Rated 2 out of 5
by 14802265, 6 years agoI tried this app to save webpages completely and accurately. It works on some pages like ghacks.net perfectly with scripted single html . On other pages like nytimes.com it captures the page out of sync even though all of the content seems to be there (large gap spaces, enlarged photos, etc.) Save Page WE has the same issue. On Washingtonpost.com WebScrapbook was almost perfect but there is a bug that will add incorrect characters if there is an apostrophe in the text(which in a news article there will undoubtedly be). I used scripted single html option on this also. I do have specific scripts for the Times and WPost running, but they are not the issue since Mozilla Archive Format and SingleFile always works perfectly on the same sites with the same scripts running. But since MAF doesnt work for current browsers and SingleFile works somewhat inconsistently (it stalls a lot), I was hoping WebScrapbook would work but no go.
Also, I havent seen an option to save the original page url either in the title or in the .html file for reference like MAF, Singlefile, or SavePage WE can.
I noticed the saved webpage nytimes.com icon was used in the tab, but Webscrapbook couldnt find the icon for washingtonpost.com tab. If the developer wants to see the output files, just tell me where to forward them.
This app might be able to save websites but if it cant do it accurately what's the point of using it.
Also, I havent seen an option to save the original page url either in the title or in the .html file for reference like MAF, Singlefile, or SavePage WE can.
I noticed the saved webpage nytimes.com icon was used in the tab, but Webscrapbook couldnt find the icon for washingtonpost.com tab. If the developer wants to see the output files, just tell me where to forward them.
This app might be able to save websites but if it cant do it accurately what's the point of using it.
Developer response
posted 6 years agoThank you for the feedback.
The issue on nytimes.com is same as the one with styled components and we are working on it (https://github.com/danny0838/webscrapbook/issues/109). It's a complicated issue as there are many things behind the scene to deal with. We almost have the solution but still need sometime to implement it, maybe next one or two revision.
I can't see an issue for washingtonpost.com, maybe it's really related with the scripts you've mentioned. Could you confirm it (by disabling your scripts and see if the issue's still there) and provide the scripts you are using, for further investigation?
The source page URL is recorded in the source code of the saved page but not shown directly. You'll be able to see it from the metadata if the backend server is used; otherwise you can see it from the source code. We are still investigating an appropriate way to present such metadata without altering the document too explictly.
As this addon site doesn't allow discussion, you can report issues to the source code repo (like the link provided above) so that we can discuss and trace them better:)
The issue on nytimes.com is same as the one with styled components and we are working on it (https://github.com/danny0838/webscrapbook/issues/109). It's a complicated issue as there are many things behind the scene to deal with. We almost have the solution but still need sometime to implement it, maybe next one or two revision.
I can't see an issue for washingtonpost.com, maybe it's really related with the scripts you've mentioned. Could you confirm it (by disabling your scripts and see if the issue's still there) and provide the scripts you are using, for further investigation?
The source page URL is recorded in the source code of the saved page but not shown directly. You'll be able to see it from the metadata if the backend server is used; otherwise you can see it from the source code. We are still investigating an appropriate way to present such metadata without altering the document too explictly.
As this addon site doesn't allow discussion, you can report issues to the source code repo (like the link provided above) so that we can discuss and trace them better:)
135 reviews
- Rated 1 out of 5by ehobby, 9 days agoUnable to get this to work on Firefox 133 and Fedora 40. Installed the backend and the browser extension. Could not find any configuration for the backend that would work. There needs to be a more specific set of instructions written by someone who has successfully installed this extension in Linux, unless this only works in Windows. Perhaps this is a great extension but it is worthless if it cannot be installed.
Developer response
posted 9 days agoHave you read the documentation: https://github.com/danny0838/webscrapbook/wiki/Basic#3-browser-sidebar-approach ? - Rated 5 out of 5by OM_RA, a month ago
- Rated 5 out of 5by Silopolis, 3 months ago
- Rated 5 out of 5by Firefox user 18235051, a year agoI needed to backup a website that used form login, making a simple scraping not possible. This extension worked like a charm after figuring out some of the configuration options.
- Rated 5 out of 5by Avater, a year ago
- Rated 5 out of 5by Firefox user 14643647, a year ago
- Rated 4 out of 5by Supriyadi, a year ago
- Rated 5 out of 5by Yaliang, 2 years agoThanks for developing this plugin. It makes it extremely easy to achieve and save web pages.
- Rated 5 out of 5by texsd, 2 years ago
- Rated 1 out of 5by Firefox user 13058149, 2 years agoI choose the WebScrapBook/data option but it wants me to configure Backend server... ? ? ?
Useless.
I miss the old Scrapbook Extension.Developer response
posted 2 years agoPlease consult the documentation about different approaches to capture a page: https://github.com/danny0838/webscrapbook/wiki/Basic. Raise an issue with more details (e.g. a screenshot illustrating where you are asked for backend server configuration) in the source repository if you still don't get it. - Rated 5 out of 5by Alexander, 3 years ago
- Rated 5 out of 5by Firefox user 12472805, 3 years ago
- Rated 5 out of 5by azzone, 3 years ago
- Rated 1 out of 5by Firefox user 13474132, 3 years agoTanto utile prima ora completamente inusabile perchè oltremodo macchinoso.
- Rated 5 out of 5by mike1985, 3 years agoAmazing Tool, would be nice to set also a min. size for pictures...
- Rated 5 out of 5by Firefox user 15902721, 3 years ago
- Rated 3 out of 5by Hann, 3 years ago
- Rated 5 out of 5by Firefox user 13637550, 3 years agoThis is incredible and the work that is being put into it is admirable. The documentation isn't the best but the creator was very quick to answer my questions and I was able to create a JSON batch capture that captured an entire site for me. It's not the easiest thing to use but read the documentation, look at the examples, and ask questions on the github if you need help. This is a very powerful plugin and I hope the developer continues to develop it.
- Rated 5 out of 5by DrakeFromFrance, 4 years agoThank you for your job Danny. I use WebScrapbook in Firefox, and Firefox is my default browser on Windows 10. I already set a new task using the .pyw extension to start WSB with the console hidden however it starts Firefox and open a new tab. Is there a way to start WebScrapbook without opening a new tab?
Just a remark: when I click on the icon of WSB in the tools bar of Firefox, it tooks me always a few seconds to find "Open Scrapbook". It would be nice if you emphasised it, or at least if you put it at the top of the menu. Thank you.
Where do I set server.browse to false??Developer response
posted 4 years ago1. Configure PyWebScrapBook and set server.browse to false. We'll consider use false as default in the future.
2. The primary feature for WebScrapBook is capturing, not management, and thus putting "Open ScrapBook" as the first command doesn't make sense. Emphasizing which command is likely controversial for the same reason. We need a thorough evaluation/discussion before doing this. FYI: you can configure a hotkey using the browser extension hotkey manager for a faster way to open the sidebar. - Rated 4 out of 5by noone, 4 years agoI just wanted to know how i can stop the auto-capture from making new folder every time for the same URL and instead skip or overwrite the same files.I'm sorry i searched a lot through the settings and wiki but couldn't find anything (also I'm not that knowledgeable with this stuff so i didn't understand a lot), its not just auto-capture but capturing a page twice still creates two folders
Developer response
posted 4 years agoThis is probably something not yet implemented. If you'd like to provide a feature request, it'd be better to raise an issue in the source repo (https://github.com/danny0838/webscrapbook/issues) or send an email, and provide more details (e.g. Do you use a backend server? What is your use case and intention?) as it's not easy to discuss in this Addon site (we won't receive any notification from an update of a comment).
cf. thread of the original request: https://github.com/danny0838/webscrapbook/issues/8