-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: One-Click Deploy to hosting providers #531
Comments
Some managed hosting options have popped up in the last few months, might be worth checking out if you're willing to pay $ for hosting: https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community#managed-archivebox-hosting |
Heroku button support would be awesome indeed. |
@olimart The biggest issue with doing this is the filesystem. Heroku & DO's App Platform both provide ephemeral filesystems per deploy, so they're wiped on restart/redeploy. We'd need to either configure those platforms for block storage (something DO's AP doesn't support yet; not sure about Heroku) or provide a swappable implementation for the filesystem to save things to S3 or some other object storage (DO's Spaces, which is S3 compatible). I haven't dug into this much but it's definitely not a trivial effort. |
Thanks @mAAdhaTTah |
Here's a WIP DigitalOcean "one-click" deploy template, but as @mAAdhaTTah mentioned it's broken because disk storage is not supported by DO apps yet: https://github.com/ArchiveBox/ArchiveBox/blob/digitalocean/.do/deploy.template.yaml |
@pirate Yeah, and swapping out for S3 would be tough/impossible with the SQLite db (plus if the tools we use write their own files, that makes it even more difficult). |
I think it's still feasible though, we can write to local disk / RAM disk and then sync it to s3 or other storage backends every few seconds. It'll have a second or two of lag but I think that's an acceptable trade off. |
@pirate How would you handle the db in that instance? Sync it down on boot? |
Nah just rsync it every few seconds like all the other files. I think S3 supports byte-range requests so you can just sync the diffs instead of the whole thing each time. |
I would also want this feature |
Alternately, use the Digital Ocean postgres server. (Or is archivebox sqlite3 only.) |
Additionally, it might be possible to use s3fuse to treat the DO spaces as a local filesystem This might be kinda gross since you have to overwrite the file each time, you can't modify / append it. That could cause issues |
@turian The big issue, as I understand it, is the external binaries write files directly to disk. |
Yeah but @pirate 's suggestion is just to rsync very frequently to s3. On startup, you rsync back from s3. (I guess this can get expensive if you are not in AWS, since s3 downloads are costly.) (BTW, digital ocean spaces are s3 compatible.) The only real issue I can think of is durability, like if the process breaks for some reason and you have a corrupted thing. Then you have to rollback the s3 which could be a pain. |
rsync'ing back & forth seems rough for an archive of any serious size. I believe my archive is several GBs at this point and if I had to resync it down on startup and rsync up after archiving, that would be pretty slow. |
@mAAdhaTTah So I don't know the internals of archivebox but:
|
I believe rsyncing bidirectionally on startup can be made reasonably fast/efficient even for large archives as there are advanced rsync options that let you store a sync cache file for faster diffing. |
@mAAdhaTTah Also, if you want a one-click deploy of ArchiveBox, you can get one on PikaPods. It costs a few bucks a month. I think they are running 0.6.2. Unfortunately this means you still will get crashes on the UTF-8 bug and youtube-dl bugs and the archiving will stop, for which there are PRs but are not merged yet. PikaPods builds all their one-click app stuff in house (not open source) I think, so there's no way to customize. Another option is YunoHost. Their apps are all open-source, so in principle there could be a bleeding edge archivebox app in there too. |
I'm going to close this for now because realistically the only two options I foresee for the future are:
|
For what its worth I did a railway deploy, this is a link to it. I think for new users they give you $5 in credit, and once that is used you get $5 credit for a $5 subscription. ArchiveBox uses like $1 of credit or so per month. Edit: here it is deployed: https://box.boehs.org/archive/1714976395.796772/index.html |
@pirate I just spent the better part of two days trying to write an ansible playbook setting up archivebox on hetzner with caddy and decent security and it still doesn't work. So I would love if you launched a managed hosted option. I would pay at least double what the expenses it costs for your server / PaaS rental, just so you could understand possible pricing. Indeed, I would venture to say that MANY MANY more people are interested in USING archivebox than in maintaining it. See how popular pinboard.in is? This could be the next one, particularly considering that pinboard.in dev goes dark for extended periods of time. "I turn ArchiveBox into a for-profit enterprise and offer paid ArchiveBox hosting (in which case I have no interest in supporting competing paid deployment solutions for free)" YES PLEASE. I think that is probably the most sustainable path to recurring revenue. Feel free to email me at lastname at gmail's email service if you want feedback |
DigitalOcean is launching a one-click deploy for it's AppPlatform. This won't work for us yet because we would need to attach a Volume, which AppPlatform doesn't support, but the documentation linked suggests it will soon/eventually. Alternatively, we could look into configuring it for Heroku.
I'm happy to take the lead on this as well, but wanted to open an issue for visibility/discussion.
Type
What is the problem that your feature request solves
I think it would be helpful for new users to be able to spin up an ArchiveBox instance in the cloud w/ minimal work. Running it on Docker in the first place is really helpful, but would be nice to simplify it even further.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
It should be feasible for a new user
What hacks or alternative solutions have you tried to solve the problem?
I'm still considering how I'm going to host my archive. I initially spun it up on a home server, which works but doesn't help if I want to expose the in-progress REST API to my website. I then put it on a DO droplet, which I'm still fiddling with. I've also considered writing ansible roles for this as well, although that's a bit more involved for the less technical.
The main issue with something like AppPlatform & Heroku is that you don't get CLI access, so everything needs to function via the UI. Downloading sites can take several minutes, which may time out if deployed on AppPlatform (I haven't tested it in that context but it's definitely been happening on my droplet). Maybe worth looking at/considering how we can configure this as background tasks or something? Or maybe deploy to AppPlatform as a worker?
How badly do you want this new feature?
The text was updated successfully, but these errors were encountered: