-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: what's the current status of the REST API? #496
Comments
✨ Edit as of v0.8.0 (2024-05): The new REST API is now available! ⬇️ |
@pirate I would be interested in working on this. I shot you an email a week or so ago cuz I think the underlying data model needs to be solidified and would love to help move this along. Let me know how I can help. |
@mAAdhaTTah that is great! Currently, on master, we have the sqlite database working. We can now start working with |
@cdvv7788 Generally, I think the split/transformation between Link <-> Snapshot is a bit weird. Snapshot seems to be db-only (it's transformed into Link's as it's fetched out of the db for most of the operations I was looking at). I also think the double duty of
(For context, I'd like to use ArchiveBox as a reading list, which I would then pull into my website, hence needing a REST API to pull that from. That's the reference to the "benefit for me" line.) |
@mAAdhaTTah We have discussed those topics before. I think that @pirate has some progress on the timestamp issue, and it will be changed once we come up with a good solution. |
So my thinking/proposal is to actually remove the Does that make sense? Happy to elaborate and/or provide some code to explain.
Not sure I understand this. Could you provide some background here? |
At this moment we only have the means to represent a single download per website. I understand what you propose, and that does make sense. At this point we already migrated the
Snapshot is a django model. We cannot use that model in a place where django has not been initialized yet. If you try to do that, it will complain because the module will try to use some django internal stuff. This is the only reason we have not gotten rid of Link as we know it. I am going to spend some time figuring alternatives to make Snapshot usable in the whole application. You are welcome to help us pursue this. As I mentioned earlier, this is a blocker, and the other stuff cannot be worked until it is not resolved (The REST API could actually be implemented, but once we fix this, we would need to refactor it in a big way...I think it is better to solve this layer first). |
All of this makes sense so far. I can do some investigating and see what I can come up with. Just to clarify, when you say "use that model", is that "interacting with it" or is importing it enough to make it fail? |
Importing it is enough to make it fail. There is a method that you will find around named |
I don't believe we need Link or Snapshot anywhere that Django is not initialized, so that is a non-issue. If you're worried about |
@pirate Does that change if the idea is to turn Link & Snapshot into db models? |
This pulls in DRF to configure our API. Pretty straightforward binding of a view to a serializer & a model and making the data available. For this first pass, we're using the model even though it's currently unstable. From a feature standpoint, we get a lot for free from DRF with very little code, including pagination. The `list_links` method loads all of the snapshots, which would require pagination to be implemented manually on the entire list of snapshots, which won't work well on large databases. Because archivebox is a CLI first and a web application second, the way Exceptions are thrown and errors logged doesn't always make those methods conducive to integrating w/ an API. On the testing side, this shows up in how we're configuring things. The `setup_django` function doesn't fully work when passing `out_path`; Some variables in the Django settings aren't updated or configured correctly. Instead, we use `subprocess` the same way the other tests do to start up the server and hit it with `requests`. # Summary This is obviously a work in progress but wanted to get some feedback on the direction. It would be helpful if the API functions exposed by archivebox were more decoupled from the CLI context specifically, but I think we're going to want to bind the Models directly (at least for querying). # Related issues ArchiveBox#496 # Changes these areas - [ ] Bugfixes - [X] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk
Hello! One of the linked tasks seems to mention it's available in 'dev' - is that an available docker tag? Thank you! |
I have not made any additional progress since opening my PR here: #529 I don't think we will be continuing down that path, as we were considering using Django Ninja instead of DRF as well. Eventually, I'd like to pick this back up again but haven't had the time. |
Copying over my earlier message here from the API discussion related to the ArchiveBox browser extension #577: I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. These endpoints are already partially available through the Django Admin:
and this bonus escape hatch endpoint is planned to be added to do everything else not possible with the above ^:
I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way. |
Thanks for the update. Looking forward to this. Though I'm not sure I read those correctly. For instance, what is the difference between a GET and a POST to And which endpoint should be used for 'return the archive URL for this input URL, if it exists'? |
@pirate hey there are you still working on this / need help? I'm thinking this is possibly something I could put together with FastAPI and the CLI hopefully next weekend. let me know! cheers
|
Definitely open to contribution on the API front! I'm more focused on internals refactoring at the moment but as mentioned in that quoted comment I believe my changes can be kept insulated from anything external facing. If you want to share gists or a fork with your work I can leave progress on your mock-up as you go to save time on PR review later. |
I would use an API like this. |
hi, if anyone is following this issue and could give me some guidance please see this issue: #1030 |
It's still on the list but slow going, I haven't had a lot of big blocks of coding time to work on ArchiveBox over the last year, so I've mostly been devoting my time to support and docs. On the plus side I have interest from a big multinational org to use ArchiveBox, and maybe able to turn that into a consulting contract to fund some work towards the API. They are a slow-moving org so it may take 6~12 months, but it's exciting news nonetheless. |
Hope this will be implemented. In my case, I want to scrap and store websites in my local network and then be able to process this with AI and then put it in my personal knowledge management system. AI and PKM staff is on my side, just need to have API 🙏 |
hello! what's the current state of this? It's kinda confusing since it says it's on Alpha but reading the comments I don't know if it's possible to use it on Docker. I'm interested on building an alternative front end for this application and the REST API would help me a lot |
Alpha = There are a few POST/GET etc. endpoints exposed by the admin UI and the /add page that allow quick things can be hacked together, but it's not a proper REST API by any means. I'm working on a Can I ask why you're going in the direction of an alternative frontend vs contributing changes to AB directly? I'd definitely be open to PRs improving our existing frontend! See the discussion here too: #1126 |
@pirate my main issue about contributing to the existing frontend is that the current version is far from what I think would be useful for me, so probably my changes might be too much disturbing to include them just with a PR without previous discussion. If you still think this project could benefit from a total rework on the FrontEnd (which I do) I can think about making some proposals and reach to an agreement |
I'm down to add a new frontend to the existing app as long as we keep the Django admin one available as well in parallel. I was considering using htmx to do this myself (it plays well with Django templates) but haven't gotten around to it. One of the core principles is that we should rely on JS as little as possible because I want ArchiveBox views to be extremely durable long term and viewable across many different types of devices. I'm ok with some of the UI requiring JS but ideally the most critical parts should fall back to working with old school plain html. If that design direction sounds compatible with your ideas then I'm down to work together to add your UI changes to AB directly, otherwise maybe an independent app/mod may be better. |
@pirate sure, that sounds nice. I don't want to include a JavaScript framework neither. Regarding htmx, we can give it a try if we need it, I already did some works on a side project and it's great. About the CSS I saw the current implementation uses Bootstrap, I wonder if we can move to Tailwind, which I think fits better for an open source project these days, in that way we don't need to implement custom classes and it's easier for external contributions |
Nice! I also prefer tailwind to bootstrap, happy to move to that. If you want to open a new issue for your UI ideas as they come up I think we should move frontend discussion away from the REST API thread so we don't spam everyone. |
If you do create a new thread for that, can you please @ me? Thanks. |
Hey everyone, check out the new REST API on For users who want to try it out, get v0.8.0-rc (unstable) or later, start It also supports sending webhooks to external servers whenever archiving events happen. |
Currently can't make a backup of my archive, so I can't switch to |
I can't wait for this to make it to stable. |
I'd like to add new pages by sending an HTTP request to an endpoint. I saw it mentioned in issues such as #339 and items linked in that thread.
There seemed to be commit names that mentioned adding a REST API, but I haven't been able to find whether those are already implemented and released.
Are they? If so, how do I call them?
I've tried just capturing a request to the "Add" method when I click it in the browser, but it looks like there is some csrf protection, so I can't just copy-paste some bearer token and re-issue requests. I'm asking here before I spend time reverse-engineering something just because I missed an already existing API. :)
The text was updated successfully, but these errors were encountered: