Question: what's the current status of the REST API? #496

zblesk · 2020-10-01T08:38:00Z

I'd like to add new pages by sending an HTTP request to an endpoint. I saw it mentioned in issues such as #339 and items linked in that thread.

There seemed to be commit names that mentioned adding a REST API, but I haven't been able to find whether those are already implemented and released.

Are they? If so, how do I call them?

I've tried just capturing a request to the "Add" method when I click it in the browser, but it looks like there is some csrf protection, so I can't just copy-paste some bearer token and re-issue requests. I'm asking here before I spend time reverse-engineering something just because I missed an already existing API. :)

pirate · 2020-10-02T19:26:19Z

The current status of the API is "unstable" I'd say. Reverse engineering the UI is the way to go for now, but we have plans to stabilize it more in future versions and split out a proper API with django-rest-framework or something so that external tools don't have to shoehorn their needs into requests used by the UI.

✨ Edit as of v0.8.0 (2024-05): The new REST API is now available! ⬇️

mAAdhaTTah · 2020-10-19T19:04:55Z

@pirate I would be interested in working on this. I shot you an email a week or so ago cuz I think the underlying data model needs to be solidified and would love to help move this along. Let me know how I can help.

cdvv7788 · 2020-10-19T19:09:21Z

@mAAdhaTTah that is great! Currently, on master, we have the sqlite database working. We can now start working with django-rest-framework to enable a proper API (Like @pirate mentioned).
What are the issues that you are finding with the data model? Something that needs to be improved? We can start the discussion here, so we can all have the proper context, and find a way to get started soon.

mAAdhaTTah · 2020-10-19T19:31:51Z

@cdvv7788 Generally, I think the split/transformation between Link <-> Snapshot is a bit weird. Snapshot seems to be db-only (it's transformed into Link's as it's fetched out of the db for most of the operations I was looking at). I also think the double duty of timestamp being "the time it was bookmarked" as well as "the path in the archive" is a bit of an issue. From my email:

I believe you're currently looking to move from timestamp -> sha for the Snapshots and their relationship to the on-disk archive. If we want to eventually allow multiple snapshots per link (to avoid the hash hack), reifiying the Link model into the database and making the Snapshot a single download of a Link seems like a good way to do it. Part of the benefit for me for moving away from timestamps is I want to track when an article was read so I can group them by read day, and manipulating the timestamp for this seems a bit fragile if it can break the relationship to the archive. Having added, updated, etc. properties for that purpose seems a lot clearer.

(For context, I'd like to use ArchiveBox as a reading list, which I would then pull into my website, hence needing a REST API to pull that from. That's the reference to the "benefit for me" line.)

cdvv7788 · 2020-10-19T19:48:09Z

@mAAdhaTTah We have discussed those topics before. I think that @pirate has some progress on the timestamp issue, and it will be changed once we come up with a good solution.
The Link <-> Snapshot stuff is a leftover of the recent migration. In the latest release (v4.x), Link was generated from the index.json, and Snapshot was updated on a best effort basis. After the refactor, this has changed, and we definitely want to get rid of this relationship, leaving everything directly in Snapshot if possible. Supporting multiple snapshots for the same url is not supported at this moment, but after we remove the dependency on the Link schema, it should not be hard to add if we decide to go that way.
The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own. We need to find a way to circumvent that (@pirate do you know if this is possible?) or we need to get more creative initializing django. Some research on this specific topic would be of great help (this is something in our short term objectives).

mAAdhaTTah · 2020-10-19T20:13:21Z

Supporting multiple snapshots for the same url is not supported at this moment, but after we remove the dependency on the Link schema, it should not be hard to add if we decide to go that way.

So my thinking/proposal is to actually remove the Link schema, migrate what is currently considered a Snapshot to be a Link instead (mostly as a naming convention change), then add Snapshot that represents a single download of a website. Based on your explanation, I think we'd need to include a migration in v0.5 that migrates the index.json into the db, then once we're solely dependent on the db, performing the above migrations, splitting the existing Snapshot into 2 models: Snapshot & Link, with a one-to-many relationship (plus whatever UI updates are needed to account for this).

Does that make sense? Happy to elaborate and/or provide some code to explain.

The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own.

Not sure I understand this. Could you provide some background here?

cdvv7788 · 2020-10-19T20:25:32Z

So my thinking/proposal is to actually remove the Link schema, migrate what is currently considered a Snapshot to be a Link instead (mostly as a naming convention change), then add Snapshot that represents a single download of a website. Based on your explanation, I think we'd need to include a migration in v0.5 that migrates the index.json into the db, then once we're solely dependent on the db, performing the above migrations, splitting the existing Snapshot into 2 models: Snapshot & Link, with a one-to-many relationship (plus whatever UI updates are needed to account for this).

At this moment we only have the means to represent a single download per website. I understand what you propose, and that does make sense. At this point we already migrated the index.json into the sqlite database. In fact, if you check #502, we are already removing the automatic generation of those indexes completely. This, however, cannot be done without first solving the other issue, which takes me to:

The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own.

Snapshot is a django model. We cannot use that model in a place where django has not been initialized yet. If you try to do that, it will complain because the module will try to use some django internal stuff. This is the only reason we have not gotten rid of Link as we know it. I am going to spend some time figuring alternatives to make Snapshot usable in the whole application. You are welcome to help us pursue this. As I mentioned earlier, this is a blocker, and the other stuff cannot be worked until it is not resolved (The REST API could actually be implemented, but once we fix this, we would need to refactor it in a big way...I think it is better to solve this layer first).

mAAdhaTTah · 2020-10-19T20:32:28Z

We cannot use that model in a place where django has not been initialized yet.

All of this makes sense so far. I can do some investigating and see what I can come up with. Just to clarify, when you say "use that model", is that "interacting with it" or is importing it enough to make it fail?

cdvv7788 · 2020-10-19T20:44:04Z

Importing it is enough to make it fail. There is a method that you will find around named django_setup which initializes what is required.

pirate · 2020-10-21T21:22:47Z

I don't believe we need Link or Snapshot anywhere that Django is not initialized, so that is a non-issue. If you're worried about oneshot I have an idea to fix that (we can discuss more in Zulip).

mAAdhaTTah · 2020-10-23T13:26:40Z

@pirate Does that change if the idea is to turn Link & Snapshot into db models?

This pulls in DRF to configure our API. Pretty straightforward binding of a view to a serializer & a model and making the data available. For this first pass, we're using the model even though it's currently unstable. From a feature standpoint, we get a lot for free from DRF with very little code, including pagination. The `list_links` method loads all of the snapshots, which would require pagination to be implemented manually on the entire list of snapshots, which won't work well on large databases. Because archivebox is a CLI first and a web application second, the way Exceptions are thrown and errors logged doesn't always make those methods conducive to integrating w/ an API. On the testing side, this shows up in how we're configuring things. The `setup_django` function doesn't fully work when passing `out_path`; Some variables in the Django settings aren't updated or configured correctly. Instead, we use `subprocess` the same way the other tests do to start up the server and hit it with `requests`. # Summary This is obviously a work in progress but wanted to get some feedback on the direction. It would be helpful if the API functions exposed by archivebox were more decoupled from the CLI context specifically, but I think we're going to want to bind the Models directly (at least for querying). # Related issues ArchiveBox#496 # Changes these areas - [ ] Bugfixes - [X] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk

zblesk · 2021-09-14T20:06:34Z

Hello!
I see there's been some progress here.
What's the current status? Is the api available yet?

One of the linked tasks seems to mention it's available in 'dev' - is that an available docker tag?
Is it safe to use?
To be more specific: I understand the API is still in alpha, and I can accept that. However, I don't understand what else can be unstable in dev - I don't want to risk my instance and my data.

Thank you!

mAAdhaTTah · 2021-09-14T21:18:38Z

I have not made any additional progress since opening my PR here: #529 I don't think we will be continuing down that path, as we were considering using Django Ninja instead of DRF as well. Eventually, I'd like to pick this back up again but haven't had the time.

pirate · 2022-04-12T22:51:37Z

Copying over my earlier message here from the API discussion related to the ArchiveBox browser extension #577:

I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. These endpoints are already partially available through the Django Admin:

/add GET,POST (CSRF excempt, usable as an API from external origins and is used by the browser extension)
/api/core/snapshot/ GET, POST, PUT
/api/core/snapshot/<id> GET, PATCH, DELETE
/api/core/archiveresult/ GET, POST
/api/core/archiveresult/<id> GET, PATCH, DELETE
/api/core/tag/ GET, POST, PUT
/api/core/tag/<id> GET, PATCH, DELETE

and this bonus escape hatch endpoint is planned to be added to do everything else not possible with the above ^:

/api/cli/<command> POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args)
e.g. /api/cli/add POST {urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}
or /api/cli/schedule POST {urls: 'https://example.com', depth: 1, every: 'day', ...}

I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way.

zblesk · 2022-04-23T11:57:34Z

Thanks for the update. Looking forward to this.

Though I'm not sure I read those correctly. For instance, what is the difference between a GET and a POST to /add?
Will it support adding many links at once, as well?

And which endpoint should be used for 'return the archive URL for this input URL, if it exists'?

djkemmet · 2022-09-12T00:26:39Z

@pirate hey there are you still working on this / need help? I'm thinking this is possibly something I could put together with FastAPI and the CLI hopefully next weekend. let me know! cheers

Copying over my earlier message here from the API discussion related to the ArchiveBox browser extension #577:

I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. These endpoints are already partially available through the Django Admin:

/add GET,POST (CSRF excempt, usable as an API from external origins and is used by the browser extension)

/api/core/snapshot/ GET, POST, PUT

/api/core/snapshot/<id> GET, PATCH, DELETE

/api/core/archiveresult/ GET, POST

/api/core/archiveresult/<id> GET, PATCH, DELETE

/api/core/tag/ GET, POST, PUT

/api/core/tag/<id> GET, PATCH, DELETE

and this bonus escape hatch endpoint is planned to be added to do everything else not possible with the above ^:

/api/cli/<command> POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args)
e.g. /api/cli/add POST {urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}
or /api/cli/schedule POST {urls: 'https://example.com', depth: 1, every: 'day', ...}

I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way.

https://fastapi.tiangolo.com/features/

https://www.stavros.io/posts/fastapi-with-django/

https://fastapi.tiangolo.com/advanced/wsgi/

pirate · 2022-09-15T01:59:15Z

Definitely open to contribution on the API front! I'm more focused on internals refactoring at the moment but as mentioned in that quoted comment I believe my changes can be kept insulated from anything external facing.

If you want to share gists or a fork with your work I can leave progress on your mock-up as you go to save time on PR review later.

joedavison · 2022-09-17T01:10:11Z

I would use an API like this.

djkemmet · 2022-09-20T22:44:58Z

hi, if anyone is following this issue and could give me some guidance please see this issue: #1030

pirate · 2023-02-19T22:20:48Z

It's still on the list but slow going, I haven't had a lot of big blocks of coding time to work on ArchiveBox over the last year, so I've mostly been devoting my time to support and docs.

On the plus side I have interest from a big multinational org to use ArchiveBox, and maybe able to turn that into a consulting contract to fund some work towards the API. They are a slow-moving org so it may take 6~12 months, but it's exciting news nonetheless.

cogscides · 2023-06-05T10:31:49Z

Hope this will be implemented. In my case, I want to scrap and store websites in my local network and then be able to process this with AI and then put it in my personal knowledge management system. AI and PKM staff is on my side, just need to have API 🙏

aitorllj93 · 2023-11-17T12:07:46Z

hello! what's the current state of this? It's kinda confusing since it says it's on Alpha but reading the comments I don't know if it's possible to use it on Docker. I'm interested on building an alternative front end for this application and the REST API would help me a lot

pirate · 2023-11-18T11:15:17Z

Alpha = There are a few POST/GET etc. endpoints exposed by the admin UI and the /add page that allow quick things can be hacked together, but it's not a proper REST API by any means. I'm working on a django-huey-monitor refactor to add and event driven queue system in the backend, and the new REST API I'm planning will insert messages into this queue to manage extractor jobs and snapshots.

Can I ask why you're going in the direction of an alternative frontend vs contributing changes to AB directly? I'd definitely be open to PRs improving our existing frontend!

See the discussion here too: #1126

aitorllj93 · 2023-11-19T02:33:44Z

Alpha = There are a few POST/GET etc. endpoints exposed by the admin UI and the /add page that allow quick things can be hacked together, but it's not a proper REST API by any means. I'm working on a django-huey-monitor refactor to add and event driven queue system in the backend, and the new REST API I'm planning will insert messages into this queue to manage extractor jobs and snapshots.

Can I ask why you're going in the direction of an alternative frontend vs contributing changes to AB directly? I'd definitely be open to PRs improving our existing frontend!

See the discussion here too: #1126

@pirate my main issue about contributing to the existing frontend is that the current version is far from what I think would be useful for me, so probably my changes might be too much disturbing to include them just with a PR without previous discussion. If you still think this project could benefit from a total rework on the FrontEnd (which I do) I can think about making some proposals and reach to an agreement

pirate · 2023-11-20T00:37:10Z

I'm down to add a new frontend to the existing app as long as we keep the Django admin one available as well in parallel. I was considering using htmx to do this myself (it plays well with Django templates) but haven't gotten around to it.

One of the core principles is that we should rely on JS as little as possible because I want ArchiveBox views to be extremely durable long term and viewable across many different types of devices.

I'm ok with some of the UI requiring JS but ideally the most critical parts should fall back to working with old school plain html.

If that design direction sounds compatible with your ideas then I'm down to work together to add your UI changes to AB directly, otherwise maybe an independent app/mod may be better.

aitorllj93 · 2023-11-20T09:14:02Z

@pirate sure, that sounds nice. I don't want to include a JavaScript framework neither. Regarding htmx, we can give it a try if we need it, I already did some works on a side project and it's great. About the CSS I saw the current implementation uses Bootstrap, I wonder if we can move to Tailwind, which I think fits better for an open source project these days, in that way we don't need to implement custom classes and it's easier for external contributions

pirate · 2023-11-21T01:04:33Z

Nice! I also prefer tailwind to bootstrap, happy to move to that.

If you want to open a new issue for your UI ideas as they come up I think we should move frontend discussion away from the REST API thread so we don't spam everyone.

zblesk · 2023-11-21T12:53:19Z

If you do create a new thread for that, can you please @ me? Thanks.

pirate · 2024-04-26T22:03:31Z

Hey everyone, check out the new REST API on dev! Big thanks to @Brandl for the first PR that kickstarted it!

For users who want to try it out, get v0.8.0-rc (unstable) or later, start archivebox server, then visit http://127.0.0.1:8000/api and (/api/v1/docs) to get started with the interactive Swagger API docs/test page ➡️

It also supports sending webhooks to external servers whenever archiving events happen.

zblesk · 2024-05-05T18:25:04Z

Currently can't make a backup of my archive, so I can't switch to dev; but I'm really looking forward to trying this. Thanks.

rcarmo · 2024-05-11T12:25:55Z

I can't wait for this to make it to stable.

pirate added the question label Oct 2, 2020

cdvv7788 mentioned this issue Oct 22, 2020

setup_django changes #510

Closed

6 tasks

cdvv7788 mentioned this issue Oct 26, 2020

Poc setup django on init #515

Merged

6 tasks

mAAdhaTTah mentioned this issue Oct 30, 2020

Split Snapshot into Link & Snapshot + migrate #520

Closed

6 tasks

mAAdhaTTah mentioned this issue Nov 8, 2020

Draft of new REST API using DRF #529

Closed

6 tasks

pirate mentioned this issue May 12, 2021

Question: Using the CLI to reach the API of a hosted instance #743

Closed

pirate mentioned this issue May 31, 2021

How can I use archive box as a service ? #756

Closed

pirate mentioned this issue Jul 17, 2021

Create an archivebox server page with a UI & REST API endpoint to add links to the archive #221

Closed

5 tasks

dgtlmoon mentioned this issue Dec 29, 2021

[feature] Execute a command on change detected dgtlmoon/changedetection.io#329

Open

Kovah mentioned this issue Sep 7, 2022

Local Archive Option Kovah/LinkAce#496

Open

pirate mentioned this issue Mar 28, 2023

Discussion: Sponsoring the REST API development + applying for grants to fund ArchiveBox development #1126

Closed

pirate mentioned this issue Apr 19, 2023

Feature Request: Allow locally run ArchiveBox CLI commands to control a separate remote ArchiveBox backend #786

Open

9 tasks

waybackarchiver mentioned this issue May 4, 2023

Bridging ArchiveBox wabarc/wayback#388

Open

pirate mentioned this issue Jun 13, 2023

Question: Why am I getting permission errors when building the dev container AND proposed implementation for the API. #1030

Closed

aitorllj93 mentioned this issue Nov 21, 2023

Feature Request: UI Rework #1273

Open

Brandl mentioned this issue Apr 9, 2024

REST API v1 using django-ninja #1397

Merged

7 tasks

pirate added this to the v0.8.0 milestone May 6, 2024

This was linked to pull requests May 6, 2024

REST API v1 using django-ninja #1397

Merged

Add Webhooks support to new REST API #1418

Merged

pirate removed the type: enhancement label Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: what's the current status of the REST API? #496

Question: what's the current status of the REST API? #496

Question: what's the current status of the REST API? #496

Question: what's the current status of the REST API? #496

Comments

✨ Edit as of v0.8.0 (2024-05): The new REST API is now available! ⬇️