8000 Add SQLite backend by hannahwhy · Pull Request #12 · rustodon/rustodon · GitHub
[go: up one dir, main page]

Skip to content

Conversation

hannahwhy
Copy link
Contributor
@hannahwhy hannahwhy commented Jan 13, 2018

I'm interested in using SQLite as a database backend for rustodon. Truly database-agnostic code is difficult to do, and it might end up being impractical, but:

  • I like to think that the set of necessary queries won't push us into territory that can only be satisfied by Postgres features
  • I'm kinda banking on WWPD being more than SQLite marketing

Tasks:

  • Get rustodon booting with SQLite
  • Switch connection to WAL mode
    • make sure concurrent writers work concurrent write isn't allowed, but WAL should at least result in fewer exclusive locks
  • Generalize connection pool and associated functions to work with either SQLite or Postgres
  • Come up with some way to select SQLite or Postgres (feature flag?)
  • Add .travis.yml to run CI in SQLite and Postgres configurations
  • Get migrations_sqlite in sync with migrations
  • Find a way to get i64 IDs working with INTEGER PRIMARY KEY NOT NULL (should be possible, because INTEGER columns in SQLite can hold signed 64-bit integers. this might end up as a Diesel patch if I can prove it's a Diesel concern)
  • Get updated_at trigger working on tables that use it
  • Make sure all of this works with an in-memory database

@hannahwhy hannahwhy force-pushed the feature/sqlite branch 4 times, most recently from afa34dc to 287aed7 Compare January 13, 2018 09:29
@iliana
Copy link
Contributor
iliana commented Jan 13, 2018

Another advantage of the SQLite backend is we can write smaller unit tests for the models that start with a fresh in-memory database.

We'll still want the same tests to run on the Postgres backend as part of CI, but this will help speed up development, hopefully...

Cargo.toml Outdated
[features]
default = ["postgres"]
postgres = []
sqlite = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be

postgres = ["diesel/postgres", "diesel_infer_schema/postgres"]
sqlite = ["diesel/sqlite", "diesel_infer_schema/sqlite"]

Then remove those features from diesel and diesel_infer_schema below, and diesel will be compiled with or without the database engines depending on how you want to compile rustodon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip! I've applied that in 4ac4311.

You probably don't want me accidentally committing IDE-specific stuff or
databases in here.
Compatibility notes:

1. TIMESTAMP WITH TIME ZONE is not supported by SQLite.  I opted to use
   DATETIME instead.  This is probably not a big deal if the application
   reads and writes times in UTC (and only converts to/from a local time
   zone in the UI).
2. In SQLite, BIGINT and INTEGER have the same storage class.  However:

   a. Diesel won't allow i64 fields to be used with a column having
      type "INTEGER"
   b. SQLite will interpret a column of type INTEGER PRIMARY KEY NOT
      NULL to be a rowid alias, but will not do so for BIGINT PRIMARY
      KEY NOT NULL

   The short version is that we're currently using BIGINT in tables, but
   we lose autoincrement.  This is not a problem if IDs are assigned by
   the application (e.g. with the Snowflake algorithm), but if we're
   expecting autoincrement (and it looks like the Postgres schema is)
   then we'll have to find some way around (a).
This should be made generic at some point.  Soon.
rustfmt was applied to master; doing it here (with the same
rustfmt.toml) will hopefully avoid the nastier merge conflicts.
We can use Cargo's feature dependencies feature (ahem) to reduce
duplication in the diesel and diesel_infer_schema dependency specs.
This switch is permanent until the journal mode is changed again; see

    https://sqlite.org/wal.html#persistence_of_wal_mode
    https://web.archive.org/web/20180114031518/https://sqlite.org/wal.html#persistence_of_wal_mode

Another application reading the database *could* change the journal mode
back to DELETE, which would have performance implications (at best);
however, I think we can probably get away with saying "well don't do
that".  A more sophisticated solution might switch to WAL mode when a
connection is first acquired.
@hannahwhy
Copy link
Contributor Author

re: i64 for autoincrementing IDs in SQLite, this has been raised before:

diesel-rs/diesel#1116
diesel-rs/diesel#852 (comment)

It looks like we will have to do manual override.

@barzamin
Copy link
Member

@yipdw i really like the idea you posted in discord:

we could also conditionally define type Serial = i32 or type Serial = i64 I guess, which be less churn than ripping out infer_schema!

@hannahwhy
Copy link
Contributor Author

It could be; I'm not yet sure if it'll work or what the code would end up looking like.

One thing that will happen is that we'd assume the burden of remembering that ID fields (both autoincremented PK and FK) must be Serial, not i32 or i64; I don't know if that's a good thing or not. (The build won't pass if they're mixed up, which is nice, but the resulting error doesn't immediately give away the cause.)

@barzamin
Copy link
Member

@yipdw i had another thought: what if we forego BIGSERIAL columns even in postgres, and just move everything to i32s? Like, do we really need 64 bits of post ID?

🤷‍♀️ I initially pushed for using BIGSERIAL, but if it's too much work, I guess we could move away from that.

@hannahwhy
Copy link
Contributor Author
hannahwhy commented Jan 15, 2018

@barzamin I think sticking with 64-bit IDs is fine in principle. Also, if e.g. you eventually want to switch away from autoincrementation to (say) a Snowflake-like ID generation algorithm, then infer_schema!'s interpretation of INTEGER PRIMARY KEY NOT NULL in SQLite is no longer relevant -- we'll re-declare the ID fields BIGINT and have i64 inferred for both databases.

I'll give the type Serial = ... thing a try and we'll see how it goes, I guess?

@hannahwhy
Copy link
Contributor Author
hannahwhy commented Jan 15, 2018

Just to be clear, the issue with SQLite is an unfortunate clash of circumstances:

  • Autoincrementing row IDs in SQLite can be done in a couple ways (see https://sqlite.org/autoinc.html). The recommended approach is marking the column exactly as INTEGER PRIMARY KEY, in which case the column becomes an alias for the signed 64-bit integer rowid.
  • In most databases, however, INTEGER does not denote a 64-bit quantity but rather a 32-bit quantity, and infer_schema! infers such columns as i32. The SQLite behavior is the exception.

@iliana
Copy link
Contributor
iliana commented Jan 16, 2018

I think we should probably do snowflake IDs from the get-go.

@hannahwhy
Copy link
Contributor Author

Yeah, if we're going to do Snowflake-ish IDs, it is much easier to do that starting out. I guess that'd be a separate issue if @barzamin is down with the idea.

@barzamin
Copy link
Member

I am very down with snowflakes. Opened #17 to track this.

@hannahwhy
Copy link
Contributor Author

I'll update this to reflect the changes made in #21 in a little bit. (Turns out git rename detection works in my favor here.)

#17 is currently blocking this issue. I can look at that if it's unassigned.

@barzamin
Copy link
Member

@yipdw yes, that would be much appreciated!

diesel::insert_into(accounts)
.values(&self)
.get_result(&**conn)
let inserted = diesel::insert_into(accounts).values(&self).execute(&**conn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I could do this all over again, I would have done it with a lot more ?


set -euo pipefail

exec scripts/run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shim script is here mainly to allow cargo watch to continue to do the right thing while allowing us to infer the right build features and set them at runtime.

fi

if [ ! -f "database/src/schema.rs" ]; then
echo "Database schema does not exist, run scripts/setup before attempting to run the tests." >&2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is correct and scripts/setup should likely continue to do the schema generation, but in retrospect build.rs could also just do the schema generation as well, and avoid having the user forced to take action.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated this further, and because db stuff is broken out into its own crate, its own build.rs would need to run first in order to get the crate to build. Unfortunately, SQLite paths would likely be qualified in the root crate, not the dependent crate, so any DATABASE_URL that exists that would allow the schema generation to proceed would need to be requalified within the context of the database subdirectory, at which point this would all start to feel like it's doing too much work / assuming too much on what's inside that env var.

Just leaving things as is for now.

@@ -1 +0,0 @@
infer_schema!("dotenv:DATABASE_URL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removal ( and replacement via manual schema generation ) was due to SQLite throwing several errors using infer_schema in the tests; since Diesel seems to be deprecating infer_schema anyways, it seemed a positive direction.

@netshade
Copy link
Contributor

Tests passing again, there were some issues w/ things on master and the CI setup that were causing tests to fail.

@netshade netshade mentioned this pull request Jul 1, 2018
@hannahwhy
Copy link
Contributor Author

Find a way to get i64 IDs working with INTEGER PRIMARY KEY NOT NULL

This is no longer relevant -- we're using i64s generated via flaken.

@netshade
Copy link
Contributor
netshade commented Jul 8, 2018

There was an out of band discussion on this branch regarding handling SQLITE_BUSY correctly. The default behavior of the SQLite bindings ( and SQLite proper ) is to return SQLITE_BUSY when access to the database is not possible and the caller should try again. This is a "common case" state. However, rusqlite ( which is used by r2d2, which in turn manages our connections in Diesel ) manually sets a busy timeout of 5 seconds, which seemed like it was generous enough to simply allow the default behavior.

Primary folks in that discussion were @yipdw and @nightpool . Comment left here just to note that if that behavior changes in rusqlite, we'd need to make changes to adjust. ( Also for posterity for the plume folks who linked to this PR )

@hannahwhy
Copy link
Contributor Author

Whew, I need to look at this again -- sorry for falling off the world. It looks like all we need now is an updated_at trigger?

@@ -1,4 +1,5 @@
use db;
use db::datetime::Rfc339able;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should add another 3 to this module name. db::datetime::Rfc3339able strikes me as something slightly more recognisable and usable in a Datetime context than "MLTNET - A "MULTI-TELNET" SUBSYSTEM FOR TENEX"

@@ -1,3 +1,4 @@
[print_schema]
file = "database/src/schema.rs"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the schema file, then how come we're deleting it, i'm a bit confused…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database/src/schema.rs is generated by diesel print-schema inside scripts/setup. (This branch adds some scripts to setup and run the server; if/when this is merged, those scripts will probably become part of the standard setup/run procedure.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc it's diesel best practice to check schema.rs into source control (so you don't have to print-schema when deploying the app For Realz). don't quote me on that though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although that gets done as part of migrations so extremely 🤷‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense in situations where you're pushing source code to a bunch of app servers (which is the classic Web application deployment model), but I don't think it matters much when you're building a binary and deploying that to a bunch of app servers or (more common in the SQLite case) building a binary and installing it on many machines, since you won't be recompiling on the destination machine in either case.

That said, if there arises some benefit to checking in schema.rs, it doesn't seem like it would be a difficult change to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have it checked in right now and i think that's the only reason to keep it around atm. Pulling it out is totally fine 💜

pub mod validators;

#[cfg(all(feature = "sqlite", feature = "postgres"))]
compile_error!("sqlite and postgres features cannot be simultaneously selected");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

38BA

i wish there was an easier (and prettier) way to do this… after all diesel doesn't use feature flags and can have all of its databases enabled at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diesel (or at least the version of Diesel we're using) has both postgres and sqlite feature flags. The postgres and sqlite feature flags on the database crate toggle feature flags on diesel and diesel_infer_schema; we use them here as well.

I think there is a case to build in support for multiple servers for e.g. pre-built Rustodon binary distributions, but that seems to be some way off. In the meantime, switching using types seemed like the mechanism that required the fewest code changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i opened an issue for this in diesel diesel-rs/diesel#1853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: Backend Anything related to the backend S: Work In Progress ⚠️ Things that aren't done yet.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

0