10000 proxy: deduplicate protocol processing code by dormando · Pull Request #1206 · memcached/memcached · GitHub
[go: up one dir, main page]

Skip to content

proxy: deduplicate protocol processing code #1206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: next
Choose a base branch
from

Conversation

dormando
Copy link
Member
@dormando dormando commented Feb 19, 2025

This commit is step one of... a few... for standardizing the code paths for protocol handling. We started with three:

  • mcmc response parsing
  • proxy_request.c request parsing
  • proto_text.c (original tokenizer)

Both requests and responses can now be tokenized by the same code from mcmc. This commit series does this in stages, which will result in removal of duplicated code and fixing many long standing issues.

In this particular commit the duplicated tokenizer is removed from proxy_request.c, and mcmc functions are now used in various places.

All of the mcmc related functions have actual C unit tests behind it which should help them be more trustworthy.

GOALS:

  • Get as close as possible to one code path for parsing request and response lines and protocol tokens (flags + tokens)
  • Thorough C level unit testing of all the protocol handling code (now centralized into mcmc)
  • Remove redundant protocol handling code from proxy_internal.c
  • Remove double parsing of protocol in certain situations
  • Do not modify request/response strings when parsed (speed, correctness, mobility)
  • Replace "C string strto*'s" for calls which are not C strings. (new length-based numeric converters from mcmc)

TODO:

This work is being done in stages because there's too much code to hold in my brain at once. Not necessarily in order, but:

  • use more convenience functions in proxy_internal.c instead of directly accessing tokens all the time
  • remove duplicated parsing code entirely from proxy_internal.c
  • (big task) evaluate requirements for converting proxy_text.c to new tokenizer
  • evaluate strictness in final text proto code (number of tokens in a command)
  • check t->cur_sfd usage when combined with mcp.internal
  • restore missing logging and verbose prints
  • restore stats_detail or remove it (lets slate for removal?) [wasn't actually removed]
  • remove double-parsing request from proxy fallback code (maybe optional? this fallback code isn't really used)
  • check if noreply is being dropped in any of the new command handlers
  • check that noreply is being properly handled in the proxy code
  • test multikey cancellation with extstore requests (this was tested and functional but hard to trace)
  • fix compilation with proxy disabled
  • fix compilation when extstore not compiled in (storage callbacks?)
  • check on/fix binary key handling within proxy (it's partially fixed already)

TESTING:

  • A/B in-memory perf test to ensure no regression.
  • examine coverage of C level unit tests
  • audit gcov reports of the new combined parser code file and text proto file

attempting to delay further cleanups:

  • moving request struct and parser code into mcmc to centralize all protocol code
  • fix double parsing flags during meta_flag_preparse. we have the bitflags now.
  • lots more strto usecases which are currently safe
  • add some missing mcmc interfaces around token length and looping flags

Got another month or two of work on this including extra testing that should be done.

@dormando dormando added WIP proxy worklogs and issues related to proxy labels Feb 19, 2025
@dormando
Copy link
Member Author

I added a lot of generic parsing code to mcmc while writing the inspector and mutator last year, which I've been able to use mostly without modification for this first big change.

If I can get mcp_parser_t moved from proxy.h to mcmc that would be a huge win: I could move the entire request parser into mcmc and thoroughly test it, then reuse that code for proto_text.

@dormando
Copy link
Member Author

still want to revise the mcmc_ API's... but the plan is to probably keep using them by converting code and then update and re-test everything.

@dormando
Copy link
Member Author

Trying to get the request/response struct's merged and fix some awkwardness with how mcmc's types/codes are handled. Taking more time than I hoped.

@dormando
Copy link
Member Author
dormando commented Feb 21, 2025

trying to de-scope "single struct for req/res" from this PR. going to stick with:

  • audit/simplify random bools
  • see about de-scoping lifting mcp_parser_s into mcmc (might just temporarily add a parser code unit)
  • big protocol conversion stage (put storage stuff into the parser code unit?)

then in separate PR's do more mcmc reworking...

edit: maybe... or maybe just a longer series of commits in this change :/ trying to land with high test coverage but don't want the change to be too massive.

@dormando
Copy link
Member Author
< 8000 strong> dormando commented Feb 21, 2025

cleaning up the ascii_multiget hack was time consuming but helps a shitton in reducing code fragility...

@dormando
Copy link
Member Author
dormando commented Feb 22, 2025

aight got a few hacks cleaned up and did planning work for the big merge:

  • pull all the protocol handling out into its own code file (including the request parser code, for now)
  • convert proto_text.c to use this new file (ow)
  • commit all of that once working again...

... then will re-evaluate if it's a good time to continue doing mcmc work or not. the mcmc response parser is still using strtol's too, which sucks.

I kiiiinda feel like I could get something big out of finishing the mcmc work... since I could write some wrapper code and get an "instant mc-speaking server" out of it. which would allow simpler experiments with funny backends by just having them be proxy backends.

@dormando dormando force-pushed the parser_merge branch 2 times, most recently from 6c46dea to a726074 Compare March 4, 2025 17:35
@dormando
Copy link
Member Author

got things wired up so I could convert one command at a time. still need another round of parser work after this stage though. All of the proxy_internal.c code is converted and done now, so that file is nice and small!

down to the last meta command for proto_proxy.c: meta set. which is complicated due to the network read stage splitting the processing. I'm going to try to get it on the common code and un-splitting the command handler so it's more straightforward.

@dormando
Copy link
Member Author

figuring out mset without duplicating code was a pain but I think it's close now.

all of the base meta commands are ported.

next is to port the base text protocol... get commands are going to be the annoying one since I'll need to write a wrapper.

@dormando
Copy link
Member Author

migrating get is moderately painful because of the multikey syntax. attempting to lift the recent hack cleanup code from proxy's version of this.

@dormando
Copy link
Member Author

all the storage commands are done. need to hype myself up for converting the maintenance commands to use the new tokenizer and an array/hash table instead of an if/else tree.

then... delete the old tokenizer code and re-evaluate for what's next.

@dormando dormando force-pushed the parser_merge branch 2 times, most recently from 31c6c53 to d988902 Compare March 20, 2025 22:34
@dormando
Copy link
Member Author

All the commands are converted.

try_read_command_asciiauth needs to be converted next.. pretty sure it's the final user of tokenize_command

after that need to start on a few cleanup passes for some stuff that got dropped during the conversion (verbose output/etc)

@dormando
Copy link
Member Author

been having trouble getting time to sit down on this lately.

just finished the asciiauth and the old tokenizer is now completely gone. input command buffers are now const! (or can be made const everywhere...)

the next big step is a lot of small cleanups. then extended testing can begin.

@dormando dormando force-pushed the parser_merge branch 2 times, most recently from bc8e196 to 94729a1 Compare April 2, 2025 00:42
@dormando
Copy link
Member Author

sigh. wtf was I doing....

@dormando
Copy link
Member Author

fixing the command length issue took way too long to unwind... looks great now though.

@dormando
Copy link
Member Author

updated the TODO list.

@dormando
Copy link
Member Author

maybe one more day to complete this? these are short days right now though :(

@dormando
Copy link
Member Author

oh right.. need to think about testing again..

dormando added 29 commits May 23, 2025 09:36
need to check for an opportunity to reduce the storage_get callback
sizes later.
execute via the new parser, else fall back to orig tokenizer.

so I can migrate these one by one...
hopefully the rest go down easier...
forgot to commit after doing md. doh.

mg feels slightly sketchy but looks right.

also gives what might be mild binary key support for proxy when used
with mcp.internal.
still some temporary code to clean up later, but this is the last meta
command to move.
got stuck debugging a while. odd failure in udp.t

removes an old workaround for an ancient form of the delete command.
easier since we get to reuse the pattern from mset.
abstracted multi-key parsing out of the actual get command code. now all
the main parsing is finally shared.
shutdown, me, quit, version, stats, watch

didn't want to do all that at once but meh.
needs another pass for min/max tokens and some conversion consistency.
... so we can use them.
was adding or removing 1 to the length randomly. fixed this and added
some assertions to build confidence... which found a legitimate bug in
mcp.request()
need to audit t->cur_sfd with mcp.internal.
this is the only place I could find it...
didn't think this would work but it does (since other parts are compiled
out, it's safe)
would be nice to retire these soon...
was double-checking for noreply since we now grab it during the initial
parsing. defaults to allowing noreply and maybe some functions should
force-disable it?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proxy worklogs and issues related to proxy WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0