8000 Introduce solution for services with a high resource count · Issue #3411 · serverless/serverless · GitHub
[go: up one dir, main page]

Skip to content

Introduce solution for services with a high resource count #3411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmuens opened this issue Mar 28, 2017 · 31 comments
Closed

Introduce solution for services with a high resource count #3411

pmuens opened this issue Mar 28, 2017 · 31 comments

Comments

@pmuens
Copy link
Contributor
pmuens commented Mar 28, 2017

This is a Feature Proposal

Description

Different implementation approaches we can take to resolve issues with a high resource count (such as #2387).

Refs #2995


1. Auto-Detect Swagger / OpenApi files

The user simply drops his own swagger / openapi file in the root of the service.
Serverless picks up this file automatically and uses this for the CloudFormation AWS::ApiGateway::RestApi resource.

Note: Serverless will fallback to the old CloudFormation approach if no swagger.yml / openapi.yml file is given.

This way Serverless will only generate the necessary resources to use OpenAPI for the API definition.
The high resource caused by the resource generation to create APIs with CloudFormation will be gone.

Upsides

  • Reduces the resource problem for endpoint heavy services
  • Uses the better, more feature rich OpenAPI standard
  • More freedom for users since they have the full control over the openapi file

Downsides

  • Only applies for APIs and won't help if the serverless.yml file is bloated because of other, non-api related resources
  • The user has to write the openapi.yml file on his own

2. Compile http events to Swagger / OpenAPI definitions

Introduce the opt-in switch useOpenApi: true in the serverless.yml file to enable auto-compilation of http events to corresponding OpenAPI resources:

# serverless.yml

provider:
  useOpenApi: true

The opt-in is used to prevent breaking changes. We can deprecate this later on and always compile to OpenAPI definitions.

Serverless will auto-generate a openapi.yml file which is also pushed to the deployment S3 bucket. Additionally we can easily implement a way for users to drop-in their own openapi.yml file later on.

Upsides

  • Will reduce high endpoint count
  • Uses the more feature-rich OpenApi standard (we could e.g. introduce binary support)

Downsides

  • Harder to overwrite / extend this file since Serverless auto-generates it for
  • More complex to implement
  • Only applies for APIs and won't help if the serverless.yml file is bloated because of other, non-api related resources

3. Use nested stacks

Introduce the opt-in switch useNestedStacks: true in the serverless.yml file to automatically let Serverless split up too large stacks into logical units of nested stacks.

# serverless.yml

provider:
  useNestedStacks: true

The opt-in is used to prevent breaking changes.

Serverless will automatically upload all the different parts of the nested stacks to S3.

Upsides

  • Can be applied to all different kind of resources
  • Can always be applied (services can be as big as needed)
  • Faster deployments (not 100% sure if this really holds true)

Downsides

  • Logic where to split the stacks is hard to implement (should be deterministic to reduce problems during re-deployments)
  • Having multiple stacks to deal with and keep track of can be unintuitive
  • Atomic deployments can be problematic when e.g. one stack gets into a weird state whereas the other succeeds

4. Use cross stack references

WIP

Upsides

  • TBD

Downsides

  • TBD

5. Use AWS::Include Transform

WIP

Use transform to split up and reference other CloudFormation templates.

Upsides

  • TBD

Downsides


Useful resources


/cc @brianneisler @eahefnawy @serverless/vip

@HyperBrain
Copy link
Contributor

How would the integrity of the system be guaranteed in the case the API specs are provided by an "external" OpenAPI definition file? Does the user have to take care by himself, that the API endpoint <-> Lambda function associations remain intact?

Wouldn't it be enough to create dependencies with DependsOn from the API method resources to the API path resources and proceed with the current CF based approach? With dependency injection the complete APIG deployment could be more or less serialized.

@dwolfand
Copy link

In my opinion options 1 and 2 don't actually solve the problem but instead just kick the can down the road. As you mentioned in the downsides, we will still have the same problem if our project has many functions (this is our situation) which each have multiple resources.

I think focusing on how to implement the nested stack approach in a deterministic way is the ideal path forward. Maybe for the initial rollout we allow users to manually specify the stack grouping in the serverless.yml file allowing time for feedback from the community on the best way to automatically group them in a future release. In my mind this will shorten the level of effort a bit and allow a fix to go out sooner while also communicating future changes will be coming.

@dschep
Copy link
Contributor
dschep commented Mar 28, 2017

Agreed with @dwolfand regarding the use of Swagger. It would definitly help, but AFAICT there are a least 4 resources per HTTP endpoint lambda (the lambda, the version, the cloudwatch log and API Gateway). That caps you at (ignoring other resources used like IAM roles etc) ~50 lambdas with HTTP endpoints. Switching to Swagger/OpenAPI specifications for API Gateway would only increase that cap to ~66, not a huge improvement.

@rowanu
Copy link
Contributor
rowanu commented Mar 29, 2017

My vote is NOT nested stacks...

While they allow a large number of resources, they open up a can of worms.
I'm still yet to meet anyone who said "gee, I'm sure glad we used nested stacks!"

From discussions here and in the forum I know a lot of people want to be able to define their APIs in Swagger/OpenAPI format for other benefits, but if @dschep's numbers are correct then it may not be worth it (for this purpose).

I think cross-stack references are also a valid way forward, and have been using them with success. I haven't tried to use them at scale yet, so I'm not sure if there's any gotchas.

I'm not sure using AWS::Include would address the resource limit, but if it did (i.e. resources created in includes don't count towards total resource count) then that would be fantastic. It would also allow the built-in templates to be nice and modular (something which may be worth doing any way).

@arabold
Copy link
arabold commented Mar 29, 2017

IMO Swagger support should be a feature independent of any solution to the CloudFormation resource limitations. If someone wants to use Swagger over the built-in CloudFormation definition I suggest to write a dedicated Swagger plugin. The upcoming 1.10 release will offer a lot more plugin hooks for packaging and deployment which will make it easy to write a dedicated Swagger plugin. @HyperBrain has done something similar for Serverless 0.5 a few months back with the serverless-models-plugin (which still needs porting to 1.x, but that's another story). So I'd like to encourage anybody to write a serverless-swagger-plugin that allows devs to use a swagger definition for their API needs.

But that doesn't solve a fundamental problem with CloudFormation - the limits: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html. The solution to this (according to Amazon) is to use nested stacks.

Limitation: 100 mappings

To specify more mappings, separate your template into multiple templates by using, for example, nested stacks.

Limitation: 200 resources

To specify more resources, separate your template into multiple templates by using, for example, nested stacks.

Limitation: 460,800 bytes template body size

To use a larger template body, separate your template into multiple templates by using, for example, nested stacks.

Hence I'm 100% with @dwolfand: A way of manually splitting the serverless.yml into multiple nested stacks should be a good workaround for this problem. For rather small setups this won't be needed and Serverless can work as is. For larger projects the engineers need to think about how to structure their templates in a meaningful way. Nested stacks are the only viable and permanent solution to solve resource limitations.

@kennu
Copy link
Contributor
kennu commented Mar 29, 2017

I would like that the nested stack solution also dealt with the problem of permanent resources which cannot be deleted (e.g. DynamoDB database or S3 bucket that has master data). This solution will eventually be needed when upgrading to Serverless 2.x, which probably involves deleting the old CF stack and creating a new one that needs to access the existing resources.

My own solution is to keep a separate stack for permanent resources (https://github.com/SC5/serverless-plugin-additional-stacks) so that it can stay untouched when removing and redeploying Serverless services.

If some kind of built-in nested or cross stack support is added to Serverless, I wish it could address this need, too. The need is to have at least one stack that won't be deleted by sls remove.

@erikerikson
8000 Copy link
Contributor

Are alternatives to CloudFormation templates off the table?

CloudFormation gives us a lot but costs much as well.

@doapp-ryanp
Copy link
Contributor
doapp-ryanp commented Mar 29, 2017

My vote is swagger over CloudFormation [nested] stacks as well for this scenario.

@arabold I started on the plugin you mention late last year: serverless-plugin-swag. I just have not had the time to work on it since then. If someone wants to pick it up or fork that would be really cool.

My hope is AWS supports Swagger 3.0 soon as there are some improvements that will allow Serverless to generate a much more concise swagger file.

Also of note, API Gateway has a soft limit of 300 resources per API.

@HyperBrain
Copy link
Contributor
HyperBrain commented Mar 29, 2017

@erikerikson Theoretically the API access that was the base concept in SLS 0.5 - it has not limits, but, honestly, I wouldn't trade in all the inconsitency issues that are all gone with the controlled CF approach.

@kennu 's idea with the possibility to declare resources to be in a permanent stack together with a nested stack solution should solve the problem completely and keep everything consistent. A swagger/openAPI solves a different problem, but not the Cloudformation limit problem!

Nested stacks could be configurable, e.g. serverless could allow you to specify that all api stuff goes into stack1, everything about functions in stack2, etc. And nested stacks keep the consistency as everything is kept managed by CloudFormation.

Also from an implementation/architecture point of view it is the least destructive approach, as the stack split can be done on the emitted (compiled) CF template as minimal input and could split that up into nested ones just by distributing the resources and creating the correct references. Without interfering with any SLS semantics.

@HyperBrain
8000
Copy link
Contributor

@doapp-ryanp I would not use soft limits for evaluating a solution as they can be easily increased by sending a request to AWS (like other soft limit, e.g. concurrent lambda invocations, etc).
Only hard limits should be taken into account because there is no way to overcome them.

@doapp-ryanp
Copy link
Contributor

@HyperBrain yup agree. That APIG soft limit has no influence on my opinion for this topic. Just something to be aware of. We submitted a APIG resource increase request last week and it was not "easy". Not only do they require justification, they urge you to use a custom sub-domain and reverse proxy traffic to multiple APIG instances.

@erikerikson
Copy link
Contributor

@HyperBrain no kidding, right? That's a good example of the "gives us a lot" bit.

The original implementation enacted no mechanism for identifying the current state and rectifying current conditions. It's not an unsolvable problem though not simple.

One of the other "gives us a lot" bits was the avoidance of maintenance due to AWS API changes or the addition of services (unless, of course, the framework community decided to give first-class treatment for a service). With a cost of waiting for CFT support or custom, to-be-thrown-away plugins which isn't as conducive to contributions.

It wouldn't be a light undertaking and given the distraction level that VC funding has introduced into the team, I'd rate the probability of execution on such an option as low but the question seemed worth consideration.

Personally, I think AWS should be providing some staffing or support to improve this, given that it solves a major counter example to their "customer obsession". SAM is a fine clone offering except for the rest of it all.

Personally, I'm still pondering this on the "slow path".

@pmuens
Copy link
Contributor Author
pmuens commented Mar 31, 2017

Just read through the whole thread.

Really great and valuable feedback! Thanks everyone! 👏 💯

In conclusion the two valid options are:

  1. Nested stacks
  2. Cross stack references

Swagger / OpenAPI support is still something we consider to implement, but it won't solve our main problem of high resource counts (as @dwolfand correctly pointed out).

@rowanu you've got some more experience with cross stack references and don't like nested stacks that much. Could go a little bit more into detail why nested stacks are bad / provide an example how we can solve this problem with nested stacks?


After some more investigation we'd propose to go with nested stacks for now.

Here's why ("Please don't burn it down with fire" 😄 ):

  • Nested stacks are the (first) suggested solution by AWS when you face problems with a high resource count (as pointed out by @arabold)
  • Nested stacks introduce a dependency to their parent stacks so the whole deployment will be atomic and when one sub-stack fails the whole deployment will be rolled back
  • Cross stack references are a more viable solution when dealing with different "microservices". In our scenario we only want to split up one microservice because of the CloudFormation limitation. Each Serverless service should still implement only one service (e.g. user managment) and not the whole application
  • Cross stack references make it hard to delete stacks that export values if those values are being imported by another stack. Making it harder to manage the whole infrastructure as one large stack

The implementation proposal would look something like this:

Users can specify if they want to opt-in for this feature. This ensures that it's non-breaking:

# serverless.yml

provider:
  useNestedStacks: true

or

# serverless.yml

provider:
  useStackSplitting: true

We hook into a lifecycle before the actual deployment, read the compiled cloud formation template from memory and split it up into separate stacks (as proposed by @HyperBrain). Each stack can hold a maximum of X resources.

The main Stack does not contain anything but all the nested stacks and gets the "normal" stack name Serverless also uses when not splitting the stack. Each stack get's a sequential number appended and is referenced in the "main" stack.

The first naive implementation would keep all the functions in one stack and then split it up after X additional resources (no grouping involved yet). We can implement a better algorithm which will check for resource groups along the way (maybe we need to do this right in the beginning because we splitting it up in the naive way is impossible due to dependencies. We'll see).

Anyway. That's just the status quo how we could implement this.

@rowanu
Copy link
Contributor
rowanu commented Mar 31, 2017

Hey, if that's what you want to do, go for it.

Just to clarify my position:

Nested stacks introduce a dependency to their parent stacks so the whole deployment will be atomic and when one sub-stack fails the whole deployment will be rolled back

This is the kicker. Maybe it's less-bad these days, because CFN seems to be able to recover from more situations than it did in the past. Having a whole stack fail in to an unrecoverable state due to a child template is a thing that happens, and it sucks.

Nested stacks result in a high degree of coupling between the stacks, to the point where a failure in a separate stack results in the failure of the parent stack.
Cross-stack references encourages lose coupling, with clearly defined interfaces (i.e. only what you export/import. In programming terms, these are desirable things.

Since they are modular, stacks connected by cross-stack references can be updated completely independently. If there is no change to a stack, then no update needs to take place. This is not the case with cross-stack references. From the docs:

If the template includes one or more nested stacks, AWS CloudFormation also initiates an update for every nested stack. This is necessary to determine whether the nested stacks have been modified. AWS CloudFormation updates only those resources in the nested stacks that have changes specified in corresponding templates.

This means every update will require an update operation of every nested stack, regardless of if there is a change or not. The child template resources will not be changed if not required, but this will not be fast (update related issues are being discussed in #3364 too).

Of the people in this thread, who have actually used nested stacks? I hear a lot of people in favour of the idea, but not a lot of personal experiences. As mentioned before, I still haven't met anyone who has used nested stacks and loved them. I have used cross-stack references, and they're really good.

Update: Added link to blog about nested stack updates.

@dougmoscrop
Copy link
Contributor

I don't think it's either or. I think persistent resources such as storage (s3, dynamo, etc.) should be in their own 'data tier' stack; possibly created from outside of serverless (larger organizations might have terraform modules or other mechanisms by which they give you an approved system e.g. that has encryption-at-rest turned on). Sometimes these systems are managed by other teams, even. These should be cross-stack references (or a plugin that understands terraform outputs). Then there's infrastructure stuff - IAM roles, CloudWatch Metrics, dead letter queues, that can go above in a number of nested stacks. Then there's the app layer which is the functions themselves.

It seems like there's some logic that needs to be sussed out to support deterministic bucketing of resources (query CF for a given stage, see if it exists, attempt to match resources to existing stacks, etc)

@HyperBrain
Copy link
Contributor

@dougmoscrop The creation of separately maintained resource stacks outside the ownership of Serverless is imo completely independent. Even now you can create them and reference the resources with Fn::ImportValue or pure Arn references. For the sake of consistency the automatically generated resources (like functions, loggroups, Apis, etc.) should be within nested stacks as otherwise the framework itself had to take care of all possible conditions that could happen during deployment (e.g. rollback of a base stack if a dependent stack update would fail).
The proposal somewhere above, having an optional persistent stack that contains user selected resources seems a proper solution for me.

@nicka
Copy link
Contributor
nicka commented Apr 3, 2017

@rowanu still haven't met anyone who has used nested stacks and loved them. I have used cross-stack references, and they're really good.

Have used nested stacks in the past and NO I wasn't enjoying it. Fn::ImportValue I LOVE and haven't had any issues with it so far! Even before Fn::ImportValue existed we created custom Lambda backed resources to do the same thing.

Nested Stacks:
You have less-to-no control over the ordering of the deployment.

@pmuens The first naive implementation would keep all the functions in one stack and then split it up after X additional resources (no grouping involved yet). We can implement a better algorithm which will check for resource groups along the way (maybe we need to do this right in the beginning because we splitting it up in the naive way is impossible due to dependencies. We'll see).

Resources switching from one stack to another during a deployment could lead to failures within a live environment (something that should be researched way before implementing anything).

I really feel some sls users are packing too much into one service and not knowing it's a bad practice. For example, the framework could easily inform developers about the resource/functions count before it becomes crucial (200 Resources). Examples of big projects could really help in this area.

Conclusion:
I don't have a perfect answer on what is the best here or what would work best for all users. One thing I do know is, is that I will probably never us the nested stack feature and keep on abstracting with cross stack referencing. The framework allowing for resource splitting sounds good, but as mentioned earlier I think users should understand forcing a lot of functions into one service might not be the best thing.

@arabold
Copy link
arabold commented Apr 3, 2017

I second @nicka's option here to encourage devs to think for themselves instead of having Serverless trying to solve every possible scenario automatically. Large stacks with hundreds of resources are difficult to maintain not only because of resource limits. It should be avoided and in a lot of cases stacks can be split to reduce their complexity, resources can be combined, etc.

We have an environment consisting of more than two dozens Serverless stacks and 150+ Lambda functions, various DynamoDB tables, SQS queues, etc. Some stacks are still 0.x while others are 1.x.Fn::ImportValue is great and we make daily use of it to interconnect these stacks and reference Lambda functions in a convenient and portable way.

Having that said, Serverless should give developers the choice and the tools to decide how stacks can be split. For some scenarios distinct stacks that are deployed independently are perfectly valid. For other scenarios you might want to go with nested ones to ensure an (semi-)atomic operation. But in no case I need Serverless to take this decision for me. Just give me the tools to do it myself.

A possible and naive solution would be to allow the definition of sub-stacks with their own functions and resources sections. E.g. something like this:

service: my-service
provider:
  name: aws
  runtime: node6.10
  stage: dev
  region: us-east-1
functions:
  # functions that will end up in the main stack
resources:
  # resources that will end up in the main stack

nested:
  my-nested-stack:
    functions:
      # functions that will end up in the nested stack
    resources:
      # resources that will end up in the nested stack

  my-other-nested-stack: ${file(./serverless-nested.yml)}

@ryansb
Copy link
Contributor
ryansb commented Apr 4, 2017

IMO the simplest first pass would be to put all the log groups in another stack, since as of v1.10 of the framework nothing depends on them directly. And it reduces the number of resources in the stack by [whatever the number of functions is].

@nicka
Copy link
Contributor
nicka commented Apr 4, 2017

@ryansb Agreed!

I alos like the suggestion provided by @arabold, where multiple stacks should be defined by the user.

@pmuens the useNestedStacks: true and useStackSplitting: true feels to "obvious" and might cause problems in the future. Keep in mind that this feature is requested for "bigger" projects meaning the impact when stuff breaks will also be a lot higher for these "bigger" projects. Notice the "bigger" here, because I feel our current services might actually be bigger but they don't all live in one serverless.yml.

The framework should solve the issue of high Lambda Functions + APIGW paths count for "big" API's. This could be done by creating the "main" resources (IAM Role, APIGW, etc.) in the top level stack, functions could/should then be distributed over nested-stacks/cross-stacks based on function names (or by a calculated hash value).

@pmuens
Copy link
Contributor Author
pmuens commented Apr 5, 2017

Thanks again for the healthy discussion! Really helpful to have so many different opinions about it.

I played around with nested stacks the other day and I can share and feel the pain. I believe you must go through this process in order to understand the pain points.

However we've started the work on a stack splitting feature in #3441. Currently it's WIP and we're experimenting to see if this could be a potential solution.

Other than that we've opened up #3442 which is a brain dump for a native Serverless import / export feature so that you can share values between two independently deployed services. Happy to have your feedback over there as well!

Anyway. Those issues have a really high priority right now since they are blockers for larger projects. There are ways to resolve this but native Serverless like solutions should be in place to help you with that.

@pmuens pmuens modified the milestones: 1.11, 1.12 Apr 6, 2017
@joseSantacruz
Copy link

Hi @pmuens any update about this feature right now my current project has 202 resources so I'm receiving the error about the limit of CF

@pmuens
Copy link
Contributor Author
pmuens commented Sep 4, 2017

@joseSantacruz thanks for commenting 👍

Right now the best solution is to split your large service into multiple services or use @dougmoscrop "Split Stacks" plugin. You can find it here: https://github.com/dougmoscrop/serverless-plugin-split-stacks

@emilioponce
Copy link

Hi guys, i'm not an expert on this issue, but is the CF template limitation something we can ask Amazon to remove? Seems critical to me. Thanks.

@pmuens
Copy link
Contributor Author
pmuens commented Oct 10, 2017

Hey @emilioponce thanks for commenting 👍

It looks like the recommended way to handle this issue is to split the stack up into multiple, smaller stacks (so it looks like that an increase done by the AWS support is not possible): http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html

@emilioponce
Copy link
emilioponce commented Oct 10, 2017

ok @pmuens , I've read about this strategy but as you may know, it's a "workaround" for a limitation that didn't has to be there. Thanks anyway.

@medikoo
Copy link
Contributor
medikoo commented Dec 4, 2017

FYI in split-stacks plugin there's a PR, which provides a solution that distributes resources to nested stacks per lambda (so each function has its own dedicated nested stack, and common resources stay in root stack).

This solution allowed to successfully workaround 200 resources limit problem in project I worked at.

Still, culprit is, that migration to such solution (from default handling), requires one-time operation of removal of stack (stage), and again creation of it with split-stacks plugin in.
It's due to CloudFormation limitation, as there's no possibility to move resources between nested stacks during update process. However as I learned, support for that is on AWS roadmap, but with delivery date uncertain.

@medikoo
Copy link
Contributor
medikoo commented Oct 7, 2020

Closing, as it feels as duplicate of #2387 which was closed for reasons outlined here #2387 (comment)

@medikoo medikoo closed this as completed Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

0