RFC: Buffer low-level logs and flush on high-level log #3410
Closed
zirkelc
started this conversation in
RFCs (Request for Comments)
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Note
The RFC is now closed, the latest spec for the feature can be found here, while implementation can be tracked here (#519) for TypeScript and here (aws-powertools/powertools-lambda-python#6060) for Python.
Is this related to an existing feature request or issue?
#519
#3142
Which area does this RFC relate to?
Logger
Summary
Log items which are below the configured log level should be buffered in memory and flushed to CloudWatch on error.
Use case
The Logger log level controls which log items are written to stdout and which are discarded.
There are five log levels: TRACE, DEBUG, INFO, WARN, ERROR.
The log level is set on instantiation, for example:
The Logger has methods for each log level to create log items:
Log items can be categorized into low-level and high-level logs.
Low-level log items are those which are below the configured log level (log < log level).
High-level log items are those which are at the or above the configured log level (log >= log level).
Low-level logs are discarded while high-level logs are written to stdout:
In a development environment, we typically set the log level to DEBUG or INFO since we use these logs for debugging purposes.
On the other hand, in production, we usually set the log level to WARN or ERROR, because we have much more traffic and want to avoid the noise in the CloudWatch logs.
However, when an error occurs, we want to have as much context as possible to understand and diagnose the root cause. Therefore, we want to write all low-level logs to stdout, regardless of their log level.
For example, a public Lambda Function URL serving as backend for a web application:
Since such Lambda function is invoked frequently, it should avoid writing unnecessary logs.
However, in case of an error we want to log everything like the request and response.
This includes the original
event
that was not logged because of the INFO level.Proposal
The following is a high-level design proposal with pseudo-typescript code how it could be implemented.
Logger.processLogItem()
method to buffer and flush low-level logsBuffer
class to store buffered log items on theLogger
instanceLogger.writeBuffer()
to write a log item to the bufferLogger.flushBuffer()
to flush the buffered log items to stdoutLogger.printLog()
method to accept a serialized log item from the bufferLogger.processLogItem()
The
Logger.processLogItem
method is called by the different public log methods (info
,warn
,error
, etc.).Therefore it would make sense to extend this method to implement the proposed behavior:
Buffer class
The
Buffer
class stores buffered log items on theLogger
instance.The exact implementation is yet to be determined.
Logger.writeBuffer()
The
Logger.writeBuffer()
method writes a log item to the buffer.Since the log items contain
extraInput
which is an object, we need to make sure that the object is serialized to a string when it is buffered in case it is mutated by the application.Logger.flushBuffer()
The
Logger.flushBuffer()
method flushes all buffered log items to stdout.Since the log items were serialized when they were buffered, we can write them directly to stdout without the need to serialize them again.
Logger.printLog()
The
Logger.printLog()
method is changed to accept a serialized log item as input.In this case, the log item is written directly to stdout.
Out of scope
Code outside of this project is out of scope, like Lambda Runtime APIs, etc.
Potential challenges
The following challenges could be identified:
Cleanup buffer after successful execution
The Lambda execution environment will be re-used for multiple invocations. If the
Logger
instance is created outside of the handler function, the buffer could contain stale log items from previous invocations. Ideally, the buffer will be cleared after each invocation.Possible solutions
middy
to clear the buffer after each invocation.Flush buffer on runtime errors
If the Lambda function throws an unhandled error, the buffer could contain important log items that were not written to stdout.
Possible solutions
middy
to flush the buffer on error.Limit buffer to avoid memory issues
Since the buffer stores serialized log items, it could grow very large and consume a lot of memory, potentially causing an out of memory error. Therefore, the buffer should be limited depending on certain metrics like number of entries and total buffer size.
Limit buffer length
The buffer could be implemented as a sliding window of fixed length.
New items are added to the buffer until its length reaches the threshold. Then, every new item drops the oldest item from the buffer, therefore the buffer length remains constant.
Limit buffer size
The buffer keeps track of the total size of serialized log items.
If the total size of serialized log items reaches the threshold, the buffer drops the oldest log items until the total size is below the threshold.
Dependencies and Integrations
This feature could be integrated with a middleware like Middy.js to:
Alternative solutions
No response
Acknowledgment
Future readers
Please react with 👍 and your use case to help us understand customer demand.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 8 comments · 9 replies
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @zirkelc, thank you so much for taking the time to open this RFC.
I'll make sure to review it properly by early next week at the latest.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @zirkelc! Thank you very much for kickstarting this RFC and helping us design the idea of implementing this buffering logic, this will be beneficial for all versions of Powertools, not just TypeScript. In this initial feedback, I will not focus on TypeScript (and code is just pseudocode), but on this feature as a whole, because we need to find a way to standardize this across all runtimes.
New methods
I'm not sure if creating new methods like
Logger.writeBuffer()
andLogger.flushBuffer()
is the best option. I think buffering logs or not should be a behavior for the logger instance and not for specific methods. I may be wrong, but it seems very confusing for customer to mix the two situations, like flushing logs immediately and buffering them, and it's even hard for us to control this. I can imagine some situations here where the order of logs could be messed up because of the flushing mechanism and create some side effects when querying these logs in some tool. I would prefer to make buffering the default behavior of the constructor, likeThis code might work better for TypeScript and Java/dotnet rather than Python, where we probably need to configure this in the Lambda Handler decorator. I need to test it, but the concept is the same: make this a default behavior instead of having different functions and different flush methods.
Limit buffer size
I like the idea of using a fixed default circular buffer size and allowing the customer to change that default size based on their needs. This highlights the shared responsibility model, where we provide some security default values as a safe guardrail, but the customer can change that and understand that it is their own responsibility. Also, the buffer size should be in bytes and not the number of log lines.
What we need to understand better: inspect the log lines across all runtimes and see an average size when we log different events that Lambda gets.
Flush buffer on runtime errors
This is one of the hardest parts of this implementation. Due to Lambda's execution model and in cases where the customer is not using extensions - and we can't assume that customers always use extensions - we don't have access to the
SIGTERM
event when Lambda terminates/kills/terminates the container, which means we don't have a standardized method to flush the logs before Lambda kills the container in case of an exception.I'm not sure I agree that we should rely on a dependency to do this. Correct me if I'm wrong, but it seems that not everyone is using Middy in their Lambda and this way we will force customers to install Middy to have this feature, which may be challenging for adoption. An alternative is to wrap the Lambda handler in a decorator (is this a possibility in TS and dotnet?) the same as we do with Python when the customer uses inject_lambda_context.
In my opinion, if we can't solve this problem, I don't see customers adopting this feature for reasons such as:
1 - Storing DEBUG/ERROR logs in a buffer and only flushing them in case of an exception is the best benefit of this feature, in my opinion. This way, customers will have logs only for debugging in case of exceptions and will not pollute their logs with unnecessary debug logs for a normal execution.
2 - Customers can store information logs in a buffer, for example, and only release them at the end of the execution, but what is the benefit? Even though printing to stdout is a blocking operation, we have no reports that this affects Lambda execution in normal situations.
Clearing buffer after successful execution
_X_AMZN_TRACE_ID
is not available inprovided
family and Java17+ and this may be a limitation. We could useaws_request_id
from Lambda Context, but this is a unique id for execution in request/response integrations like APIGW, and may be repeated in Stream/Async invocations where reprocessing happens for a specific item that failed. If we design this feature to use a decorator in Lambda handler then we don't need to worry about this because we can cleanup the class after the execution, if not then we will need to find a way to do this and I don't have a proper answer right now.We need your opinion here in some aspects for dotnet @hjgraca.
Again, thanks so much for starting this RFC @zirkelc and please let me know what you think.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you @leandrodamascena for your feedback! Here are my thoughts:
I agree with your reasoning. The two suggested methods were only intended for demo purposes and to make it easier to follow the idea of buffering and flushing.
I think Middy is just one of the options that could be provided as example to automatically flush the buffer, basically as it is now for Metrics where you can use the
logMetrics
middleware, use a class decorator or callpublishStoredMetrics()
.I partly disagree, as I think this feature will be useful even if it does not work automatically on unhandled errors (i.e.
SIGTERM
). In my opinion, unhandled errors that actually kill the runtime are rare situations. Every time I encounter an unhandled error, I will try to isolate the error and put it inside try-catch andlog.error()
the message and context. I can still re-throw the original error, but thelog.error()
before will cause the buffer to flush.I think the main benefit is actually reduced costs. In my case, we are a small startup and CloudWatch is already our 3rd most expensive AWS service.
I had the same idea of using
aws_request_id
as unique identifier. I don't like it, but maybe this is something where the Logger implementation of this feature is deviates from runtime to runtime? For example, the unique identifier for log buffering is configurable at the Logger constructor and defaults to_X_AMZN_TRACE_ID
if available. Other runtimes can require the context/aws_request_id or require some user-generated id.But even in the worst case, where you have no unique id available to distinguish the log items from subsequent invocations, you will only print too many items which are easily detectable because the log contains the datetime of its original creation. That means you can spot of a log was created before the Lambda request started. That could be somehow mitigated with a warning that print
Flushing N buffered logs. May contain logs from previous invocations, check the log datetime and start of request
. Not optimal though, but acceptable as a opt-in feature.Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Got it, thanks for clarifying.
I agree. We can implement it in several ways and Middy + Decorator are two options. In Python and other runtimes, middleware is not a real solution, but class decorator is definitely the way to go.
You have a strong point here and I agree with you, customers can benefit from this feature by controlling their exception flow. My main thought here was about helping customers to flush debug/error logs when uncaught exceptions happen, and in this case it seems we don't have much to do. Probably with the class decorator we can do something, but we need to investigate more.
We can agree here to make it clear in the documentation that this is a mechanism that must be controlled by the customer and not by Powertools.
I understand the cost concerns, but I don't get the idea of buffering logs and flushing after execution/exception helps. Do you mean that no matter the log level (INFO, DEBUG, ERROR), it will only be flushed explicitly and not automatically? Can you expand on the idea here a bit more?
I don't have a strong opinion on this yet, let me think about it a bit more. If we flush previous executions then we may add unnecessary costs to the customers.
Hello @zirkelc! Thanks for clarifying your points and I think we are starting to get a clearer picture of where we can go with this feature. The other Powertools members are on annual leave and should be back on Monday 06/01, we will probably have more traction on this discussion next week.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
With log buffering, we have a decision boundary along the
LOG_LEVEL
: every log belowLOG_LEVEL
(aka low-level logs) goes into the buffer and every log at or above theLOG_LEVEL
(aka high-level logs) gets printed to stdout. Now there are two cases where the buffer should be flushed:logger.debug/info/warn/error
functionsHere's a bit pseudocode to demonstrate 2):
Output:
Previously I said unhandled errors that actually kill the runtime are rare situations, because I would usually try to convert the unhandled into a handled error. Here's an example (even though we're still throwing):
In this case, I still re-throw the original error and thereby kill the runtime, but I logged the error and
logger.error()
implicitly flushed the buffer with the debug statement at the beginning.I hope it's more clear what I meant. If I misunderstood it, please let me know.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Thanks for the interesting discussion and clarifications.
I agree that the new methods to flush & write to the buffer should be internal only. I don't know if this was the case when Leandro first commented but at the time of me reading they're marked as
private
so this is clear and we are on the same page.Regarding how the feature would work, aka when logs are flushed I think there are three things that are all at play here: 1/ at this time, and for the foreseeable future, we don't have a way of intercepting unhandled exceptions and flush automatically; 2/ because of this customers will need to flush the buffer manually. 3/ With that said, the Logger still needs to automatically empty the buffer after a request completes to avoid leaking data across invocations.
For now I'd like to exclude the first point from the discussion since it requires a separate discussion and has upstream dependencies that we need to address - if interested to read more or chime in, here's the discussion aws-powertools/powertools-lambda-python#1798.
Regarding the other two points I'd like to first focus on how/when logs are flushed by customers.
How do customers flush the buffer
In my opinion overloading the
logger.error()
method offers less flexibility and makes the feature non-backwards compatible. I could see myself not wanting to flush a buffer every time I log an error but wanting to do so at specific points of my code.For example, if I have multiple
try/catch
blocks either in a sequence or nested I might want to still log the errors but I might want to reserve the option to flush the buffer either only in the outertry/catch
or when a truly critical error happens.For example:
Alternatively we could instead use the
logger.critical()
method, but to be perfectly honest I am quite inclined to just have a dedicated method. This method would also be needed anyway if we were ever to do some auto-flushing mechanism.Additionally, using a logging method would mean that customers adopting this feature might have to refactor their code since now every
logger.error()
/logger.critical()
call also flushes buffered logs.Basically I would want to decouple the flushing from emitting regular error logs since this would make the feature more incrementally adoptable and also clearer to predict how it works.
How do Logger empties the buffer between requests/invocations
The second point is how do we clean the buffer.
Clearly using a method that wraps the handler is an option, be it Middy.js middleware or class method decorator in TypeScript or regular function decorators in Python. This would be quite easy since we control the flow and we can run code after the handler returns but before the invocation finishes.
The problem with this is what Leandro hinted at: not everyone uses these methods and there's a non-zero amount of customers who don't use either Middy.js nor class method decorators (example).
Because of this, before introducing a potential foot-gun I'd like this RFC to have a compelling answer to "how do we make sure all/most customers can use this feature without the risk of leaking buffered logs in between requests".
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Just so I don't forget, I'm linking this issue from Python repository here: aws-powertools/powertools-lambda-python#4432
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone! Just a quick heads up, I'm running a PoC with some scenarios and validating critical points before we move forward with this RFC. We'll have some updates in the coming weeks.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @zirkelc - thank you for your patience. It took me longer than I would have liked to get back to you on this.
First of all thank you for putting time and effort into writing this thought out proposal.
Here are a few comments that I have after a couple reads:
Feature should be opt-in
My first comment is that the feature should be completely opt-in. I am unsure if this was already the intention behind the RFC, but since I expect this feature to have some kind of impact on either performance or resource consumption I would want only customers who are actually interested in it to have this overhead. Doing this would also allow us to release the feature in a minor release rather than having to wait for a major one.
Log level changed at runtime
I'd like us to discuss what happens when customers change the log level at runtime. Specifically, the Logger allows customers to set a new log level after the log has been initialized via
logger.setLogLevel('DEBUG');
.With this in mind, I'd like the RFC to have an agreement on what happens to buffered logs when the log level is changed to one that is above (aka more verbose) than the original one.
For example:
Log Sampling
I'd also like to discuss how this new feature integrates with the log sampling existing one. For context, this existing feature allows customers to dynamically change the log level to
DEBUG
for a % of executions.This addresses addresses a similar concern to this proposed feature, albeit from a different angle since for log sampling the assumption is that if there’s a consistent issue in my workload it’ll show up in a statistically relevant sample of my requests.
With this in mind, what happens when a request is sampled (aka the log level is set to
DEBUG
) and buffering is also enabled? Since this is a runtime log level change tied to a request, this case is somewhat related to the section above so I expect that whatever we decide above would apply here as well.Data structure
The proposal suggests we should implement our own
Buffer
class. I know this is an implementation detail, but I think we might want to evaluate if we can use existing data structures rather than building an unoptimized one ourselves.For example, in this case I could see us using an
Array
, aSet
, or even better aMap
(see comments in the section below titled Cleanup buffer after successful execution as for why).When to flush logs
If I understand the proposal well, especially in the section that details the
Logger.processLogItem()
method, I think it's suggesting that logs are flushed dynamically based on the level of the log being emitted. If this is the case then I am not sure I understand how would that work.From that section I understand the
else
block buffers logs below the current log level, but I don’t understand why the buffer is flushed whenever I emit a higher level log. This way, if I do something like the example below, the logs will be flushed at the third log:Since the original assumption behind this feature is that as a customer I want to have all the logs to diagnose an issue, I would rather instead do one of the following two options:
logger.error()
and/orlogger.critical()
are calledlogger.flushBufferredLogs()
to be called most likely in atry/catch
block. (personally I prefer this one)This would cover most of the cases where as a developer I want to get extra info because I know there was an error. The only case not covered by this mechanism is unhandled exceptions, which can’t be addressed today and will be addressed by a separate feature.
Cleanup buffer after successful execution
I don’t think we should rely on a Middy.js middleware or class method decorator for this purpose since not everyone uses these ways of wrapping a Lambda handler.
I think we should instead modify the data structure we use to hold the buffered logs to be namespaced so that log entries from one request can never interfere with others. This would also make the cleaning process more straightforward.
For example, if we had a structure like a Map where the keys are a request identifier (TBD) and the value is an Array/Set of logs, then the
writeBuffer()
andflushBuffer()
internal methods would always only touch the key with the request identifier that they’re being called in.The main issue I see with this approach is that we’d need to find a request identifier that is widely available and easy to access.
The
_X_AMZN_TRACE_ID
env variable seems like a good candidate, but for custom runtimes it requires customers to set it, and would not work for local environments unless the customer mocks it. Additionally, I don't think this value is populated while outside of the Lambda handler so we'd need to decide what to do with logs that might be buffered during the INIT phase.The request ID could be another candidate but just like the previous one, it's also not present outside the request cycle and it also requires customers to explicitly pass a reference of the
context
to the Logger to access it. This is not a huge deal since we already have methods to grab it when customers calllogger.addContext(context)
or when they use the Middy.js middleware/decorator.Something I mentioned during our initial discussion in the linked issue would be to leverage language features like
AsyncLocalContext
but in practice I am not sure how would that work since I am not overly familiar with this API. Likewise I don't know if there are equivalent in other languages.Limit buffer size
I share @leandrodamascena's sentiment about having a circular buffer with configurable size in bytes. In addition to his comment about analyzing average log item size to define a default size I would also like to understand the performance impact of actually inspecting the log size and whether there's anything we can do to limit this impact.
In terms of questions from my side, I have mainly one:
Should we buffer logs emitted outside the Lambda handler if buffering is enabled? I can see us wanting to do this, but doing so complicates even further the problem of "how do we clean up after each request" since these logs are inherently not linked to any invocation.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Yes, definitely opt-int!
If we only flush logs on error (see When to flush logs), I think changing the log level should not affect the buffer. That means in your example, the buffered
info
log should remain in the buffer until flushed on error.Alternatively, we could say changing the log level via
logger.setLevel()
is similar to flushing the buffer vialogger.flushBufferredLogs()
. In the first case, we have a filtered flush that only emitINFO
logs while the second case is a unfiltered flush that emit all logs (includingDEBUG
).If a request is sampled and all logs are being emitted directly, I think we can simply pause the buffering for this request. I didn't look into the implementation of sampling, but if the log level for sampled requests is temporarily changed to
DEBUG
it should disable the buffer implicitly.I agree that the logs should be collected in an optimised data structure. The idea for a
Buffer
class is simply to co-locate necessary functions for insert/remove/flush. Like every append into the buffer should potentially drop an element at the other end. We can implement this mechanism in theLogger
itself or as standalone function/class.Yes, the main feature of this proposal is to collect logs (debug, info, ...) and emit them on error (handled or unhandled), either implicitly via to
logger.error()
or explicitly vialogger.flushBufferredLogs()
. I think this should even be a configuration option.I went a bit further in my proposal, because I wanted to generalize this behavior into a simple condition like
if $logItemLevel > $configLogLevel, then flush logs
.For example, I personally use
INFO
as configured log level for production and everything above this level (WARN
andERROR
) are exceptional cases that I want to investigate further. That means everylogger.warn()
should flush the log it is above the threshold. But it is fair to say that not everybody uses log levels that way, so let's stick to the main idea of flushing logs on error.This is how I use it at the moment:
I use the X-Ray Trace ID as key in the
#buffer
object and collect all buffered items in an array.I think using
_X_AMZN_TRACE_ID
is the best option we have. If_X_AMZN_TRACE_ID
is not available butbufferLogs
config istrue
, we should print a warning similar to publishing metrics if none are available. As fallback, we could introduce alogger.setBufferId()
or something like that to manually set/override the identifier with acontext.requestId
or a user-generated UUID that is being set at the beginning of each function handler.Could you elaborate how/when you would use
AsyncLocalContext
? Would you like to use it to store the buffered logs or some kind of identifier?Depending how we implement the buffer, we could also opt for a
BufferInterface
that defines two methods:buffer()
andflush()
. We could provide a default buffer implementation likeCircularBuffer
, but every customer can implement their own buffering mechanism based on buffer-length, buffer-size or a timestamp.See my comments at Cleanup buffer after successful execution, I would say buffering outside of the Lambda handler requires setting an identifier manually via
logger.setBufferId()
, or we append all logs to the same identifier (empty string) and accept the fact that logs outside of the handler are mixed from multiple invocations.Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Thanks for the thoughtful answer.
I think at least for now taking into consideration
AsyncLocalContext
, or other contexts really, would complicate the implementation and API significantly and also stall the RFC.After some additional discussion with the team, I think we settled on an API that should accommodate all the requirements that we discussed.
Below are the decision points that we can use to guide the implementation:
When do we buffer
With this in mind, I think my final position would be to follow these rules:
_X_AMZN_TRACE_ID
(as you suggested) as key to group the logs that belong to an invocation and buffer themINIT
phase (aka outside of the scope of an invocation) these logs are never buffered, this is because_X_AMZN_TRACE_ID
is not yet present thereRegarding the second point, the experience would look like this:
which translates, in your logs to this:
Request 1
Request 2
What do we buffer
By default, when buffering is enabled, we'll buffer only log levels
DEBUG
andTRACE
. We settled on this default because we want to nudge customers to make better design decisions and emit logs at the correct log level. In principle, diagnostic data that might be superfluous unless there's an error should be emitted at one of these levels, while less verbose levels likeINFO
, andWARN
should be used for data that should be logged also during regular operations.With this in mind, logger would work like this when buffering is enabled:
We however understand that some customers might want to make different tradeoffs, either because they don't fully control the logs or because they want to reduce costs at minimum and really only emit certain logs, but still have them in a buffer for when things go wrong.
To accommodate this use case, we're also giving the flexibility to change the level at which the logger starts buffering. For example, if I want to buffer up to
WARN
, I can do so by setting an option in the buffer configuration:How do we empty the buffer
When it comes to flushing the log buffer, after a lot of discussion with @leandrodamascena I think we'd like to settle on three APIs:
logger.error()
(enabled by default when buffering is active, can be opted-out)manual flushing
The first one is pretty straightforward, customers should be able to flush the buffer at any time by calling a method:
Flush on error log
When buffering is enabled, we'll flush the buffer whenever customers call
logger.error()
. This allows customers to get all the data about an invocation when they need it the most, aka when there's an error.Customers can also opt-out from this if they want to have tight control over the buffering mechanism:
flush on unhandled error
When buffering is enabled, and customers are using our
injectLambdaContext
Middy.js middleware or class method decorator, we can also intercept unhandled errors and flush the buffer before re-throwing the customer error (we'll never swallow customer errors).Since this requires us wrapping the handler, we can only do this with either the middleware or the decorator. Similar to the previous method, customers can opt-out of this by setting an option on the decorator or middleware:
What happens when the buffer is full
As discussed above/in other threads, we'll use a circular buffer structure. This means that depending on the
maxBytes
setting and how many logs are actually buffered, some of the logs previously buffered might be evicted before the buffer is flushed.To avoid customer confusion, when this happens we'll emit a warning log when the buffer is being flushed. This allows us to respect the customer buffering level while still informing them about the missing logs.
For example:
which results in:
How do you optimize for buffer storage
By default we'll set the buffer to a relatively low amount (TBD) and while customers can configure this amount, certain use cases might require additional optimization. For these cases we'll allow customers to enable buffer compression which optimizes for space, at the cost of a CPU overhead (its actual impact will depend on the Lambda function allocated resources).
When this setting is enabled, we'll use the
zlib
module from the standard library to compress the logs as they are added to the buffer, and decompress them if/when the buffer is flushed.This should be all, if there's no additional comment/unresolved point we'll begin implementing this sometime next week.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you, I'm very happy with the design decisions! 👍
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey @zirkelc , thanks for the RFC, Leandro and Andrea already dived deep into most critical questions I had on my mind. One addition I noticed is missing (or maybe I missed it), is the information about buffer size overflow. If the buffer is fixed and we replace logs, I would suggest to add a warning to indicate that it happened. This will help customers to see that they might need to increase the buffer size and that some logs are missing.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @am29d
yes that's a good point. It would be kind of a buffer overflow warning.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
I have a PoC running in Python with most of all the things that we discussed here.. The experience is something like this:
I'm reviewing the RFC again and I have 2 final questions:
1 - Should we allow customers to set max_size and/or max_lines, discarding the oldest entries when either limit is reached?
2 - Does it make sense to compress the buffer? I was running a test and 1k log entries in debug mode and they are 100kb without compressing and 45kb compressing. But I'm not sure if someone is buffering 1k log entries and will want to flush it in case of 1 error, what do you think? The CPU consumption doesn't change much, but I still need to do some more detailed tests.
I'm just waiting for @dreamorosi's final considerations to submit the PR in Python.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
I left a long comment here with a recap of all the points we discussed.
Beta Was this translation helpful? Give feedback.
All reactions
Uh oh!
There was an error while loading. Please reload this page.
-
Just wanted to update the RFC sharing that the feature was released in v2.16.0 - let us know what you think!
Beta Was this translation helpful? Give feedback.
All reactions