-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Flush error messages incrementally after processing a file #4396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I mostly left minor notes, but I'd like to see some more rigorous testing. Also, have you verified manually that streaming can produce errors faster using a significant codebase (such as mypy itself)?
mypy/errors.py
Outdated
|
||
Use a form suitable for displaying to the user. | ||
""" | ||
self.new_messages() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment about new_messages()
storing new messages as a side effect (or potentially rename the method to something that makes this more explicit).
mypy/test/testerrorstream.py
Outdated
if msgs: | ||
a.append('==== Errors flushed ====') | ||
a += msgs | ||
plugin = ChainedPlugin(options, [LoggingPlugin(options, flush_errors), DefaultPlugin(options)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: Add empty line after nested function for clarity.
test-data/unit/errorstream.test
Outdated
-- starting with "----" that are not ignored. The first two dashes of these | ||
-- lines are interpreted as escapes and removed. | ||
|
||
[case testErrorStream] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test case with an import cycle? You can perhaps use "deferred nodes" to interleave messages from two modules (like 'error from module a', 'error from module b', 'error from module a') and then you can test that the errors are grouped by file correctly.
mypy/main.py
Outdated
try: | ||
res = type_check_only(sources, bin_dir, options) | ||
res = type_check_only(sources, bin_dir, options, flush_errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if the main body of test cases (mypy/test/testscheck.py
) would use streaming errors (without testing that flushing happens as expected, i.e. we'd still only test that the sequence of errors in the output is as expected). Not sure if this is feasible without many changes to test cases. If not, at least it would be good to do a one-off manual check that streaming produces the same errors as not streaming in test cases.
test-data/unit/errorstream.test
Outdated
[out] | ||
==== Errors flushed ==== | ||
a.py:1: error: Unsupported operand types for + ("int" and "str") | ||
==== Blocking error ==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about also adding a third error that would be get reported after the blocking error (but doesn't get reported, because there was a blocking error I assume)?
Time to report an error in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may take my comments about the brittleness of the API with a grain of salt -- as we discussed off-line there isn't a great alternative. If you see ways to make it less brittle without having to change various call sites I'm all for it though!
mypy/build.py
Outdated
|
||
|
||
def flush_compile_errors(f: F) -> F: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not to make this a decorator at all, but a wrapper function. You could rename build
-> _build
and then create a new build
function with the same signature that calls _build
and catches CompileError
etc. You wouldn't need a type variable and the code smell would be less -- no higher-order function, no functools
, no TypeVar
, no cast
, no kwargs.get()
. (The price would be more repetition in the argument lists but that's all KISS. :-)
mypy/main.py
Outdated
for m in a: | ||
f.write(m + '\n') | ||
except BrokenPipeError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd sys.exit(1)
here.
mypy/errors.py
Outdated
# files were processed. | ||
error_info_map = None # type: Dict[str, List[ErrorInfo]] | ||
|
||
# The size of error_info the last time that error messages were flushed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no error_info
any more. :-) More seriously, this design feels a bit brittle, but I'm not sure how to remove that feeling -- I was thinking of having two maps, one with flushed errors and one with "new" errors, where the new_*
method transfers from the latter to the former, but that would require a whole bunch of updates to e.g. num_errors
etc.
But perhaps the flushed errors are not used any more (other than being counted, and that's just used as a Boolean)? That might simplify things a bit? (It seems messages()
just returns formatted_messages
so it doesn't need the flushed errors either.)
mypy/errors.py
Outdated
@@ -302,40 +317,41 @@ def add_error_info(self, info: ErrorInfo) -> None: | |||
if info.message in self.only_once_messages: | |||
return | |||
self.only_once_messages.add(info.message) | |||
self.error_info.append(info) | |||
self._add_error_info(info) | |||
self.error_files.add(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole variable error_files
is redundant, it should always be equal to set(self.error_info_map.keys())
. (But see my comment about the design of the latter.)
mypy/errors.py
Outdated
super().__init__('\n'.join(messages)) | ||
self.messages = messages | ||
self.use_stdout = use_stdout | ||
self.module_with_blocker = module_with_blocker | ||
self.num_already_seen = num_already_seen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds another ugly wart to the API (see the code in build.py that uses this variable). It might be prettier to have another List[str]
attribute unflushed_messages
that's set here. (Yet another place where I don't like indices pointing into arrays. :-)
test-data/unit/errorstream.test
Outdated
@@ -0,0 +1,72 @@ | |||
-- Test cases for incremental error streaming. Each test case consists of two | |||
-- sections. | |||
-- The first section contains [case NAME] followed by the input code, while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is redundant, since all .test files work that way.
test-data/unit/errorstream.test
Outdated
-- a plugin when a call to it is checked, which can be used to verify that | ||
-- error messages are printed before doing later typechecking work. | ||
-- | ||
-- The input file name in errors is "file". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
test-data/unit/errorstream.test
Outdated
-- | ||
-- The input file name in errors is "file". | ||
-- | ||
-- Comments starting with "--" in this file will be ignored, except for lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also redundant.
mypy/test/testerrorstream.py
Outdated
if msgs: | ||
logged_messages.append('==== Errors flushed ====') | ||
logged_messages.extend(msgs) | ||
if is_real: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block can also be indented.
mypy/test/testerrorstream.py
Outdated
logged_messages = [] # type: List[str] | ||
real_messages = [] # type: List[str] | ||
|
||
def flush_errors(msgs: List[str], serious: bool, is_real: bool=True) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be spaces around the =
(it's a special case in PEP 8 :-).
Another thought: let BuildResult only collect messages that haven't been flushed yet. This means that in general there are three ways you can get messages: (1) via the flush_errors callback, if set; (2) via BuildResult, if no flush_errors callback is set; (3) via CompileError. This way you can keep the old way of keeping track of messages, and you don't need to cache formatted messages. The tests simply pass flush_errors=None and get the message from BuildResult. |
The benefit of the caching scheme and returning the messages even when streaming is on is that it makes it easy to ensure that the streaming and the fixed interfaces return the same messages in the same order and also to test that. If we think that is important, ditching the caching would I think require some other machinery. |
I'm pretty confident about the ordering regardless, so I'd rather do without the cache scheme etc. |
Getting to the ordering to match without caching will require some new machinery not in the current patches: the errors are streamed out in the order that SCCs are processed, but this might not match the order in the |
In order to avoid duplicate error messages for errors produced in both load_graph() and process_graph() and to prevent misordered error messages in a number of places, lists of error messages are now tracked per-file. These lists are collected and printed out when a file is complete. To maintain consistency with clients that use .messages() (namely, tests), messages are generated file-at-a-time even when not printing them out incrementally. Fixes #1294
Nevermind, I have what I think is a good plan. |
Aren't parse-time errors always fatal (blocking)?
|
Actual parse errors are blocking but the first pass of semantic analysis is done immediately after parsing, and those can be nonblocking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this version a lot better! But I still have a bunch of questions and suggestions.
mypy/build.py
Outdated
@@ -25,6 +25,7 @@ | |||
import time | |||
from os.path import dirname, basename | |||
import errno | |||
from functools import wraps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this any more.
mypy/build.py
Outdated
@@ -703,6 +743,9 @@ def add_stats(self, **kwds: Any) -> None: | |||
def stats_summary(self) -> Mapping[str, object]: | |||
return self.stats | |||
|
|||
def error_flush(self, msgs: List[str], serious: bool=False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider getting rid of this and just inlining the only call site to add serious=False?
mypy/errors.py
Outdated
@@ -554,7 +596,8 @@ def report_internal_error(err: Exception, file: Optional[str], line: int, | |||
# Dump out errors so far, they often provide a clue. | |||
# But catch unexpected errors rendering them. | |||
try: | |||
for msg in errors.messages(): | |||
errors.flushed_files = set() # Print out already flushed messages too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried to provoke this? ISTM that in the only use case where it matters (real users running into a crash) this will just print the entire list of errors twice, potentially just confusing everyone. Or is there a unit test that needs this?
Also, flushed_files
feels like an internal attribute of the Errors
class -- if you really need this consider making it a flag to new_messages()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that I wanted to always make sure that all of the messages printed, even in the cases where the messages were being buffered in build.build. But I think you are right and we would rather lose some messages while running tests than confuse matters by printing duplicate messages in actual use.
mypy/main.py
Outdated
|
||
messages = [] | ||
|
||
def flush_errors(a: List[str], serious: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename a
to something longer?
@@ -1973,6 +2016,10 @@ def write_cache(self) -> None: | |||
def dependency_priorities(self) -> List[int]: | |||
return [self.priorities.get(dep, PRI_HIGH) for dep in self.dependencies] | |||
|
|||
def generate_unused_ignore_notes(self) -> None: | |||
if self.options.warn_unused_ignores: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you've made this effectively into a per-module option, please add it to the list of such in options.py.
@@ -90,15 +93,17 @@ class Errors: | |||
current error context (nested imports). | |||
""" | |||
|
|||
# List of generated error messages. | |||
error_info = None # type: List[ErrorInfo] | |||
# Map from files to generated error messages. Is an OrderedDict so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be safer to use the module ID rather than the file as the key? Because add_error_info()
doesn't call remove_path_prefix()
. And IIRC sometimes different passes have different ideas about the filename (normalized or not). However the old code makes the same assumption about error_files
I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all error infos have a module, unfortunately.
mypy/errors.py
Outdated
F438
|
||
def raise_error(self) -> None: | ||
"""Raise a CompileError with the generated messages. | ||
|
||
Render the messages suitable for displaying. | ||
""" | ||
raise CompileError(self.messages(), | ||
# self.new_messages() will format all messages that haven't already | ||
# been returned from a new_module_messages() call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/new_module_messages/???/
@@ -511,6 +547,12 @@ class CompileError(Exception): | |||
|
|||
It can be a parse, semantic analysis, type check or other | |||
compilation-related error. | |||
|
|||
CompileErrors raised from an errors object carry all of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is very helpful. But perhaps a form of it would also be useful in the except
clause in build.build
, where the logic had me baffled for a bit.
mypy/test/testerrorstream.py
Outdated
alt_lib_path=test_temp_dir, | ||
flush_errors=flush_errors) | ||
except CompileError as e: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you at least assert that there are no messages in the error object?
test-data/unit/check-kwargs.test
Outdated
pass | ||
|
||
[out] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comment (--
) explaining why the errors appear out of order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I am happy with the code now. Can you update the docs per my suggestion?
@@ -34,6 +34,7 @@ class Options: | |||
"show_none_errors", | |||
"warn_no_return", | |||
"warn_return_any", | |||
"warn_unused_ignores", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now the docs also need to be updated (it has separate sections for global and per-module flags).
Congrats! A nice piece of work. I'll merge the other thing too as soon as I remember what it was. |
This is breaking pytest integration, any chance of a bugfix release any time soon? |
@marcintustin Please file a separate bug report with more details. |
@gvanrossum I'm not saying that this PR is causing a problem. I'm saying that the lack of a release including this PR is causing a problem. You still want a separate bug report? |
There has been a release since this was merged, so I'm not sure what you're referring to. |
@JelleZijlstra @gvanrossum Well I suspect then that I've misunderstood the chatter on the issue where this is linked as the cause of the pytest integration problem. :( Apologies; I'll chase up on the integration first. |
@marcintustin this PR broke things as filed in #4681. That issue is missing that it causes errors in pytest-mypy. |
@ethanhs You're quite right. Thanks for clarifying. |
In order to avoid duplicate error messages for errors produced in both
load_graph() and process_graph() and to prevent misordered error
messages in a number of places, lists of error messages are now
tracked per-file.
These lists are collected and printed out when a file is complete. To
maintain consistency with clients that use .messages() (namely,
tests), messages are generated file-at-a-time even when not printing
them out incrementally.
Fixes #1294