-
Notifications
You must be signed in to change notification settings - Fork 1.3k
【Need Update csv.rs】Update csv.py and test_csv.py from CPython v3.12 #5176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please, go ahead and update the behavior of |
Does the current csv implementation not incorporate the dialect feature? |
I don't remember the exact state. Since test_csv didn't exist, I guess our csv implementation was very immature. |
I have done some work to make the csv module compatible with test_csv.py, but due to my unfamiliarity with the API, the current version contains too much template code and redundant operations. It looks like the implementation is a bit messy. I will try to optimize the code and submit a new commit. I will close the current commit because I have a few more commits to make. Here are the test results.
|
Currently, I have adapted most of the behaviors specified by the Python CSV module, but there are still 18 test cases failing, which I have marked as 'failed.' The functionality of the module should be working as expected at the moment. By the way, I have used modules from std in the CSV module, such as Hashmap and Mutex, and initialized using Onecell. I'm not sure if this is feasible, or if I should switch to using PyMutex instead? |
@youknowone have a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for many efforts to add a new test and implementing a many features in rust native module!
I'm not sure if this is feasible, or if I should switch to using PyMutex instead?
PyMutex is single/multi thread support helpers. You can easilty turns parking_lot::Mutex to PyMutex. (but not std Mutex) I thinkPyMutex
fits here. Otherwise we prefer to useparking_lot::PyMutex
rather than std Mutex
The most of changes looks really good, but I have a bit of concerns about using intern_str
. Could you check if they are intended to be actually interned or not?
stdlib/src/csv.rs
Outdated
fn doublequote(&self, vm: &VirtualMachine) -> PyRef<PyInt> { | ||
vm.ctx.new_bool(self.doublequote).to_owned() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type automatically be able to be turned into PyObjectRef.
So this is possible:
fn doublequote(&self, vm: &VirtualMachine) -> PyRef<PyInt> { | |
vm.ctx.new_bool(self.doublequote).to_owned() | |
} | |
fn doublequote(&self, vm: &VirtualMachine) -> bool { | |
self.doublequote | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
stdlib/src/csv.rs
Outdated
fn lineterminator(&self, vm: &VirtualMachine) -> PyRef<PyStr> { | ||
match self.lineterminator { | ||
Terminator::CRLF => vm.ctx.intern_str("\r\n".to_string()).to_owned(), | ||
Terminator::Any(t) => vm.ctx.intern_str(format!("{}", t as char)).to_owned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this t
static value or dynamic value?
intern_str
interns static strings. If this is a dynamic value,
Terminator::Any(t) => vm.ctx.intern_str(format!("{}", t as char)).to_owned(), | |
Terminator::Any(t) => vm.ctx.new_str(format!("{}", t as char)).to_owned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I didn't notice this API when I was looking through the API documentation. I was still wondering why there wasn't a basic API for creating dynamic strings.
stdlib/src/csv.rs
Outdated
s @ PyStr => { | ||
Ok(if s.as_str().bytes().eq(b"\r\n".iter().copied()) { | ||
csv_core::Terminator::CRLF | ||
} else if let Some(t) = s.as_str().bytes().next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if let Some(t) = s.as_str().bytes().next() { | |
} else if let Some(t) = s.as_str().as_bytes().first() { |
Does this check require to ensure s.len() == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this modification may be unnecessary. When s.len()<1, this function will throw an error, propagating the error to the upper layer, which conforms to the expected behavior in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I need to get ownership of the first character, and because it is of type u8 I believe that copying it is inexpensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I am sorry to make confusion. The suggestion and question were not related. I thought creating an iterator was not necessary here (for suggestion), and worried if it requires to raise error when s.len() > 1 but it is missed (comment). *t
will be a copy for Copy types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I am sorry to make confusion. The suggestion and question were not related. I thought creating an iterator was not necessary here (for suggestion), and worried if it requires to raise error when s.len() > 1 but it is missed (comment).
*t
will be a copy for Copy types.
Oh, I see. Due to limitations in the current implementation within csv_core, the support for multiple characters in lineterminator is not complete.
Therefore, I intentionally ignored the case where s.len() > 1. I will add comments to explain this and thank you for your suggestion. I have already optimized the iterator part.
stdlib/src/csv.rs
Outdated
})?; | ||
let input = string.as_str().as_bytes(); | ||
8000 |
|
|
if input.is_empty() || input.starts_with(&[b'\n']) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if input.is_empty() || input.starts_with(&[b'\n']) { | |
if input.is_empty() || input.starts_with(b"\n") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review
Haha, I didn't notice this API when I was looking through the API documentation. I was still wondering why there wasn't a basic API for creating strings. I have modified part of the code, keeping the old code for multiple type returns and operations on one of the bytes. Now I have another question: is |
PyMutex doesn't seem to implement sync, so I switched back to parking_lot. |
Currently we don't support no-std. So that is ok while preparing it is still a good idea. |
Are there any requested changes that I need to address? |
e8c6c61
to
05b4ec4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I attached a commit with minor fixes
I accidentally made changes to |
Co-authored-by: Jeong, YunWon <jeong@youknowone.org>
Thank you so much! |
You're welcome. Actually, there's one more thing I'd like to discuss. Regarding |
We probably chose it as harvesting low-hanging fruit. The first csv module writers might not know about those limitation. Any reasonable suggestion will be appreciated. |
I've updated
csv.py
andtest_csv.py
, but version 3.12 has changed the import options, and__version__
now needs to be imported from_csv.c
(which is nowcsv.rs
). It seems thatcsv.rs
has not yet aligned with the behavior ofcpython
upstream. Should we consider updatingcsv.rs
or postponing this commit? Due to this reason, I'm still unable to testtest_csv
.