8000 Fix problems with DataFrame WriteCsv when quotes are present in data by dakersnar · Pull Request #6340 · dotnet/machinelearning · GitHub
[go: up one dir, main page]

Skip to content

Fix problems with DataFrame WriteCsv when quotes are present in data #6340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 4, 2022

Conversation

dakersnar
Copy link
Contributor

Fixes #6238, continuation of #6303

There were no issues with Load, but Writing back to CSV didn't properly escape quotation marks within the data, preventing the CSV from being correctly loaded again.

@codecov
Copy link
codecov bot commented Sep 28, 2022

Codecov Report

Merging #6340 (d7dac25) into main (09bbb19) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6340      +/-   ##
==========================================
+ Coverage   68.58%   68.59%   +0.01%     
==========================================
  Files        1171     1171              
  Lines      247916   248036     +120     
  Branches    25742    25736       -6     
==========================================
+ Hits       170022   170151     +129     
+ Misses      71134    71125       -9     
  Partials     6760     6760              
Flag Coverage Δ
Debug 68.59% <100.00%> (+0.01%) ⬆️
production 62.97% <100.00%> (+<0.01%) ⬆️
test 89.13% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Microsoft.Data.Analysis/DataFrame.IO.cs 81.26% <100.00%> (+0.10%) ⬆️
...Microsoft.Data.Analysis.Tests/DataFrame.IOTests.cs 98.89% <100.00%> (+0.08%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 85.40% <0.00%> (-0.26%) ⬇️
src/Microsoft.Data.Analysis/TextFieldParser.cs 73.44% <0.00%> (+0.50%) ⬆️
...soft.ML.Transforms/Text/WordEmbeddingsExtractor.cs 86.87% <0.00%> (+1.13%) ⬆️
src/Microsoft.ML.Data/TrainCatalog.cs 82.70% <0.00%> (+2.53%) ⬆️
test/Microsoft.ML.AutoML.Tests/AutoFitTests.cs 82.48% <0.00%> (+2.91%) ⬆️

Copy link
Member
@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one stray using statement that I think can be removed.

@dakersnar
Copy link
Contributor Author

@eerhardt @michaelgsharp Looks like build is failing because the raw string literals are a "preview feature". Should we add <LangVersion>preview</LangVersion> to a csproj file to resolve this, or should I revert back to the previous version of the unit tests? See a2de77d for the commit in question.

@eerhardt
Copy link
Member
eerhardt commented Oct 3, 2022

Should we add <LangVersion>preview</LangVersion> to a csproj file to resolve this

This would be my preference. It is just to the test project, so it shouldn't affect anything else.

@dakersnar dakersnar merged commit 7d764bb into dotnet:main Oct 4, 2022
@dakersnar dakersnar deleted the fix-6238 branch October 4, 2022 21:02
@dakersnar dakersnar added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Oct 10, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DataFrame] Allow quote escaping
2 participants
0