8000 DataFrame merge results in column datatype problem · Issue #6127 · dotnet/machinelearning · GitHub
[go: up one dir, main page]

Skip to content

DataFrame merge results in column datatype problem #6127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #6144
olavt opened this issue Mar 14, 2022 · 1 comment · Fixed by #6677
Closed
Tracked by #6144

DataFrame merge results in column datatype problem #6127

olavt opened this issue Mar 14, 2022 · 1 comment · Fixed by #6677
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Milestone

Comments

@olavt
Copy link
olavt commented Mar 14, 2022

.Net Core 3.1
Microsoft.Data.Analysis Nuget package version: 0.19.1

The last line of the following program crashes with the exception:

System.ArgumentException: 'Cannot cast column holding System.Double values to type System.Double'

using Microsoft.Data.Analysis;
using System;
using System.Linq;

namespace TestDataFRame
{
internal class Program
{
static void Main(string[] args)
{
DateTime?[] dates1 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03) };
double?[] closePrices = { 10.5, 12.4, 11.3 };

        DateTime?[] dates2 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03) };
        double[] shortPercentages = { 2.34, 2.36, 3.01 };

        DataFrame dataFrame1 = new DataFrame();
        dataFrame1.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates1));
        dataFrame1.Columns.Add(new DoubleDataFrameColumn("ClosePrice", closePrices));

        var numbers1 = dataFrame1.Columns.GetDoubleColumn("ClosePrice").ToArray();

        DataFrame dataFrame2 = new DataFrame();
        dataFrame2.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates1));
        dataFrame2.Columns.Add(new DoubleDataFrameColumn("ShortPercentage", shortPercentages));

        var numbers2 = dataFrame2.Columns.GetDoubleColumn("ShortPercentage").ToArray();

        DataFrame dataFrame = dataFrame1.Merge<DateTime>(dataFrame2, "Date", "Date", joinAlgorithm: JoinAlgorithm.Left);
        var numbers = dataFrame.Columns.GetDoubleColumn("ClosePrice").ToArray();
    }
}

}

@luisquintanilla luisquintanilla added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Mar 14, 2022
@michaelgsharp
Copy link
Contributor

Alright, the issue is that merge is calling "Clone" on the columns, but Clone returns slightly different types. For example, Instead of returning DoubleDataFrameColumn it is returning PrimitiveDataFrameColumn<double>. DoubleDataFrameColumn does extend PrimitiveDataFrameColumn<double> but they aren't the same type obviously. The problem then is that in the call to GetDoubleColumn, the check if (column is DoubleDataFrameColumn ret) fails because its not actually a DoubleDataFrameColumn anymore.

I not sure the exact best way to fix this from a code standpoint. Probably needs a little bit more investigation to figure out the best way. @luisquintanilla for visiblity.

@michaelgsharp michaelgsharp added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Mar 18, 2022
@michaelgsharp michaelgsharp added this to the ML.NET Future milestone Apr 11, 2022
@luisquintanilla luisquintanilla removed this from the ML.NET Future milestone Dec 1, 2022
@ghost ghost added the untriaged New issue has not been triaged label Dec 1, 2022
@luisquintanilla luisquintanilla added this to the ML.NET 3.0 milestone Dec 1, 2022
@ghost ghost removed the untriaged New issue has not been triaged label Dec 1, 2022
@ghost ghost added the in-pr label May 14, 2023
@ghost ghost removed the in-pr label Jun 13, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0