8000 Feature/decimal support by GoEddie · Pull Request #982 · dotnet/spark · GitHub
[go: up one dir, main page]

Skip to content

Feature/decimal support #982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8000
Prev Previous commit
Next Next commit
test
  • Loading branch information
GOEddieUK committed Oct 14, 2021
commit 4c88ece7f3d86b2e5bdae00ca1bbc936bacc615c
64 changes: 64 additions & 0 deletions src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/DataTypesTests.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Types;
using Xunit;

namespace Microsoft.Spark.E2ETest.IpcTests
{

[Collection("Spark E2E Tests")]
public class DataTypesTests
{
private readonly SparkSession _spark;

public DataTypesTests(SparkFixture fixture)
{
_spark = fixture.Spark;
}

/// <summary>
///
/// </summary>
[Fact]
public void TestDecimalType()
{
var df = _spark.CreateDataFrame(
new List<GenericRow>
{
new GenericRow(
new object[]
{
decimal.MinValue, decimal.MaxValue, decimal.Zero, decimal.MinusOne,
new object[]
{
decimal.MinValue, decimal.MaxValue, decimal.Zero, decimal.MinusOne
}
}),
},
new StructType(
new List<StructField>()
{
new StructField("min", new DecimalType(38, 0)),
new StructField("max", new DecimalType(38, 0)),
new StructField("zero", new DecimalType(38, 0)),
new StructField("minusOne", new DecimalType(38, 0)),
new StructField("array", new ArrayType(new DecimalType(38,0)))
}));

Row row = df.Collect().First();
Assert.Equal(decimal.MinValue, row[0]);
Assert.Equal(decimal.MaxValue, row[1]);
Assert.Equal(decimal.Zero, row[2]);
Copy link
Contributor
@cutecycle cutecycle May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gotten to dive deep into whether this is an issue yet, but want to bring it to attention just in case:

There was a time when we were comparing SQL Server output to Spark SQL output trying to migrate a pipeline to Synapse, and when attempting to diff two tables, found an issue with a double.

SQL Server uses, presumably, C#'s (and JavaScript, which the Python Notebook table preview in Synapse uses)'s conception of floats: -0.0 == 0.0, but the JVM/Spark in some cases compares by bit and differentiates because of the signed bit: -0.0 != 0.0.
image
image
image

It's resolved in later versions of Spark's DataFrames, and may not apply in the case of [decimal]String, so it may not be problematic.

Copy link
Contributor
@cutecycle cutecycle Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because internally, BigDecimal uses BigInteger, and BigInteger also only has a single concept of zero. A BigInteger behaves as a two's-complement integer, and two's-complement only has a single zero.

Assert.Equal(decimal.MinusOne, row[3]);
Assert.Equal(new object[]{decimal.MinValue, decimal.MaxValue, decimal.Zero, decimal.MinusOne},
row[4]);
}

}
}
0