10000 Backend Architecture Rework by Nucs · Pull Request #299 · SciSharp/NumSharp · GitHub
[go: up one dir, main page]

Skip to content

Backend Architecture Rework #299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 30, 2019
Merged

Backend Architecture Rework #299

merged 11 commits into from
Jun 30, 2019

Conversation

Nucs
Copy link
Member
@Nucs Nucs commented Jun 28, 2019

This is a draft PR or a work in progress and is here for you all to review.
The changes are for the most part non-breaking.
Unfortunately all changes are in a single commit because the designing progress was more random than one-by-one making it difficult to commit changes as I go.
The current failing tests are expected and are part of the todo. All tests before this PR are passing.

I've spent a good amount of time micro-optimizing what I can - benchmarks were also added and many more were done locally.

All the changes were reviewed by me for at-least 6 times before this PR but it doesn't mean it is all perfect - Fresh eyes are needed.

These are most of the changes I've documented - I'm more than sure I've missed some.
Even though all unit-tests are passing, there are still problems which can be found by searching for "//todo!"

There was a lot of confusion and I dare to even say chaos regarding how every function affects the current object's state.
I'm refering to mainly NDArray, IStorage/TypedArrayStorage and their methods / interfaces.
To combat that, I've renamed SetData(Array) that intends to replace the internal array to ReplaceData.
And more importantly I've added documentations to every method that tells you exactly how it'll affect the object. (e.g. change shape or replace internal array or cause to copy data and so on).

Creating regen templating was a good choice
I ended up generating over at-least 15k lines of repetitive code using it.
Search for _REGEN for the templates that were used for generation.
The templates are visible in the code along with the generated code - you can easily make small changes and regenerate.

The Changes

  • Rewritten Most of TypedArrayStorage.

  • Rewritten some of ViewStorage.

    • Any non-supported methods were changed to implicit implementation.
  • SetData(Array) has been renamed to ReplaceData(Array) which is more explicit about what it does.
    SetData still has other overloads to set index-specific data.

  • Rewritten NDArray.Scalar(object value, Type dtype) to use Arrays

  • NDArray has been refactored and changes were made in the constructors but it was mostly untouched - rewriting it is next.

  • Added enum NPTypeCode which represents all possible types on numpy.

  • Added ReplaceData(arr) that replaces contents on an array.
    SetData is used for setting

  • Changed cast of NDArray to Array from Implicit to explicit.

  • Added array element type unpacked to NPTypeCode.GetTypeCode(Type)

  • Added Utils/ArrayConvert.cs - Allows fast conversion of non-generic arrays - support NPTypeCode.

  • Added Utils/Arrays.cs - "Extension" of Array static methods - support NPTypeCode.

  • Added Utils/NonGenericConvert.cs - Ano 8000 nymous (by object) convert - supports NPTypeCode.

  • Added Utils/TypelessConvert.cs

  • Replaced all usages of Array.CreateInstance(...) with Arrays.Create(...) that uses NPTypeCode (faster)

  • Added NDArray.AsGeneric() that will return NDArray only if T == dtype, otherwise null.

  • Fixed NdArray.MakeGeneric to also copy TensorEngine.

  • Removed NdArray.ARange, NdArray.Ones (Part of transferring all functions to an np class, more to come in the next PR)

  • NDArray.Array was changed to internal, Making it public will cause breaking code when user uses a different backend that'll most likely use byte[] in pinned unmanaged memory instead of expected dtype[].

  • Added Shape.Clone().

  • Added Assembly/Properties.cs with assembly: InternalsVisibleTo attributes to allow unit testing and benchmarking internal properties.

  • Added BackendType.Default

  • All as casts were changed to direct cast (after benchmarking).

  • Added NDArray.Scalar(NPTypeCode).

  • Added various benchmarks that helped to make design decisions.

  • DefaultEngine was changed from abstract class to regular class - virtuals should be avoided due to performance impact if one of the methods are for many iterations.

  • All new TypedArrayStorage(...) were changed to use ITensorEngine to create it.

  • Removed ArrayStorage

  • Removed StorageType.Array enum's value

  • BackendFactory.GetStorage has been moved to ITensorEngine.

  • BackendFactory.GetEngine argument's default was changed to: (BackendType backendType = BackendType.Default)

  • ITensorEngine (in NDArray, is transparent for the user and cannot be set outside of the library.

  • Removed NPTypeCode.Object - decided to drop off support for dtype of Object due to performance impact.

  • Added nuget package of FluentAssertions to NumSharp.UnitTest.

  • Added missing np dtypes (e.g. np.complex_ and more)

Breaking Changes

  • Changed NDArray implicit cast from T[] to explicit.
  • Shape.ReShape was renamed to Shape.Reshape
  • NDArray.Array was changed to internal, Making it public will cause breaking code when user uses a different backend that'll most likely use byte[] in pinned unmanaged memory instead of expected dtype[].
  • Changed cast of NDArray to Array from Implicit to explicit.
  • SetData(Array) has been renamed to ReplaceData(Array) which is more explicit about what it does.
    SetData still has other overloads to set index-specific data.

TODO for this PR

  • [V] Rewrite NDArray (Just the main class for now).
  • [V] Handle the many //todo! comments.
  • [V] Ensure related issues have unit test and are passing.

Nucs added 5 commits June 26, 2019 14:54
Which is a an equivalent of Type.TypeCode but also contains the rest of supported features by numpy
- Rewritten Most of TypedArrayStorage.
- Rewritten some of ViewStorage.
	- Any non-supported methods were changed to implicit implementation.
- SetData(Array) has been renamed to 
8000
ReplaceData(Array) which is more explicit about what it does.
	SetData still has other overloads to set index-specific data.
- Rewritten NDArray.Scalar(object value, Type dtype) to use Arrays
- NDArray has been refactored and changes were made in the constructors but it was mostly untouched - rewriting it is next.
- Added enum NPTypeCode which represents all possible types on numpy.
- Added ReplaceData(arr) that replaces contents on an array.
    SetData is used for setting
- Changed cast of NDArray to Array from Implicit to explicit.

- Added array element type unpacked to NPTypeCode.GetTypeCode(Type)

- Added Utils/ArrayConvert.cs - Allows fast conversion of non-generic arrays - support NPTypeCode.
- Added Utils/Arrays.cs - "Extension" of `Array` static methods - support NPTypeCode.
- Added Utils/NonGenericConvert.cs - Anonymous (by object) convert - supports NPTypeCode.
- Added Utils/TypelessConvert.cs
- Replaced all usages of Array.CreateInstance(...) with Arrays.Create(...) that uses NPTypeCode (faster)

- Added NDArray.AsGeneric<T>() that will return NDArray<T> only if T == dtype, otherwise null.
- Fixed NdArray.MakeGeneric to also copy TensorEngine.
- Removed NdArray.ARange, NdArray.Ones (Part of transferring all functions to an np class, more to come in the next PR)
- NDArray.Array was changed to internal, Making it public will cause breaking code when user uses a different backend that'll most likely use byte[] in pinned unmanaged memory instead of expected dtype[].
- Added Shape.Clone().
- Added Assembly/Properties.cs with `assembly: InternalsVisibleTo` attributes to allow unit testing and benchmarking internal properties.

- Added BackendType.Default
- All `as` casts were changed to direct cast (after benchmarking).
- Added NDArray.Scalar(NPTypeCode).
- Added various benchmarks that helped to make design decisions.

- DefaultEngine was changed from abstract class to regular class - virtuals should be avoided due to performance impact if one of the methods are for many iterations.
- All new TypedArrayStorage(...) were changed to use ITensorEngine to create it.

- Removed ArrayStorage
- Removed StorageType.Array enum's value
- BackendFactory.GetStorage has been moved to ITensorEngine.
- BackendFactory.GetEngine argument's default was changed to: (BackendType backendType = BackendType.Default)
- ITensorEngine (in NDArray, is transparent for the user and cannot be set outside of the library.
- Removed NPTypeCode.Object - decided to drop off support for dtype of Object due to performance impact.

- Added nuget package of FluentAssertions to NumSharp.UnitTest.
@Nucs
Copy link
Member Author
Nucs commented Jun 28, 2019

Related issues that are either fixed or are WIP that'll will be fixed at the end of this PR:
#293, #290, #271, #129, #75, #11, #294,

Related Discussions:
#284

@Oceania2018
Copy link
Member

@Nucs
image

@Oceania2018 Oceania2018 marked this pull request as ready for review June 29, 2019 00:05
@Oceania2018
Copy link
Member

Test in TensorFlow.NET

image

Copy link
Member
@Oceania2018 Oceania2018 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test it in TensorFlow.NET. Should be same result as before.

image

Should be:
image

@Nucs
8000 Copy link
Member Author
Nucs commented Jun 29, 2019

@Oceania2018, this is due to the following change:

- SetData(Array) has been renamed to ReplaceData(Array) which is more explicit about what it does.
     SetData still has other overloads to set index-specific data."

It doesn't break your code because there is still a SetData(Array, params int[] indicies) which causes it to throw at the length check of indicies.

This is a breaking change. How do you want to handle it?
I've also added a section of all breaking changes I've encountered by now.

image

@Nucs
Copy link
Member Author
Nucs commented Jun 29, 2019

Some issues are related to NDArray with dtype of NDArray.
There are no unit tests in NumSharp related to their use case but they are used in Tensorflow.NET and cause exceptions.
I'll review how they are used in TF.NET and apply changes to NumSharp based on that and add unit tests.

@Oceania2018
Copy link
Member

image

@Oceania2018 Oceania2018 requested review from Oceania2018 and removed request for kerryjiang June 29, 2019 02:15
@Oceania2018 Oceania2018 removed the request for review from deepakkumar1984 June 29, 2019 02:18
Nucs added 2 commits June 29, 2019 17:34
Copy link
Contributor
@henon henon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sure is orders of magnitudes better than it was before in terms of comments, consistency and coding style. Heads up!

@Oceania2018 Oceania2018 merged commit 3a56181 into master Jun 30, 2019
@Nucs Nucs changed the title [WIP] Backend Architecture Rework Backend Architecture Rework Jun 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0