-
Notifications
You must be signed in to change notification settings - Fork 401
-
Star 4.3k < 10BC0 /div>
feat(python): add bfloat16 and bfloat16_array support#3329
feat(python): add bfloat16 and bfloat16_array support#3329asadjan4611 wants to merge 24 commits intoapache:mainapache/fory:mainfrom asadjan4611:feat/python-bfloat16-supportasadjan4611/fory:feat/python-bfloat16-supportCopy head branch name to clipboard
Conversation
- Add BFloat16 Cython type with IEEE 754 compliant conversions
- Add BFloat16Array class backed by array.array('H')
- Implement serializers for scalar and array types
- Register types in type resolver (TypeId.BFLOAT16 = 18, TypeId.BFLOAT16_ARRAY = 54)
- Add buffer read/write methods for bfloat16
- Add codegen support for bfloat16
- Add row format support (with temporary float16 mapping until C++ support)
- Add comprehensive test suite with 11 test cases covering all edge cases
- Follow existing float16 implementation patterns
Fixes apache#3289
- Change single quotes to double quotes (ruff format requirement) - Remove trailing whitespace - Add blank lines after imports (PEP 8) - Remove unused import (pyfory) - Fix closing parenthesis alignment
- Remove invalid Cython type casts (<BFloat16>) in serialization.pyx and primitive.pxi - Use isinstance() check instead of type casting for Python classes - Fix bfloat16() function to use float16() as temporary workaround until C++ support is added - Comment out bfloat16() declaration in libformat.pxd with TODO for future C++ implementation
Replace unsafe pointer casts with memcpy to ensure cross-platform compatibility across all OS versions (Windows, Linux, macOS) and architectures (x86_64, ARM). This fixes strict aliasing violations that cause compilation failures on ARM and newer compilers.
Replace sizeof(float) with explicit constant 4 in memcpy calls to ensure cross-platform compatibility, especially on ARM architectures where sizeof() may cause compilation issues. This matches the project's pattern of using explicit size constants (as seen in types.py). Fixes build failures on: - ubuntu-24.04-arm (aarch64) - macos-arm64 (Apple Silicon) - ubuntu-24.04-arm with Python 3.13
|
@chaokunyang please review my PR and this is very interesting Project and i learn a lot of things from this issue . |
|
@komamitsu @chaokunyang |
python/pyfory/registry.py
Outdated
| ) | ||
| register(float, type_id=TypeId.FLOAT64, serializer=Float64Serializer) | ||
| # BFloat16 is optional if the extension module is unavailable. | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should always be available, could you remove the tra excep clause
python/pyfory/registry.py
Outdated
| serializer=PyArraySerializer(self.fory, ftype, typeid), | ||
| ) | ||
| # BFloat16Array is optional if the extension module is unavailable. | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
python/pyfory/serialization.pyx
Outdated
| cpdef inline read_nullable_bfloat16(Buffer buffer): | ||
| if buffer.read_int8() == NOT_NULL_VALUE_FLAG: | ||
| from pyfory.bfloat16 import BFloat16 | ||
| return BFloat16.from_bits(buffer.read_bfloat16()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to create a bfloat.pxd, so we can import it in buffer.pyx and make buffer.read_bfloat16() return BFloat16 directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And could you rename BFloat16 to bfloat16? This is a primitive type, use lowercase name style make it looks like buildin
python/pyfory/serialization.pyx
Outdated
| return False | ||
|
|
||
|
|
||
| cdef class XlangCompatibleSerializer(Serializer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you merge main branch, we've removed the xwrite/xread API, and unified API in #3348
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This XlangCompatibleSerializer is not needed anymore
python/pyfory/serializer.py
Outdated
| self.type_id = type_id | ||
| self.itemsize = 2 | ||
|
|
||
| def xwrite(self, buffer, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for xwrite/xread, we don't haev such API anymore
python/pyfory/_serializer.py
Outdated
| return False | ||
|
|
||
|
|
||
| class XlangCompatibleSerializer(Serializer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this, we've removed it in #3348
…llection.pxi Cython cpdef functions do not support keyword arguments when called from C code. Changed all read_no_ref(buffer, serializer=...) calls to use positional arguments read_no_ref(buffer, serializer) instead.
…ruct.py and collection.py Cython cpdef functions do not support keyword arguments when called from C code. Changed all xwrite_ref, xread_ref, write_no_ref, and read_no_ref calls to use positional arguments instead of keyword arguments (serializer=...).
|
@chaokunyang I am facing one issue that some checks are not passing how can i handle it ? |
|
hey @asadjan4611, went through NaN becomes Infinity for a signaling NaN like fix — add NaN passthrough at the top (also note: cdef inline uint16_t float32_to_bfloat16_bits(float value) nogil:
cdef uint32_t f32_bits
cdef uint16_t bf16_bits
cdef uint16_t truncated
memcpy(&f32_bits, &value, 4)
if (f32_bits & 0x7FFFFFFF) > 0x7F800000:
return (<uint16_t>(f32_bits >> 16)) | 0x0040
bf16_bits = <uint16_t>(f32_bits >> 16)
truncated = <uint16_t>(f32_bits & 0xFFFF)
if truncated > 0x8000:
bf16_bits += 1
elif truncated == 0x8000 and (bf16_bits & 1):
bf16_bits += 1
return bf16_bits
def __hash__(self):
if (self._bits & 0x7FFF) == 0:
return hash(0)
return hash(self._bits)Happy to discuss if im misreading the flow here |
Why?
This PR implements bfloat16 (Brain Float 16) and bfloat16_array support for Fory Python runtime and codegen, addressing issue #3289. This enables using bfloat16 in FDL to reduce payload size while keeping a wide exponent range, which is common in ML/AI workflows.
What does this PR do?
This PR adds comprehensive bfloat16 support to Fory Python:
Core Implementation
array.array('H')for packed contiguous storageBFloat16Serializer) and array (BFloat16ArraySerializer) serializersIntegration Points
write_bfloat16()andread_bfloat16()methodsbfloat16()factory function (temporarily maps to float16 until C++ row format supports it)Testing
Code Quality
float16implementation patternsRelated issues
Does this PR introduce any user-facing change?
BFloat16,BFloat16Arraytypes andbfloat16()factory functionImplementation Details
Wire Format
Type System
Performance
array.array('H')