-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Insights: apache/datafusion
Overview
Could not load contribution data
Please try again later
42 Pull requests merged by 22 people
-
Fix Partial Sort Get Slice Point Between Batches
#16881 merged
Jul 27, 2025 -
Implement Helpers for ScopedTimerGuard and Time Structs
#16911 merged
Jul 27, 2025 -
feat: Add
ScalarValue::{new_one,new_zero,new_ten,distance}
support forDecimal128
andDecimal256
#16831 merged
Jul 27, 2025 -
minor: Rename add_spm_on_top as add_merge_on_top
#16913 merged
Jul 27, 2025 -
minor: implement with_new_expressions for AggregateFunctionExpr
#16897 merged
Jul 26, 2025 -
minor: add is_superset() method for Interval's
#16895 merged
Jul 26, 2025 -
fix:
PlaceholderRowExec::partition_statistics
#16851 merged
Jul 26, 2025 -
ScalarValue Default + Min + Max
#16891 merged
Jul 26, 2025 -
fix(datafusion-proto): support serializing/deserilizing ArrowFormat tables
#16875 merged
Jul 26, 2025 -
feat(spark): implement Spark datetime function last_day
#16828 merged
Jul 26, 2025 -
Mutable Join Unwind
#16883 merged
Jul 25, 2025 -
Ensure Substrait consumer can handle expressions in VirtualTable
#16857 merged
Jul 25, 2025 -
Derive UDF equality from PartialEq, Hash
#16842 merged
Jul 25, 2025 -
chore(deps): bump aws-config from 1.8.2 to 1.8.3
#16912 merged
Jul 25, 2025 -
feat(spark): Implement Spark
string
functionluhn_check
#16848 merged
Jul 25, 2025 -
Fixes 3 bugs during serialization and deserialization of physical plans
#16858 merged
Jul 25, 2025 -
Support utf8view for spark hex
#16885 merged
Jul 25, 2025 -
remove deprecated methods from FileScanConfig / DataSourceExec
#16901 merged
Jul 24, 2025 -
MINOR: add unit tests for chr function
#16856 merged
Jul 24, 2025 -
Improve async_udf example and docs
#16846 merged
Jul 24, 2025 -
test: fix more flaky join tests
#16880 merged
Jul 24, 2025 -
Report error when
SessionState::sql_to_expr_with_alias
does not consume all input#16811 merged
Jul 24, 2025 -
chore(deps): bump rand from 0.9.1 to 0.9.2
#16882 merged
Jul 24, 2025 -
test: Fix flaky join tests
#16860 merged
Jul 24, 2025 -
chore(deps): bump aws-config from 1.8.1 to 1.8.2
#16864 merged
Jul 23, 2025 -
chore(deps): bump serde_json from 1.0.140 to 1.0.141
#16863 merged
Jul 23, 2025 -
fix(build-wasm): put
arrow-ipc/zstd
dep undercompression
feature#16844 merged
Jul 23, 2025 -
[main] Update version to 49.0.0, add 49.0.0 changelog
#16855 merged
Jul 23, 2025 -
feat: improve LiteralGuarantee for the case like
(a=1 AND b=1) OR (a=2 AND b=3)
#16762 merged
Jul 23, 2025 -
feat: Allow tree explain format width to be customizable
#16827 merged
Jul 22, 2025 -
chore(deps): bump aws-credential-types from 1.2.3 to 1.2.4
#16815 merged
Jul 22, 2025 -
[branch-49] Final Changelog Tweaks
#16852 merged
Jul 22, 2025 -
chore(deps): bump sysinfo from 0.35.2 to 0.36.1
#16850 merged
Jul 22, 2025 -
Fix flaky test case in joins.slt
#16849 merged
Jul 22, 2025 -
[branch-49] chore: use
equals_datatype
forBinaryExpr
. Cherry pick to DF 49.0#16847 merged
Jul 22, 2025 -
Add note to upgrade guide about MSRV update
#16845 merged
Jul 22, 2025 -
Simplify try cast expr evaluation
#16834 merged
Jul 22, 2025 -
docs: Fix broken links
#16839 merged
Jul 21, 2025 -
Add benchmark for ByteViewGroupValueBuilder
#16826 merged
Jul 21, 2025 -
refactor(examples): remove redundant call to create directory in
parquet_embedded_index.rs
#16825 merged
Jul 21, 2025 -
fix(docs): Update broken links to
TableProvider
docs#16830 merged
Jul 21, 2025 -
cache generation of dictionary keys and null arrays for ScalarValue
#16789 merged
Jul 20, 2025
27 Pull requests opened by 16 people
-
Fix integration tests not running
#16835 opened
Jul 21, 2025 -
Docs: Update Upgrading.md to reflect 49.0.0 is released
#16853 opened
Jul 22, 2025 -
Fix: unnest with alias reports error
#16854 opened
Jul 22, 2025 -
speedup `date_trunc` (~7x faster) in some cases
#16859 opened
Jul 22, 2025 -
dissallow pushdown of volatile PhysicalExprs
#16861 opened
Jul 23, 2025 -
Minor: Optimize byte view benchmark to add more groups and more testing cases.
#16862 opened
Jul 23, 2025 -
Perf: Optimize vectorized append function
#16876 opened
Jul 23, 2025 -
refactor `character_length` impl by unifying null handling logic
#16877 opened
Jul 23, 2025 -
optimize `initcap` function by avoiding memory allocation
#16878 opened
Jul 23, 2025 -
WIP: Rewrite NestedLoopJoin to limit intermediate size (up to 2X faster)
#16889 opened
Jul 24, 2025 -
fix: regex bench
#16890 opened
Jul 24, 2025 -
Add Fetch Property to OutputRequirementExec
#16892 opened
Jul 24, 2025 -
minor: Improve equivalence handling of joins
#16893 opened
Jul 24, 2025 -
minor format
#16898 opened
Jul 24, 2025 -
Add partial_sort.slt test for partially sorted data
#16900 opened
Jul 24, 2025 -
Make `AsyncScalarUDFImpl::invoke_async_with_args` consistent with `ScalarUDFImpl::invoke_with_args`
#16902 opened
Jul 24, 2025 -
feat: Use PartialSortExec when input data is sorted on prefix columns
#16905 opened
Jul 25, 2025 -
Fix create table by values with string, which doesn't respect `string_to_utf8view` config
#16906 opened
Jul 25, 2025 -
Equivalence::fields_eq() and PhysicalSortExpr Update for Projections
#16910 opened
Jul 25, 2025 -
We have now the CI ensure all doc strings remain formatted
#16916 opened
Jul 25, 2025 -
fix: `TrivialValueAccumulators` to ignore null states for `ignore nulls`
#16918 opened
Jul 25, 2025 -
feat(spark): implement Spark math function rint
#16924 opened
Jul 26, 2025 -
feat: support distinct for window
#16925 opened
Jul 26, 2025 -
Add ExplainMemory Trait for Memory Usage Reporting and Enhance Memory Debugging Tools
#16926 opened
Jul 26, 2025 -
Added Example for `Statistical Functions` in Docs
#16927 opened
Jul 26, 2025 -
Update release process
#16929 opened
Jul 27, 2025 -
fix error result in execute&pre_selection
#16930 opened
Jul 27, 2025
12 Issues closed by 5 people
-
Enhance support for types in ScalarValue
#16832 closed
Jul 27, 2025 -
Unable to serialize and deserialize scans using ArrowFormat
#16874 closed
Jul 26, 2025 -
[datafusion-spark] Implement Spark `datetime` function `last_day`
#16774 closed
Jul 26, 2025 -
Ensure Substrait consumer can handle expressions in VirtualTable
#16363 closed
Jul 25, 2025 -
Derive UDF (`ScalarUDFImpl`) equality from PartialEq, Hash
#16865 closed
Jul 25, 2025 -
[datafusion-spark] Implement Spark `string` function `luhn_check`
#16612 closed
Jul 25, 2025 -
Only 4 tpc-h queries have matching physical plans before serialization and after deserialization
#16772 closed
Jul 25, 2025 -
`SessionState::sql_to_expr` does not report unconsumed input
#16810 closed
Jul 24, 2025 -
Clang requirement when building for WebAssembly, `cc-rs` through `zstd`, fails
#16843 closed
Jul 23, 2025 -
Bloom filters are unused for certain where clause patterns (improve LiteralGuarantee)
#16697 closed
Jul 23, 2025 -
[Proposal] Support User-Defined Types (UDT)
#7923 closed
Jul 21, 2025 -
Code clean for new datafusion-cli streaming printing logic
#14886 closed
Jul 21, 2025
35 Issues opened by 16 people
-
Incorrect results from pre_selection_scatter when RHS is scalar
#16928 opened
Jul 26, 2025 -
[Doc Fix] : Missing Example for `Statistical Functions` in the Docs
#16923 opened
Jul 26, 2025 -
Make the max temp directory size (for spills) configurable through configuration API
#16922 opened
Jul 26, 2025 -
Make the temporary directory (for spills) configurable through configuration API
#16921 opened
Jul 26, 2025 -
Streaming Aggregate operator not being used in deduplication of pre-sorted Parquet files
#16919 opened
Jul 25, 2025 -
Consider deprecate or remove some physical expr helper functions
#16917 opened
Jul 25, 2025 -
Chore: format documentation examples
#16915 opened
Jul 25, 2025 -
Add a "col_case_preserved" helper function for creating Columns with the case preserved
#16914 opened
Jul 25, 2025 -
Validate the memory consumption in `SortPreservingMergeStream`
#16909 opened
Jul 25, 2025 -
Limit the max merge degree during re-spill in external sort
#16908 opened
Jul 25, 2025 -
Combine utilities in `SpillManager`
#16907 opened
Jul 25, 2025 -
Add a way to get what takes memory
#16904 opened
Jul 24, 2025 -
[DISCUSSION] Conditional Utf8View support for downstream projects
#16903 opened
Jul 24, 2025 -
Entire input is resorted when the data is partially sorted (not using `PartialSortExec`)
#16899 opened
Jul 24, 2025 -
Make `AsyncScalarUDFImpl::invoke_async_with_args` consistent with `ScalarUDFImpl::invoke_with_args`
#16896 opened
Jul 24, 2025 -
Remove `__unnest_placeholder` from result projection on queries with struct unnest.
#16894 opened
Jul 24, 2025 -
Window aggregates output order broken due to hash repartitioning
#16888 opened
Jul 24, 2025 -
`COUNT(DISTINCT ..)` is ignored in Widow Aggregates
#16887 opened
Jul 24, 2025 -
Discussion: DataFusion Improvement Proposal (DIPs) Process?
#16886 opened
Jul 24, 2025 -
Question about string to utf8view when creating table
#16884 opened
Jul 24, 2025 -
panic when running `regx` benchmark
#16879 opened
Jul 24, 2025 -
Replace `WindowUDFImpl::{equals,hash_value}` with `UdfHash`, `UdfEq` traits
#16873 opened
Jul 23, 2025 -
Replace `AggregateUDFImpl::{equals,hash_value}` with `UdfHash`, `UdfEq` traits
#16872 opened
Jul 23, 2025 -
Replace `ScalarUDFImpl::{equals,hash_value}` with `UdfHash`, `UdfEq` traits
#16871 opened
Jul 23, 2025 -
Implement PartialEq, Hash for all UDWFs (`WindowUDFImpl`)
#16870 opened
Jul 23, 2025 -
Implement PartialEq, Hash for all UDAFs (`AggregateUDFImpl`)
#16869 opened
Jul 23, 2025 -
Implement PartialEq, Hash for all UDFs (`ScalarUDFImpl`)
#16868 opened
Jul 23, 2025 -
Derive UDWF (`WindowUDFImpl`) equality from PartialEq, Hash
#16867 opened
Jul 23, 2025 -
Derive UDAF (`AggregateUDFImpl`) equality from PartialEq, Hash
#16866 opened
Jul 23, 2025 -
[DISCUSSION] Memory accounting model discussion
#16841 opened
Jul 21, 2025 -
CI: Check broken links in src doc comments
#16840 opened
Jul 21, 2025 -
Optimize concatenation of complex data type, such as list, struct
#16838 opened
Jul 21, 2025 -
[Bug] Aggregate + TopK fails when asc = false
#16837 opened
Jul 21, 2025 -
Missing data when inserting into MemTable
#16836 opened
Jul 21, 2025 -
datafusion seems to be single threaded regardless of the number of cores
#16833 opened
Jul 21, 2025
61 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: add multi level merge sort that will always fit in memory
#15700 commented on
Jul 25, 2025 • 27 new comments -
Address memory over-accounting in array_agg
#16816 commented on
Jul 22, 2025 • 4 new comments -
Add benchmark utility to profile peak memory usage
#16814 commented on
Jul 27, 2025 • 4 new comments -
Support multiple ordered `array_agg` aggregations
#16625 commented on
Jul 24, 2025 • 4 new comments -
feat: [datafusion-spark] Implement `next_day` function
#16780 commented on
Jul 27, 2025 • 3 new comments -
Adds script to detect breaking API changes/ semver
#16541 commented on
Jul 22, 2025 • 1 new comment -
POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4)
#16711 commented on
Jul 21, 2025 • 0 new comments -
Enable Projection Pushdown Optimization for Recursive CTEs
#16696 commented on
Jul 25, 2025 • 0 new comments -
Partially implement MATCH_RECOGNIZE for Advanced Pattern Matching
#16685 commented on
Jul 25, 2025 • 0 new comments -
chore(deps): Update sqlparser to 0.56
#16456 commented on
Jul 24, 2025 • 0 new comments -
Transform scalar correlated subqueries in Where to DependentJoin
#16174 commented on
Jul 25, 2025 • 0 new comments -
Semver-checks for all crate on merge and push
#16078 commented on
Jul 27, 2025 • 0 new comments -
Fix: common_sub_expression_eliminate optimizer rule failed
#16066 commented on
Jul 23, 2025 • 0 new comments -
Fix Correlated Subquery With Depth Larger Than One
#16060 commented on
Jul 23, 2025 • 0 new comments -
feat: add macros for DataFusionError variants
#15946 commented on
Jul 23, 2025 • 0 new comments -
pipe column orderings into pruning predicate creation
#15821 commented on
Jul 20, 2025 • 0 new comments -
Treat truncated parquet stats as inexact
#15976 commented on
Jul 21, 2025 • 0 new comments -
SQL logic tests for Run-End Encoded (REE)
#16715 commented on
Jul 22, 2025 • 0 new comments -
feat: Optimize `collect_left_input` processing
#16727 commented on
Jul 20, 2025 • 0 new comments -
Fix `next_up` and `next_down` behavior for zero float values
#16745 commented on
Jul 27, 2025 • 0 new comments -
Use tokio::task::coop::poll_proceed by default in CooperativeStream
#16748 commented on
Jul 27, 2025 • 0 new comments -
48.0.1
#16755 commented on
Jul 21, 2025 • 0 new comments -
feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size
#16771 commented on
Jul 23, 2025 • 0 new comments -
fix: skip predicates on struct unnest in PushDownFilter
#16790 commented on
Jul 24, 2025 • 0 new comments -
Add support for Float16 type in substrait
#16793 commented on
Jul 26, 2025 • 0 new comments -
chore(deps): bump the proto group with 2 updates
#16806 commented on
Jul 25, 2025 • 0 new comments -
chore(deps): bump substrait from 0.58.0 to 0.59.0
#16808 commented on
Jul 25, 2025 • 0 new comments -
chore(deps): bump on-headers and compression in /datafusion/wasmtest/datafusion-wasm-app
#16812 commented on
Jul 23, 2025 • 0 new comments -
Benchmark: Add micro-benchmark for Nested Loop Join operator
#16819 commented on
Jul 25, 2025 • 0 new comments -
Replace π-related bound constants with next_up/next_down
#16823 commented on
Jul 22, 2025 • 0 new comments -
feat(spark): implement Spark math function mod/pmod
#16829 commented on
Jul 21, 2025 • 0 new comments -
Advanced Interval Analysis
#14515 commented on
Jul 21, 2025 • 0 new comments -
Unnest struct expression can't be aliased
#12794 commented on
Jul 21, 2025 • 0 new comments -
Building project takes a *long* time (esp compilation time for `datafusion` core crate)
#13814 commented on
Jul 21, 2025 • 0 new comments -
Dependency conflict with rquest due to async-compression and xz2 linking to lzma
#15342 commented on
Jul 21, 2025 • 0 new comments -
Regression: `DataFrameWriteOptions::with_single_file_output` produces a directory
#13323 commented on
Jul 21, 2025 • 0 new comments -
Move code in `user_defined_plan.rs` to the `extending-operators` doc
#15774 commented on
Jul 21, 2025 • 0 new comments -
QUALIFY clause
#15485 commented on
Jul 22, 2025 • 0 new comments -
Physical plan pushdown for volatile predicates
#16545 commented on
Jul 22, 2025 • 0 new comments -
ScalarUDFImpl::equals default implementation is error-prone
#16677 commented on
Jul 23, 2025 • 0 new comments -
Upgrade to sqlparser 0.56.0
#16405 commented on
Jul 23, 2025 • 0 new comments -
Document the SQL dialect DataFusion attempts to follow
#13704 commented on
Jul 23, 2025 • 0 new comments -
Plan to replace `SchemaAdapter` with `PhysicalExprAdapter`
#16800 commented on
Jul 24, 2025 • 0 new comments -
Panic happens when adding a decimal256 to a float (SQLancer)
#16689 commented on
Jul 24, 2025 • 0 new comments -
RFC: What table provider features would be helpful in an example?
#16821 commented on
Jul 24, 2025 • 0 new comments -
`DataFusionError` leaks inner types to the user
#16805 commented on
Jul 24, 2025 • 0 new comments -
Support Extension Types / User Defined Types in DataFusion
#12644 commented on
Jul 24, 2025 • 0 new comments -
[Blog] Async Scalar User Defined Functions
#16525 commented on
Jul 25, 2025 • 0 new comments -
[Epic]: Google Summer of Code 2025 Correlated Subquery Support
#16059 commented on
Jul 25, 2025 • 0 new comments -
[BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion
#16756 commented on
Jul 25, 2025 • 0 new comments -
[EPIC] Complete `datafusion-spark` Spark Compatible Functions
#15914 commented on
Jul 26, 2025 • 0 new comments -
Release DataFusion `50.0.0` (Aug/Sep 2025)
#16799 commented on
Jul 26, 2025 • 0 new comments -
Implement `partition_statistics` API for more operators
#15873 commented on
Jul 26, 2025 • 0 new comments -
Add a "Gentle Introduction to Arrow / Record Batches"
#11336 commented on
Jul 26, 2025 • 0 new comments -
Release DataFusion `49.0.0` (July 2025)
#16235 commented on
Jul 27, 2025 • 0 new comments -
Integration tests are not being run
#16801 commented on
Jul 27, 2025 • 0 new comments -
feat: metadata columns
#14057 commented on
Jul 27, 2025 • 0 new comments -
feat: Support On-Demand Repartition
#14411 commented on
Jul 24, 2025 • 0 new comments -
[POC] feat: Add datafusion-storage
#15018 commented on
Jul 24, 2025 • 0 new comments -
feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl
#15022 commented on
Jul 25, 2025 • 0 new comments -
Shell script to collect benchmarks for multiple versions
#15144 commented on
Jul 27, 2025 • 0 new comments