Description
Overview
Multiple people have expressed interest in working with DataFusion Python for a Google Summer of Code project. We are excited to have this level of interest and we always welcome contributions from the community.
The goal of this issue is to have a coordination point for people interested in working on and mentoring this project for GSoC. We would like to collect specific ideas for projects. This is to help those applicants pick something interesting that is achievable in the time frame of GSoC.
This is a subproject under the greater Apache DataFusion GSoC: apache/datafusion#14478
Related Issues
These issues have all been identified as candidates for inclusion in a GSoC project. Feel free to suggest others. Ideally we would combine some of these issues into a larger unified goal.
- Create more user friendly aliases from
col
#754 - Add remaining non-wrapped functions #767
- Expose additional regexp functions #803
- Add udf / udaf decorators #806
- RFC: Ideas for what to include in a user tutorial #842
- Change naming of rust exposed structs to ease debugging #853
- Make all read methods available on DataFusion module #918
- how to use datafusion-contrib through the python bindings #920
- Add DataFrame fill_nan/fill_null #922
Additionally, I know there is interest in working on integrations with Iceberg and other Table Providers. That work is not directly within this repository, but it does fall within the purview of the GSoC project in my opinion.