-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Insights: apache/spark
Overview
-
0 Active issues
-
- 0 Merged pull requests
- 76 Open pull requests
- 0 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
76 Pull requests opened by 50 people
-
[SPARK-52601][SQL] Support primitive types in TransformingEncoder
#51313 opened
Jun 28, 2025 -
[SPARK-46912][CORE] Using correct environment variables on workers of StandAlone cluster
#51314 opened
Jun 28, 2025 -
[MINOR][DOCS] Updated the docstring of DataStreamWriter.foreach() method
#51316 opened
Jun 29, 2025 -
[SPARK-52614][SQL] Support RowEncoder inside Product Encoder
#51319 opened
Jun 30, 2025 -
[WIP][SPARK-52622][PS] Avoid CAST_INVALID_INPUT of `DataFrame.melt` in ANSI mode
#51326 opened
Jul 1, 2025 -
[WIP][SQL][TESTS] Disable stable column aliases in tests if assumed
#51337 opened
Jul 1, 2025 -
[SPARK-52638][SQL] Allow preserving Hive-style column order to be configurable
#51342 opened
Jul 1, 2025 -
[SPARK-52640][SDP] Propagate Python Source Code Location
#51344 opened
Jul 1, 2025 -
[SPARK-52669][PySpark]Improvement PySpark choose pythonExec in cluster yarn client mode
#51357 opened
Jul 3, 2025 -
[SPARK-52646][PS] Avoid CAST_INVALID_INPUT of `__eq__` in ANSI mode
#51370 opened
Jul 4, 2025 -
[SPARK-52689][SQL] Send DML Metrics to V2Write
#51377 opened
Jul 4, 2025 -
[SPARK-52659][SQL]Misleading modulo error message in ansi mode
#51378 opened
Jul 5, 2025 -
[SPARK-52545][SQL][DOCS] Update string literal docs for quote escaping rules
#51379 opened
Jul 5, 2025 -
[SPARK-52617][SQL] Cast TIME to/from TIMESTAMP_NTZ
#51381 opened
Jul 5, 2025 -
approx_top_k_combine
#51393 opened
Jul 7, 2025 -
[IN PROGRESS] Support getting pod state using Informers/Listers
#51396 opened
Jul 8, 2025 -
[DRAFT] Parameter markers in DDL.
#51410 opened
Jul 9, 2025 -
[SPARK-52729][SQL] Add GENERAL_TABLE v2 table capacity
#51419 opened
Jul 9, 2025 -
[SPARK-52741][SQL] RemoveFiles ShuffleCleanup mode doesnt work with non-adaptive execution
#51432 opened
Jul 9, 2025 -
[WIP][SQL] Clarify schema mismatch types in insertInto error
#51446 opened
Jul 10, 2025 -
[SPARK-52767][SQL] Optimize the performance of maxRows for join and union
#51451 opened
Jul 11, 2025 -
[SPARK-52769][SQL] InjectRuntimeFilter should take into account join type and hints
#51453 opened
Jul 11, 2025 -
[SPARK-52777][SQL] Enable shuffle cleanup mode configuration in Spark SQL
#51458 opened
Jul 12, 2025 -
[SPARK-52457][SQL]ParseToDate/ParseToTimestamp can return incorrect value for TimestampNTZ
#51465 opened
Jul 13, 2025 -
[SPARK-52772][SQL] Inconsistent table attribute handling during updates
#51466 opened
Jul 13, 2025 -
[WIP][SQL] Incapsulate type operations
#51467 opened
Jul 13, 2025 -
[SPARK-52449][CONNECT][PYTHON][ML] Make datatypes for Expression.Literal.Map/Array optional
#51473 opened
Jul 14, 2025 -
[SPARK-52790][CORE] Introduce new grid testing method in SparkFunSuite
#51474 opened
Jul 14, 2025 -
[DO NOT REVIEW] temp
#51501 opened
Jul 15, 2025 -
[SPARK-52807][SDP] Proto changes to support analysis inside Declarative Pipelines query functions
#51502 opened
Jul 15, 2025 -
[SPARK-52798] [SQL] Add function approx_top_k_combine
#51505 opened
Jul 15, 2025 -
[SPARK-52813][CONNECT] Allow DAGs in Spark Connect
#51516 opened
Jul 16, 2025 -
[SPARK-52828][SQL] Make hashing for collated strings collation agnostic
#51521 opened
Jul 16, 2025 -
Add client env proto to spark connect client requests
#51529 opened
Jul 17, 2025 -
[WIP][SPARK-51169] Set up a daily job for Python 3.14
#51532 opened
Jul 17, 2025 -
[WIP][SPARK-52764][PYTHON][ML][CONNECT][TESTS] Retry flaky tests in `test_parity_classification`
#51535 opened
Jul 17, 2025 -
[SPARK-51415][SQL] Support the time type by make_timestamp()
#51543 opened
Jul 18, 2025 -
[SPARK-51554][SQL] Add the time_trunc() function
#51547 opened
Jul 18, 2025 -
Keep coverage data when running pip tests
#51552 opened
Jul 18, 2025 -
[SPARK-52621][SQL] Cast TIME to/from VARIANT
#51553 opened
Jul 18, 2025 -
[SPARK-52865][SQL]Remove usage of deprecated FileCommitProtocol.newTaskTempFile method
#51554 opened
Jul 18, 2025 -
[SPARK-52867][SQL] Remove redundant GetTimestamp
#51556 opened
Jul 18, 2025 -
[SPARK-52868][SQL] CBO: OOM-risky stats underestimation for some filters and sources
#51558 opened
Jul 18, 2025 -
added the file in readme
#51571 opened
Jul 19, 2025 -
[SPARK-52888][PYTHON] Implement the make_time function in PySpark
#51582 opened
Jul 20, 2025 -
[DRAFT][DO-NOT-REVIEW][SPARK-51XXX][SQL] Enable implicit cast from STRING to TIME type
#51583 opened
Jul 20, 2025 -
[SPARK-52889][PYTHON] Implement the current_time function in PySpark
#51584 opened
Jul 20, 2025 -
[SPARK-52890][SPARK-52891][PYTHON] Implement the to_time and try_to_time functions in PySpark
#51585 opened
Jul 20, 2025 -
[SPARK-51920][SS][PYTHON] Fix composite/nested type in value state for python
#51621 opened
Jul 22, 2025 -
[SPARK-52146][SQL] Detect cyclic function references in SQL UDFs
#51626 opened
Jul 23, 2025 -
[SPARK-52922][PS] Avoid CAST_INVALID_INPUT of "astype" in ANSI mode
#51627 opened
Jul 23, 2025 -
[SPARK-52923][CORE] Allow ShuffleManager to control push merge during shuffle registration
#51629 opened
Jul 23, 2025 -
[SPARK-52926][SQL] Added SQLMetric for remote schema fetching time duration.
#51633 opened
Jul 23, 2025 -
[SPARK-52943][PYTHON] Enable arrow_cast for all pandas UDF eval types
#51635 opened
Jul 23, 2025 -
[SPARK-52844][PYTHON][TESTS] Update Python test dependencies
#51636 opened
Jul 23, 2025 -
[SPARK-52936][INFRA][TESTS] Benchmark result update automation
#51643 opened
Jul 24, 2025 -
[SPARK-52937][SDP] Sinks
#51644 opened
Jul 24, 2025 -
[SPARK-52942][YARN][BUILD] YARN External Shuffle Service jar should include `scala-library`
#51650 opened
Jul 24, 2025 -
[SPARK-52930][CONNECT] Use DataType.Array/Map for Array/Map Literals
#51653 opened
Jul 24, 2025 -
[SPARK-52952][PYTHON] Add PySpark UDF Type Coercion Dev Script
#51663 opened
Jul 25, 2025 -
[SPARK-52953][SQL] Incorrect parameter order in some ExpressionEvalHelper.checkResult() method invocations
#51664 opened
Jul 25, 2025 -
[SPARK-52954][PYTHON] Arrow UDF support return type coercion
#51665 opened
Jul 25, 2025 -
[SPARK-52956][SQL] Preserve alias metadata when collapsing projects
#51667 opened
Jul 25, 2025 -
[SPARK-52931][Core] Restrict declare variable naming
#51669 opened
Jul 25, 2025 -
[WIP] Exclude pandas numpy operand in __eq__
#51670 opened
Jul 25, 2025 -
[SPARK-52960][SQL] Show subtree string in LogicalQueryStage toString
#51674 opened
Jul 26, 2025 -
[SPARK-52967][BUILD] Upgrade ORC to 2.2.0
#51676 opened
Jul 26, 2025 -
[SPARK-52944][CORE][SQL][YARN][TESTS][3.5] Fix invalid assertions in tests
#51677 opened
Jul 27, 2025 -
[SPARK-52968][SS] Emit additional state store metrics
#51679 opened
Jul 28, 2025 -
[SPARK-52853][TESTS][FOLLOW-UP] Import SDP module when connect dependencies are available
#51680 opened
Jul 28, 2025
31 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[SPARK-52582][SQL] Improve the memory usage of XML parser
#51287 commented on
Jul 13, 2025 • 57 new comments -
[SPARK-52495][SQL] Allow including partition columns in the single variant column
#51206 commented on
Jul 10, 2025 • 15 new comments -
[SPARK-52407][SQL] Add support for Theta Sketch
#51298 commented on
Jul 28, 2025 • 9 new comments -
[SPARK-48359][SQL] Built-in functions for Zstd compression and decompression
#46672 commented on
Jul 8, 2025 • 8 new comments -
[SPARK-52593][PS] Avoid CAST_INVALID_INPUT of `Series.dot` and `DataFrame.dot` in ANSI mode
#51310 commented on
Jul 25, 2025 • 6 new comments -
[SPARK-42746][SQL] Fix optimizer failure for SortOrder in the LISTAGG function
#51117 commented on
Jul 17, 2025 • 6 new comments -
Enable -Xsource:3 compiler flag
#50474 commented on
Jul 10, 2025 • 5 new comments -
[SPARK-52444][SQL][CONNECT] Add support for Variant/Char/Varchar Literal
#51215 commented on
Jul 26, 2025 • 5 new comments -
[SPARK-51400] Replace ArrayContains nodes to InSet
#50170 commented on
Jul 21, 2025 • 4 new comments -
[SPARK-52858][INFRA] Retry SBT compilation when OOM
#51149 commented on
Jul 21, 2025 • 3 new comments -
[SPARK-52598][DOCS] Reorganize Spark Connect programming guide
#51305 commented on
Jul 15, 2025 • 1 new comment -
[SPARK-22876][YARN] Respect YARN AM failure validity interval
#42570 commented on
Jul 3, 2025 • 0 new comments -
[SPARK-50292] Add MapStatus RowCount optimize skewed job
#48825 commented on
Jun 30, 2025 • 0 new comments -
[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of `replace` in ANSI mode
#51297 commented on
Jul 24, 2025 • 0 new comments -
[SPARK-49547][SQL][PYTHON] Add iterator of `RecordBatch` API to `applyInArrow`
#49005 commented on
Jul 4, 2025 • 0 new comments -
[SPARK-52864] [CORE] [Tests] Let LocalSparkContext clear active context in beforeAll
#51284 commented on
Jul 18, 2025 • 0 new comments -
[SPARK-51035][BUILD] Upgrade Janino to 3.1.12
#51239 commented on
Jul 1, 2025 • 0 new comments -
[SPARK-52544][SQL] Allow configuring Json datasource string length limit through SQLConf
#51235 commented on
Jul 25, 2025 • 0 new comments -
[WIP][SPARK-51224][BUILD] Test Maven 4
#51230 commented on
Jul 9, 2025 • 0 new comments -
[SPARK-52486][SQL] Fix Spark Driver Planning OOM issue due to unworthwhile dpp expression before Execution when enabling AQE
#51184 commented on
Jul 21, 2025 • 0 new comments -
[SPARK-52439][SQL] Support creating check constraint with NULL
#51146 commented on
Jul 18, 2025 • 0 new comments -
[SPARK-51168][BUILD] Test Hadoop 3.4.2
#51127 commented on
Jul 4, 2025 • 0 new comments -
[SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files
#50215 commented on
Jul 24, 2025 • 0 new comments -
Increase report interval of spaming logs to 10 seconds
#51012 commented on
Jul 25, 2025 • 0 new comments -
[SPARK-52012][CORE][SQL] Restore IDE Index with type annotations
#50798 commented on
Jul 8, 2025 • 0 new comments -
[WIP][SPARK-52011][SQL] Reduce HDFS NameNode RPC on vectorized Parquet reader
#50765 commented on
Jul 4, 2025 • 0 new comments -
[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown
#50684 commented on
Jul 9, 2025 • 0 new comments -
[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype
#50607 commented on
Jul 22, 2025 • 0 new comments -
[SPARK-51728][SQL] Add SELECT EXCEPT Support
#50536 commented on
Jul 15, 2025 • 0 new comments -
[SPARK-51585][SQL] Oracle dialect supports pushdown datetime functions
#50353 commented on
Jul 28, 2025 • 0 new comments -
[SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters
#50230 commented on
Jul 23, 2025 • 0 new comments