You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-53598][SQL] Check the existence of numParts before reading large table property
### What changes were proposed in this pull request?
This PR proposes to fix a regression caused by SPARK-33812 (de234ee 3.2.0).
We hit an error when upgrading Spark from 3.1 to 3.3, the table is a Hive SerDe Parquet table, which TBLPROPERTIES looks like malformed (not sure how this happen), the table can be read and write normally with Spark 3.1, but fails on reading with Spark 3.3.
```
-- Hive DDL
CREATE EXTERNAL TABLE `foo`.`bar`(
`id` bigint,
...
) PARTITIONED BY (`dt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://...'
TBLPROPERTIES (
...
'spark.sql.sources.schema.partCol.0'='dt',
'transient_lastDdlTime'='1727333678')
```
```
org.apache.spark.sql.AnalysisException: Cannot read table property 'spark.sql.sources.schema' as it's corrupted.
at org.apache.spark.sql.errors.QueryCompilationErrors$.cannotReadCorruptedTablePropertyError(QueryCompilationErrors.scala:840) ~[spark-catalyst_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.catalyst.catalog.CatalogTable$.$anonfun$readLargeTableProp$1(interface.scala:502) ~[spark-catalyst_2.12-3.3.4.104.jar:3.3.4.104]
at scala.Option.orElse(Option.scala:447) ~[scala-library-2.12.18.jar:?]
at org.apache.spark.sql.catalyst.catalog.CatalogTable$.readLargeTableProp(interface.scala:497) ~[spark-catalyst_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.hive.HiveExternalCatalog.getSchemaFromTableProperties(HiveExternalCatalog.scala:839) ~[spark-hive_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.hive.HiveExternalCatalog.restoreHiveSerdeTable(HiveExternalCatalog.scala:809) ~[spark-hive_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.hive.HiveExternalCatalog.restoreTableMetadata(HiveExternalCatalog.scala:765) ~[spark-hive_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$getTable$1(HiveExternalCatalog.scala:734) ~[spark-hive_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101) ~[spark-hive_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:734) ~[spark-hive_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138) ~[spark-catalyst_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:544) ~[spark-catalyst_2.12-3.3.4.104.jar:3.3.4.104]
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:529) ~[spark-catalyst_2.12-3.3.4.104.jar:3.3.4.104]
...
```
Before SPARK-33812, it skipped reading the schema from table properties if `spark.sql.sources.schema.numParts` was missing for a Hive SerDe table.
### Why are the changes needed?
Restore behavior before Spark 3.2.
### Does this PR introduce _any_ user-facing change?
Yes, restores behavior before Spark 3.2.
### How was this patch tested?
UT is added.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#52355 from pan3793/SPARK-53598.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
0 commit comments