-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem or challenge?
Currently, in our implementation of 'Create External Table', the configuration options are not systematically organized, leading to potential confusion and complexity for the users. This is especially evident when we compare our configuration pattern with other systems like Apache Flink and Apache Spark.
For instance, in our current setup, wiring CSV format options and AWS credential settings are done in the same context. This approach lacks the clarity and structure found in similar systems. Examples of more systematic configurations can be seen in:
Although Spark’s approach could be perceived as confusing when applied to our 'Create External Table' method, our method is currently more aligned with Spark's approach in terms of table creation.
However, one aspect where our system shines is in our session context configuration. We utilize a more intuitive dot (.) divided pattern, like datafusion.execution.parquet.statistics_enabled. This is more user-friendly and logically structured.
Describe the solution you'd like
I propose we adopt a more structured and systematic approach in defining table options, similar to our session context configuration. For example, instead of the current format:
CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 's3://boo/foo.csv'
OPTIONS ('AWS_ACCESS_KEY_ID' 'asdasd',
'AWS_SECRET_ACCESS_KEY', 'asdasd',
'timestamp_format' 'asdasd',
'date_format' 'asdasd')
We could structure it more clearly:
CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 's3://boo/foo.csv'
OPTIONS ('aws.credentials.basic.accesskeyid' 'asdasd',
'aws.credentials.basic.secretkey', 'asdasd',
'format.csv.sink.timestamp_format' 'asdasd',
'format.csv.sink.date_format' 'asdasd')
Or even more detailed:
CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 's3://boo/foo.csv'
OPTIONS ('aws.credentials.basic.accesskeyid' 'asdasd',
'aws.credentials.basic.secretkey', 'asdasd',
'format.csv.scan.datetime_regex' 'asdasd',
'format.csv.sink.timestamp_format' 'asdasd',
'format.csv.sink.date_format' 'asdasd')
This approach would separate AWS credentials from CSV format options and further delineate options for scanning and sinking, enhancing clarity and ease of use.
Impact:
- User Experience: This change will significantly improve user experience by making configuration more intuitive and easy to understand.
- Documentation: Accompanying documentation will be necessary to guide users through the new configuration pattern.
- Compatibility: It’s important to note that this change will introduce breaking changes. Thus, a clear migration path needs to be provided for existing users.
Describe alternatives you've considered
No response
Additional context
No response