8000 Discussion for a Systematic Configuration in 'Create External Table' Options · Issue #8994 · apache/datafusion · GitHub
[go: up one dir, main page]

Skip to content
Discussion for a Systematic Configuration in 'Create External Table' Options #8994
@metesynnada

Description

@metesynnada

Is your feature request related to a problem or challenge?

Currently, in our implementation of 'Create External Table', the configuration options are not systematically organized, leading to potential confusion and complexity for the users. This is especially evident when we compare our configuration pattern with other systems like Apache Flink and Apache Spark.

For instance, in our current setup, wiring CSV format options and AWS credential settings are done in the same context. This approach lacks the clarity and structure found in similar systems. Examples of more systematic configurations can be seen in:

Although Spark’s approach could be perceived as confusing when applied to our 'Create External Table' method, our method is currently more aligned with Spark's approach in terms of table creation.

However, one aspect where our system shines is in our session context configuration. We utilize a more intuitive dot (.) divided pattern, like datafusion.execution.parquet.statistics_enabled. This is more user-friendly and logically structured.

Describe the solution you'd like

I propose we adopt a more structured and systematic approach in defining table options, similar to our session context configuration. For example, instead of the current format:

CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 's3://boo/foo.csv'
OPTIONS ('AWS_ACCESS_KEY_ID' 'asdasd',
         'AWS_SECRET_ACCESS_KEY', 'asdasd',
         'timestamp_format' 'asdasd',
         'date_format' 'asdasd')

We could structure it more clearly:

CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 's3://boo/foo.csv'
OPTIONS ('aws.credentials.basic.accesskeyid' 'asdasd',
         'aws.credentials.basic.secretkey', 'asdasd',
         'format.csv.sink.timestamp_format' 'asdasd',
         'format.csv.sink.date_format' 'asdasd')

Or even more detailed:

CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV LOCATION 's3://boo/foo.csv'
OPTIONS ('aws.credentials.basic.accesskeyid' 'asdasd',
         'aws.credentials.basic.secretkey', 'asdasd',
         'format.csv.scan.datetime_regex' 'asdasd',
         'format.csv.sink.timestamp_format' 'asdasd',
         'format.csv.sink.date_format' 'asdasd')

This approach would separate AWS credentials from CSV format options and further delineate options for scanning and sinking, enhancing clarity and ease of use.

Impact:

  • User Experience: This change will significantly improve user experience by making configuration more intuitive and easy to understand.
  • Documentation: Accompanying documentation will be necessary to guide users through the new configuration pattern.
  • Compatibility: It’s important to note that this change will introduce breaking changes. Thus, a clear migration path needs to be provided for existing users.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0