8000 "Smart mode" for schema creation · Issue #54 · csv2db/csv2db · GitHub
[go: up one dir, main page]

Skip to content

"Smart mode" for schema creation #54

@andygrunwald

Description

@andygrunwald

What would you like to be added:

When csv2db creates a schema, every column is created as VARCHAR(1000).
This default type can be changed by using

  -c COLUMN_TYPE, --column-type COLUMN_TYPE
                        The column type to use for the table generation.

A smart mode would be quite useful.
When this mode is enabled, csv2db would scan the content of the CSV file first and tries to determine the right type per column.

It doesn't need to be super smart.
Maybe beginning with three different types:

  • strings by VARCHAR(1000)
  • integers via a column that supports integer 64 bit
  • floats via a column that supports decimals

A CSV like this

id,name,unit_name,nutrient_nbr,rank
1001,Solids,G,201,200
1002,Nitrogen,G,202,500
1003,Protein,G,203,600
1004,Total lipid (fat),G,204,800
1005,"Carbohydrate, by difference",G,205,1110

would lead to something like

CREATE TABLE `nutrient`
(
 `ID` int,
 `NAME` VARCHAR(1000),
 `UNIT_NAME` VARCHAR(1000),
 `NUTRIENT_NBR` int,
 `RANK` int
);

Above is pseudocode, database types for the particular database systems need to be determined).

Why is this needed:

  • The --column-type can only change the type for all columns, not for specific ones
  • Queries on the imported data might be more efficient (especially for numbers)
  • We would leverage the functionality of the database "a bit more"

Anything else we need to know?:

  • We assume CSVs that follow a valid structure
  • Database types for the particular database systems need to be determined
  • A current work-a-round would be to ALTER the schema after the import

Feedback:

Let me know what you think about this.
If this would be a welcoming feature, I might be able to spent some time on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0