8000 BigQuery Storage: Add more in-depth system tests covering all data formats and field data types · Issue #8983 · googleapis/google-cloud-python · GitHub
[go: up one dir, main page]

Skip to content

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

@tswast

Description

@tswast

Simple functional tests

  • Simple correctness: create a table with some initial data, create a read session over the table, and verify that the expected number of rows is returned.
  • Filtering: create a table with some initial data, create a read session over the table with a push-down filter which excludes some data, and verify that the expected number of rows is returned. (Avro-only)
  • Column selection: create a table with some initial data, create a read session over the table with a list of columns specified, and verify that the expected columns and rows are returned.
  • Snapshot test: create a table with some initial data, load some additional data as a separate step, create a read session using the timestamp of the initial load, read the data, and verify that the initial data is returned and the additional data is not returned. (Avro-only)
  • Column-partitioned table test: create a column-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned. (Avro-only)
  • Naturally-partitioned table test: create a date-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned.
  • Data decoding: create a table with at least one field of each type supported by BigQuery -- including numeric, geographic, etc. -- and verify that the fields are decoded successfully.
  • Resuming a read at an offset. For example, we can have some using the Shakespeare samples table which reads half-way and then resumes.

Many of these tests can be implemented using a pre-created sample table if appropriate -- tests a, b, and c use the Shakespeare samples table in our internal tests, for example.

Long-running tests

  • Open a set of streams in parallel and read the full contents of a large table -- our google3 internal equivalent uses the Wikipedia sample table (which is about 35GB in size) and runs for between 10 and 20 minutes. This test should detect issues with long-running streams and, over time, should eventually add coverage for transparent stream resumption in Java and Python. [swast] I don't think we want to block client presubmits with a 10-to-20 minute system test. I'll work with @shollyman and the backend team to figure out a more appropriate home for these long-running tests.

This was originally filed internally as bug 133243219.

Metadata

Metadata

Assignees

Labels

api: bigquerystorageIssues related to the BigQuery Storage API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0