BigQuery Storage: Add more in-depth system tests covering all data formats and field data types

**Simple functional tests**

- [ ] Simple correctness: create a table with some initial data, create a read session over the table, and verify that the expected number of rows is returned.
- [ ] Filtering: create a table with some initial data, create a read session over the table with a push-down filter which excludes some data, and verify that the expected number of rows is returned. (Avro-only)
- [ ] Column selection: create a table with some initial data, create a read session over the table with a list of columns specified, and verify that the expected columns and rows are returned.
- [ ] Snapshot test: create a table with some initial data, load some additional data as a separate step, create a read session using the timestamp of the initial load, read the data, and verify that the initial data is returned and the additional data is not returned. (Avro-only)
- [ ] Column-partitioned table test: create a column-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned. (Avro-only)
- [ ] Naturally-partitioned table test: create a date-partitioned table, read from the table with a partition filter specified, and verify that only the expected data is returned.
- [ ] Data decoding: create a table with at least one field of each type supported by BigQuery -- including numeric, geographic, etc. -- and verify that the fields are decoded successfully.
- [ ] Resuming a read at an offset. For example, we can have some using the Shakespeare samples table which reads half-way and then resumes.

Many of these tests can be implemented using a pre-created sample table if appropriate -- tests a, b, and c use the Shakespeare samples table in our internal tests, for example.

**Long-running tests**

- [ ] ~Open a set of streams in parallel and read the full contents of a large table -- our google3 internal equivalent uses the Wikipedia sample table (which is about 35GB in size) and runs for between 10 and 20 minutes. This test should detect issues with long-running streams and, over time, should eventually add coverage for transparent stream resumption in Java and Python.~ [swast] I don't think we want to block client presubmits with a 10-to-20 minute system test. I'll work with @shollyman and the backend team to figure out a more appropriate home for these long-running tests.

This was originally filed internally as bug 133243219.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BigQuery Storage: Add more in-depth system tests covering all data formats and field data types #8983

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions