8000 BigQuery upload_from_file unicode file-like must be opened in binary-mode if it's more than RESUMABLE_UPLOAD_THRESHOLD, otherwise str-mode. · Issue #1760 · googleapis/google-cloud-python · GitHub
[go: up one dir, main page]

Skip to content
BigQuery upload_from_file unicode file-like must be opened in binary-mode if it's more than RESUMABLE_UPLOAD_THRESHOLD, otherwise str-mode. #1760
@joar

Description

@joar

Steps to reproduce

import sys
from gcloud.bigquery import Client, SchemaField
from gcloud.bigquery.job import CreateDisposition, WriteDisposition

csv_filename = 'sandwiches.csv'

if len(sys.argv) > 1:
    csv_filename = sys.argv[1]

bq = Client()

ds = bq.dataset('test_unicode')
ds.location = 'EU'


def test_unicode_upload(filename, mode):

    if not ds.exists():
        print('Creating dataset: {}'.format(ds.name))
        ds.create()

    fields = [
        SchemaField('name', 'STRING'),
        SchemaField('main_ingredient', 'STRING'),
    ]

    table = ds.table('sandwiches', fields)

    print('Uploading CSV: {}, mode={!r}'.format(csv_filename, mode))
    table.upload_from_file(
        open(filename, mode),
        encoding='UTF-8',
        source_format='CSV',
        write_disposition=WriteDisposition.WRITE_TRUNCATE,
        create_disposition=CreateDisposition.CREATE_IF_NEEDED)

test_unicode_upload(csv_filename, 'r') # Works

test_unicode_upload(csv_filename, 'rb')
# Fails in http.client.HTTPConnection()._send_request
#
# /usr/lib/python3.4/http/client.py in _send_request(self, method, url,
# body, headers)
#    1178         if isinstance(body, str):
#    1179             # RFC 2616 Section 3.7.1 says that text default has a
#    1180             # default charset of iso-8859-1.
# -> 1181             body = body.encode('iso-8859-1')
#    1182         self.endheaders(body)
#
# UnicodeEncodeError: 'latin-1' codec can't encode characters in position
#649-650: ordinal not in range(256)

sandwiches.csv

name,main_ingredient
Räksmörgås,Räkor
Baguette,Bröd

Expected behavior

When i send in a binary-mode file-like I expect upload_from_file to pass the data through to BigQuery as-is, and that the BigQuery load job will decode it for me using encoding=.

Enviroment

$ python --version
Python 3.4.3+
$ pip freeze | egrep 'httplib2|gcloud'
gcloud==0.13.0
httplib2==0.9.2

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the BigQuery API.api: storageIssues related to the Cloud Storage API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0