E5BD python-docs-samples/dlp at master · michaelawyu/python-docs-samples · GitHub
[go: up one dir, main page]

Skip to content

Latest commit

 

History

History
 
 

README.rst

Google Data Loss Prevention Python Samples

https://gstatic.com/cloudssh/images/open-btn.png

This directory contains samples for Google Data Loss Prevention. Google Data Loss Prevention provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams. This api is currently in beta.

Setup

Authentication

This sample requires you to have authentication setup. Refer to the Authentication Getting Started Guide for instructions on setting up credentials for applications.

Install Dependencies

  1. Install pip and virtualenv if you do not already have them. You may want to refer to the Python Development Environment Setup Guide for Google Cloud Platform for instructions.
  1. Create a virtualenv. Samples are compatible with Python 2.7 and 3.4+.

    $ virtualenv env
    $ source env/bin/activate
  2. Install the dependencies needed to run the samples.

    $ pip install -r requirements.txt

Samples

Quickstart

https://gstatic.com/cloudssh/images/open-btn.png

To run this sample:

$ python quickstart.py

Inspect Content

https://gstatic.com/cloudssh/images/open-btn.png

To run this sample:

$ python inspect_content.py

usage: inspect_content.py [-h] {string,file,gcs,datastore,bigquery} ...

Sample app that uses the Data Loss Prevention API to inspect a string, a local
file or a file on Google Cloud Storage.

positional arguments:
  {string,file,gcs,datastore,bigquery}
                        Select how to submit content to the API.
    string              Inspect a string.
    file                Inspect a local file.
    gcs                 Inspect files on Google Cloud Storage.
    datastore           Inspect files on Google Datastore.
    bigquery            Inspect files on Google BigQuery.

optional arguments:
  -h, --help            show this help message and exit

Redact Content

https://gstatic.com/cloudssh/images/open-btn.png

To run this sample:

$ python redact.py

usage: redact.py [-h] [--project PROJECT] [--info_types INFO_TYPES]
                 [--min_likelihood {LIKELIHOOD_UNSPECIFIED,VERY_UNLIKELY,UNLIKELY,POSSIBLE,LIKELY,VERY_LIKELY}]
                 [--mime_type MIME_TYPE]
                 filename output_filename

Sample app that uses the Data Loss Prevent API to redact the contents of a
string or an image file.

positional arguments:
  filename              The path to the file to inspect.
  output_filename       The path to which the redacted image will be written.

optional arguments:
  -h, --help            show this help message and exit
  --project PROJECT     The Google Cloud project id to use as a parent
                        resource.
  --info_types INFO_TYPES
                        Strings representing info types to look for. A full
                        list of info categories and types is available from
                        the API. Examples include "FIRST_NAME", "LAST_NAME",
                        "EMAIL_ADDRESS". If unspecified, the three above
                        examples will be used.
  --min_likelihood {LIKELIHOOD_UNSPECIFIED,VERY_UNLIKELY,UNLIKELY,POSSIBLE,LIKELY,VERY_LIKELY}
                        A string representing the minimum likelihood threshold
                        that constitutes a match.
  --mime_type MIME_TYPE
                        The MIME type of the file. If not specified, the type
                        is inferred via the Python standard library's
                        mimetypes module.

Display Metadata

https://gstatic.com/cloudssh/images/open-btn.png

To run this sample:

$ python metadata.py

usage: metadata.py [-h] [--language_code LANGUAGE_CODE] [--filter FILTER]

Sample app that queries the Data Loss Prevention API for supported categories
and info types.

optional arguments:
  -h, --help            show this help message and exit
  --language_code LANGUAGE_CODE
                        The BCP-47 language code to use, e.g. 'en-US'.
  --filter FILTER       An optional filter to only return info types supported
                        by certain parts of the API. Defaults to
                        "supported_by=INSPECT".

The client library

This sample uses the Google Cloud Client Library for Python. You can read the documentation for more details on API usage and use GitHub to browse the source and report issues.

0