| 119 | +> ```sh |
| 120 | +> # (Optional) Enable to use Kaniko cache by default. |
| 121 | +> gcloud config set builds/use_kaniko True |
| 122 | +> ``` |
| 123 | +
|
| 124 | +Cloud Build allows you to |
| 125 | +[build a Docker image using a `Dockerfile`](https://cloud.google.com/cloud-build/docs/quickstart-docker#build_using_dockerfile). |
| 126 | +and saves it into |
| 127 | +[Container Registry](https://cloud.google.com/container-registry/), |
| 128 | +where the image is accessible to other Google Cloud products. |
| 129 | +
|
| 130 | +```sh |
| 131 | +export TEMPLATE_IMAGE="gcr.io/$PROJECT/samples/dataflow/streaming-beam:latest" |
| 132 | +
|
| 133 | +# Build the image into Container Registry, this is roughly equivalent to: |
| 134 | +# gcloud auth configure-docker |
| 135 | +# docker image build -t $TEMPLATE_IMAGE . |
| 136 | +# docker push $TEMPLATE_IMAGE |
| 137 | +gcloud builds submit --tag "$TEMPLATE_IMAGE" . |
| 138 | +``` |
| 139 | +
|
| 140 | +Images starting with `gcr.io/PROJECT/` are saved into your project's |
| 141 | +Container Registry, where the image is accessible to other Google Cloud products. |
| 142 | + |
| 143 | +### Creating a Flex Template |
| 144 | + |
| 145 | +To run a template, you need to create a *template spec* file containing all the |
| 146 | +necessary information to run the job, such as the SDK information and metadata. |
| 147 | + |
| 148 | +The [`metadata.json`](metadata.json) file contains additional information for |
| 149 | +the template such as the "name", "description", and input "parameters" field. |
| 150 | + |
| 151 | +The template file must be created in a Cloud Storage location, |
| 152 | +and is used to run a new Dataflow job. |
| 153 | + |
| 154 | +```sh |
| 155 | +export TEMPLATE_PATH="gs://$BUCKET/samples/dataflow/templates/streaming-beam.json" |
| 156 | +
|
| 157 | +# Build the Flex Template. |
| 158 | +gcloud beta dataflow flex-template build $TEMPLATE_PATH \ |
| 159 | + --image "$TEMPLATE_IMAGE" \ |
| 160 | + --sdk-language "PYTHON" \ |
| 161 | + --metadata-file "metadata.json" |
| 162 | +``` |
| 163 | + |
| 164 | +The template is now available through the template file in the Cloud Storage |
| 165 | +location that you specified. |
| 166 | + |
| 167 | +### Running a Dataflow Flex Template pipeline |
| 168 | + |
| 169 | +You can now run the Apache Beam pipeline in Dataflow by referring to the |
| 170 | +template file and passing the template |
| 171 | +[parameters](https://cloud.devsite.corp.google.com/dataflow/docs/guides/specifying-exec-params#setting-other-cloud-dataflow-pipeline-options) |
| 172 | +required by the pipeline. |
| 173 | + |
| 174 | +```sh |
| 175 | +# Run the Flex Template. |
| 176 | +gcloud beta dataflow flex-template run "streaming-beam-`date +%Y%m%d-%H%M%S`" \ |
| 177 | + --template-file-gcs-location "$TEMPLATE_PATH" \ |
| 178 | + --parameters "input_subscription=$SUBSCRIPTION,output_table=$PROJECT:$DATASET.$TABLE" |
| 179 | +``` |
| 180 | + |
| 181 | +Check the results in BigQuery by running the following query: |
| 182 | + |
| 183 | +```sh |
| 184 | +bq query --use_legacy_sql=false 'SELECT * FROM `'"$PROJECT.$DATASET.$TABLE"'`' |
| 185 | +``` |
| 186 | + |
| 187 | +While this pipeline is running, you can see new rows appended into the BigQuery |
| 188 | +table every minute. |
| 189 | + |
| 190 | +You can manually publish more messages from the |
| 191 | +[Cloud Scheduler page](https://console.cloud.google.com/cloudscheduler) |
| 192 | +to see how that affects the page review scores. |
| 193 | + |
| 194 | +You can also publish messages directly to a topic through the |
| 195 | +[Pub/Sub topics page](https://console.cloud.google.com/cloudpubsub/topic/list) |
| 196 | +by selecting the topic you want to publish to, |
| 197 | +and then clicking the "Publish message" button at the top. |
| 198 | +This way you can test your pipeline with different URLs, |
| 199 | +just make sure you pass valid JSON data since this sample does not do any |
| 200 | +error handling for code simplicity. |
| 201 | + |
| 202 | +Try sending the following message and check back the BigQuery table about |
| 203 | +a minute later. |
| 204 | + |
| 205 | +```json |
| 206 | +{"url": "https://cloud.google.com/bigquery/", "review": "positive"} |
| 207 | +``` |
| 208 | + |
| 209 | +### Cleaning up |
| 210 | + |
| 211 | +After you've finished this tutorial, you can clean up the resources you created |
| 212 | +on Google Cloud so you won't be billed for them in the future. |
| 213 | +The following sections describe how to delete or turn off these resources. |
| 214 | + |
| 215 | +#### Clean up the Flex template resources |
| 216 | + |
| 217 | +1. Stop the Dataflow pipeline. |
| 218 | + |
| 219 | + ```sh |
| 220 | + gcloud dataflow jobs list \ |
| 221 | + --filter 'NAME:streaming-beam AND STATE=Running' \ |
| 222 | + --format 'value(JOB_ID)' \ |
| 223 | + | xargs gcloud dataflow jobs cancel |
| 224 | + ``` |
| 225 | + |
| 226 | +1. Delete the template spec file from Cloud Storage. |
| 227 | + |
| 228 | + ```sh |
| 229 | + gsutil rm $TEMPLATE_PATH |
| 230 | + ``` |
| 231 | + |
| 232 | +1. Delete the Flex Template container image from Container Registry. |
| 233 | + |
| 234 | + ```sh |
| 235 | + gcloud container images delete $TEMPLATE_IMAGE --force-delete-tags |
| 236 | + ``` |
| 237 | + |
| 238 | +#### Clean up Google Cloud project resources |
| 239 | + |
| 240 | +1. Delete the Cloud Scheduler jobs. |
| 241 | + |
| 242 | + ```sh |
| 243 | + gcloud scheduler jobs delete negative-ratings-publisher |
| 244 | + gcloud scheduler jobs delete positive-ratings-publisher |
| 245 | + ``` |
| 246 | + |
| 247 | +1. Delete the Pub/Sub subscription and topic. |
| 248 | + |
| 249 | + ```sh |
| 250 | + gcloud pubsub subscriptions delete $SUBSCRIPTION |
| 251 | + gcloud pubsub topics delete $TOPIC |
| 252 | + ``` |
| 253 | + |
| 254 | +1. Delete the BigQuery table. |
| 255 | + |
| 256 | + ```sh |
| 257 | + bq rm -f -t $PROJECT:$DATASET.$TABLE |
| 258 | + ``` |
| 259 | + |
| 260 | +1. Delete the BigQuery dataset, this alone does not incur any charges. |
| 261 | + |
| 262 | + > ⚠️ The following command also deletes all tables in the dataset. |
| 263 | + > The tables and data cannot be recovered. |
| 264 | + > |
| 265 | + > ```sh |
| 266 | + > bq rm -r -f -d $PROJECT:$DATASET |
| 267 | + > ``` |
| 268 | + |
| 269 | +1. Delete the Cloud Storage bucket, this alone does not incur any charges. |
| 270 | + |
| 271 | + > ⚠️ The following command also deletes all objects in the bucket. |
| 272 | + > These objects cannot be recovered. |
| 273 | + > |
| 274 | + > ```sh |
| 275 | + > gsutil rm -r gs://$BUCKET |
| 276 | + > ``` |
| 277 | + |
| 278 | +## Limitations |
| 279 | + |
| 280 | +* You must use a Google-provided base image to package your containers using Docker. |
| 281 | +* You cannot update streaming jobs using Flex Template. |
| 282 | +* You cannot use FlexRS for Flex Template jobs. |
| 283 | + |
| 284 | +📝 Docs: [Using Flex Templates](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates) |
0 commit comments