Recommendations on GCP with TensorFlow and WALS with Cloud Composer


This lab is adapted from the original solution created by lukmanr

This project deploys a solution for a recommendation service on GCP, using the WALS algorithm in TensorFlow. Components include:

  • Recommendation model code, and scripts to train and tune the model on ML Engine
  • A REST endpoint using Google Cloud Endpoints for serving recommendations
  • An Airflow server managed by Cloud Composer for running scheduled model training

Confirm Prerequisites

Create a Cloud Composer Instance

  • Create a Cloud Composer instance
    1. Specify 'composer' for name
    2. Choose a location
    3. Keep the remaining settings at their defaults
    4. Select Create

This takes 15 - 20 minutes. Continue with the rest of the lab as you will be using Cloud Composer near the end.

In [1]:
%%bash
pip install sh --upgrade pip # needed to execute shell scripts later
Requirement already up-to-date: sh in /opt/conda/lib/python3.7/site-packages (1.13.1)
Requirement already up-to-date: pip in /opt/conda/lib/python3.7/site-packages (20.1)

Setup environment variables

__Replace the below settings with your own.__ Note: you can leave AIRFLOW_BUCKET blank and come back to it after your Composer instance is created which automatically will create an Airflow bucket for you.

1. Make a GCS bucket with the name recserve_[YOUR-PROJECT-ID]:

In [2]:
import os
PROJECT = 'my-project' # REPLACE WITH YOUR PROJECT ID
REGION = 'us-central1' # REPLACE WITH YOUR REGION e.g. us-central1

# do not change these
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = 'recserve_' + PROJECT
os.environ['REGION'] = REGION
In [3]:
%%bash

gcloud config set project $PROJECT
gcloud config set compute/region $REGION
Updated property [core/project].
Updated property [compute/region].
In [4]:
%%bash

# create GCS bucket with recserve_PROJECT_NAME if not exists
exists=$(gsutil ls -d | grep -w gs://${BUCKET}/)
if [ -n "$exists" ]; then
   echo "Not creating recserve_bucket since it already exists."
else
   echo "Creating recserve_bucket"
   gsutil mb -l ${REGION} gs://${BUCKET}
fi
Not creating recserve_bucket since it already exists.

Setup Google App Engine permissions

  1. In IAM, change permissions for "Compute Engine default service account" from Editor to Owner. This is required so you can create and deploy App Engine versions from within Cloud Datalab. Note: the alternative is to run all app engine commands directly in Cloud Shell instead of from within Cloud Datalab.

  2. Create an App Engine instance if you have not already by uncommenting and running the below code

In [ ]:
# %%bash
# run app engine creation commands
# gcloud app create --region ${REGION} # see: https://cloud.google.com/compute/docs/regions-zones/
# gcloud app update --no-split-health-checks

Part One: Setup and Train the WALS Model

Upload sample data to BigQuery

This tutorial comes with a sample Google Analytics data set, containing page tracking events from the Austrian news site Kurier.at. The schema file '''ga_sessions_sample_schema.json''' is located in the folder data in the tutorial code, and the data file '''ga_sessions_sample.json.gz''' is located in a public Cloud Storage bucket associated with this tutorial. To upload this data set to BigQuery:

Copy sample data files into our bucket

In [5]:
%%bash

gsutil -m cp gs://cloud-training-demos/courses/machine_learning/deepdive/10_recommendation/endtoend/data/ga_sessions_sample.json.gz gs://${BUCKET}/data/ga_sessions_sample.json.gz
gsutil -m cp gs://cloud-training-demos/courses/machine_learning/deepdive/10_recommendation/endtoend/data/recommendation_events.csv data/recommendation_events.csv
gsutil -m cp gs://cloud-training-demos/courses/machine_learning/deepdive/10_recommendation/endtoend/data/recommendation_events.csv gs://${BUCKET}/data/recommendation_events.csv
Copying gs://cloud-training-demos/courses/machine_learning/deepdive/10_recommendation/endtoend/data/ga_sessions_sample.json.gz [Content-Type=application/json]...
- [1/1 files][121.3 MiB/121.3 MiB] 100% Done                                    
Operation completed over 1 objects/121.3 MiB.                                    
Copying gs://cloud-training-demos/courses/machine_learning/deepdive/10_recommendation/endtoend/data/recommendation_events.csv...
/ [1/1 files][ 10.0 MiB/ 10.0 MiB] 100% Done                                    
Operation completed over 1 objects/10.0 MiB.                                     
Copying gs://cloud-training-demos/courses/machine_learning/deepdive/10_recommendation/endtoend/data/recommendation_events.csv [Content-Type=text/csv]...
- [1/1 files][ 10.0 MiB/ 10.0 MiB] 100% Done                                    
Operation completed over 1 objects/10.0 MiB.                                     

2. Create empty BigQuery dataset and load sample JSON data

Note: Ingesting the 400K rows of sample data. This usually takes 5-7 minutes.

In [6]:
%%bash

# create BigQuery dataset if it doesn't already exist
exists=$(bq ls -d | grep -w GA360_test)
if [ -n "$exists" ]; then
   echo "Not creating GA360_test since it already exists."
else
   echo "Creating GA360_test dataset."
   bq --project_id=${PROJECT} mk GA360_test 
fi

# create the schema and load our sample Google Analytics session data
bq load --source_format=NEWLINE_DELIMITED_JSON \
 GA360_test.ga_sessions_sample \
 gs://${BUCKET}/data/ga_sessions_sample.json.gz \
 data/ga_sessions_sample_schema.json # can't load schema files from GCS
Not creating GA360_test since it already exists.
Waiting on bqjob_r3f8842b47668a331_00000171e5d3e90c_1 ... (194s) Current status: DONE   

Install WALS model training package and model data

1. Create a distributable package. Copy the package up to the code folder in the bucket you created previously.

In [7]:
%%bash

cd wals_ml_engine

echo "creating distributable package"
python setup.py sdist

echo "copying ML package to bucket"
gsutil cp dist/wals_ml_engine-0.1.tar.gz gs://${BUCKET}/code/
creating distributable package
running sdist
running egg_info
creating wals_ml_engine.egg-info
writing wals_ml_engine.egg-info/PKG-INFO
writing dependency_links to wals_ml_engine.egg-info/dependency_links.txt
writing requirements to wals_ml_engine.egg-info/requires.txt
writing top-level names to wals_ml_engine.egg-info/top_level.txt
writing manifest file 'wals_ml_engine.egg-info/SOURCES.txt'
reading manifest file 'wals_ml_engine.egg-info/SOURCES.txt'
writing manifest file 'wals_ml_engine.egg-info/SOURCES.txt'
running check
creating wals_ml_engine-0.1
creating wals_ml_engine-0.1/trainer
creating wals_ml_engine-0.1/wals_ml_engine.egg-info
copying files to wals_ml_engine-0.1...
copying README.md -> wals_ml_engine-0.1
copying setup.py -> wals_ml_engine-0.1
copying trainer/__init__.py -> wals_ml_engine-0.1/trainer
copying trainer/model.py -> wals_ml_engine-0.1/trainer
copying trainer/task.py -> wals_ml_engine-0.1/trainer
copying trainer/util.py -> wals_ml_engine-0.1/trainer
copying trainer/wals.py -> wals_ml_engine-0.1/trainer
copying wals_ml_engine.egg-info/PKG-INFO -> wals_ml_engine-0.1/wals_ml_engine.egg-info
copying wals_ml_engine.egg-info/SOURCES.txt -> wals_ml_engine-0.1/wals_ml_engine.egg-info
copying wals_ml_engine.egg-info/dependency_links.txt -> wals_ml_engine-0.1/wals_ml_engine.egg-info
copying wals_ml_engine.egg-info/requires.txt -> wals_ml_engine-0.1/wals_ml_engine.egg-info
copying wals_ml_engine.egg-info/top_level.txt -> wals_ml_engine-0.1/wals_ml_engine.egg-info
Writing wals_ml_engine-0.1/setup.cfg
creating dist
Creating tar archive
removing 'wals_ml_engine-0.1' (and everything under it)
copying ML package to bucket
warning: check: missing required meta-data: url

warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied

Copying file://dist/wals_ml_engine-0.1.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  8.4 KiB/  8.4 KiB]                                                
Operation completed over 1 objects/8.4 KiB.                                      

2. Run the WALS model on the sample data set:

In [8]:
%%bash

# view the ML train local script before running
cat wals_ml_engine/mltrain.sh
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


usage () {
  echo "usage: mltrain.sh [local | train | tune] [gs://]<input_file>.csv
                  [--data-type ratings|web_views]
                  [--delimiter <delim>]
                  [--use-optimized]
                  [--headers]

Use 'local' to train locally with a local data file, and 'train' and 'tune' to
run on ML Engine.  For ML Engine jobs the input file must reside on GCS.

Optional args:
  --data-type:      Default to 'ratings', meaning MovieLens ratings from 0-5.
                    Set to 'web_views' for Google Analytics data.
  --delimiter:      CSV delimiter, default to '\t'.
  --use-optimized:  Use optimized hyperparamters, default False.
  --headers:        Default False for 'ratings', True for 'web_views'.

Examples:

# train locally with unoptimized hyperparams
./mltrain.sh local ../data/recommendation_events.csv --data-type web_views

# train on ML Engine with optimized hyperparams
./mltrain.sh train gs://rec_serve/data/recommendation_events.csv --data-type web_views --use-optimized

# tune hyperparams on ML Engine:
./mltrain.sh tune gs://rec_serve/data/recommendation_events.csv --data-type web_views
"

}

date

TIME=`date +"%Y%m%d_%H%M%S"`

# CHANGE TO YOUR BUCKET
BUCKET="gs://rec_serve"

if [[ $# < 2 ]]; then
  usage
  exit 1
fi

# set job vars
TRAIN_JOB="$1"
TRAIN_FILE="$2"
JOB_NAME=wals_ml_${TRAIN_JOB}_${TIME}
REGION=us-central1

# add additional args
shift; shift
ARGS="--train-files ${TRAIN_FILE} --verbose-logging $@"

if [[ ${TRAIN_JOB} == "local" ]]; then

  mkdir -p jobs/${JOB_NAME}

  gcloud ml-engine local train \
    --module-name trainer.task \
    --package-path trainer \
    -- \
    --job-dir jobs/${JOB_NAME} \
    ${ARGS}

elif [[ ${TRAIN_JOB} == "train" ]]; then

  gcloud ml-engine jobs submit training ${JOB_NAME} \
    --region $REGION \
    --scale-tier=CUSTOM \
    --job-dir ${BUCKET}/jobs/${JOB_NAME} \
    --module-name trainer.task \
    --package-path trainer \
    --config trainer/config/config_train.json \
    -- \
    ${ARGS}

elif [[ $TRAIN_JOB == "tune" ]]; then

  # set configuration for tuning
  CONFIG_TUNE="trainer/config/config_tune.json"
  for i in $ARGS ; do
    if [[ "$i" == "web_views" ]]; then
      CONFIG_TUNE="trainer/config/config_tune_web.json"
      break
    fi
  done

  gcloud ml-engine jobs submit training ${JOB_NAME} \
    --region ${REGION} \
    --scale-tier=CUSTOM \
    --job-dir ${BUCKET}/jobs/${JOB_NAME} \
    --module-name trainer.task \
    --package-path trainer \
    --config ${CONFIG_TUNE} \
    -- \
    --hypertune \
    ${ARGS}

else
  usage
fi

date
In [9]:
%%bash

cd wals_ml_engine

# train locally with unoptimized hyperparams
./mltrain.sh local ../data/recommendation_events.csv --data-type web_views --use-optimized

# Options if we wanted to train on CMLE. We will do this with Cloud Composer later
# train on ML Engine with optimized hyperparams
# ./mltrain.sh train ../data/recommendation_events.csv --data-type web_views --use-optimized

# tune hyperparams on ML Engine:
# ./mltrain.sh tune ../data/recommendation_events.csv --data-type web_views
Tue May  5 17:17:35 UTC 2020
Tue May  5 17:17:43 UTC 2020
WARNING: The `gcloud ml-engine` commands have been renamed and will soon be removed. Please use `gcloud ai-platform` instead.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jupyter/training-data-analyst/courses/machine_learning/deepdive/10_recommend/endtoend/wals_ml_engine/trainer/task.py", line 22, in <module>
    import model
ModuleNotFoundError: No module named 'model'

This will take a couple minutes, and create a job directory under wals_ml_engine/jobs like "wals_ml_local_20180102_012345/model", containing the model files saved as numpy arrays.

View the locally trained model directory

In [10]:
ls wals_ml_engine/jobs
wals_ml_local_20200505_171735/

3. Copy the model files from this directory to the model folder in the project bucket:

In the case of multiple models, take the most recent (tail -1)

In [11]:
%%bash
export JOB_MODEL=$(find wals_ml_engine/jobs -name "model" | tail -1)
gsutil cp ${JOB_MODEL}/* gs://${BUCKET}/model/
  
echo "Recommendation model file numpy arrays in bucket:"  
gsutil ls gs://${BUCKET}/model/
Recommendation model file numpy arrays in bucket:
gs://recserve_qwiklabs-gcp-01-3486430ce6e3/model/initrd.img
gs://recserve_qwiklabs-gcp-01-3486430ce6e3/model/initrd.img.old
gs://recserve_qwiklabs-gcp-01-3486430ce6e3/model/vmlinuz
gs://recserve_qwiklabs-gcp-01-3486430ce6e3/model/vmlinuz.old
Omitting directory "file:///bin". (Did you mean to do cp -r?)
Omitting directory "file:///boot". (Did you mean to do cp -r?)
Omitting directory "file:///dev". (Did you mean to do cp -r?)
Omitting directory "file:///etc". (Did you mean to do cp -r?)
Omitting directory "file:///home". (Did you mean to do cp -r?)
Copying file:///initrd.img [Content-Type=application/octet-stream]...
Copying file:///initrd.img.old [Content-Type=application/octet-stream]...       
Omitting directory "file:///lib". (Did you mean to do cp -r?)                   
Omitting directory "file:///lib64". (Did you mean to do cp -r?)
Omitting directory "file:///lost+found". (Did you mean to do cp -r?)
Omitting directory "file:///media". (Did you mean to do cp -r?)
Omitting directory "file:///mnt". (Did you mean to do cp -r?)
Omitting directory "file:///opt". (Did you mean to do cp -r?)
Omitting directory "file:///proc". (Did you mean to do cp -r?)
Omitting directory "file:///root". (Did you mean to do cp -r?)
Omitting directory "file:///run". (Did you mean to do cp -r?)
Omitting directory "file:///sbin". (Did you mean to do cp -r?)
Omitting directory "file:///srv". (Did you mean to do cp -r?)
Omitting directory "file:///sys". (Did you mean to do cp -r?)
Omitting directory "file:///tmp". (Did you mean to do cp -r?)
Omitting directory "file:///usr". (Did you mean to do cp -r?)
Omitting directory "file:///var". (Did you mean to do cp -r?)
Copying file:///vmlinuz [Content-Type=application/octet-stream]...
Copying file:///vmlinuz.old [Content-Type=application/octet-stream]...          
| [4 files][ 42.9 MiB/ 42.9 MiB]                                                
Operation completed over 4 objects/42.9 MiB.                                     

Install the recserve endpoint

1. Prepare the deploy template for the Cloud Endpoint API:

In [12]:
%%bash
cd scripts
cat prepare_deploy_api.sh
#!/bin/bash
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -euo pipefail

source util.sh

main() {
  # Get our working project, or exit if it's not set.
  local project_id=$(get_project_id)
  if [[ -z "$project_id" ]]; then
    exit 1
  fi
  local temp_file=$(mktemp)
  export TEMP_FILE="${temp_file}.yaml"
  mv "$temp_file" "$TEMP_FILE"

  # Because the included API is a template, we have to do some string
  # substitution before we can deploy it. Sed does this nicely.
  < "$API_FILE" sed -E "s/YOUR-PROJECT-ID/${project_id}/g" > "$TEMP_FILE"
  echo "Preparing config for deploying service in $API_FILE..."
  echo "To deploy:  gcloud endpoints services deploy $TEMP_FILE"
}

# Defaults.
API_FILE="../app/openapi.yaml"

if [[ "$#" == 0 ]]; then
  : # Use defaults.
elif [[ "$#" == 1 ]]; then
  API_FILE="$1"
else
  echo "Wrong number of arguments specified."
  echo "Usage: deploy_api.sh [api-file]"
  exit 1
fi

main "$@"
In [13]:
%%bash
printf "\nCopy and run the deploy script generated below:\n"
cd scripts
./prepare_deploy_api.sh                         # Prepare config file for the API.
Copy and run the deploy script generated below:
Preparing config for deploying service in ../app/openapi.yaml...
To deploy:  gcloud endpoints services deploy /tmp/tmp.2QluUYw9AH.yaml

This will output somthing like:

To deploy: gcloud endpoints services deploy /var/folders/1m/r3slmhp92074pzdhhfjvnw0m00dhhl/T/tmp.n6QVl5hO.yaml

2. Run the endpoints deploy command output above:

Be sure to __replace the below [FILE_NAME]__ with the results from above before running.

In [14]:
%%bash
#gcloud endpoints services deploy [REPLACE_WITH_TEMP_FILE_NAME.yaml]
gcloud endpoints services deploy /tmp/tmp.2QluUYw9AH.yaml
Waiting for async operation operations/serviceConfigs.qwiklabs-gcp-01-3486430ce6e3.appspot.com:f8eaa1b9-24a1-4cc1-9146-bec2b0c47e23 to complete...
Operation finished successfully. The following command can describe the Operation details:
 gcloud endpoints operations describe operations/serviceConfigs.qwiklabs-gcp-01-3486430ce6e3.appspot.com:f8eaa1b9-24a1-4cc1-9146-bec2b0c47e23

Waiting for async operation operations/rollouts.qwiklabs-gcp-01-3486430ce6e3.appspot.com:1c2d855a-bc39-4a31-8f80-f9c5545db778 to complete...
Operation finished successfully. The following command can describe the Operation details:
 gcloud endpoints operations describe operations/rollouts.qwiklabs-gcp-01-3486430ce6e3.appspot.com:1c2d855a-bc39-4a31-8f80-f9c5545db778

Service Configuration [2020-05-05r1] uploaded for service [qwiklabs-gcp-01-3486430ce6e3.appspot.com]

To manage your API, go to: https://console.cloud.google.com/endpoints/api/qwiklabs-gcp-01-3486430ce6e3.appspot.com/overview?project=qwiklabs-gcp-01-3486430ce6e3

3. Prepare the deploy template for the App Engine App:

In [15]:
%%bash
# view the app deployment script
cat scripts/prepare_deploy_app.sh
#!/bin/bash
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -euo pipefail

source util.sh

main() {
  # Get our working project, or exit if it's not set.
  local project_id="$(get_project_id)"
  if [[ -z "$project_id" ]]; then
    exit 1
  fi
  # Try to create an App Engine project in our selected region.
  # If it already exists, return a success ("|| true").
  echo "gcloud app create --region=$REGION"
  gcloud app create --region="$REGION" || true

  # Prepare the necessary variables for substitution in our app configuration
  # template, and create a temporary file to hold the templatized version.
  local service_name="${project_id}.appspot.com"
  local config_id=$(get_latest_config_id "$service_name")
  export TEMP_FILE="${APP}_deploy.yaml"
  < "$APP" \
    sed -E "s/SERVICE_NAME/${service_name}/g" \
    | sed -E "s/SERVICE_CONFIG_ID/${config_id}/g" \
    > "$TEMP_FILE"

  echo "To deploy:  gcloud -q app deploy $TEMP_FILE"
}

# Defaults.
APP="../app/app_template.yaml"
REGION="us-east1"
SERVICE_NAME="default"

if [[ "$#" == 0 ]]; then
  : # Use defaults.
elif [[ "$#" == 1 ]]; then
  APP="$1"
elif [[ "$#" == 2 ]]; then
  APP="$1"
  REGION="$2"
else
  echo "Wrong number of arguments specified."
  echo "Usage: deploy_app.sh [app-template] [region]"
  exit 1
fi

main "$@"
In [16]:
%%bash
# prepare to deploy 
cd scripts

./prepare_deploy_app.sh
gcloud app create --region=us-east1
To deploy:  gcloud -q app deploy ../app/app_template.yaml_deploy.yaml
You are creating an app for project [qwiklabs-gcp-01-3486430ce6e3].
WARNING: Creating an App Engine application for a project is irreversible and the region
cannot be changed. More information about regions is at
<https://cloud.google.com/appengine/docs/locations>.

ERROR: (gcloud.app.create) PERMISSION_DENIED: The caller does not have permission

You can ignore the script output "ERROR: (gcloud.app.create) The project [...] already contains an App Engine application. You can deploy your application using gcloud app deploy." This is expected.

The script will output something like:

To deploy: gcloud -q app deploy app/app_template.yaml_deploy.yaml

4. Run the command above:

In [17]:
%%bash
gcloud -q app deploy app/app_template.yaml_deploy.yaml
Services to deploy:

descriptor:      [/home/jupyter/training-data-analyst/courses/machine_learning/deepdive/10_recommend/endtoend/app/app_template.yaml_deploy.yaml]
source:          [/home/jupyter/training-data-analyst/courses/machine_learning/deepdive/10_recommend/endtoend/app]
target project:  [qwiklabs-gcp-01-3486430ce6e3]
target service:  [default]
target version:  [20200505t172051]
target url:      [https://qwiklabs-gcp-01-3486430ce6e3.uc.r.appspot.com]


Beginning deployment of service [default]...
Building and pushing image for service [default]
Started cloud build [a8180df3-6aec-4acd-a276-2caafd1c8899].
To see logs in the Cloud Console: https://console.cloud.google.com/cloud-build/builds/a8180df3-6aec-4acd-a276-2caafd1c8899?project=889358714747
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "a8180df3-6aec-4acd-a276-2caafd1c8899"

FETCHSOURCE
Fetching storage object: gs://staging.qwiklabs-gcp-01-3486430ce6e3.appspot.com/us.gcr.io/qwiklabs-gcp-01-3486430ce6e3/appengine/default.20200505t172051:latest#1588699252387403
Copying gs://staging.qwiklabs-gcp-01-3486430ce6e3.appspot.com/us.gcr.io/qwiklabs-gcp-01-3486430ce6e3/appengine/default.20200505t172051:latest#1588699252387403...
/ [1 files][  3.4 KiB/  3.4 KiB]                                                
Operation completed over 1 objects/3.4 KiB.                                      
BUILD
Starting Step #0
Step #0: Pulling image: gcr.io/gcp-runtimes/python/gen-dockerfile@sha256:76e4c7c235d5acb3ea66227a4ecd6f5b1bff2c53c2d832765af65cf612368db1
Step #0: sha256:76e4c7c235d5acb3ea66227a4ecd6f5b1bff2c53c2d832765af65cf612368db1: Pulling from gcp-runtimes/python/gen-dockerfile
Step #0: d04e4a159eb2: Pulling fs layer
Step #0: 05c79f8d94d2: Pulling fs layer
Step #0: 3c2cba919283: Pulling fs layer
Step #0: 165a366b7bd9: Pulling fs layer
Step #0: 8b12381fd082: Pulling fs layer
Step #0: 5f2d9b8a6c61: Pulling fs layer
Step #0: a1705fbc031f: Pulling fs layer
Step #0: 290eedeabcf5: Pulling fs layer
Step #0: 8f4534289921: Pulling fs layer
Step #0: e8e8949443e6: Pulling fs layer
Step #0: 3048093b82ce: Pulling fs layer
Step #0: c2c220036633: Pulling fs layer
Step #0: 8a87305d5c5c: Pulling fs layer
Step #0: 17c1dd8ad78b: Pulling fs layer
Step #0: 0cdb23aa3f21: Pulling fs layer
Step #0: 33a1afb0142c: Pulling fs layer
Step #0: 165a366b7bd9: Waiting
Step #0: 8b12381fd082: Waiting
Step #0: 5f2d9b8a6c61: Waiting
Step #0: a1705fbc031f: Waiting
Step #0: 290eedeabcf5: Waiting
Step #0: 8f4534289921: Waiting
Step #0: e8e8949443e6: Waiting
Step #0: 3048093b82ce: Waiting
Step #0: c2c220036633: Waiting
Step #0: 8a87305d5c5c: Waiting
Step #0: 17c1dd8ad78b: Waiting
Step #0: 0cdb23aa3f21: Waiting
Step #0: 33a1afb0142c: Waiting
Step #0: 3c2cba919283: Verifying Checksum
Step #0: 3c2cba919283: Download complete
Step #0: 165a366b7bd9: Verifying Checksum
Step #0: 165a366b7bd9: Download complete
Step #0: 8b12381fd082: Verifying Checksum
Step #0: 8b12381fd082: Download complete
Step #0: 05c79f8d94d2: Verifying Checksum
Step #0: 05c79f8d94d2: Download complete
Step #0: d04e4a159eb2: Verifying Checksum
Step #0: d04e4a159eb2: Download complete
Step #0: 290eedeabcf5: Verifying Checksum
Step #0: 290eedeabcf5: Download complete
Step #0: 8f4534289921: Verifying Checksum
Step #0: 8f4534289921: Download complete
Step #0: a1705fbc031f: Verifying Checksum
Step #0: a1705fbc031f: Download complete
Step #0: e8e8949443e6: Verifying Checksum
Step #0: e8e8949443e6: Download complete
Step #0: 3048093b82ce: Verifying Checksum
Step #0: 3048093b82ce: Download complete
Step #0: 5f2d9b8a6c61: Verifying Checksum
Step #0: 5f2d9b8a6c61: Download complete
Step #0: c2c220036633: Verifying Checksum
Step #0: c2c220036633: Download complete
Step #0: 17c1dd8ad78b: Verifying Checksum
Step #0: 17c1dd8ad78b: Download complete
Step #0: 8a87305d5c5c: Verifying Checksum
Step #0: 8a87305d5c5c: Download complete
Step #0: 0cdb23aa3f21: Verifying Checksum
Step #0: 0cdb23aa3f21: Download complete
Step #0: 33a1afb0142c: Verifying Checksum
Step #0: 33a1afb0142c: Download complete
Step #0: d04e4a159eb2: Pull complete
Step #0: 05c79f8d94d2: Pull complete
Step #0: 3c2cba919283: Pull complete
Step #0: 165a366b7bd9: Pull complete
Step #0: 8b12381fd082: Pull complete
Step #0: 5f2d9b8a6c61: Pull complete
Step #0: a1705fbc031f: Pull complete
Step #0: 290eedeabcf5: Pull complete
Step #0: 8f4534289921: Pull complete
Step #0: e8e8949443e6: Pull complete
Step #0: 3048093b82ce: Pull complete
Step #0: c2c220036633: Pull complete
Step #0: 8a87305d5c5c: Pull complete
Step #0: 17c1dd8ad78b: Pull complete
Step #0: 0cdb23aa3f21: Pull complete
Step #0: 33a1afb0142c: Pull complete
Step #0: Digest: sha256:76e4c7c235d5acb3ea66227a4ecd6f5b1bff2c53c2d832765af65cf612368db1
Step #0: Status: Downloaded newer image for gcr.io/gcp-runtimes/python/gen-dockerfile@sha256:76e4c7c235d5acb3ea66227a4ecd6f5b1bff2c53c2d832765af65cf612368db1
Step #0: gcr.io/gcp-runtimes/python/gen-dockerfile@sha256:76e4c7c235d5acb3ea66227a4ecd6f5b1bff2c53c2d832765af65cf612368db1
Finished Step #0
Starting Step #1
Step #1: Pulling image: gcr.io/cloud-builders/docker@sha256:461bb53c226048a2f5eabebe1d8b4367a02d3a484a8cc7455a21377702bbf4f6
Step #1: sha256:461bb53c226048a2f5eabebe1d8b4367a02d3a484a8cc7455a21377702bbf4f6: Pulling from cloud-builders/docker
Step #1: 75f546e73d8b: Already exists
Step #1: 0f3bb76fc390: Already exists
Step #1: 3c2cba919283: Already exists
Step #1: 8944ea7fb66c: Pulling fs layer
Step #1: 8944ea7fb66c: Verifying Checksum
Step #1: 8944ea7fb66c: Download complete
Step #1: 8944ea7fb66c: Pull complete
Step #1: Digest: sha256:461bb53c226048a2f5eabebe1d8b4367a02d3a484a8cc7455a21377702bbf4f6
Step #1: Status: Downloaded newer image for gcr.io/cloud-builders/docker@sha256:461bb53c226048a2f5eabebe1d8b4367a02d3a484a8cc7455a21377702bbf4f6
Step #1: gcr.io/cloud-builders/docker@sha256:461bb53c226048a2f5eabebe1d8b4367a02d3a484a8cc7455a21377702bbf4f6
Step #1: Sending build context to Docker daemon  20.48kB
Step #1: Step 1/9 : FROM gcr.io/google-appengine/python@sha256:55096029b76bcc83e6ddff5e2dc4198df657a912982920f12b9977863eae7173
Step #1: sha256:55096029b76bcc83e6ddff5e2dc4198df657a912982920f12b9977863eae7173: Pulling from google-appengine/python
Step #1: 40a5c2875f88: Pulling fs layer
Step #1: 72be9390242a: Pulling fs layer
Step #1: 3c2cba919283: Pulling fs layer
Step #1: 91d77be5c6ea: Pulling fs layer
Step #1: 29a75d8abe7e: Pulling fs layer
Step #1: 177dc5a458f7: Pulling fs layer
Step #1: 2c2c0146fdfe: Pulling fs layer
Step #1: f148de29d703: Pulling fs layer
Step #1: 1908d2d66a44: Pulling fs layer
Step #1: 87b7849d11e1: Pulling fs layer
Step #1: a63988796a14: Pulling fs layer
Step #1: 91d77be5c6ea: Waiting
Step #1: 29a75d8abe7e: Waiting
Step #1: 177dc5a458f7: Waiting
Step #1: 2c2c0146fdfe: Waiting
Step #1: f148de29d703: Waiting
Step #1: 1908d2d66a44: Waiting
Step #1: 87b7849d11e1: Waiting
Step #1: a63988796a14: Waiting
Step #1: 3c2cba919283: Verifying Checksum
Step #1: 3c2cba919283: Download complete
Step #1: 72be9390242a: Verifying Checksum
Step #1: 72be9390242a: Download complete
Step #1: 91d77be5c6ea: Verifying Checksum
Step #1: 91d77be5c6ea: Download complete
Step #1: 40a5c2875f88: Verifying Checksum
Step #1: 40a5c2875f88: Download complete
Step #1: 29a75d8abe7e: Download complete
Step #1: f148de29d703: Verifying Checksum
Step #1: f148de29d703: Download complete
Step #1: 1908d2d66a44: Verifying Checksum
Step #1: 1908d2d66a44: Download complete
Step #1: 87b7849d11e1: Verifying Checksum
Step #1: 87b7849d11e1: Download complete
Step #1: 2c2c0146fdfe: Verifying Checksum
Step #1: 2c2c0146fdfe: Download complete
Step #1: a63988796a14: Verifying Checksum
Step #1: a63988796a14: Download complete
Step #1: 177dc5a458f7: Verifying Checksum
Step #1: 177dc5a458f7: Download complete
Step #1: 40a5c2875f88: Pull complete
Step #1: 72be9390242a: Pull complete
Step #1: 3c2cba919283: Pull complete
Step #1: 91d77be5c6ea: Pull complete
Step #1: 29a75d8abe7e: Pull complete
Step #1: 177dc5a458f7: Pull complete
Step #1: 2c2c0146fdfe: Pull complete
Step #1: f148de29d703: Pull complete
Step #1: 1908d2d66a44: Pull complete
Step #1: 87b7849d11e1: Pull complete
Step #1: a63988796a14: Pull complete
Step #1: Digest: sha256:55096029b76bcc83e6ddff5e2dc4198df657a912982920f12b9977863eae7173
Step #1: Status: Downloaded newer image for gcr.io/google-appengine/python@sha256:55096029b76bcc83e6ddff5e2dc4198df657a912982920f12b9977863eae7173
Step #1:  ---> f186f86e42ea
Step #1: Step 2/9 : LABEL python_version=python3.6
Step #1:  ---> Running in a929fa16f02b
Step #1: Removing intermediate container a929fa16f02b
Step #1:  ---> a340706da94f
Step #1: Step 3/9 : RUN virtualenv --no-download /env -p python3.6
Step #1:  ---> Running in d4a17810114d
Step #1: Running virtualenv with interpreter /opt/python3.6/bin/python3.6
Step #1: Using base prefix '/opt/python3.6'
Step #1: New python executable in /env/bin/python3.6
Step #1: Also creating executable in /env/bin/python
Step #1: Installing setuptools, pip, wheel...done.
Step #1: Removing intermediate container d4a17810114d
Step #1:  ---> 052f8df7dc38
Step #1: Step 4/9 : ENV VIRTUAL_ENV /env
Step #1:  ---> Running in a33bf81fc3ea
Step #1: Removing intermediate container a33bf81fc3ea
Step #1:  ---> ba82531c8046
Step #1: Step 5/9 : ENV PATH /env/bin:$PATH
Step #1:  ---> Running in 7cb5a97f19cc
Step #1: Removing intermediate container 7cb5a97f19cc
Step #1:  ---> 8a0125e1f737
Step #1: Step 6/9 : ADD requirements.txt /app/
Step #1:  ---> eceee3a05d2e
Step #1: Step 7/9 : RUN pip install -r requirements.txt
Step #1:  ---> Running in 0edd8ce8a950
Step #1: Collecting flask (from -r requirements.txt (line 1))
Step #1:   Downloading https://files.pythonhosted.org/packages/f2/28/2a03252dfb9ebf377f40fba6a7841b47083260bf8bd8e737b0c6952df83f/Flask-1.1.2-py2.py3-none-any.whl (94kB)
Step #1: Collecting gunicorn (from -r requirements.txt (line 2))
Step #1:   Downloading https://files.pythonhosted.org/packages/69/ca/926f7cd3a2014b16870086b2d0fdc84a9e49473c68a8dff8b57f7c156f43/gunicorn-20.0.4-py2.py3-none-any.whl (77kB)
Step #1: Collecting pandas (from -r requirements.txt (line 3))
Step #1:   Downloading https://files.pythonhosted.org/packages/bb/71/8f53bdbcbc67c912b888b40def255767e475402e9df64050019149b1a943/pandas-1.0.3-cp36-cp36m-manylinux1_x86_64.whl (10.0MB)
Step #1: Collecting numpy (from -r requirements.txt (line 4))
Step #1:   Downloading https://files.pythonhosted.org/packages/03/27/e35e7c6e6a52fab9fcc64fc2b20c6b516eba930bb02b10ace3b38200d3ab/numpy-1.18.4-cp36-cp36m-manylinux1_x86_64.whl (20.2MB)
Step #1: Collecting google-cloud-storage==1.6.0 (from -r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/c8/13/131c4d6b72411bcd56ab82a70a256d961e8d87e7b6356c12791c0003765d/google_cloud_storage-1.6.0-py2.py3-none-any.whl (51kB)
Step #1: Collecting Jinja2>=2.10.1 (from flask->-r requirements.txt (line 1))
Step #1:   Downloading https://files.pythonhosted.org/packages/30/9e/f663a2aa66a09d838042ae1a2c5659828bb9b41ea3a6efa20a20fd92b121/Jinja2-2.11.2-py2.py3-none-any.whl (125kB)
Step #1: Collecting Werkzeug>=0.15 (from flask->-r requirements.txt (line 1))
Step #1:   Downloading https://files.pythonhosted.org/packages/cc/94/5f7079a0e00bd6863ef8f1da638721e9da21e5bacee597595b318f71d62e/Werkzeug-1.0.1-py2.py3-none-any.whl (298kB)
Step #1: Collecting click>=5.1 (from flask->-r requirements.txt (line 1))
Step #1:   Downloading https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl (82kB)
Step #1: Collecting itsdangerous>=0.24 (from flask->-r requirements.txt (line 1))
Step #1:   Downloading https://files.pythonhosted.org/packages/76/ae/44b03b253d6fade317f32c24d100b3b35c2239807046a4c953c7b89fa49e/itsdangerous-1.1.0-py2.py3-none-any.whl
Step #1: Requirement already satisfied: setuptools>=3.0 in /env/lib/python3.6/site-packages (from gunicorn->-r requirements.txt (line 2)) (39.1.0)
Step #1: Collecting python-dateutil>=2.6.1 (from pandas->-r requirements.txt (line 3))
Step #1:   Downloading https://files.pythonhosted.org/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl (227kB)
Step #1: Collecting pytz>=2017.2 (from pandas->-r requirements.txt (line 3))
Step #1:   Downloading https://files.pythonhosted.org/packages/4f/a4/879454d49688e2fad93e59d7d4efda580b783c745fd2ec2a3adf87b0808d/pytz-2020.1-py2.py3-none-any.whl (510kB)
Step #1: Collecting google-resumable-media>=0.3.1 (from google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/35/9e/f73325d0466ce5bdc36333f1aeb2892ead7b76e79bdb5c8b0493961fa098/google_resumable_media-0.5.0-py2.py3-none-any.whl
Step #1: Collecting google-cloud-core<0.29dev,>=0.28.0 (from google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/0f/41/ae2418b4003a14cf21c1c46d61d1b044bf02cf0f8f91598af572b9216515/google_cloud_core-0.28.1-py2.py3-none-any.whl
Step #1: Collecting google-auth>=1.0.0 (from google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/d2/f8/1623d69e5de22e499b68a0cb5e5d02cd6a2843e55acc19f314f48fe04299/google_auth-1.14.1-py2.py3-none-any.whl (89kB)
Step #1: Collecting requests>=2.18.0 (from google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/1a/70/1935c770cb3be6e3a8b78ced23d7e0f3b187f5cbfab4749523ed65d7c9b1/requests-2.23.0-py2.py3-none-any.whl (58kB)
Step #1: Collecting google-api-core<0.2.0dev,>=0.1.1 (from google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/10/65/6237293db4fbf6f0bcf7c2b67c63e4dc4837c631f194064ae84957cd0313/google_api_core-0.1.4-py2.py3-none-any.whl (50kB)
Step #1: Collecting MarkupSafe>=0.23 (from Jinja2>=2.10.1->flask->-r requirements.txt (line 1))
Step #1:   Downloading https://files.pythonhosted.org/packages/b2/5f/23e0023be6bb885d00ffbefad2942bc51a620328ee910f64abe5a8d18dd1/MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
Step #1: Collecting six>=1.5 (from python-dateutil>=2.6.1->pandas->-r requirements.txt (line 3))
Step #1:   Downloading https://files.pythonhosted.org/packages/65/eb/1f97cb97bfc2390a276969c6fae16075da282f5058082d4cb10c6c5c1dba/six-1.14.0-py2.py3-none-any.whl
Step #1: Collecting rsa<4.1,>=3.1.4 (from google-auth>=1.0.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/02/e5/38518af393f7c214357079ce67a317307936896e961e35450b70fad2a9cf/rsa-4.0-py2.py3-none-any.whl
Step #1: Collecting pyasn1-modules>=0.2.1 (from google-auth>=1.0.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/95/de/214830a981892a3e286c3794f41ae67a4495df1108c3da8a9f62159b9a9d/pyasn1_modules-0.2.8-py2.py3-none-any.whl (155kB)
Step #1: Collecting cachetools<5.0,>=2.0.0 (from google-auth>=1.0.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/b3/59/524ffb454d05001e2be74c14745b485681c6ed5f2e625f71d135704c0909/cachetools-4.1.0-py3-none-any.whl
Step #1: Collecting idna<3,>=2.5 (from requests>=2.18.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/89/e3/afebe61c546d18fb1709a61bee788254b40e736cff7271c7de5de2dc4128/idna-2.9-py2.py3-none-any.whl (58kB)
Step #1: Collecting certifi>=2017.4.17 (from requests>=2.18.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/57/2b/26e37a4b034800c960a00c4e1b3d9ca5d7014e983e6e729e33ea2f36426c/certifi-2020.4.5.1-py2.py3-none-any.whl (157kB)
Step #1: Collecting chardet<4,>=3.0.2 (from requests>=2.18.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
Step #1: Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests>=2.18.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/e1/e5/df302e8017440f111c11cc41a6b432838672f5a70aa29227bf58149dc72f/urllib3-1.25.9-py2.py3-none-any.whl (126kB)
Step #1: Collecting protobuf>=3.0.0 (from google-api-core<0.2.0dev,>=0.1.1->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/57/02/5432412c162989260fab61fa65e0a490c1872739eb91a659896e4d554b26/protobuf-3.11.3-cp36-cp36m-manylinux1_x86_64.whl (1.3MB)
Step #1: Collecting googleapis-common-protos<2.0dev,>=1.5.3 (from google-api-core<0.2.0dev,>=0.1.1->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/05/46/168fd780f594a4d61122f7f3dc0561686084319ad73b4febbf02ae8b32cf/googleapis-common-protos-1.51.0.tar.gz
Step #1: Collecting pyasn1>=0.1.3 (from rsa<4.1,>=3.1.4->google-auth>=1.0.0->google-cloud-storage==1.6.0->-r requirements.txt (line 5))
Step #1:   Downloading https://files.pythonhosted.org/packages/62/1e/a94a8d635fa3ce4cfc7f506003548d0a2447ae76fd5ca53932970fe3053f/pyasn1-0.4.8-py2.py3-none-any.whl (77kB)
Step #1: Building wheels for collected packages: googleapis-common-protos
Step #1:   Running setup.py bdist_wheel for googleapis-common-protos: started
Step #1:   Running setup.py bdist_wheel for googleapis-common-protos: finished with status 'done'
Step #1:   Stored in directory: /root/.cache/pip/wheels/2c/f9/7f/6eb87e636072bf467e25348bbeb96849333e6a080dca78f706
Step #1: Successfully built googleapis-common-protos
Step #1: google-auth 1.14.1 has requirement setuptools>=40.3.0, but you'll have setuptools 39.1.0 which is incompatible.
Step #1: Installing collected packages: MarkupSafe, Jinja2, Werkzeug, click, itsdangerous, flask, gunicorn, six, python-dateutil, numpy, pytz, pandas, google-resumable-media, protobuf, idna, certifi, chardet, urllib3, requests, googleapis-common-protos, pyasn1, rsa, pyasn1-modules, cachetools, google-auth, google-api-core, google-cloud-core, google-cloud-storage
Step #1: Successfully installed Jinja2-2.11.2 MarkupSafe-1.1.1 Werkzeug-1.0.1 cachetools-4.1.0 certifi-2020.4.5.1 chardet-3.0.4 click-7.1.2 flask-1.1.2 google-api-core-0.1.4 google-auth-1.14.1 google-cloud-core-0.28.1 google-cloud-storage-1.6.0 google-resumable-media-0.5.0 googleapis-common-protos-1.51.0 gunicorn-20.0.4 idna-2.9 itsdangerous-1.1.0 numpy-1.18.4 pandas-1.0.3 protobuf-3.11.3 pyasn1-0.4.8 pyasn1-modules-0.2.8 python-dateutil-2.8.1 pytz-2020.1 requests-2.23.0 rsa-4.0 six-1.14.0 urllib3-1.25.9
Step #1: You are using pip version 10.0.1, however version 20.1 is available.
Step #1: You should consider upgrading via the 'pip install --upgrade pip' command.
Step #1: Removing intermediate container 0edd8ce8a950
Step #1:  ---> ace7c5e7db32
Step #1: Step 8/9 : ADD . /app/
Step #1:  ---> 1a174250c28a
Step #1: Step 9/9 : CMD exec gunicorn -b :$PORT main:app
Step #1:  ---> Running in 1e0cb098375b
Step #1: Removing intermediate container 1e0cb098375b
Step #1:  ---> 4b7301ba7cc7
Step #1: Successfully built 4b7301ba7cc7
Step #1: Successfully tagged us.gcr.io/qwiklabs-gcp-01-3486430ce6e3/appengine/default.20200505t172051:latest
Finished Step #1
PUSH
Pushing us.gcr.io/qwiklabs-gcp-01-3486430ce6e3/appengine/default.20200505t172051:latest
The push refers to repository [us.gcr.io/qwiklabs-gcp-01-3486430ce6e3/appengine/default.20200505t172051]
43019477e31c: Preparing
a4b874eb63b3: Preparing
b034c2c7e982: Preparing
77cdda64b647: Preparing
01e356cb36de: Preparing
56629965b158: Preparing
ad78b4a1f5bb: Preparing
a61a95908009: Preparing
09528eeaacbd: Preparing
6f65ef95da50: Preparing
494abc772a0f: Preparing
7c2e502fada8: Preparing
84ff92691f90: Preparing
15784232b592: Preparing
f66b9865f45c: Preparing
56629965b158: Waiting
ad78b4a1f5bb: Waiting
a61a95908009: Waiting
09528eeaacbd: Waiting
6f65ef95da50: Waiting
494abc772a0f: Waiting
7c2e502fada8: Waiting
84ff92691f90: Waiting
15784232b592: Waiting
f66b9865f45c: Waiting
01e356cb36de: Layer already exists
56629965b158: Layer already exists
ad78b4a1f5bb: Layer already exists
a61a95908009: Layer already exists
09528eeaacbd: Layer already exists
43019477e31c: Pushed
6f65ef95da50: Layer already exists
b034c2c7e982: Pushed
494abc772a0f: Layer already exists
84ff92691f90: Layer already exists
7c2e502fada8: Layer already exists
15784232b592: Layer already exists
f66b9865f45c: Layer already exists
77cdda64b647: Pushed
a4b874eb63b3: Pushed
latest: digest: sha256:64ae763cd47fa807c18d2c00e38182e268e1d5862c5ebc280722aeac2469caf9 size: 3459
DONE
--------------------------------------------------------------------------------

ERROR: (gcloud.app.deploy) INVALID_ARGUMENT: Legacy health checks are no longer supported for the App Engine Flexible environment. Please remove the 'health_check' section from your app.yaml and configure updated health checks. For instructions on migrating to split health checks see https://cloud.google.com/appengine/docs/flexible/java/migrating-to-split-health-checks

This will take 7 - 10 minutes to deploy the app. While you wait, consider starting on Part Two below and completing the Cloud Composer DAG file.

Query the API for Article Recommendations

Lastly, you are able to test the recommendation model API by submitting a query request. Note the example userId passed and numRecs desired as the URL parameters for the model input.

In [18]:
%%bash
cd scripts
./query_api.sh          # Query the API.
#./generate_traffic.sh   # Send traffic to the API.
curl "https://qwiklabs-gcp-01-3486430ce6e3.appspot.com/recommendation?userId=5448543647176335931&numRecs=5"
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Page not found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL was not found on this server.  <ins>That’s all we know.</ins>

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1551  100  1551    0     0  91235      0 --:--:-- --:--:-- --:--:-- 91235

If the call is successful, you will see the article IDs recommended for that specific user by the WALS ML model
(Example: curl "https://qwiklabs-gcp-12345.appspot.com/recommendation?userId=5448543647176335931&numRecs=5" {"articles":["299824032","1701682","299935287","299959410","298157062"]} )

Part One is done! You have successfully created the back-end architecture for serving your ML recommendation system. But we're not done yet, we still need to automatically retrain and redeploy our model once new data comes in. For that we will use Cloud Composer and Apache Airflow.


Part Two: Setup a scheduled workflow with Cloud Composer

In this section you will complete a partially written training.py DAG file and copy it to the DAGS folder in your Composer instance.

Copy your Airflow bucket name

  1. Navigate to your Cloud Composer instance

  2. Select DAGs Folder

  3. You will be taken to the Google Cloud Storage bucket that Cloud Composer has created automatically for your Airflow instance

  4. Copy the bucket name into the variable below (example: us-central1-composer-08f6edeb-bucket)
In [19]:
AIRFLOW_BUCKET = 'us-central1-mlcomposer-6e2cceb4-bucket' # REPLACE WITH AIRFLOW BUCKET NAME
os.environ['AIRFLOW_BUCKET'] = AIRFLOW_BUCKET

Complete the training.py DAG file

Apache Airflow orchestrates tasks out to other services through a DAG (Directed Acyclic Graph) file which specifies what services to call, what to do, and when to run these tasks. DAG files are written in python and are loaded automatically into Airflow once present in the Airflow/dags/ folder in your Cloud Composer bucket.

Your task is to complete the partially written DAG file below which will enable the automatic retraining and redeployment of our WALS recommendation model.

Complete the #TODOs in the Airflow DAG file below and execute the code block to save the file

In [20]:
%%writefile airflow/dags/training.py

# Copyright 2018 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""DAG definition for recserv model training."""

import airflow
from airflow import DAG

# Reference for all available airflow operators: 
# https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/operators
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from airflow.contrib.operators.bigquery_to_gcs import BigQueryToCloudStorageOperator
from airflow.hooks.base_hook import BaseHook
# from airflow.contrib.operators.mlengine_operator import MLEngineTrainingOperator
# above mlengine_operator currently doesnt support custom MasterType so we import our own plugins:

# custom plugins
from airflow.operators.app_engine_admin_plugin import AppEngineVersionOperator
from airflow.operators.ml_engine_plugin import MLEngineTrainingOperator


import datetime

def _get_project_id():
  """Get project ID from default GCP connection."""

  extras = BaseHook.get_connection('google_cloud_default').extra_dejson
  key = 'extra__google_cloud_platform__project'
  if key in extras:
    project_id = extras[key]
  else:
    raise ('Must configure project_id in google_cloud_default '
           'connection from Airflow Console')
  return project_id

PROJECT_ID = _get_project_id()

# Data set constants, used in BigQuery tasks.  You can change these
# to conform to your data.

# TODO: Specify your BigQuery dataset name and table name
DATASET = 'GA360_test'
TABLE_NAME = 'ga_sessions_sample'
ARTICLE_CUSTOM_DIMENSION = '10'

# TODO: Confirm bucket name and region
# GCS bucket names and region, can also be changed.
BUCKET = 'gs://recserve_' + PROJECT_ID
REGION = 'us-east1'

# The code package name comes from the model code in the wals_ml_engine
# directory of the solution code base.
PACKAGE_URI = BUCKET + '/code/wals_ml_engine-0.1.tar.gz'
JOB_DIR = BUCKET + '/jobs'

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email': ['airflow@example.com'],
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 5,
    'retry_delay': datetime.timedelta(minutes=5)
}

# Default schedule interval using cronjob syntax - can be customized here
# or in the Airflow console.

# TODO: Specify a schedule interval in CRON syntax to run once a day at 2100 hours (9pm)
# Reference: https://airflow.apache.org/scheduler.html
schedule_interval = '00 21 * * *'

# TODO: Title your DAG to be recommendations_training_v1
dag = DAG('recommendations_training_v1', 
          default_args=default_args,
          schedule_interval=schedule_interval)

dag.doc_md = __doc__


#
#
# Task Definition
#
#

# BigQuery training data query

bql='''
#legacySql
SELECT
 fullVisitorId as clientId,
 ArticleID as contentId,
 (nextTime - hits.time) as timeOnPage,
FROM(
  SELECT
    fullVisitorId,
    hits.time,
    MAX(IF(hits.customDimensions.index={0},
           hits.customDimensions.value,NULL)) WITHIN hits AS ArticleID,
    LEAD(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitNumber
                             ORDER BY hits.time ASC) as nextTime
  FROM [{1}.{2}.{3}]
  WHERE hits.type = "PAGE"
) HAVING timeOnPage is not null and contentId is not null;
'''

bql = bql.format(ARTICLE_CUSTOM_DIMENSION, PROJECT_ID, DATASET, TABLE_NAME)

# TODO: Complete the BigQueryOperator task to truncate the table if it already exists before writing
# Reference: https://airflow.apache.org/integration.html#bigqueryoperator
t1 = BigQueryOperator(
    task_id='bq_rec_training_data',
    bql=bql,
    destination_dataset_table='%s.recommendation_events' % DATASET,
    write_disposition='WRITE_TRUNCATE', # specify to truncate on writes
    dag=dag)

# BigQuery training data export to GCS

# TODO: Fill in the missing operator name for task #2 which
# takes a BigQuery dataset and table as input and exports it to GCS as a CSV
training_file = BUCKET + '/data/recommendation_events.csv'
t2 = BigQueryToCloudStorageOperator(
    task_id='bq_export_op',
    source_project_dataset_table='%s.recommendation_events' % DATASET,
    destination_cloud_storage_uris=[training_file],
    export_format='CSV',
    dag=dag
)


# ML Engine training job

job_id = 'recserve_{0}'.format(datetime.datetime.now().strftime('%Y%m%d%H%M'))
job_dir = BUCKET + '/jobs/' + job_id
output_dir = BUCKET
training_args = ['--job-dir', job_dir,
                 '--train-files', training_file,
                 '--output-dir', output_dir,
                 '--data-type', 'web_views',
                 '--use-optimized']

# TODO: Fill in the missing operator name for task #3 which will
# start a new training job to Cloud ML Engine
# Reference: https://airflow.apache.org/integration.html#cloud-ml-engine
# https://cloud.google.com/ml-engine/docs/tensorflow/machine-types
t3 = MLEngineTrainingOperator(
    task_id='ml_engine_training_op',
    project_id=PROJECT_ID,
    job_id=job_id,
    package_uris=[PACKAGE_URI],
    training_python_module='trainer.task',
    training_args=training_args,
    region=REGION,
    scale_tier='CUSTOM',
    master_type='complex_model_m_gpu',
    dag=dag
)

# App Engine deploy new version

t4 = AppEngineVersionOperator(
    task_id='app_engine_deploy_version',
    project_id=PROJECT_ID,
    service_id='default',
    region=REGION,
    service_spec=None,
    dag=dag
)

# TODO: Be sure to set_upstream dependencies for all tasks
t2.set_upstream(t1)
t3.set_upstream(t2)
t4.set_upstream(t3)
Overwriting airflow/dags/training.py

Copy local Airflow DAG file and plugins into the DAGs folder

In [21]:
%%bash
gsutil cp airflow/dags/training.py gs://${AIRFLOW_BUCKET}/dags # overwrite if it exists
gsutil cp -r airflow/plugins gs://${AIRFLOW_BUCKET} # copy custom plugins
Copying file://airflow/dags/training.py [Content-Type=text/x-python]...
/ [1 files][  5.9 KiB/  5.9 KiB]                                                
Operation completed over 1 objects/5.9 KiB.                                      
Copying file://airflow/plugins/gae_admin_plugin.py [Content-Type=text/x-python]...
Copying file://airflow/plugins/ml_engine_plugin.py [Content-Type=text/x-python]...
/ [2 files][ 17.9 KiB/ 17.9 KiB]                                                
Operation completed over 2 objects/17.9 KiB.                                     
  1. Navigate to your Cloud Composer instance

  2. Trigger a manual run of your DAG for testing

  3. Ensure your DAG runs successfully (all nodes outlined in dark green and 'success' tag shows)

Successful Airflow DAG run

Troubleshooting your DAG

DAG not executing successfully? Follow these below steps to troubleshoot.

Click on the name of a DAG to view a run (ex: recommendations_training_v1)

  1. Select a node in the DAG (red or yellow borders mean failed nodes)
  2. Select View Log
  3. Scroll to the bottom of the log to diagnose
  4. X Option: Clear and immediately restart the DAG after diagnosing the issue

Tips:

  • If bq_rec_training_data immediately fails without logs, your DAG file is missing key parts and is not compiling
  • ml_engine_training_op will take 9 - 12 minutes to run. Monitor the training job in ML Engine
  • Lastly, check the solution endtoend.ipynb to compare your lab answers

Viewing Airflow logs

Congratulations!

You have made it to the end of the end-to-end recommendation system lab. You have successfully setup an automated workflow to retrain and redeploy your recommendation model.


Challenges

Looking to solidify your Cloud Composer skills even more? Complete the optional challenges below

Challenge 1

Use either the BigQueryCheckOperator or the BigQueryValueCheckOperator to create a new task in your DAG that ensures the SQL query for training data is returning valid results before it is passed to Cloud ML Engine for training.

Hint: Check for COUNT() = 0 or other health check


Challenge 2

Create a Cloud Function to automatically trigger your DAG when a new recommendation_events.csv file is loaded into your Google Cloud Storage Bucket.

Hint: Check the composer_gcf_trigger.ipynb lab for inspiration


Challenge 3

Modify the BigQuery query in the DAG to only train on a portion of the data available in the dataset using a WHERE clause filtering on date. Next, parameterize the WHERE clause to be based on when the Airflow DAG is run

Hint: Make use of prebuilt Airflow macros like the below:

constants or can be dynamic based on Airflow macros
max_query_date = '2018-02-01' # {{ macros.ds_add(ds, -7) }}
min_query_date = '2018-01-01' # {{ macros.ds_add(ds, -1) }}

Additional Resources