Flowers Image Classification with TensorFlow on Cloud ML Engine

This notebook demonstrates how to do image classification from scratch on a flowers dataset using the Estimator API.

In [4]:
import os
PROJECT = "cloud-training-demos" # REPLACE WITH YOUR PROJECT ID
BUCKET = "cloud-training-demos-ml" # REPLACE WITH YOUR BUCKET NAME
REGION = "us-central1" # REPLACE WITH YOUR BUCKET REGION e.g. us-central1
MODEL_TYPE = "cnn"

# do not change these
os.environ["PROJECT"] = PROJECT
os.environ["BUCKET"] = BUCKET
os.environ["REGION"] = REGION
os.environ["MODEL_TYPE"] = MODEL_TYPE
os.environ["TFVERSION"] = "1.13"  # Tensorflow version
In [5]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION
Updated property [core/project].
Updated property [compute/region].

Input functions to read JPEG images

The key difference between this notebook and the MNIST one is in the input function. In the input function here, we are doing the following:

  • Reading JPEG images, rather than 2D integer arrays.
  • Reading in batches of batch_size images rather than slicing our in-memory structure to be batch_size images.
  • Resizing the images to the expected HEIGHT, WIDTH. Because this is a real-world dataset, the images are of different sizes. We need to preprocess the data to, at the very least, resize them to constant size.

Run as a Python module

Let's first run it locally for a short while to test the code works. Note the --model parameter

In [6]:
%%bash
rm -rf flowersmodel.tar.gz flowers_trained
gcloud ml-engine local train \
    --module-name=flowersmodel.task \
    --package-path=${PWD}/flowersmodel \
    -- \
    --output_dir=${PWD}/flowers_trained \
    --train_steps=5 \
    --learning_rate=0.01 \
    --batch_size=2 \
    --model=$MODEL_TYPE \
    --augment \
    --train_data_path=gs://cloud-ml-data/img/flower_photos/train_set.csv \
    --eval_data_path=gs://cloud-ml-data/img/flower_photos/eval_set.csv
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

INFO:tensorflow:TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {}, u'job': {u'args': [u'--output_dir=/home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained', u'--train_steps=5', u'--learning_rate=0.01', u'--batch_size=2', u'--model=cnn', u'--augment', u'--train_data_path=gs://cloud-ml-data/img/flower_photos/train_set.csv', u'--eval_data_path=gs://cloud-ml-data/img/flower_photos/eval_set.csv'], u'job_name': u'flowersmodel.task'}, u'task': {}}
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 300, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f093cc35050>, '_model_dir': '/home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained/', '_protocol': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 300.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From flowersmodel/model.py:64: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
WARNING:tensorflow:From flowersmodel/model.py:66: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
WARNING:tensorflow:From flowersmodel/model.py:83: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
WARNING:tensorflow:From flowersmodel/model.py:86: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/lookup_ops.py:1137: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2019-04-11 02:02:38.285029: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-04-11 02:02:38.303195: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000185000 Hz
2019-04-11 02:02:38.313982: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562a215c1440 executing computations on platform Host. Devices:
2019-04-11 02:02:38.314023: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-11 02:02:38.324266: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained/model.ckpt.
INFO:tensorflow:loss = 1.6782651, step = 1
INFO:tensorflow:Saving checkpoints for 5 into /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained/model.ckpt.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/metrics_impl.py:455: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-04-11T02:02:47Z
INFO:tensorflow:Graph was finalized.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained/model.ckpt-5
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-04-11-02:03:56
INFO:tensorflow:Saving dict for global step 5: accuracy = 0.1891892, global_step = 5, loss = 1.6156957
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5: /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained/model.ckpt-5
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:205: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default', 'classes']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Restoring parameters from /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained/model.ckpt-5
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: /home/jupyter/training-data-analyst/courses/machine_learning/deepdive/08_image/flowers_trained/export/exporter/temp-1554948236/saved_model.pb
INFO:tensorflow:Loss for final step: 1.6121318.

Now, let's do it on ML Engine. Note the --model parameter

In [ ]:
%%bash
OUTDIR=gs://${BUCKET}/flowers/trained_${MODEL_TYPE}
JOBNAME=flowers_${MODEL_TYPE}_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
    --region=$REGION \
    --module-name=flowersmodel.task \
    --package-path=${PWD}/flowersmodel \
    --job-dir=$OUTDIR \
    --staging-bucket=gs://$BUCKET \
    --scale-tier=BASIC_GPU \
    --runtime-version=$TFVERSION \
    -- \
    --output_dir=$OUTDIR \
    --train_steps=1000 \
    --learning_rate=0.01 \
    --batch_size=40 \
    --model=$MODEL_TYPE \
    --augment \
    --batch_norm \
    --train_data_path=gs://cloud-ml-data/img/flower_photos/train_set.csv \
    --eval_data_path=gs://cloud-ml-data/img/flower_photos/eval_set.csv

Monitoring training with TensorBoard

Use this cell to launch tensorboard

In [ ]:
from google.datalab.ml import TensorBoard
TensorBoard().start("gs://{}/flowers/trained_{}".format(BUCKET, MODEL_TYPE))
In [ ]:
for pid in TensorBoard.list()["pid"]:
    TensorBoard().stop(pid)
    print("Stopped TensorBoard with pid {}".format(pid))

Here are my results:

Model Accuracy Time taken Run time parameters
cnn with batch-norm 0.582 47 min 1000 steps, LR=0.01, Batch=40
as above, plus augment 0.615 3 hr 5000 steps, LR=0.01, Batch=40

Deploying and predicting with model

Deploy the model:

In [ ]:
%%bash
MODEL_NAME="flowers"
MODEL_VERSION=${MODEL_TYPE}
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/flowers/trained_${MODEL_TYPE}/export/exporter | tail -1)
echo "Deleting and deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes"
#gcloud ml-engine versions delete --quiet ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version=$TFVERSION

To predict with the model, let's take one of the example images that is available on Google Cloud Storage

The online prediction service expects images to be base64 encoded as described here.

In [12]:
%%bash
IMAGE_URL=gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg

# Copy the image to local disk.
gsutil cp $IMAGE_URL flower.jpg

# Base64 encode and create request message in json format.
python -c 'import base64, sys, json; img = base64.b64encode(open("flower.jpg", "rb").read()).decode(); print(json.dumps({"image_bytes":{"b64": img}}))' &> request.json
Copying gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg...
/ [1 files][ 41.7 KiB/ 41.7 KiB]                                                
Operation completed over 1 objects/41.7 KiB.                                     

Send it to the prediction service

In [13]:
%%bash
gcloud ml-engine predict \
    --model=flowers \
    --version=${MODEL_TYPE} \
    --json-instances=./request.json
CLASS       CLASSID  PROBABILITIES
sunflowers  3        [0.0026037374045699835, 0.33217525482177734, 0.0010772582609206438, 0.6588829159736633, 0.005260893143713474]
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.