Notebooks
H
Hugging Face
111 Tf Serving Vision

111 Tf Serving Vision

hf-bloghacktoberfestnotebooks

This notebook shows how to deploy a vision model in TensorFlow from πŸ€— Transformers with TensorFlow Serving. It uses this blog post as a reference.

Setup

[ ]
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.4 MB 4.4 MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.6 MB 23.1 MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 596 kB 39.9 MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 101 kB 9.9 MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 511.7 MB 5.5 kB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 438 kB 46.6 MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.8 MB 33.5 MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.6 MB 41.1 MB/s 

Imports

[ ]
[ ]
4.20.1

Save the ViT model and investigate its inputs

[ ]
Downloading:   0%|          | 0.00/68.0k [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/330M [00:00<?, ?B/s]
All model checkpoint layers were used when initializing TFViTForImageClassification.

All the layers of TFViTForImageClassification were initialized from the model checkpoint at google/vit-base-patch16-224.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFViTForImageClassification for predictions without further training.
WARNING:absl:Found untraced functions such as embeddings_layer_call_fn, embeddings_layer_call_and_return_conditional_losses, encoder_layer_call_fn, encoder_layer_call_and_return_conditional_losses, layernorm_layer_call_fn while saving (showing 5 of 421). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: resnet/saved_model/1/assets
INFO:tensorflow:Assets written to: resnet/saved_model/1/assets
[ ]
The given SavedModel SignatureDef contains the following input(s):
  inputs['pixel_values'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, -1)
      name: serving_default_pixel_values:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['logits'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1000)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

Save the model embedding pre-processing and post-processing ops

[ ]
ViTFeatureExtractor {
,  "do_normalize": true,
,  "do_resize": true,
,  "feature_extractor_type": "ViTFeatureExtractor",
,  "image_mean": [
,    0.5,
,    0.5,
,    0.5
,  ],
,  "image_std": [
,    0.5,
,    0.5,
,    0.5
,  ],
,  "resample": 2,
,  "size": 224
,}
[ ]
[ ]

Notes on making the model accept string inputs:

When dealing with images via REST or gRPC requests the size of the request payload can easily spiral up depending on the resolution of the images being passed. This is why, it is good practice to compress them reliably and then prepare the request payload.

[ ]
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:458: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.map_fn(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems))
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py:458: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.map_fn(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems))
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py:629: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
WARNING:absl:Found untraced functions such as embeddings_layer_call_fn, embeddings_layer_call_and_return_conditional_losses, encoder_layer_call_fn, encoder_layer_call_and_return_conditional_losses, layernorm_layer_call_fn while saving (showing 5 of 421). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: /tmp/1/assets
INFO:tensorflow:Assets written to: /tmp/1/assets

Investigate the SavedModel once again.

[ ]
The given SavedModel SignatureDef contains the following input(s):
  inputs['string_input'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: serving_default_string_input:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['confidence'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1)
      name: StatefulPartitionedCall:0
  outputs['label'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: StatefulPartitionedCall:1
Method name is: tensorflow/serving/predict

Install TensorFlow Model Server

[ ]
--2022-07-15 04:11:11--  http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-universal-2.8.0/t/tensorflow-model-server-universal/tensorflow-model-server-universal_2.8.0_all.deb
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.8.128, 74.125.23.128, 74.125.203.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.8.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 335421916 (320M) [application/x-debian-package]
Saving to: β€˜tensorflow-model-server-universal_2.8.0_all.deb’

tensorflow-model-se 100%[===================>] 319.88M  68.7MB/s    in 4.7s    

2022-07-15 04:11:16 (68.7 MB/s) - β€˜tensorflow-model-server-universal_2.8.0_all.deb’ saved [335421916/335421916]

Selecting previously unselected package tensorflow-model-server-universal.
(Reading database ... 155653 files and directories currently installed.)
Preparing to unpack tensorflow-model-server-universal_2.8.0_all.deb ...
Unpacking tensorflow-model-server-universal (2.8.0) ...
Setting up tensorflow-model-server-universal (2.8.0) ...

Deploy the model

By default TF Serving exposes two APIs: REST and gRPC. We will see how to infer with both. Each has their own pros and cons.

[ ]
Starting job # 0 in a separate thread.
[ ]
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
[ ]
node        8 root   21u  IPv6  26436      0t0  TCP *:8080 (LISTEN)
colab-fil  30 root    5u  IPv6  26409      0t0  TCP *:3453 (LISTEN)
colab-fil  30 root    6u  IPv4  26410      0t0  TCP *:3453 (LISTEN)
jupyter-n  43 root    6u  IPv4  27130      0t0  TCP 172.28.0.2:9000 (LISTEN)
python3    60 root   15u  IPv4  30327      0t0  TCP 127.0.0.1:46129 (LISTEN)
python3    60 root   18u  IPv4  30331      0t0  TCP 127.0.0.1:58207 (LISTEN)
python3    60 root   21u  IPv4  30335      0t0  TCP 127.0.0.1:44103 (LISTEN)
python3    60 root   24u  IPv4  30339      0t0  TCP 127.0.0.1:53393 (LISTEN)
python3    60 root   30u  IPv4  30345      0t0  TCP 127.0.0.1:46873 (LISTEN)
python3    60 root   43u  IPv4  31046      0t0  TCP 127.0.0.1:59625 (LISTEN)
python3    80 root    3u  IPv4  31602      0t0  TCP 127.0.0.1:20352 (LISTEN)
python3    80 root    4u  IPv4  31603      0t0  TCP 127.0.0.1:34417 (LISTEN)
python3    80 root    9u  IPv4  32828      0t0  TCP 127.0.0.1:36819 (LISTEN)
tensorflo 259 root    5u  IPv4  93837      0t0  TCP *:8500 (LISTEN)
tensorflo 259 root   12u  IPv4  92983      0t0  TCP *:8501 (LISTEN)

REST API

[ ]
Downloading data from http://images.cocodataset.org/val2017/000000039769.jpg
173131/173131 [==============================] - 1s 3us/step
Data: {"signature_name": "serving_default", "instances": ... TRmYgEHbbrYWv0A6b4o2n1HZgYLq91nP-o7O2pcNa6r__2Q=="]}
[ ]
{'predictions': [{'label': 'Egyptian cat', 'confidence': 0.896659195}]}

gRPC

[ ]
[ ]
[ ]
Serving function input: string_input
[ ]
[ ]
outputs {
,  key: "confidence"
,  value {
,    dtype: DT_FLOAT
,    tensor_shape {
,      dim {
,        size: 1
,      }
,    }
,    float_val: 0.8966591954231262
,  }
,}
,outputs {
,  key: "label"
,  value {
,    dtype: DT_STRING
,    tensor_shape {
,      dim {
,        size: 1
,      }
,    }
,    string_val: "Egyptian cat"
,  }
,}
,model_spec {
,  name: "resnet"
,  version {
,    value: 1
,  }
,  signature_name: "serving_default"
,}
[ ]
([b'Egyptian cat'], [0.8966591954231262])

Next steps

  • Deploy the SavedModel to Vertex AI
  • Deploy with TF Serving + Kubernetes (via GKE)