Skip to content

Instantly share code, notes, and snippets.

@RaphaelMeudec
Created July 18, 2019 15:11
Show Gist options
  • Save RaphaelMeudec/e9a805fa82880876f8d89766f0690b54 to your computer and use it in GitHub Desktop.
Save RaphaelMeudec/e9a805fa82880876f8d89766f0690b54 to your computer and use it in GitHub Desktop.
Grad CAM implementation with Tensorflow 2
import cv2
import numpy as np
import tensorflow as tf
IMAGE_PATH = './cat.jpg'
LAYER_NAME = 'block5_conv3'
CAT_CLASS_INDEX = 281
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
img = tf.keras.preprocessing.image.img_to_array(img)
model = tf.keras.applications.vgg16.VGG16(weights='imagenet', include_top=True)
grad_model = tf.keras.models.Model([model.inputs], [model.get_layer(LAYER_NAME).output, model.output])
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(np.array([img]))
loss = predictions[:, CAT_CLASS_INDEX]
output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]
gate_f = tf.cast(output > 0, 'float32')
gate_r = tf.cast(grads > 0, 'float32')
guided_grads = tf.cast(output > 0, 'float32') * tf.cast(grads > 0, 'float32') * grads
weights = tf.reduce_mean(guided_grads, axis=(0, 1))
cam = np.ones(output.shape[0: 2], dtype = np.float32)
for i, w in enumerate(weights):
cam += w * output[:, :, i]
cam = cv2.resize(cam.numpy(), (224, 224))
cam = np.maximum(cam, 0)
heatmap = (cam - cam.min()) / (cam.max() - cam.min())
cam = cv2.applyColorMap(np.uint8(255*heatmap), cv2.COLORMAP_JET)
output_image = cv2.addWeighted(cv2.cvtColor(img.astype('uint8'), cv2.COLOR_RGB2BGR), 0.5, cam, 1, 0)
cv2.imwrite('cam.png', output_image)
@cordeirojoao
Copy link

@cordeirojoao Yes, indeed. Elements of the array with the largest values (hence green on the plot) are the most important (according to grad cam method)

Thank you for the quick reply :)
So I guess I did the code the right way, I was not sure.

I have several plots on the code. When you refer to look for the green points (largest values) it's not clear to me which plot to look. I'm a little bit confused.
Should it be the one below the comment "Apply guided backpropagation"?

Image

@cordeirojoao
Copy link

@RaphaelMeud Sorry for bothering you again, but I'm still a little bit confused and I would like to hear your comments on my last comment.
Thank you very much

@RaphaelMeudec
Copy link
Author

@cordeirojoao yes indeed, I'm refering to this one!

@cordeirojoao
Copy link

@cordeirojoao yes indeed, I'm refering to this one!

@RaphaelMeud thank you very much for that. Really appreciate it :)

@sravanth99
Copy link

Thank you so much! you saved my day : ).

@sdbonte
Copy link

sdbonte commented Apr 16, 2020

Thank you for this elegant implementation. I have a small question: is it possible that gate_f and gate_r are never used? Is this an error?

@RaphaelMeudec
Copy link
Author

@sdbonte Indeed, they could have been used on line 25 but for some reason I decided to rebuild them.

@Jeremynadal33
Copy link

First of all, thank you very much for sharing! Nevertheless, I have a question if you have time.
I am dealing with multi-variate time series so it is 1D convolutions but I guess the method is the same as with 2D.

How would you retrieve the heat maps/activation maps for each different inputs of your network? Because if the same filters are used for all inputs, I feel like the heat map will be identical for every feature. (I am quite new in this field so I apologies if anything I say is incorrect)
If so, would you recommend treating each input individually to use Grad Cam like method?

Thanks in advance!

@RaphaelMeudec
Copy link
Author

@Jeremynadal33 Could you provide a minimal example of your data/model (ideally with np.random.random so I can have an idea of the input shapes)?

@Jeremynadal33
Copy link

Jeremynadal33 commented Apr 17, 2020

@Jeremynadal33 Could you provide a minimal example of your data/model (ideally with np.random.random so I can have an idea of the input shapes)?

Capture d’écran 2020-04-17 à 11 49 59

You may find attached a snapshot of the data (it is a synthetic dataset but here there are three features each of the same length under the form of time series and one time series target of specific length). You can see some specific events in time series 2 and 3 and I would like to know if the network's attention is focused on those events. (Additionally, there might be an event in time series 1 but not the case in that sample)
I made several models all different (functional mostly like that one for example but possibly with more convolutional layers).
Capture d’écran 2020-04-17 à 11 52 46

And lastly, how does cv2 resize the output shape back into the image proportion?
Thank you very much and I will put more code into a GitHub rep if you need more.

@SelComputas
Copy link

Hi, I was wondering if you had any input as to how one would go about implementing this with a pre-trained (frozen) .pb model?

@kdh-awraw1019
Copy link

Hello.

I want run your code with my dataset(images).

width = 224, height = 112 and My model structure is MobileNetV2.

So, I fixed your code for my dataset.

then, LAYER_NAME = 'block5_conv3' -> 'Conv_1'(Last Convolution Layer in MobileNetV2).
CAT_CLASS_INDEX = 281 -> 456 (number of class)

And I load an image randomly in my dataset.

After run the code,

this error is occured.

Traceback (most recent call last):
File "C:/Users/BV_WorkStation/PycharmProjects/MobileNetV3-TF-master/BV_Denom_Classification_test_Grad-CAM.py", line 31, in
loss = predictions[:, CAT_CLASS_INDEX]
File "C:\Users\BV_WorkStation\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "C:\Users\BV_WorkStation\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1042, in _slice_helper
name=name)
File "C:\Users\BV_WorkStation\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "C:\Users\BV_WorkStation\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1214, in strided_slice
shrink_axis_mask=shrink_axis_mask)
File "C:\Users\BV_WorkStation\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 10320, in strided_slice
_ops.raise_from_not_ok_status(e, name)
File "C:\Users\BV_WorkStation\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 6921, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 456 of dimension 1 out of bounds. [Op:StridedSlice] name: strided_slice/

Process finished with exit code 1

and full code

from future import absolute_import
from future import division
from future import print_function

import os
import tensorflow as tf
from tensorflow import keras
import numpy as np
import random
import cv2

data_dir = "G:\ATEC_AP\KDH\Fitness\DB\"
predict_dir = "G:\ATEC_AP\KDH\Fitness\DB\predict_img\"
random_img_path = random.choice(os.listdir(predict_dir))

LAYER_NAME = 'Conv_1'
CAT_CLASS_INDEX = 456
input_width = 224
input_height = 112

img = tf.keras.preprocessing.image.load_img(predict_dir + random_img_path, target_size=(input_height, input_width),
color_mode='grayscale')
img = tf.keras.preprocessing.image.img_to_array(img)

model = tf.keras.models.load_model(data_dir + "MobileNetV2_20_112_224_08_25_17_55" + ".h5")
model.summary()
grad_model = tf.keras.models.Model([model.inputs], [model.get_layer(LAYER_NAME).output, model.output])

with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(np.array([img]))
loss = predictions[:, CAT_CLASS_INDEX]

output = conv_outputs[0]
grads = tape.gradient(loss, conv_outputs)[0]

gate_f = tf.cast(output > 0, 'float32')
gate_r = tf.cast(grads > 0, 'float32')
guided_grads = tf.cast(output > 0, 'float32') * tf.cast(grads > 0, 'float32') * grads

weights = tf.reduce_mean(guided_grads, axis=(0, 1))

cam = np.ones(output.shape[0: 2], dtype=np.float32)

for i, w in enumerate(weights):
cam += w * output[:, :, i]

cam = cv2.resize(cam.numpy(), (input_width, input_height))
cam = np.maximum(cam, 0)
heatmap = (cam - cam.min()) / (cam.max() - cam.min()) ##

cam = cv2.applyColorMap(np.uint8(255*heatmap), cv2.COLORMAP_JET)

output_image = cv2.addWeighted(cv2.cvtColor(img.astype('uint8'), cv2.COLOR_RGB2BGR), 0.5, cam, 1, 0)

cv2.imwrite('cam.png', output_image)

Please help me

Tensorflow Version is 2.1.0

@limbachia
Copy link

I have a question similar to that of @Jeremynadal33. Even my data has several features.

@RizwanMunawar
Copy link

@KDHAP the above code work on image size of (224,224) while in your code image_height =112, you need to change it to 224.

@Jimut123
Copy link

Jimut123 commented Mar 5, 2021

Hi Raphael,

Firstly, thank you for the code.
I am having some problem in doing GRAD-CAM for functional layers in keras, any leads on this?

So here is my model, I want to visualise the activation on the layer conv_64, and I am getting some error.

Keras Model

last_conv_layer_name = "conv_64"
classifier_layer_names = [      
    "concatenate_17", 
    "conv_64_2",              
    "max_pool3",                  
    "conv_64_3",                
    "conv_64_31",                
    "concatenate_18",                  
    "conv_64_32",             
    "max_pool4",                
    "conv_64_4",                  
    "conv_64_41",                  
    "concatenate_19",                  
    "conv_64_42",             
    "max_pool5", 
    "conv_64_5", 
    "max_pool6", 
    "flatten", 
    "dropout_3", 
    "dense_64", 
    "output_layer", 
]


# Generate class activation heatmap
heatmap = make_gradcam_heatmap(
    img_array, model, last_conv_layer_name, classifier_layer_names
)

This is what I am doing, but getting the following error when running The problem lies when we are concatenating two different layers and it is confusing the classifier of the GRAD CAM module...

(1, 360, 360, 3)
preds :  [[5.7032428e-16 1.0479534e-33 0.0000000e+00 1.0243782e-23 1.0000000e+00
  0.0000000e+00 0.0000000e+00 0.0000000e+00]]
Model: "model_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param   No
=================================================================
input_9 (InputLayer)         [(None, 360, 360, 3)]     0         
_________________________________________________________________
conv_32 (Conv2D)             (None, 360, 360, 16)      448       
_________________________________________________________________
conv_64 (Conv2D)             (None, 360, 360, 1)       145       
=================================================================
Total params: 593
Trainable params: 593
Non-trainable params: 0
_________________________________________________________________

------------------------------------------------------------------------

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/merge.py in call(self, inputs)
    120   def call(self, inputs):
    121     if not isinstance(inputs, (list, tuple)):
--> 122       raise ValueError('A merge layer should be called on a list of inputs.')
    123     if self._reshape_required:
    124       reshaped_inputs = []

ValueError: A merge layer should be called on a list of inputs.

Here is the minimal version for reproducing the error.
https://colab.research.google.com/drive/1nnnHlyGbuOgNGEDvF1DY9l0vlVzgu291?usp=sharing

Looks like it will be tough when we are using concatenate layer and focusing on an intermediate layer...

Since I have a series of same structure of layer, I can use the deeper layers too.
The only problem is I am not able to do it in a functional model.

@JoanaNRocha
Copy link

JoanaNRocha commented Jun 1, 2021

Hi Raphael, thank you for sharing your code. Is it normal for cam.min() and cam.max() to get very close values? I'm getting min 1.0001216 and max 1.003609 e.g.. After I rescale the heatmap variable, I get a valid map, but I just want to check if those are plausible cam values.

@RaphaelMeudec
Copy link
Author

If your last layer is a softmax activation, it is very plausible. I think in the original paper, they remove the last softmax. This would lead to a larger gap but should produce a highly similar activation map.

@JoanaNRocha
Copy link

I'm using a MobileNet with a final sigmoid activation layer. Here is the architecture:
0 input_2 - True
1 mobilenet_1.00_224 - True
2 global_average_pooling2d - True
3 dense - True
4 dense_1 - True

The mobilenet_1.00_224 includes the MobileNet feature extractor:
0 input_1 - True
1 conv1_pad - False
2 conv1 - False
3 conv1_bn - False
...
70 conv_dw_11_relu - True
71 conv_pw_11 - True
72 conv_pw_11_bn - False
73 conv_pw_11_relu - True
74 conv_pad_12 - True
75 conv_dw_12 - True
76 conv_dw_12_bn - False
77 conv_dw_12_relu - True
78 conv_pw_12 - True
79 conv_pw_12_bn - False
80 conv_pw_12_relu - True
81 conv_dw_13 - True
82 conv_dw_13_bn - False
83 conv_dw_13_relu - True
84 conv_pw_13 - True
85 conv_pw_13_bn - False
86 conv_pw_13_relu - True

I'm accessing the output of the final (86th) layer, with ReLU to get my results. Should I opt for the 84th? Or is it ok to use after the ReLU?

@HakanKARASU
Copy link

could u give some information in the example above ( 1,300,20) corresponding to only one target value?

@HakanKARASU
Copy link

it would be great to provide some information on that

@ErolCitak
Copy link

Hi, may I demand some clue about how to predict gradcam output not just a single image per time but also for multiple images at once?

Thanks

@HakanKARASU
Copy link

HakanKARASU commented Sep 1, 2021 via email

@JoanaNRocha
Copy link

Hi, may I demand some clue about how to predict gradcam output not just a single image per time but also for multiple images at once?

Thanks

The easiest way would be to put this code into a function, using the image path as argument (if you want to test different models, add them as argument as well). Then, simply create a 'for' loop for all the image paths you want to analyze.

@ErolCitak
Copy link

Hi, may I demand some clue about how to predict gradcam output not just a single image per time but also for multiple images at once?
Thanks

The easiest way would be to put this code into a function, using the image path as argument (if you want to test different models, add them as argument as well). Then, simply create a 'for' loop for all the image paths you want to analyze.

Thank you for your response. Actually, I was wondering that what should I do or how should I approach this problem if I want to use my gpu to extract for example 20 images at once, otherwise in for loop solution I have to feed every image one by one.

@JoanaNRocha
Copy link

Hi, may I demand some clue about how to predict gradcam output not just a single image per time but also for multiple images at once?
Thanks

The easiest way would be to put this code into a function, using the image path as argument (if you want to test different models, add them as argument as well). Then, simply create a 'for' loop for all the image paths you want to analyze.

Thank you for your response. Actually, I was wondering that what should I do or how should I approach this problem if I want to use my gpu to extract for example 20 images at once, otherwise in for loop solution I have to feed every image one by one.

Create a list containing the image paths for the 20 images you want to analyze, and then iterate on that list using the 'for' loop.

@Tixi3
Copy link

Tixi3 commented Jan 25, 2022

Hello and thank you very much for this code, it really saved me :)
I have a question regarding CAT_CLASS_INDEX = 281. What does it mean? How do you get that value? And what changes I may have to do if I want to implement this code to dogs pictures or for example medicine images?

@RaphaelMeudec
Copy link
Author

The model is a pretrained VGG that has been trained on imagenet. Imagenet has 1000 classes (the last layer of the VGG is a dense one with 1000 outputs), and the index that has been used during training for cat is the 281-th. The list is available here.

For your case, you just need to know which index is used for the "dog" and then pass the according index.

@Tixi3
Copy link

Tixi3 commented Jan 25, 2022

I see! Thank you very much :)

@Corne173
Copy link

How would you find the feature importance for in terms of the RGB colour channels? Here you get pixel importance, which is a combination of the RGB input. I'm very interest in the answer as it relates to a problem I have where I want to find the grad cam for a multivariate time series

@sneh-debug
Copy link

@RaphaelMeudec Hi. what if the input size is 192,152,4 where 4 is number of 2D slices? how can we obtain for each of the image??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment