Today you will see how the convolutional layers of a CNN transform an image. Moreover, you’ll see that as we go higher on the stacked conv layer the activations become more and more abstracts.
For doing this, I created a CNN from scratch trained on ‘cats_vs_dogs’ dataset taken from TensorFlow datasets page
Here you will find the most useful parts of the code. For the complete Jupyter notebook, take a look at the link at the bottom page.
Ok, let’s start!
Import and pre-processing
import tensorflow as tf
import tensorflow_datasets as tfd
Get the data
# load dataset
(train, validation), metadata = tfd.load(
'cats_vs_dogs',
split=["train[:80%]","train[80%:]"],
as_supervised = True,
with_info = True,
)
Pre-process the data
width, height = 150, 150
def preprocess(image, label):
img = tf.cast(image, tf.float32)
img = img / 255
img = tf.image.resize(img, (width, height))
return img, label
train = train.map(preprocess)
validation = validation.map(preprocess)
#create dataset for train and val
train_dataset = train.shuffle(100).batch(64)#look
validation_dataset = validation.shuffle(100).batch(64)
Model definition and training
There is 4 conv layer and, on top of that, a Flatten and a dense one.
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(width, height, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
Training the model
epochs = 10
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.0001),
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_dataset, epochs=epochs,
validation_data=validation_dataset)
model.save("cat_vs_dog.h5")
Result of train after 10 epoch (we are not interested in performance, we want to see the activations of the conv layers)
Epoch 10/10
291/291 [==============================] - 82s 280ms/step - loss: 0.3577 - accuracy: 0.8445 - val_loss: 0.4525 - val_accuracy: 0.7896
Intermediate output of the conv layers
Now the interesting part! We define the index of the conv layer and then we feed the network with a dog image
I choose to try with a photo of my dog, Rocky
from keras.preprocessing import image
model_l = tf.keras.models.load_model('cat_vs_dog.h5')
# load image of rocky
rocky = image.load_img('rocky.jpg', target_size=(width,height))
rocky_as_tensor = image.img_to_array(rocky)
rocky_as_tensor = np.expand_dims(rocky_as_tensor, axis=0)
rocky_as_tensor /= 255 # normalize
print("dog shape", rocky_as_tensor.shape)
# get layer for catch the output
layer_outputs = [layer.output for layer in model_l.layers[:8]]
activation_model = models.Model(inputs=model_l.input, outputs=layer_outputs)#feed the model with the image
activations = activation_model.predict(rocky_as_tensor)
For getting the 4 conv layer, we selected model_l.layers[:8]. Indeed, there are 4 conv and 4 max pool layer. The code below will filter out the max pool and take the conv indexes.
#take only the conv layers ( we filter out the max pool layers)
conv_indixes = []
for i in range(len(activations)):
if( "conv2d" in model_l.layers[i].name) :
conv_indixes.append(i)
print("Layer: ", model_l.layers[i].name, " Shape: ", activations[i].shape)
Create a function plot_layer to display the layers as a grid of images. We iterate through the conv layer and call this function
from mpl_toolkits.axes_grid1 import ImageGrid
#https://matplotlib.org/stable/gallery/axes_grid1/simple_axesgrid.html
def plot_layer(name, activation):
print("Processing {} layer...".format(name))
how_many_features_map = activation.shape[3]
figure_size = how_many_features_map * 2
fig = plt.figure(figsize=(figure_size, figure_size),)
grid = ImageGrid(fig, 111,
nrows_ncols=(how_many_features_map // 16, 16),
axes_pad=0.1, # pad between axes in inch.
)
images = [activation[0, :, :, i] for i in range(how_many_features_map)]
for ax, img in zip(grid, images):
# Iterating over the grid returns the Axes.
ax.matshow(img)
plt.show()
#for each conv2d layer plot the feature maps
for i, conv_ix in enumerate(conv_indixes):
plot_layer(model_l.layers[conv_ix].name, activations[conv_ix])
Result
As you can see, the last layer is the most abstract and the less visually interpretable. There are a few things to note here. First of all, the first layer is the most visually interpretable. We can say that it acts as an edge detector.
In contrast, as we dig deeper into the network, it seems that the results are less interpretable and more abstract.
Complete code
Here the colab file