r"""Batched Jacobians
=====================
In PyTorch, you can easily compute derivatives of a **scalar-valued** variable
:code:`f` w.r.t. to a variable :code:`param` by calling
:code:`f.backward()`. This computes the Jacobian :code:`∂(f) / ∂(param)`
that has shape :code:`[1, *param.shape]`.
If :code:`f` is a reduction of a **batched scalar** :code:`fs` of shape
:code:`[N]`, then BackPACK is capable to compute the individual gradients for
each scalar with its :code:`BatchGrad` extension. This yields the Jacobian
:code:`∂(fs) / ∂(param)` of shape :code:`[N, *param.shape]`.
**This example** demonstrates how to compute the Jacobian of a tensor-valued
variable :code:`fs`, here for the example of a **batched vector** of shape
:code:`[N, C]`, whose Jacobian has shape :code:`[N, C, *param.shape]`.
Setup
-----
We will use the batched vector-valued output of a simple MLP as tensor
:code:`fs` that should be differentiated w.r.t. the model parameters
:code:`param_1, param_2, ...`. For :code:`param_i`, this leads to a Jacobian
:code:`∂(fs) / ∂(param_i)` of shape :code:`[N, C, *param_i.shape]`.
Let's start by importing the required functionality and write a setup function
to create our synthetic data.
"""
import itertools
from math import sqrt
from typing import List, Tuple
import matplotlib.pyplot as plt
from torch import Tensor, allclose, cat, manual_seed, rand, zeros, zeros_like
from torch.autograd import grad
from torch.nn import Linear, MSELoss, ReLU, Sequential
from backpack import backpack, extend, extensions
# architecture specifications
N = 15
D_in = 10
D_hidden = 7
C = 5
def setup() -> Tuple[Sequential, Tensor]:
"""Create a simple MLP with ReLU activations and its synthetic input.
Returns:
A simple MLP and a tensor that can be fed to it.
"""
X = rand(N, D_in)
model = Sequential(Linear(D_in, D_hidden), ReLU(), Linear(D_hidden, C))
return model, X
# %%
# With autograd
# -------------
#
# First, let's compute the Jacobians with PyTorch's :code:`autograd` to verify
# our results.
#
# To do that, we need to differentiate per component of :code:`fs`. This means
# that we will differentiate multiple times through its graph, therefore we
# need to set :code:`retain_graph=True`.
manual_seed(0)
model, X = setup()
fs = model(X)
autograd_jacobians = [zeros(fs.shape + param.shape) for param in model.parameters()]
for n, c in itertools.product(range(N), range(C)):
grads_n_c = grad(fs[n, c], model.parameters(), retain_graph=True)
for param_idx, param_grad in enumerate(grads_n_c):
autograd_jacobians[param_idx][n, c, :] = param_grad
# %%
#
# Let's visualize the Jacobians by flattening the dimensions stemming from
# :code:`fs` and from :code:`param_i`, and by concatenating them along the
# parameter dimensions:
plt.figure()
plt.title(r"Batched Jacobian")
image = plt.imshow(
cat(
[
jac.flatten(end_dim=fs.dim() - 1).flatten(start_dim=1)
for jac in autograd_jacobians
],
dim=1,
)
)
plt.colorbar(image, shrink=0.7)
# %%
#
# In the following, we will compute the same Jacobian tensor lists with
# BackPACK. To compare our results, we will use the following helper function:
def compare_tensor_lists(
tensor_list1: List[Tensor], tensor_list_2: List[Tensor]
) -> None:
"""Checks equality of two tensor lists.
Args:
tensor_list1: First tensor list.
tensor_list2: Second tensor list.
Raises:
ValueError: If the two tensor lists don't match.
"""
if len(tensor_list1) != len(tensor_list_2):
raise ValueError("Tensor lists have different length.")
for tensor1, tensor2 in zip(tensor_list1, tensor_list_2):
if tensor1.shape != tensor2.shape:
raise ValueError("Tensors have different sizes.")
if not allclose(tensor1, tensor2):
raise ValueError("Tensors have different values.")
print("Both tensor lists match.")
# %%
# Next, we will present two approaches to compute such Jacobians with BackPACK.
#
# You can imagine the first one as carrying out the for-loop over :code:`N`
# parallel, and the second one as carrying out both for loops over :code:`N, C`
# in parallel. The first approach relies on a first-order extension, the second
# one on a second-order extension. This means that while the first approach
# works on quite general graphs, for the second one to work your graph must be
# fully BackPACK-compatible.
#
# With BackPACK's :code:`BatchGrad`
# ---------------------------------
#
# As described in the introduction, BackPACK's :code:`BatchGrad` extension can
# compute Jacobians of batched scalars. We can therefore compute the
# derivatives for :code:`fs[:, c]` in one iteration, parallelizing the Jacobian
# computation over the batch axis. For the full Jacobian, this requires
# :code:`C` backpropagations, hence we need to tell both :code:`autograd` and
# BackPACK to retain the graph.
#
# Let's do that in code, and check the result:
manual_seed(0)
model, X = setup()
model = extend(model)
fs = model(X)
backpack_first_jacobians = [zeros(fs.shape + p.shape) for p in model.parameters()]
for c in range(C):
with backpack(extensions.BatchGrad(), retain_graph=True):
f = fs[:, c].sum()
f.backward(retain_graph=True)
for param_idx, param in enumerate(model.parameters()):
backpack_first_jacobians[param_idx][:, c, :] = param.grad_batch
print("Comparing batched Jacobian from autograd with BackPACK (via BatchGrad):")
compare_tensor_lists(autograd_jacobians, backpack_first_jacobians)
# %%
# With BackPACK's :code:`SqrtGGNExact`
# ------------------------------------
#
# The second approach uses BackPACK's :code:`SqrtGGNExact` second-order
# extension. It computes the matrix square root of the GGN/Fisher.
#
# This approach uses that after feeding :code:`fs` through a square loss with
# :code:`reduction='sum'`, the GGN's square root is the desired Jacobian up to
# a normalization factor of √2 (to find out more, read Section 2 of `[Dangel,
# 2021] `_), and a transposition due to
# BackPACK's internals.
#
# Like that, we get the Jacobian in a single backward pass and don't have to
# retain the graph:
manual_seed(0)
model, X = setup()
model = extend(model)
loss_func = extend(MSELoss(reduction="sum"))
fs = model(X)
fs_labels = zeros_like(fs) # can contain arbitrary values.
loss = loss_func(fs, fs_labels)
with backpack(extensions.SqrtGGNExact()):
loss.backward()
backpack_second_jacobians = [
param.sqrt_ggn_exact.transpose(0, 1) / sqrt(2) for param in model.parameters()
]
print("Comparing batched Jacobian from autograd with BackPACK (via SqrtGGNExact):")
compare_tensor_lists(autograd_jacobians, backpack_second_jacobians)