You're right, translation is not relevant in this context: it wouldn't affect the normals.
Consider a non-uniformly scaled object:

We want to get the normal matrix to transform the normals in the vertex buffer into vectors representing world space normals from the world matrix M, which encodes the translation, rotation and scaling (and shearing) of the model object.
If you were to interpret the 3x3 of the M in the vertex shader, you would read rotation data with scaling applied. If the world matrix contains non-uniform scaling, your normal vectors will be scaled incorrectly, and you'll get the wrong results.
The reason we take the transpose of the transform is because the rotation matrix, used to create M = TranslationMatrix x RotationMatrix x ScaleMatrix
, is an Orthagonal Matrix, which means that the row vectors are orthogonal, unit vectors, i.e. they are linearly-independent and their length is one.
This type of matrices hold a property like the following:
Q^T * Q = Q * Q^T = I
If you remember the inverse rule:
Q^-1 * Q = Q * Q^-1 = I
This also means the following for the orthogonal matrices:
Q^T = Q^-1
This allows us to apply the inverse of the rotations by transposing the matrix.
In summary, taking inverse-transpose of the world matrix does 2 important things:
- Undo transformation (Inverse)
- undo the scaling (and shearing)
- invert rotations
- translate the object to the world center (irrelevant)

- Un-undo rotations (Transpose)
- Now, if we take the transpose of our matrix, we will get our rotations back to their original state because of the orthogonal property of the rotation matrix.

We now have our normal matrix ready to transform model-space normals defined in the vertex buffer into world-space normals in the vertex shader.