Yes, you do need to transform the fetch direction into the space of the cubemap. If you could somehow figure out the fetch direction in the vertex shader, then you could do the transformation there instead, but that would produce worse lighting.
It also may be worth optimizing for a smaller number of interpolants between the vertex and pixel shaders (rather than a smaller number of matrix-vector multiplications in the pixel shader). As soon as you have one transformed view vector and two transformed light vectors that must be interpolated (which would otherwise be constant), you've matched the size of the tangent/bitangent/normal matrix that you would otherwise be interpolating. That has the potential to have more performance impact than doing extra matrix multiplications in the shader.