我讀過，模糊是在實時圖形中通過在一個軸上然後在另一個軸上完成的。

我過去在1D中做過一些卷積運算，但是我對它不十分滿意，也不知道在這種情況下到底要進行哪些卷積運算。

任何人都可以用簡單的方式解釋圖像的2D高斯模糊處理嗎？

我也聽說模糊的半徑會影響性能。那是由於必須進行更大的捲積嗎？

44

我讀過，模糊是在實時圖形中通過在一個軸上然後在另一個軸上完成的。

我過去在1D中做過一些卷積運算，但是我對它不十分滿意，也不知道在這種情況下到底要進行哪些卷積運算。

任何人都可以用簡單的方式解釋圖像的2D高斯模糊處理嗎？

我也聽說模糊的半徑會影響性能。那是由於必須進行更大的捲積嗎？

image-processinggaussian-blur

14

In general, a convolution is performed by taking the integral of the product of two functions in a sliding window, but if you're not from a math background, that's not a very helpful explanation, and certainly won't give you a useful intuition for it. More intuitively, a convolution allows multiple points in an input signal to affect a single point on an output signal.

Since you're not super comfortable with convolutions, let's first review what a convolution means in a discrete context like this, and then go over a simpler blur.

In our discrete context, we can multiply our two signals by simply multiplying each corresponding sample. The integral is also simple to do discretely, we just add up each sample in the interval we're integrating over. One simple discrete convolution is computing a moving average. If you want to take the moving average of 10 samples, this can be thought of as convolving your signal by a distribution 10 samples long and 0.1 tall, each sample in the window first gets multiplied by 0.1, then all 10 are added together to produce the average. This also reveals an interesting and important distinction, when you're blurring with a convolution, the distribution that you use should sum to 1.0 over all its samples, otherwise it will increase or decrease the overall brightness of the image when you apply it. If the distribution for our average had been 1 over its whole interval, then the total signal would be 10x brighter after the convolution.

Now that we've looked at convolutions, we can move on to blurs. A Gaussian blur is implemented by convolving an image by a Gaussian distribution. Other blurs are generally implemented by convolving the image by other distributions. The simplest blur is the box blur, and it uses the same distribution we described above, a box with unit area. If we want to blur a 10x10 area, then we multiply each sample in the box by 0.01, and then sum them all together to produce the center pixel. We still need to ensure that the total sum of all the samples in our blur distribution are 1.0 to make sure the image doesn't get brighter or darker.

A Gaussian blur follows the same broad procedure as a box blur, but it uses a more complex formula to determine the weights. The distribution can be computed based on the distance from the center `r`

, by evaluating $$\frac{e^{-x^2/2}}{\sqrt{2\pi}}$$ The sum of all the samples in a Gaussian will eventually be approximately 1.0 if you sample every single pixel, but the fact that a Gaussian has infinite support (it has values everywhere) means that you need to use a slightly modified version that sums to 1.0 using only a few values.

Of course both of these processes can be very expensive if you perform them on a very large radius, since you need to sample a lot of pixels in order to compute the blur. This is where the final trick comes in: both a Gaussian blur and a box blur are what's called a "separable" blur. This means that if you perform the blur along one axis, and then perform it along the other axis, it produces the exact same result as if you'd performed it along both axes at the same time. This can be tremendously important. If your blur is 10px across, it requires 100 samples in the naive form, but only 20 when separated. The difference only gets bigger, since the combined blur is $O(n^2)$, while the separated form is $O(n)$.

50

In convolution, two mathematical functions are combined to produce a third function. In image processing functions are usually called kernels. A kernel is nothing more than a (square) array of pixels (a small image so to speak). Usually, the values in the kernel add up to one. This is to make sure no energy is added or removed from the image after the operation.

Specifically, a Gaussian kernel (used for Gaussian blur) is a square array of pixels where the pixel values correspond to the values of a Gaussian curve (in 2D).

Each pixel in the image gets multiplied by the Gaussian kernel. This is done by placing the center pixel of the kernel on the image pixel and multiplying the values in the original image with the pixels in the kernel that overlap. The values resulting from these multiplications are added up and that result is used for the value at the destination pixel. Looking at the image, you would multiply the value at (0,0) in the input array by the value at (i) in the kernel array, the value at (1,0) in the input array by the value at (h) in the kernel array, and so on. and then add all these values to get the value for (1,1) at the output image.

To answer your second question first, the larger the kernel, the more expensive the operation. So, the larger the radius of the blur, the longer the operation will take.

To answer your first question, as explained above, convolution can be done by multiplying each input pixel with the entire kernel. However, if the kernel is symmetrical (which a Gaussian kernel is) you can also multiply each axis (x and y) independently, which will decrease the total number of multiplications. In proper mathematical terms, if a matrix is separable it can be decomposed into (M×1) and (1×N) matrices. For the Gaussian kernel above this means you can also use the following kernels:

$$\frac1{256}\cdot\begin{bmatrix} 1&4&6&4&1\\ 4&16&24&16&4\\ 6&24&36&24&6\\ 4&16&24&16&4\\ 1&4&6&4&1 \end{bmatrix} = \frac1{256}\cdot\begin{bmatrix} 1\\4\\6\\4\\1 \end{bmatrix}\cdot\begin{bmatrix} 1&4&6&4&1 \end{bmatrix} $$

You would now multiply each pixel in the input image with both kernels and add the resulting values to get the value for the output pixel.

For more information on how to see if a kernel is separable, follow this link.

Edit: the two kernels shown above use slightly different values. This is because the (sigma) parameter used for the Gaussian curve to create these kernels were slightly different in both cases. For an explanation on which parameters influence the shape of the Gaussian curve and thus the values in the kernel follow this link

Edit: in the second image above it says the kernel that is used is flipped. This of course only makes any difference if the kernel you use is not symmetric. The reason why you need to flip the kernel has to do with the mathematical properties of the convolution operation (see link for a more in depth explanation on convolution). Simply put: if you would not flip the kernel, the result of the convolution operation will be flipped. By flipping the kernel, you get the correct result.

16

Here is the best article I've read on the topic:
*Efficient Gaussian blur with linear sampling*. It addresses all your questions and is really accessible.

For the layman very short explanation: Gaussian is a function with the nice property of being separable, which means that a 2D Gaussian function can be computed by combining two 1D Gaussian functions.

So for a $n \times n$ size ($O(n^2)$), you only need to evaluate $2 \times n$ values ($O(n)$), which is significantly less. If your operation consists in reading a texture element (commonly called a *"tap"*), it is good news: less taps is cheaper because a texture fetch has a cost.

That's why blurring algorithms use that property by doing two passes, one to blur horizontally by gathering the $n$ horizontal pixels, and one to blur vertically by gathering the $n$ vertical pixels. The result is the final blurred pixel color.

11

The most important thing to consider when implementing the Gaussian blur is, as others have pointed out, to separate the 2D convolution filter into two 1D convolutions because it brings the complexity down from $O(n^2)$ to $O(n)$.

But there are two more tricks you might want to consider in an actual implementation:

The filter has a certain radius and because of that, at the very borders, you will need to calculate with pixels which fall outside of the image. In such case, you could try one of the following: for the outside pixels you simply take the last possible value (i.e. the pixel at the very border, as in `max(x, 0)`

. Or you could "reflect" the image towards outside (as in `x < 0 ? -x : x`

). Or you could simply stop at the border but then you would need to adjust the denominator in convolution filter so that it sums up to 1. For example:

$$ \operatorname{sum} \frac{1}{256} \begin{bmatrix} 1 & 4 & 6 & 4 & 1 \\ 4 & 16 & 24 & 16 & 4 \\ 6 & 24 & 36 & 24 & 6 \\ 4 & 16 & 24 & 16 & 4 \\ 1 & 4 & 6 & 4 & 1 \\ \end{bmatrix} = \operatorname{sum} \frac{1}{225} \begin{bmatrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 16 & 24 & 16 & 0 \\ 0 & 24 & 36 & 16 & 0 \\ 0 & 16 & 24 & 16 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix} = 1. $$ Another trick concerns how to calculate the actual coefficients of the kernel. Obviously you could try to implement the Gaussian function but much faster way is to observe that the 1D kernel reassembles Pascal's triangle. For example:

```
1
1 1
1 2 1
1 3 3 1
[1 4 6 4 1]
```