如何在這種光投射算法中防止出現波紋現象?


5

我正在用OpenGL計算著色器編寫2D光投射算法。該算法很簡單:對於每個光源,從光源射出的光線,光線始於光坐標,結束於光源周圍的圓圈(光距)中的一個像素。計算射線樣本時,其下方的像素可以模擬光的顏色/ alpha變化。該算法效果很好,但是遺漏了一些像素,因此它們保持黑色,從而導致類似莫爾條紋的偽像。光線是在計算著色器中並行計算的,光線邊框圓中的每個像素都有一個線程。例如,這意味著對於半徑為256的燈光,GPU上有2 * 256 * Pi = 1608個線程,每個線程計算一條光線。

此處提供了相關的計算著色器代碼:

#define LOCAL_WG_SIZE 128u
const float PI = 3.1415926535897932384626433832795;
const float RAY_LENGTH = 256.0f;

// get the number of the current thread (0...1608)
uint renderNodeNum = local_coords;
// get the endpoint of the ray, it will be on a circle
endPoint.x = int(RAY_LENGTH * sin(float(renderNodeNum) * PI / 1024.0f)) + lightPos.x;
endPoint.y = int(RAY_LENGTH * cos(float(renderNodeNum) * PI / 1024.0f)) + lightPos.y;

// vector approximation. Works, but has moire artifacts.
// I've also tried Bresenham's line algorithm, but it leaves a cross shape as the light fades which looks ugly.
vec2 dt = normalize(vec2(endPoint - lightPos));
vec2 t = vec2(lightPos);
for (int k = 0; k < RAY_LENGTH; k++) {
    coords.x = int(t.x);
    coords.y = int(t.y);

    // calculate transparency
    transpPixel = imageLoad(transpTex, coords);   
    currentAlpha = (transpPixel.b + transpPixel.g * 10.0f + transpPixel.r * 100.0f) / 111.0f;
    // calculate color
    colorPixel = imageLoad(colorTex, coords);
    lightRay.rgb = min(colorPixel.rgb, lightRay.rgb) - (1.0f - currentAlpha) - transmit; 
    currentOutPixel = imageLoad(img_output, coords);
    currentOutPixel.rgb = max(currentOutPixel.rgb, lightRay.rgb);
    currentOutPixel.a = lightRay.a;
    // write color
    imageStore(img_output, coords, currentOutPixel);

    t += dt;
}

這裡是它的外觀:

Moire artifacts

另一個帶有彩色背景並顯示光傳播並帶有相同偽像的示例:

same Moire artifacts here too

因此,我需要一種更好的算法來繪製這些光線(或在計算著色器上高度可並行化的任何其他方法),而不會遺漏一些像素(圖片中的黑點)。我可以對光線進行過度採樣(例如,拍攝兩倍的光線),但這會非常昂貴,必須有一個更便宜的解決方案。

2

In principle you avoid using scatter (casting) behavior with GPU. They have offered random output coordinate write out since only shader model 5 as a need for extreme situations. But you should as general rule write your GPU code in a "gather" fashion.

The difference: the hardware threads are logically soft-locked to one output position in the render target. The scheduler decides to what rectangle (or cube) in the target buffers, the kicked thread group will output results.

So you should work around the designated destination, and figure out the start; instead of working from some start and computing a dynamic destination.

This way not only will you please the hardware by avoiding contention, and race conditions completely; but also you'll avoid holes.