* You are viewing Posts Tagged ‘Graphics’

## Loose Grids

Spatial data structures are an important part of game engines, with uses in various systems such as collision detection, rendering and gameplay. Octrees, BSP trees and bounding volume hierarchies are just a few of the options available. However, for a many of applications these complicated hierarchical structures aren’t a good choice. Tree structures structures often have poor cache performance and aren’t ideally suited for SIMD or other parallel processing techniques. They can also have considerable cost when updating adding, removing or updating the positions of the objects. Instead, setting up your data in a way that allows brute force processing with highly optimized code may be faster, even if you typically end up processing a lot more objects.

The most cache friendly data structure is an array (or possibly a structure of arrays). It also allows trivial updating of objects, but for some data sets, putting all objects into a single array may be too slow. Most likely, we can process a lot of items in an array very efficiently, but the total number of objects is an order of magnitude too high to make this practical.

The next simplest method is a regular grid, where each cell stores any array of the objects inside it. Typically you can make the cells pretty big (to amortize cache misses and maintain efficient processing over the contents) and for decently distributed objects we end up processing many fewer objects. However, regular grids have some drawbacks. The first draw back is that objects might straddle multiple grid cells and need special handling. A typical solution is to store the object in every cell it overlaps, but this makes the book-keeping more complicated and can affect your ability to process cells in parallel. The second draw back is that grids have finite size, so you need to either prevent any objects from moving outside the bounds of the grid, or resize the grid. Resizing can be ugly since it involves touching everything in the grid.

The first drawback can be solved by borrowing an idea from Thatcher Ulrich’s loose octree data structure. Like the loose octree, we expand each cell of our grid by a fixed amount so that each cell overlaps its neighbors. When inserting an object into the grid, we place the object into any of the cells it is contained within. As long the object is smaller than the overlap size, we only need to store it in a single cell.

This overlap scheme affects how we traverse the grid when performing operations on the object. If we wanted to find all of the objects within a sphere, we need to test all of the grid cells whose expanded versions fall within the sphere. This means we’ll end up testing more cells than in the regular grid, but remember we are assuming that processing the objects is relatively fast due to our efficient array layout.

This grid works nicely if all of the objects are smaller than the overlap size, but if an object exceeds it, it can still can fall into multiple cells. In most applications it will probably be reasonable to impose some sort of maximum size on objects. However, this hard limit can be transformed into a “soft” one by creating an additional cell where any objects above this maximum size are placed. In the example of querying for objects in a sphere, after we tested all of the cells overlapping the sphere, we also test this “overflow” cell.

The overflow cell also allows us to deal with objects moving outside the grid. Like with the soft limit of object size, we assume that most objects will be within our predefined grid bounds, and put any objects which violate that constraint in the overflow cell. Since the number of outliers stored in that cell is small, the additional testing should be negligible.

Loose grids may not be the best candidate for all situations, but if you are looking for an easy way to spatially partition a data set into manageable parts, it can be a good option. We’ve had success with this setup for frustum/occlusion culling, broadphase collision detection and gameplay queries in the engine for Natural Selection 2.

## First person view model rendering

In a first-person game, the player’s hand or weapon is often represented by a model called the “view model” in the terminology established by first-person shooters. One important consideration is that this view model does not intersect the world geometry when the player is close to a wall. It’s not always possible to limit the player’s movement so that this doesn’t happen, so special rendering methods are used to draw the view model.

The most obvious method is to draw the world first, clear the z-buffer and then draw the view model. This has a few disadvantages. The biggest one is that at end of the drawing process you don’t have a single z-buffer describing the entire scene.

This is important if you are using a deferred rendering process where the light volumes will need to be tested against the z-buffer. This method also requires an additional clear of the z-buffer and doesn’t take advantage of any early z-rejection for pixels blocked by the view model.

Fortunately there’s a different method which is very simple and avoids these problems. By manipulating the projection matrix, we can map the view model and the world into mutually exclusive parts of the z-buffer. First a little refresher on projection and the z-buffer (assuming Direct3D conventions, OpenGL is slightly different):

The result of transforming a point (x, y, z, 1) by the projection matrix is a homogenous vector (x’, y’, z’, w’). The z-buffer stores the value z’/w’, which is in the range 0 to 1 after clipping. For your run of the mill projection matrix, w’ = z, which is what you’re probably saving to the G-buffer if you’re doing deferred rendering. Our manipulation of the projection matrix should not affect x’, y’ or w’ so that everything else should continue to work properly.

To map to a sub-region of the z-buffer, we we want to apply another transformation to remap z’/w’ from the range [a, b] instead of [0, 1]:

$z''/w'' = z'/w' \cdot (b - a) + a$

or, since we require that w” = w’:

$z'' = z' \cdot (b - a) + w' \cdot a$

Finally, this can be easily expressed as a matrix multiplication:

$\left[ {\begin{array}{c} x'' \\ y'' \\ z'' \\ w'' \\ \end{array} } \right] = \left[ {\begin{array}{c} x' \\ y' \\ z' \cdot (b - a) + w' \cdot a \\ w' \\ \end{array} } \right] = \left[ {\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & b - a & a \\ 0 & 0 & 0 & 1 \\ \end{array} } \right] \times \left[ {\begin{array}{c} x' \\ y' \\ z' \\ w' \\ \end{array} } \right]$

To compute the final projection matrix, just pre-multiply the original projection matrix P with this adjustment matrix M:

$\left[ {\begin{array}{c} x'' \\ y'' \\ z'' \\ w'' \\ \end{array} } \right] = M \times \left[ {\begin{array}{c} x' \\ y' \\ z' \\ w' \\ \end{array} } \right] = M \times P \times \left[ {\begin{array}{c} x \\ y \\ z \\ 1 \\ \end{array} } \right]$

Using a matrix that maps to [0, ε] for the view model and a matrix that maps to [ε, 1] for the world allows you to ensure they never overlap each other. Just keep in mind that you will be reducing the precision of your z-buffer this way, so try to make ε as small as possible.

## Text pre-processing

Text pre-processing is a useful technique for adding features to systems that use text files as input. Probably the most common example of pre-processing is the C pre-processor. This is the mechanism that handles comments, #includes and #defines before C/C++ code is feed into the compiler. These are features that could be used in many different types of text files and Boost Wave is one C pre-processor implementation that’s readily available.

In addition to using a “generic” pre-processor we can also create very domain specific pre-processing functions. Here are two examples.

We use the D3DX FX system for our shaders. One problem you can run into with shaders is requiring all sorts of different combinations of parameters, like does this shader use normal mapping, skinning, etc. Writing all of these combinations out by hand can be tedious and the FX system doesn’t directly have anything to support the concept of shader permutations.

technique WriteDeferred[AlphaTest][NormalMap][Skinned]
{
pass p0
{
PixelShader     = compile ps_2_0 WriteDeferredPS(AlphaTest, NormalMap);
#if AlphaTest
AlphaTestEnable = True;
AlphaFunc       = Greater;
AlphaRef        = 128;
#endif
}
}

With this enhanced syntax, we instruct the pre-processor to generate all combinations of the WriteDeferred technique with the AlphaTest, NormalMap, and Skinned options. This results in 2^3 = 8 combinations:

WriteDeferred
WriteDeferredAlphaTest
WriteDeferredNormalMap
WriteDeferredAlphaTestNormalMap
WriteDeferredSkinned
WriteDeferredAlphaTestSkinned
WriteDeferredNormalMapSkinned
etc.

Here’s what it expands to after our pre-processor runs on it:

technique WriteDeferredNormalMap
#define AlphaTest 0
#define NormalMap 1
#define Skinned 0
{
pass p0
{
PixelShader     = compile ps_2_0 WriteDeferredPS(AlphaTest, NormalMap);
#if AlphaTest
AlphaTestEnable = True;
AlphaFunc       = Greater;
AlphaRef        = 128;
#endif
}
}
#undef AlphaTest
#undef NormalMap
#undef Skinned

Note that our custom pre-processor generates instructions for the standard C pre-processor that the D3DX FX system understands. Generating pre-processor #defines based on the permutation allows us to customize the body of the technique for each configuration. We also use to it alter the compilation of vertex and pixel shaders using compile-time constants.

This is a simple example, where each of the “parameters” is either on or off, but since we have complete control in our pre-processor, we can make things as complex as we like. Here’s another example from our system, which allows us to specify a set of values to choose from (parenthesis mean the parameter always has one of the values, brackets mean the parameter is optional).

{
// ...
}

One caveat to modifying the input to the FX compiler this way is that the line numbers will be incorrect if it reports an error. This easily solved by maintaining a separate mapping from old line numbers to new ones. Using this, any error messages that reference line numbers can be fixed up before logging them.

Lua Profiling

Here’s a piece of Lua code that’s marked up for profiling:

function Actor:UpdateBoneCoords()

PROFILE("Actor:UpdateBoneCoords")

local model = Shared.GetModel(self.modelIndex)

if (model ~= nil) then
local poses = PosesArray()
self:BuildPose(poses)
return
end

self:UpdatePhysicsModelCoords()

end

We use this same method for instrumenting our C++ code for profiling. In C++, PROFILE is a #define which creates a C++ object on the stack. When this object is constructed profiling begins, and when it’s destroyed (because it goes out of scope) profiling ends. In a ship build, the PROFILEs are #defined to nothing to eliminate any overhead from the final game.

In Lua, there’s no way to directly implement this because we don’t have control over an object’s lifetime. There’s also no built-in pre-processor, so there’s no way to completely eliminate the PROFILE markers. Both of these problems can be solved with a custom pre-processing step. Our pre-processor searches for the PROFILE token and inserts function calls at it’s location, and at every location where we exit the Lua block that contains it.

Here’s what the pre-processed output looks like:

function Actor:UpdateBoneCoords()

__profile_start(10)

local model = Shared.GetModel(self.modelIndex)

if (model ~= nil) then
local poses = PosesArray()
self:BuildPose(poses)
__profile_end(10); return
end

self:UpdatePhysicsModelCoords()

__profile_end(10); end

In addition to inserting the __profile_begin and __profile_end markers, the string “Actor:UpdateBoneCoords” was also inserted into a string table and the index was substituted. This allows the profiler to run with as little overhead as possible. Handling all of the different syntax cases in Lua is a little complex, but the code is only around 150 lines of code.

With the Lua pre-processor, it was possible to maintain the line numbers using Lua’s ability to have multiple statements on a single line. Keeping the line numbers identical eases error reporting and Decoda integration.

## Specular Color in Light Pre-pass Renderer

Light pre-pass and inferred lighting have some nice properties that make them popular alternatives to traditional deferred rendering. Like a deferred renderer, a light pre-pass renderer generates a G-buffer and then accumulates the lighting using screen space operations. A light pre-pass renderer differs from a deferred renderer by only computing part of the lighting equation in the passes. The full equation is then computed in a final pass.

Here is the basic equation for computing the lighting for a surface using the Blinn-Phong reflection model:

$D \sum C_i (N \cdot L_i) + S \sum C_i (N \cdot H_i)^G$

D is the diffuse reflectance of the surface, S is the specular reflectance, G is the glossiness and N is the surface normal. These are all properties of the material and are constant over all of the light passes. The light-dependent parameters are L (the direction to the light source), C (the color of the light source) and H (the half vector).

With a light pre-pass renderer, the calculations are performed as they are organized in this equation. First the renderer loops over all of the lights computing the summed terms. Then a final pass is performed in which the lighting results are combined with the material parameters D and S.

By doing the lighting this way you can reduce the width of the G-buffer, since you don’t need to store the diffuse or specular components (we do however need the glossiness). In the case of inferred lighting, you can also compute the lighting at a lower resolution than the final image while still having full resolution textures.

A light pre-pass renderer is typically implemented using a RGBA render target and each pass accumulates the diffuse part of the equation in the RGB channels and the specular part in the A channel. While we would ideally like to store the diffuse and specular components in 3 channels each, this requires writing into multiple render targets. Currently support for alpha blending into multiple render targets is uncommon and the result would require more bandwidth and memory. Since the specular component in our original equation is modulated by the light color we lose information when we store it in a single channel of the render target.

With this adjustment to the equation, here’s what we’re computing in the lighting passes:

$\begin{tabular}{ l c l } $${target}_{RGB}$$ & $$=$$ & $$\sum C_i (N \cdot L_i)$$ \\ $${target}_A$$ & $$=$$ & $$\sum {intensity}(C_i) (N \cdot H_i)^G$$ \end{tabular}$

One addition is the intensity function. This is used to convert the light color into a monochrome value so that dim lights will have less pronounced specular highlights. A photometric function could be used, or we could treat the color as a vector and compute the length; we’ll return to this later.

One way to reconstruct the lighting in the final pass is to scale the diffuse component by the material’s diffuse reflectance and the specular component by the specular reflectance and add them together:

$D * {target}_{RGB} + S * {target}_A$

Because the specular lighting component is monochrome, we’re computing the lighting as if the light sources were all white; the highlight will pickup the color of the material, but it won’t be affected by the original color of the light. If all of your lights are white — like in an office building perhaps — this is fine. However, when you have intense colored lights the results can look a little strange:

Fortunately, there’s a simple approximation we can apply to get much better results. Since the diffuse lighting includes the color of the light source encoded in it, with a little manipulation we can extract it and apply it to the specular component.

In the case of a single light source, the RGB portion of the render target simply contains the light color multiplied by diffuse attenuation term. If we treat the color as a vector, we can decompose it into a normal vector and a magnitude:

$\begin{tabular}{ l c l } $${target}_{RGB}$$ & $$=$$ & $$C_0 (N \cdot L_0)$$ \\ & $$=$$ & $$\hat{C_0}\left \| C_0 \right \| (N \cdot L_0)$$ \\ $$\hat{C_0}$$ & $$=$$ & $${normalize}({target}_{RGB}), \mbox{for} \; (N \cdot L_0) \neq 0$$ \end{tabular}$

The final equation shows the normalized light color extracted from the render target. Note that the required condition will always be true where we want the specular highlight to appear. If we use the length of the light color as our intensity function during the lighting passes, then multiplying by the normalized color in the final pass gives us the correct result in the case of a single light source.

$D * {target}_{RGB} + S * {normalize}({target}_{RGB}) * {target}_A$

Here’s what the result looks like when we extract the light color from the diffuse and apply it to the specular:

As mentioned, this only holds for a single light source. When there are multiple light sources our calculations become an approximation. Thankfully, the results look quite good in real world scenarios with this method and there’s very little cost to including it in a light pre-pass or inferred renderer. Comparing with a reference image, it’s easy to see the discrepancies caused by this approximation, but when viewed on its own the artifacts are not very noticeable.

## Blending Terrain Textures

One of the most common methods for texturing a height map terrain is with multiple tiling layers.  These texture layers are stacked on top of each other and blended together to create the final textured look. In addition to the tiling texture, each layer also has an opacity map which controls how much of the texture is blended in at any point on the terrain. Unlike the tiling texture this opacity map is stretched over the entire terrain and is therefore very low detail.

Textured Terrain in Titan Quest

In a simple example, our terrain could have two layers: sand and grass. Anywhere the opacity map for the grass layer was painted 1 (opaque) we would just see grass. Where the opacity map was 0 we would see sand, and where it was 0.5 we’d see a mix of sand and grass. Note that the bottom layer doesn’t have an opacity map since the base layer must appear everywhere. This blending operation for two textures is implemented like this:

float4 blend(float4 texture0, float4 texture1, float layerOpacity)
{
return lerp(texture0, texture1, layerOpacity);
}

The problem with this basic technique is that alpha blending a grass texture with 50% opacity over a sand texture doesn’t really look very good when you look at a closely:

The blended result on the bottom resembles green sand more than a patch of sandy grass.  If the textures were more closely related this would work better, so this can be solved by adding additional artist crafted transition textures.  This however creates more work for the artists, and requires us to render an additional texture layer. Finally, for every combination of textures that we’re going to layer we’d need a different transition texture, and ultimately we might need several.

There is however what I consider to be a better an easier solution which we successfully used on Titan Quest. The idea is that instead of drawing 50% of each texture at every pixel, at 50% of the pixels we draw one texture and at the other 50% we draw the second. This is what it looks like when we do this:

To implement this, we store a mask in the alpha channel of each of the tiling textures. This mask uses the entire range of values and determines whether or not a texel will be displayed when the layer has a particular opacity. Specifically, a texel is displayed if the mask is less than the opacity value.  This new, splotchy blending technique is simply implemented like this:

float4 blend(float4 texture0, float4 texture1, float layerOpacity)
{
if (texture1.a < layerOpacity)
{
return texture1;
}
else
{
return texture0;
}
}

The result has a more natural look, and the splotching pattern can be tailored for each type of texture individually. While this grass texture uses a noisy pattern, textures with more structure — such as stones — could be setup so that the blending follows the structure.  With stones, that the alpha mask can be created with high values in the cracks and low values on the flat surfaces.  When this is blended on top of a grass texture, the grass will first appear only in the cracks as the layer opacity is increased.

Because this is a binary decision, the result has hard edges and suffers from aliasing.  It looks best if we smooth out the edges between the textures by doing a small amount of regular alpha blending between the textures. Here’s one way to implement it, although other methods can certainly be used (for Titan Quest we used a simpler method to fit within the instruction limits of ps1.0):

float4 blend(float4 texture0, float4 texture1, float layerOpacity)
{

const float blendRange = 0.1;

if (texture1.a < layerOpacity - blendRange)
{
return texture1;
}
else if (texture1.a > layerOpacity + blendRange)
{
return texture0;
}
else
{
float f = (texture1.a - layerOpacity + blendRange) / (2 * blendRange);
return lerp(texture1, texture0, f);
}
}

Here’s what the final result looks like:

Although these examples show two textures, any number of layers can be blended this way, just like with the basic technique. This type of blending also isn’t limited to terrain textures, and interesting effects can be produced when the layer opacity is animated. For example,you can see the same type of blending at work in this prototype of dynamic infestation for Natural Selection 2: