From: Subject: An enhanced OpenGL renderer for Unreal Tournament Date: Fri, 15 Jul 2005 10:19:33 +0200 MIME-Version: 1.0 Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Content-Location: http://cwdohnal.home.mindspring.com/utglr/ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1506 An enhanced OpenGL renderer for Unreal = Tournament

An enhanced OpenGL = renderer for=20 Unreal Tournament

Last updated July 11, = 2005


I added some enhancements to the OpenGL Renderer for Unreal = Tournament. A=20 Win32/x86 binary and the source code are available on this = page.

The settings= page=20 documents some of the options.

Latest news
I built an updated D3D8 renderer. It has the=20 experimental ZRangeHack option, which is designed to avoid problems with = flickering decals in the distance. Like with the OpenGL renderer, it = would have=20 been better if this issue were dealt with in higher level engine code = since it=20 could be done more efficiently and with less risk of side effects. = ZRangeHack=20 had to be implemented a little differently in this renderer. I ran some = tests,=20 but there may still be some risk of it failing in certain cases. There = are also=20 various other mostly minor updates in this build. The file is utd3d8r1= 1.zip.

7-3-2005
Version=20 3.1 is released. I added a function that uses SSE for buffering fogged = actor=20 vertices. There's a fix for a minor detail that probably never breaks = anything,=20 but is still good to take care of anyway. Some very minor optimizations = were=20 added in a few places.

6-12-2005

Rune = renderer
I=20 built a new renderer for Rune. It's the 3.0 code built to work with = Rune. It=20 also contains minor additional optimizations to the Rune specific fog = code. The=20 file is runeglr1= 0.zip=20 (works with Rune version 1.07).

Deus Ex renderer
I = built a new=20 renderer for Deus Ex. It's the 3.0 code built to work with Deus Ex. The = file is=20 dxglr14.zi= p=20 (works with Deus Ex version 1112fm). OneXBlending is enabled by default = in this=20 renderer, but if the brightness looks off, in addition to GammaOffset=20 adjustments, also make sure OneXBlending=3DTrue in the=20 [OpenGLDrv.OpenGLRenderDevice] section of your DeusEx.ini file. A new = option=20 called SceneNodeHack was added (in the previous version). Enabling this = may work=20 around some minor problems, though it wasn't tested extensively, so = there's a=20 chance it might cause other problems.

Old files
I = removed a few=20 old files. This includes old versions of the Deus Ex renderer = dxglr10.zip and=20 dxglr11.zip, and the one renderer u1glr10.zip that I built for Unreal = Gold. For=20 Unreal Gold, and other versions of Unreal, use the newer renderer from=20 OldUnreal.

6-4-2005
Version 3.0 is released. It adds a = few new=20 major features.

Single pass fog
Support for single pass = fog=20 mode is now present in the OpenGL renderer. It requires at least 3 = texture units=20 and support for the GL_ATI_texture_env_combine3 extension or the=20 GL_NV_texture_env_combine4 extension. The bad news is that if you don't = have an=20 ATI or NVIDIA video card, you probably can't use single pass fog mode = since=20 there is no standard extension that provides access to the required = texture=20 blending mode (for fixed function), and it looks like other vendors = decided not=20 to support either of these extensions. The SinglePassFog option enables = single=20 pass fog mode if supported.

DetailMax
Support for an = extra=20 detail texture layer was added. This feature is already present in some = OpenGL=20 renderers from Oldunreal and some Direct3D renderers. The extra detail = texture=20 layer, which adds even more detail at short distances, can be enabled by = settings DetailMax to 2. If set to 0 or 1, standard one layer detail = texturing=20 will be used if detail textures are enabled. I limited DetailMax to 2 = because my=20 testing showed that the third layer only starts to blend in when right = up=20 against a wall, at which point it makes almost no difference. Also, the = second=20 detail texture layer will not show up unless SinglePassDetail is=20 disabled.

Masked S3TC texture support
Masked S3TC = textures are=20 given an RGBA format that supports single bit masking instead of an RGB = format=20 that does not like in previous renderers. This is enabled by default and = hopefully does not break any existing uses of textures. The new debug = option=20 NoMaskedS3TC can be used to disable this feature (though it really = should be=20 considered a fix for a bug that's always been there). If creating new = masked=20 S3TC textures, this option can also be used to see how they would look = with an=20 older renderer that does not support masked S3TC textures. Also note = that this=20 change most likely will not fix existing problems with black areas in = older=20 original S3TC skybox textures since all of these do not appear to have = been=20 encoded with proper masking information.

ZRangeHack
I = activated=20 the experimental ZRangeHack code, fixed a few things, and adjusted it a = little=20 bit for safety. There's still some risk, so no guarantee it works in all = cases.=20 This hack fix isn't enough to make a 16-bit z-buffer work. It can = however fix=20 the problem with decals flickering in the distance with 24-bit = z-buffers. It=20 will also fix the issue with the Redeemer covering up part of the HUD. = It may=20 have the same effect on other similar weapon models used in various = mods, though=20 I didn't test any of these potential cases, so I can't say for sure, or = if it=20 might cause problems in these cases. Set ZRangeHack to True to enable = this=20 experimental option.

Notes and other minor changes
The = single=20 pass fog support was difficult to add for the OpenGL renderer. NVIDIA, = ATI, and=20 others involved with OpenGL in this area really made a mess out of = things here.=20 NVIDIA has one extension, GL_NV_texture_env_combine4, that provides = access to=20 great functionality that should have always been the standard at this = level, but=20 does a lot of bad things in the way the extension was implemented. The = ATI=20 extension, GL_ATI_texture_env_combine3, doesn't seem too bad, though = there are=20 still things I don't like about this and how tex env stuff in general = ended up=20 in OpenGL. Unfortunately, no other vendors appear to support the simpler = ATI=20 extension that provides minor additional functionality, which their = hardware=20 most likely supports. So, for myself and other developers interested in = using=20 this functionality, we get to write and test two different code paths to = support=20 NVIDIA and ATI, and end up without access to certain features on other = video=20 cards unless using fragment programs or Direct3D. I pulled some previous = optimization to implement single pass fog mode, though the overall = effect should=20 be very minor. This could be fixed, but it's even more extra work, = including=20 testing the updates on two different video cards.

Enabling = ZRangeHack=20 will slow things down a little bit, but the difference will probably = still be=20 very small. It would be better if the issues this hack fix addresses = could be=20 dealt with in higher level code, but it has been a while, and with not = much else=20 to add, I decided to try adding this one, mainly to get rid of the = flickering=20 decals with 24-bit z-buffers that will be the maximum available on most = video=20 cards for quite a while it seems.

I added some minor = optimizations and=20 one or two minor fixes. There's some new inline assembly for the various = gouraud=20 polygon buffering functions. It will help fogged triangles on all = systems, the=20 colored triangles path on systems without SSE support, and the buffer = additional=20 triangles path on all systems. I also ran some benchmarks on the new = code that=20 buffers 3 vertices on my P4 2.8C. For colored, execution time is reduced = by=20 7.5%, and it's 15% for fogged. The existing SSE colored function reduces = execution time by an additional 25% compared to the new colored function = with=20 the improved inline assembly. But, the architecture of the renderer that = I can't=20 change isn't really designed for moving lots of polygons quickly, so due = to high=20 overhead, these changes will make very little difference=20 overall.

5-22-2005
I built an OpenGL renderer for Rune, = which=20 is based on the 2.9 code. Consider it a beta since I didn't run = extensive tests=20 on all areas of it. There is a chance that some things might have been = missed=20 since I didn't merge all modifications into this one. It's difficult to = tell if=20 all of the modifications are necessary, though I'm fairly sure that some = are old=20 stuff that isn't used. The file is runeglr0= 9.zip.=20 If you notice anything that might be broken, make sure it isn't the same = in the=20 original OpenGL renderer. Although it still could be a bug, if it's the = same in=20 the original OpenGL renderer, it may not be a bug that I can=20 fix.

5-15-2005
I ran a few tests on the D3D8 renderer = built for=20 D3D9, which only required minor modifications. With D3D9, V Sync in a = window=20 control is available and it has access to a more rational z-bias = implementation.=20 It also tends to run slightly slower.

5-7-2005
I built = a new=20 version of the D3D8 renderer. This one is a bit faster with interleaved=20 vertex/color data, larger vertex buffers, and the BufferTileQuads code = added.=20 BufferTileQuads is enabled by default in this renderer since not having = it hurts=20 D3D a lot more than OpenGL, and I don't have to be concerned about any = backwards=20 compatibility issues. I also added a few more features and some minor=20 optimizations. The file is utd3d8r1= 0.zip.

I=20 didn't add paletted texture support to this renderer, so if you have a=20 GeForce1-4 series video card, you should make sure to use the OpenGL = renderer=20 and enable the settings that tell it to use paletted textures (these are = disabled by default). Also, on other video cards with good enough OpenGL = driver=20 support, the OpenGL renderer may be better.

Performance = differences=20 between this renderer and the OpenGL renderer are fairly small on my = system,=20 though it does tend to be a little slower. It may be possible to improve = this in=20 some cases by interleaving the texture arrays, but this is a lot of = extra work,=20 so I may not try it. It doesn't help that D3D seems to have poor small = batch=20 performance in general due to intrinsic design/implementation = characteristics.=20 There's no avoiding this after a certain point since UT has fairly low = geometric=20 complexity.

So, D3D is far simpler compared to OpenGL in the = feature set=20 it supports on the API side and yet ends up with far worse small batch=20 performance. In various places in the renderer, it's possible to get = moderate=20 performance with a minor amount of work using OpenGL, but with D3D, it = requires=20 extra work just to make it work at all and end up with poor performance. = With=20 either API, it's possible to get higher performance by adding more = advanced=20 buffering schemes such as actor triangle buffering, clipped actor = triangle=20 buffering, BufferTileQuads, etc. This D3D renderer will be far slower = than the=20 OpenGL renderer for line drawing since it lacks advanced buffering in = this area.=20 This shouldn't be a problem with the editor because I don't support it = with this=20 renderer anyway since selection support is not implemented. Hopefully = line=20 drawing isn't used too heavily, or at all, outside the=20 editor.

z-buffer issues
Like the OpenGL renderer, this = D3D=20 renderer may have problems with far away decals flickering due to = z-buffer=20 precision issues if only a 24-bit z-buffer is available. It doesn't = support=20 w-buffering either, though it looks like a lot of newer video cards = don't=20 support this feature anyway. It's probably possible to work around this = problem=20 in the renderer, though it may not be anything I'll add. Of course if = all these=20 new GPUs/VPUs didn't drop support for 32-bit z-buffers, this wouldn't be = a=20 problem.

4-26-2005
I finally decided to learn Direct3D = in case=20 knowing it would be good for a future job. Porting the renderer only = added a few=20 days, with a lot of that time spent dealing with things D3D makes = difficult, so=20 I tried building one that uses D3D. D3D has gotten better in recent = versions,=20 but some areas are still problematic. I'm sure glad I never used D3D7 or = earlier.

This renderer will most likely be slower than the OpenGL = one on=20 ATI, NVIDIA, or other graphics cards that at least have reasonably good = OpenGL=20 drivers. I also left out a few likely significant optimizations in the = current=20 build that may limit its performance. I guess I'll find out later if = fixing=20 these can bring it up to the speed of the OpenGL renderer on my system. = It uses=20 D3D8 and since it uses certain advanced features, it will not function = on=20 various older video cards. Also, due to certain SDK complications, I = think it=20 ends up requiring at least DirectX 8.1, which I believe means it will = not=20 support Win95. You can download the initial beta renderer here= . It=20 comes with a .int file that also goes in the UnrealTournament\System = directory.=20 To use it, change video drivers in UT, select show all devices, and then = select=20 the new entry that says Experimental Direct3D8 Support. Remember, it = probably=20 isn't worth trying this beta build if the OpenGL renderer runs okay on = your=20 system.

I added single pass fog mode to this one, since it = happened to be=20 easy with D3D. The required blend mode on the OpenGL side requires one = extension=20 for NVIDIA, another extension for ATI, and probably just isn't there for = various=20 other video cards since providing a standard way to access it on the = fixed=20 function side seems to have been forgotten about. It's too bad some of = the other=20 vendors didn't at least add support for the ATI version of the extension = since=20 it doesn't really add much and their hardware probably supports it all. = I'll=20 check the standard extensions again sometime, but I don't think the=20 functionality required for single pass fog in UT is there.

I'm = checking a=20 large number of caps bits/values in this build, but a few checks are = still=20 missing. I'll probably fix a few of these later, but may leave a few of = the more=20 complicated ones out.

Windowed mode, windowed mode resizing, and=20 surviving through various mode switches should work, but some things in = this=20 area get awfully difficult to support and test when using D3D. Windowed = mode=20 screen shots hopefully work okay, including without crashing in various = special=20 cases when the window isn't fully within the screen. D3D still makes = something=20 basic like grabbing a copy of what got rendered far too difficult in = cases like=20 this.

This initial build of this renderer supports a large number = of=20 features, but some are missing at this time.
- Selection support for = UnrealEd=20 isn't there. I may never add it, so don't use it with the editor (other=20 functionality should work, but it's not really usable there without this = feature).
- S3TC support is there.
- 16 bit texture support is = there, but=20 I did the conversions using simple clipping instead of proper = rounding.
- Not=20 checking texture aspect ratio restrictions yet, so if any specific = requirements=20 here, it may just crash when trying to load certain textures (good = chance this=20 may not be an issue on any new enough cards to run this renderer = though).
- V=20 Sync on or off request only works full screen. D3D8 doesn't allow = something=20 basic like V Sync on or off to be requested when in windowed mode. I = believe=20 this got fixed in D3D9.
- All the texture filtering modes and LOD = bias should=20 work.
- No paletted texture support, and I'm not sure I'll ever add = it to=20 this one.
- Lots of other features are supported, but a few others = are=20 not.

4-20-2005
Version 2.9 is released. It has a new = option=20 called BufferTileQuads, which should speed up text rendering. It was = easy to add=20 and seems to work well in various cases. It's disabled by default = because it's=20 new, but it should be fairly safe to enable. UseSSE was changed to a = True/False=20 option and it now supports dynamic updates.

A few other changes = are also=20 present:
- Improved alignment on main buffering arrays.
- Minor=20 optimizations in multipass detail texture code.
- Removed a few = instructions=20 from 3 vertex gouraud polygon SSE fastpath.
- Minor optimizations in = buffer=20 additional clipped vertices function.
- Fixed bug in buffer size = remaining=20 calculation for clipped gouraud polygons. Note that this could not = actually=20 cause a failure in version 2.8 because of other factors.
- Other = minor=20 optimizations.
- Other minor = changes.

3-16-2005
Various=20 things.

Broken TruForm support in the renderer
TruForm = support=20 in the renderer is broken for a few reasons. Consider it an experimental = and=20 incomplete feature right now. There is no easy fix for the problem with = player=20 models where it doesn't look good when enabled. There are two other = fixes that=20 could be made to other parts of the game engine code that could correct = two=20 other outstanding issues.

Higher level rendering code clips actor = polygons to the edge of the screen. This destroys information contained = in=20 normals needed to implement TruForm correctly. This will lead to = polygons that=20 cross the edge of the screen having minor to potentially severe = graphical=20 corruption. This is trivial to fix, but the code that does it isn't in = the part=20 of the renderer that was open sourced, so it's nothing I can fix right = now.=20 Also, with many video card/driver combinations, letting the driver or = hardware=20 deal with clipping polygons that are partially clipped by the edge of = the screen=20 should speed things up a little.

Once actor triangles make it to=20 DrawGouraudPolygon in the renderer, there's no good way to tell if = they're from=20 a player actor that should have TruForm applied or some other actor that = should=20 not have TruForm applied. It would probably be fairly easy for higher = level=20 rendering code to use a new PolyFlags bit to tag triangles from objects = that=20 should have TruForm applied if enabled. This could reliably eliminate = problems=20 with weapons and other objects that look bad with TruForm applied. Note = that the=20 TruFormMinVertices setting attempts to solve these problems, but cannot = do so=20 reliably, and while it can fix some cases, it will also break=20 others.

Linux builds
I never seem to hear any good news = about=20 attempts to build the updated renderer on Linux. Unfortunately, I can = only=20 provide limited help in this area. I do try to keep the code cross = platform=20 friendly, but it's unlikely that I will be able to attempt to build it = on other=20 platforms anytime in the near future. I know the current code won't = compile as=20 is with gcc, but I expect that only minor syntax fixups and basic = non-essential=20 feature removal will be able to make the updates I added both compile = cleanly=20 and work correctly.

The first major step in attempting to build = the=20 updated renderer on Linux is to make sure you can build the original = renderer=20 code before I added any updates to it. This will ensure that there are = not any=20 major existing issues before going forward and attempting to use the new = code.=20 If any problems are encountered in this stage, it's not really anything = I can=20 help with much because I don't have a local build environment for this = platform,=20 didn't write this code, etc. If there's some problem like maybe the = ut432 header=20 files don't quite match the current Linux version of UT and cause = problems=20 because of this, that falls into the I didn't break it and I can't fix = it=20 category.

You'll need to use a compatible version of gcc. = Unfortunately,=20 ABI changes mean you will almost certainly not be able to use a newer = version of=20 gcc (unless the rest of the game were compiled with it of = course).

You=20 can easily ifdef out the SSE code I added because whatever compiler you = use=20 probably won't support it. This is not a major loss since the SSE code I = added=20 only provides very minor speedups.

There's a chance there will be = problems with the sstream code I used for the debug stream when using an = older=20 gcc and/or older libraries. Although it requires numerous changes to = remove it,=20 this code is non-essential, and the changes should be = simple.

There are=20 good reasons to try to get an updated renderer working on Linux if = running UT=20 natively here. Besides just being very obsolete at this point, the = original=20 OpenGL renderer code does contain a couple of more major = design/implementation=20 issues that would be good to have fixed.

Other news (technical = details)
I may not release any updates to the renderer beyond = utglr 2.8=20 anytime soon. It's getting difficult to find places that can be improved = much at=20 all, can be easily improved, or can perhaps be significantly improved = without a=20 lot of extra work.

For a long time now, a lot of the updated = renderer=20 code is much more optimized than various similar bits of code in the = rest of the=20 game engine. This leads to diminishing returns kicking in sooner when = attempting=20 to apply various general optimizations. I've still found a few simple = things and=20 maybe one or two more complicated things I might experiment with if I = have the=20 time.

9-15-2004
I may have fixed the problem with = UnrealEd not=20 restoring gamma on exit, but I still need to review the changes to make = sure=20 they have little risk of causing new problems. It doesn't help that = ATI's=20 drivers still do odd stuff with gamma correction. They're getting close = to a=20 year of breaking things in this area to various degrees. There's = something odd=20 about their installer for 4.9 too, as I had to temporarily rename my bin = directory to prevent it from failing.

8-15-2004
I don't = think=20 there's any easy way to fix the 16-bit z-buffer problems without using a = w-buffer. I can sort of half fix it, but it's not really good enough to = be of=20 much use, so I'll either leave the new code ifdef'ed out for a bit or = just=20 delete it. W-buffers are supported through D3D on some cards, but I've = never=20 seen them supported through OpenGL.

I found the cause of the = excess code=20 padding in all of my renderer builds. Incremental linking was turned on = for=20 release builds in the project file. Getting rid of this cut the = uncompressed DLL=20 size by over 40 KB. Although it won't significantly speed things up, = it'll still=20 save a little memory and perhaps increase code cache and TLB hit rates = by a very=20 small amount.

Based on looking at a few other files, it looks = like all of=20 436 and 451 on Win32 might have also been built with incremental linking = enabled. Although unfortunate, it's still one of those things where the=20 difference might be too small to show up in typical benchmarks by=20 itself.

5-3-2004
I built a new version of SetGamma that = fixes=20 various minor problems. It's a simple command line utility program that = adjusts=20 the hardware gamma ramp on the primary display adapter. A shortcut that = sends it=20 the -reset option can be used to reset the hardware gamma ramp to 1.0 = after a=20 crash that prevents it from being restored.

Some of the old news gets moved to the News = Archive page.

Notes
- Additional options are documented in the [New=20 options] section.

Things that could be = added
-=20 Patching the ComputeOutcode function with some SSE code.

OpenGLDrv.dll for Win32/x86: utglr31.zi= p (48=20 KB)

OpenGLDrv.dll for Win32/x86: utglr30.zi= p (48=20 KB)
OpenGLDrv.dll for Win32/x86: utglr29.zi= p (48=20 KB)
OpenGLDrv.dll for Win32/x86: utglr28.zi= p (46=20 KB)
OpenGLDrv.dll for Win32/x86: utglr27.zi= p (46=20 KB)
OpenGLDrv.dll for Win32/x86: utglr26.zi= p (48=20 KB)
OpenGLDrv.dll for Win32/x86: utglr25.zi= p (50=20 KB)

Installation instructions
Go to your=20 UnrealTournament\System directory. Make a backup of your old = OpenGLDrv.dll in=20 case the new one doesn't work. Then put the new OpenGLDrv.dll in your=20 UnrealTournament\System directory. This one contains a number of = optimizations=20 that should improve performance over the base UT 4.36 OpenGL renderer. = It also=20 contains a number of new options, which are described further down on = this=20 page.

Source code: utglr31= src.zip=20 (110 KB)

Source code: utglr30= src.zip=20 (110 KB)
Source code: utglr29= src.zip=20 (95 KB)
Source code: utglr28= src.zip=20 (94 KB)
Source code: utglr27= src.zip=20 (93 KB)
Source code: utglr26= src.zip=20 (92 KB)
Source code: utglr25= src.zip=20 (88 KB)

Notes about the source code
The source = code has=20 been modified extensively. Although I did not try to break Linux support = completely, I did add some Windows specific code. Feel free to email me = at smpdev@mindspring.com if you = need any=20 help getting it to build on Linux. Make sure to add the = NO_UNICODE_OS_SUPPORT=20 define when building it on Win32.

The source code package only = contains=20 .cpp and .h files from the OpenGL\Src subdirectory, which is where my = changes=20 are. You will need to get the 432 headers from Epic to be able to build = it. You=20 can download these from the Unreal Technology = Downloads=20 page.

For version 1.2 and up, I had to remove the operator new = and delete=20 overrides to make the new C++ debug functions work. I included a copy of = the=20 modified UnFile.h with the proper ifdefs. I just have it pass things = through to=20 malloc and free instead. I believe the problem may be with the overrides = not=20 handling 0 byte allocations as malloc and new=20 do.

Feedback
Email: smpdev@mindspring.com

New options
This enhanced UT = OpenGL=20 renderer supports some new options. They go in the=20 [OpenGLDrv.OpenGLRenderDevice] section of your UnrealTournament.ini = file. Most=20 options are documented on the settings= =20 page.

Credits
I'd like to thank Epic Games for releasing the = source code to=20 the UT OpenGL renderer, which made adding these updates to it=20 possible.

NitroGL for the original TruForm renderer modification. = Initial=20 experimental TruForm code is based on these = modifications.

Leonhard=20 Gruenschloss for help with implementing and testing additional TruForm = related=20 updates and new Deus Ex specific code.


Copyright 2002-2005 Chris Dohnal=20