===============================================================================
     About UT, multicore processors, power management and timing problems
                           Version 2 - 05/07/2010
                by AnthraX - stidzjene a.t. hotmail d.o.t. com 
===============================================================================

In this small article I will try to explain everything that's wrong with the 
method UT uses to do its time measurements and why it's causing problems with
modern systems, speedstep/cool 'n quiet and why it makes speedhacking so easy.

I sadly don't have access to the UT sourcecode so all of the information in
this article is based on reverse engineering and personal experiments. If there
are major errors in this article, I would appreciate it if you contacted me
about them.

1) Introduction
---------------

Before I get straight to the point, it is important you know what the "UT Main
Loop" is and what it does. A main loop is basically a sequence of actions that
repeats over and over again. UT's main loop begins from the moment the game is
loaded until the moment you close the game again. Every iteration of the main
loop is called a tick and the tickrate is the amount of ticks your client
performs per second.

In every tick, the following actions are performed (roughly):

    - 1: The tick starts. UT stores the time of this event
    
    - 2: All client input is processed
    
    - 3: All clientside actors are ticked (updated)
    
    - 4: All networklinks are updated
    
    - 5: The audio device is updated
    
    - 6: The hud is prerendered
    
    - 7: A frame is rendered. This will display the game world but without 
    the HUD
    
    - 8: The hud is postrendered. During the postrendering of the hud all 
    uscript rendering is done (text, scoreboards, windows, console, ...)
    
    - 9: UT calculates the maximum tickrate your client should run at.
    
    - 10: The tick ends. UT stores the time of this event.
    
    - 11: The main loop is temporarily suspended. The engine now starts waiting
    until it is time to perform the next tick.
    
As you can see, nothing out of the ordinary is going on here. There are a few
things worth noting though:

* Tickrate = Framerate! Exactly ONE frame is rendered to the screen during
every tick. Your tickrate (amount of ticks per second) is therefore equal to
your framerate.

* Maximum Tickrate. In step 9 UT will calculate the maximum client tickrate.
This tickrate is governed by your current netspeed and is calculated as follows:

    Maximum Tickrate = Netspeed / 64
    
The reasoning behind this is that the engine attempts to send a position update
during every tick. A position update is sent in a 64 byte packet. Your amount
of position updates is therefore limited by your netspeed so your tickrate
should be limited by it as well.

* Time measurements: UT has to measure the time at several points during every
tick. The game needs to know exactly how long it took to perform the tick,
without an exact measurement of this time interval, the game cannot know how
long it should pause for during step 11.

If the time measurements are not exact, the pauses won't be exact either. This
means that the main loop will often pause for too long or not long enough,
which in turn can make your framerate terribly unstable.

* Updating objects: UT tells all objects it updates during every tick (this
includes actors, network links, the audio device, the render device, ...) how
much time has passed since the last tick ended. These objects will act 
accordingly. 

* Player movement: One example of an object that is updated during every tick
is your own playerpawn object. Every time you move your mouse or press a
movement key on your keyboard the engine will update your position and your
camera angle (your camera angle is also called the viewrotation). To make
these calculations accurate, the engine needs to know how long you've pressed
that key and how long you've moved your mouse for. The longer you have pressed
a key, the further you will have moved! Therefore, if the timing measurements 
your game performs are not accurate, your movement won't be accurate either.

If your game for example always overestimates the time intervals (eg: if it
keeps telling your playerpawn that 50 milliseconds have passed since the last
tick while in reality only 45 milliseconds have passed) you will achieve a
speedhack effect. You will consistently move faster than the other players on
the server.

Experienced gamers or uscript coders will undoubtedly have noticed that the
above information isn't entirely true. Indeed, if your client could just tell
the server where you have moved without any control, you would be able to move
at any speed you like. Moving 10 times faster than all the other players would
be possible if the above information was entirely accurate but obviously that
is not true. Your client DOES indeed tell the server where you have moved and
how much your camera angle has changed but ultimately the server has the
authority to change these values if they are not acceptable. The server will
always compare your new position with your old one and it will reject the new
position if it is technically impossible to have moved from the old to the
new position in the time that has passed since you sent the server your old
position.

Because this may all sound a bit confusing, consider the following kick-ass
graphical example. Suppose that your client tells the server where your
playerpawn is once a second and that playerpawns can move at a maximum speed
of 5 meters per second. In that case, the following situation would be
impossible:

  Position 1:                                     Position 2:
    1 sec                                           2 sec

      o                                               o
     /|\                                             /|\
     / \                                             / \
      
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
0     1     2     3     4     5     6     7     8     9    10    11    12


The server will detect that this situation is impossible and it will move
you back your last known position immediately. The following situation on
the other hand IS allowed:

  Position 1:       Position 2:
    1 sec             2 sec

      o                 o
     /|\               /|\
     / \               / \
      
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
0     1     2     3     4     5     6     7     8     9    10    11    12

In this situation the server will not interfere but instead accept your new
position and replicate it to the other players.

This behavior also explains why players with connection problems "warp around"
on the server. These players will continue to send updates about their current
position once a second but if packets are lost, the following can happen:

  Position 1:       Position 2:       Position 3:       Position 4:
    1 sec             2 sec             3 sec             4 sec
   PKT SENT          PKT LOST          PKT LOST          PKT SENT

      o                 o                 o                 o
     /|\               /|\               /|\               /|\
     / \               / \               / \               / \
      
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
0     1     2     3     4     5     6     7     8     9    10    11    12

As you can see, the player starts at position 1 and this position is also
successfully sent to and accepted by the server. The next two position updates
are not received by the server because the packets got lost. The fourth 
position IS received by the server but the server will see that the player has
moved 9 meters between 2 position updates and because this situation is 
impossible, the server will move the player back to position 1.

Two important note befores I continue to the next section:
- To compensate for minor connection issues the server allows a small error
margin on the distance you have moved between two position updates. This
error margin is usually 3%. If for example the server allows you to move
5 meters per position update but you consistently tell the server that you
have moved 5,15 meters (= 5+3%), the server will NOT reject your new positions.

Players with a stable connection can abuse this mechanism to achieve a minor
speedhack effect.

- MANY objects are updated in a similar way. If time measurements are
therefore very inaccurate, the game will seem very buggy.

2) Timing issues
----------------

Now that you have a good overview of what the main loop does and how time
measurements affect the way the game works I can move on to HOW UT measures
the time.

2.1: The issues
- - - - - - - -

* As you may have read before, UT uses the RDTSC instruction to measure time.
This is an assembler instruction that reads the "Cycle Counter" value from your
CPU. When you power up your PC, this cycle counter starts at 0 and during every 
clocktick it is incremented by 1. If you have a CPU with a clock speed of 1Hz, 
your CPU clock will tick once a second so your cycle counter will increase by 
1 every second. If you have a 2Ghz CPU, your CPU Clock will tick 2 000 000 000 
times per second. This means that your Cycle Counter will also increase 
2 000 000 000 per second.

In general this means that you can read the CPU Cycle Counter, divide it by
your CPU clock speed (in Hz) and you will find the amount of seconds that have
passed since you powered on your computer. Because this counter is updated
during every clocktick, you can also measure much smaller intervals.

On the surface, this seems like an excellent way to measure the time and
indeed, for a long time it WAS a good way to measure the time. Every game 
released in the UT99 era used RDTSC for all time measurements. These days
however, RDTSC is almost completely deprecated because of the great amount of
issues that affect this technique. Here's a short overview of the problems
with RDTSC:

    - The cpu clockspeed is not constant. the clockspeed generally fluctuates
    around the speed it's supposed to run at. This means that RDTSC is still
    reasonably accurate to measure the time your PC has been powered up for
    but to measure small intervals (such as needed by games), it is not a good
    choice anymore.
    
    - Many CPUs have some sort of dynamic frequency scaling. Dynamic
    overclocking or power management features such as Cool n Quiet or SpeedStep
    can drastically change the speed your CPU runs at. If your PC is using any
    of these techniques, RDTSC is inaccurate overall, even to measure greater
    time intervals.
    
    - Every CPU core has its own cycle counter and to make things worse, there
    are several situations in which the cycle counter on one cpu core can have
    a different value than the cycle counter on another cpu core. This means
    that if UT measures the start of an interval on one cpu core and the end of
    the the interval on another core, magic stuff can happen.
    
    In such cases the time can make big forward or backward leaps. This will
    cause MAJOR problems. Many people will have experienced that if they run
    UT on a multicore system without any kind of multicore fix, they will warp
    around really badly and in some cases their sound will even break up or
    their input will be completely broken.
    
    Using RDTSC is therefore NOT multicore safe. If you wish to run an RDTSC
    based game you will almost always have to restrict the game to run on one
    core.
    
* The use of RDTSC for timing is not the only issue in UT. A second major issue
is that all timing info (time intervals or absolute time) is stored in FLOAT
format. The FLOAT format (also known as single float) can store floating point 
numbers using 4 bytes (or 32 bits). Unfortunately UT uses its timing functions
primarily to measure small time intervals. At high framerates (anything above 
80 frames per second really) the single float format becomes a real problem as
it does not allow floating point numbers to be stored with great enough 
precision. At really high framerates (200+) the single float format is not just
a problem anymore, it is in fact a game breaker because it's the cause of 
floating point overflows. Some players may have noticed that if they play at
really high framerates (200-300+) in singleplayer mode, their game starts to
speed up and down for no apperant reason. This is caused by the fact that
everything is stored as single floats.

Instead, UT should use double floats to store timing info. This allows the
floating point numbers to use 64 bits instead of 32 and will ofcourse greatly
increase the precision with which these numbers can stored.

I'm not sure why single floats are used. Ever since the introduction of the 
486dx CPU in the early 90's every CPU has been equipped with a floating point
unit which converts every floating point number to a special 80 bit format
internally. Using double floats is therefore just as fast as using single
floats and because they allow much greater precision, they are the only logical
choice to store timing info.

* The last issue with the UT main loop is fortunately the least severe. As you
may remember, step 11 of the main loop is a pause. The game waits at the end
of every tick until it is time to start the next tick. This way the game can
ensure a stable tickrate (= framerate!), which is ofcourse a very desirable
feature for every game. Unfortunately the function UT uses to "pause" is also
quite imprecise. Consider the situation in which you want your game to run
at 100 frames per second. This means that a frame should be rendered every 10
milliseconds (1000 milliseconds per second / 100 frames per second = 10
milliseconds per frame). Suppose then that steps 1 through 10 of the main loop
take 4 milliseconds (consistently). This means that at the end of every tick
you would want to pause for 6 milliseconds before moving on to the next tick.

To pause for 6 milliseconds, the game uses the following function call:

	Sleep(6.0);
	
The sleep function is implemented by the operating system (in most cases 
windows) but it is not always accurate. I will spare you the details of how
kernel interrupts and interrupt timers work. It suffices to know that calling
Sleep(6.0) will not always make your application pause for 6ms. Usually your
game will pause for 7ms, sometimes 8ms (provided that you have requested the 
interrupt timer to run at its greatest precision).

This means that the game will often pause for too long, causing a minor
decrease in framerate.

2.2: The solutions
- - - - - - - - - -

The previous subsection presented a small overview of the 3 issues with the UT
main loop. Now it's time to look at some of the possible solutions:

* The single float problem. The fix consists of using the double float problem
instead. Nothing more, nothing less.

* The sleeping problem. As you may remember, the Sleep function often pauses
for a bit longer than requested. Although this is not a major problem it can
still be fixed easily. Here's a piece of code that will do the trick:
If you have no coding experience you may want to skip this section.

	// Call this when UT starts
	void InitTimer()	
	{
		// This will increase the precision of the kernel interrupt 
		// timer. Although this will slightly increase resource usage
		// this will also increase the precision of sleep calls and
		// this will in turn increase the stability of the framerate
		timeBeginPeriod(1);
	}
	
	// Call this when UT shuts down
	void UninitTimer()
	{
		// Restore the kernel timer config
		timeEndPeriod(1);
	}
	
	// Get the time since the pc powered up
	DOUBLE appGetTime()
	{
		static DWORD  OldTime   = timeGetTime();
		static DOUBLE TimeInSec = 0.0;

		DWORD NewTime  = timeGetTime();
		// Fix for buggy timer hardware
		// Prevents backward time leaps
		if (NewTime > OldTime)
		{
			TimeInSec	  += (NewTime - OldTime) / 1000.0;
			OldTime    	   = NewTime;		
		}
		// Most likely wraparound
		else if (OldTime > 0xF0000000 && NewTime < 0x10000000)
			OldTime = NewTime;

		return TimeInSec;
	}
		
	// Sleep until the time has reached endSleep
	void ImprovedSleep(DOUBLE endSleep)
	{		
		DOUBLE timeLeft = endSleep - appGetTime();
		
		if (timeLeft > 2)
			Sleep(timeLeft - 2);
		
		while ((timeLeft = endSleep - appGetTime()) > 0)
			Sleep(0);
	}

* The timing problem. There are several possible solutions and I've tested them
all extensively. Most of them consist of giving up RDTSC for a more accurate
timing method.

- 1: Using GetTickCount instead of RDTSC. GetTickCount is a very lightweight
function that retrieves the number of milliseconds since the pc was started.
This function reads the kernel Tick Counter. Unfortunately this Tick Counter
is often not accurate. In some cases it can be several milliseconds off.

Even though it is the most resource friendly method to measure time, it is not
advisible for short time intervals and therefore not the preferred method for
games.

Advantages:
    + Only timing method that is more resource friendly than RDTSC
    + Multicore safe. There's only one tick counter
    
Disadvantages:
    + Poor accuracy

- 2: Using timeGetTime instead of RDTSC. This is the function used by every (?)
media player. It also returns the number of milliseconds since the pc was 
started but it retrieves the info from the interrupt hardware and is therefore
incredibly accurate.

Advantages:
    + Only slightly less resource friendly than RDTSC
    + Very accurate
    + Multicore safe
    
- 3: Using QueryPerformanceCounter instead of RDTSC: This function is similar
to RDTSC. It also reads a counter that is updated during every clocktick.
The function is implemented in the windows kernel and has certain advantages
over RDTSC. The most important advantage being that, if used right, the kernel
garantuees that the results of this function are monotonic.

Advantages:
    + Most accurate method if used right
    
Disadvantages:
    + Not always multicore safe
    + Least resource friendly
    + Bugged on some systems
    
- 4: Sticking to RDTSC: It is still possible to get UT to work properly with
RDTSC based timing. These fixes are needed to do so though:
    
    + The game needs to be restricted to one core. RDTSC is NOT multicore safe
    
    + The cpu clock speed needs to be remeasured constantly (to compensate for    
    dynamic frequency scaling due to overclocking or power management features)
    Remeasuring the clockspeed is not trivial though. The best way to do it is
    to use a "calibration thread" that remeasures the clock speed every X
    seconds and updates the game's timing variables accordingly.
    
    Using a calibration thread brings some new problems though. It will
    increase the resource usage of your game and it requires some
    synchronization with the main UT thread to update the timing values safely.
    
* As I said, all of these options have been tested extensively (in the ACE 
private beta tests). My findings are that for high end systems the best
performance and the least problems are achieved by using QueryPerformanceCounter
to measure the time and by using a variant of the ImprovedSleep method to do the
pausing.

For mid range and low end systems the best performance was achieved with a
combination of timeGetTime based timing and normal sleeps.

For any system it was possible to completely eliminate speedhacking abuse and
to greatly improve the "smoothness" of the game.

3) AntiCheatEngine
------------------

My goal for this project was twofold:

* Providing a new publically available cheatprotection system that works with
all modern operating systems and that uses heuristic methods to either block or
detect cheats.

* Fixing all major bugs and shortcomings in UT that could not be fixed without 
either having UT sourcecode access or interfering with a native cheatprotection

Some of these shortcomings are the various problems with the UT main loop. ACE
gives clients the option to fix these using either consolecommands or by using
User.ini settings.

Up until ACE v0.6q BETA, ACE is using the following settings by default:

- For single core systems:
    
    + timeGetTime based timing
    + normal accuracy sleeps
    + double float precision
    + ace tickrate capping
    
- For slow dual core systems:

    + timeGetTime based timing
    + high accuracy sleeps (ImprovedSleep method)
    + double float precision    
    + ace tickrate capping

- For dual core systems (> 1Ghz) or systems with more than 2 cores:

    + QueryPerformanceCounter based timing
    + high accuracy sleeps (a variant of the ImprovedSleep method)
    + double float precision
    + ace tickrate capping

- In "Compatibility Mode" (mutate ace compattoggle):

    + RDTSC based timing
    + normal accuracy sleeps
    + double float precision
    + UT standard tickrate capping (limited to 200fps)
    
From ACE v0.7 BETA onward, ACE will use the following settings by default:

- With standard settings:

    + timeGetTime based timing
    + normal accuracy sleeps
    + double float precision
    + UT standard tickrate capping (limited to 200fps)
    
- In "High Performance Mode" (mutate ace highperftoggle):

    + QueryPerformanceCounter based timing
    + high accuracy sleeps (a variant of the ImprovedSleep method)
    + double float precision
    + ace tickrate capping
    
- In "Compatibility Mode" (mutate ace compattoggle):

    + RDTSC based timing
    + normal accuracy sleeps
    + double float precision
    + UT standard tickrate capping (limited to 200fps)
    
* ACE tickrate capping: This is nothing fancy. It's the function that is used
in step 9 of the UT main loop. ACE calculates the maximum tickrate as follows:

ACE MaximumTickrate = Netspeed / 64 limited to ACE MaximumFramerate

ACE MaximumFramerate is defined as follows:

    + If you are using a UTGLR renderer and you have Frameratelimit set to a
    value higher than 20, ACE MaximumFramerate = UTGLR frameratelimit
    
    + If you are not using a UTGLR renderer, ACE MaximumFramerate = 200
    
* Configuring it yourself. You can manually select your timing, sleeping and
tickrate calculation method. To do this, you can add the following settings
to your User.ini:

[ACEv07_C.ACENative]         << Change this to whatever version you're using
TimingMode=0
SleepMode=0
bNonStrictCapping=False
bForceHighPerf=False

Possible settings:

- TimingMode:
    + 0 = automatic
    + 1 = GetTickCount
    + 2 = timeGetTime
    + 3 = RDTSC with calibration
    + 4 = QueryPerformanceCounter

- SleepMode:
    + 0 = automatic
    + 1 = normal accuracy sleeps
    + 2 = high accuracy sleeps (ImprovedSleep variant)
    
- bNonStrictCapping:
    + False = ACE tickrate calculation
    + True  = UT standard tickrate calculation
   
- bForceHighPerf: (added in v0.7)
    + False = Uses TimingMode 0, SleepMode 0, bNonStrictCapping True
    + True  = Uses TimingMode 4, SleepMode 3, bNonStrictCapping False
    
    SleepMode 3 is a third version of ImprovedSleep that cannot be set manually
    
===============================================================================
                                   The End
===============================================================================