=============================================================================== About UT, multicore processors, power management and timing problems Version 2 - 05/07/2010 by AnthraX - stidzjene a.t. hotmail d.o.t. com =============================================================================== In this small article I will try to explain everything that's wrong with the method UT uses to do its time measurements and why it's causing problems with modern systems, speedstep/cool 'n quiet and why it makes speedhacking so easy. I sadly don't have access to the UT sourcecode so all of the information in this article is based on reverse engineering and personal experiments. If there are major errors in this article, I would appreciate it if you contacted me about them. 1) Introduction --------------- Before I get straight to the point, it is important you know what the "UT Main Loop" is and what it does. A main loop is basically a sequence of actions that repeats over and over again. UT's main loop begins from the moment the game is loaded until the moment you close the game again. Every iteration of the main loop is called a tick and the tickrate is the amount of ticks your client performs per second. In every tick, the following actions are performed (roughly): - 1: The tick starts. UT stores the time of this event - 2: All client input is processed - 3: All clientside actors are ticked (updated) - 4: All networklinks are updated - 5: The audio device is updated - 6: The hud is prerendered - 7: A frame is rendered. This will display the game world but without the HUD - 8: The hud is postrendered. During the postrendering of the hud all uscript rendering is done (text, scoreboards, windows, console, ...) - 9: UT calculates the maximum tickrate your client should run at. - 10: The tick ends. UT stores the time of this event. - 11: The main loop is temporarily suspended. The engine now starts waiting until it is time to perform the next tick. As you can see, nothing out of the ordinary is going on here. There are a few things worth noting though: * Tickrate = Framerate! Exactly ONE frame is rendered to the screen during every tick. Your tickrate (amount of ticks per second) is therefore equal to your framerate. * Maximum Tickrate. In step 9 UT will calculate the maximum client tickrate. This tickrate is governed by your current netspeed and is calculated as follows: Maximum Tickrate = Netspeed / 64 The reasoning behind this is that the engine attempts to send a position update during every tick. A position update is sent in a 64 byte packet. Your amount of position updates is therefore limited by your netspeed so your tickrate should be limited by it as well. * Time measurements: UT has to measure the time at several points during every tick. The game needs to know exactly how long it took to perform the tick, without an exact measurement of this time interval, the game cannot know how long it should pause for during step 11. If the time measurements are not exact, the pauses won't be exact either. This means that the main loop will often pause for too long or not long enough, which in turn can make your framerate terribly unstable. * Updating objects: UT tells all objects it updates during every tick (this includes actors, network links, the audio device, the render device, ...) how much time has passed since the last tick ended. These objects will act accordingly. * Player movement: One example of an object that is updated during every tick is your own playerpawn object. Every time you move your mouse or press a movement key on your keyboard the engine will update your position and your camera angle (your camera angle is also called the viewrotation). To make these calculations accurate, the engine needs to know how long you've pressed that key and how long you've moved your mouse for. The longer you have pressed a key, the further you will have moved! Therefore, if the timing measurements your game performs are not accurate, your movement won't be accurate either. If your game for example always overestimates the time intervals (eg: if it keeps telling your playerpawn that 50 milliseconds have passed since the last tick while in reality only 45 milliseconds have passed) you will achieve a speedhack effect. You will consistently move faster than the other players on the server. Experienced gamers or uscript coders will undoubtedly have noticed that the above information isn't entirely true. Indeed, if your client could just tell the server where you have moved without any control, you would be able to move at any speed you like. Moving 10 times faster than all the other players would be possible if the above information was entirely accurate but obviously that is not true. Your client DOES indeed tell the server where you have moved and how much your camera angle has changed but ultimately the server has the authority to change these values if they are not acceptable. The server will always compare your new position with your old one and it will reject the new position if it is technically impossible to have moved from the old to the new position in the time that has passed since you sent the server your old position. Because this may all sound a bit confusing, consider the following kick-ass graphical example. Suppose that your client tells the server where your playerpawn is once a second and that playerpawns can move at a maximum speed of 5 meters per second. In that case, the following situation would be impossible: Position 1: Position 2: 1 sec 2 sec o o /|\ /|\ / \ / \ +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 0 1 2 3 4 5 6 7 8 9 10 11 12 The server will detect that this situation is impossible and it will move you back your last known position immediately. The following situation on the other hand IS allowed: Position 1: Position 2: 1 sec 2 sec o o /|\ /|\ / \ / \ +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 0 1 2 3 4 5 6 7 8 9 10 11 12 In this situation the server will not interfere but instead accept your new position and replicate it to the other players. This behavior also explains why players with connection problems "warp around" on the server. These players will continue to send updates about their current position once a second but if packets are lost, the following can happen: Position 1: Position 2: Position 3: Position 4: 1 sec 2 sec 3 sec 4 sec PKT SENT PKT LOST PKT LOST PKT SENT o o o o /|\ /|\ /|\ /|\ / \ / \ / \ / \ +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 0 1 2 3 4 5 6 7 8 9 10 11 12 As you can see, the player starts at position 1 and this position is also successfully sent to and accepted by the server. The next two position updates are not received by the server because the packets got lost. The fourth position IS received by the server but the server will see that the player has moved 9 meters between 2 position updates and because this situation is impossible, the server will move the player back to position 1. Two important note befores I continue to the next section: - To compensate for minor connection issues the server allows a small error margin on the distance you have moved between two position updates. This error margin is usually 3%. If for example the server allows you to move 5 meters per position update but you consistently tell the server that you have moved 5,15 meters (= 5+3%), the server will NOT reject your new positions. Players with a stable connection can abuse this mechanism to achieve a minor speedhack effect. - MANY objects are updated in a similar way. If time measurements are therefore very inaccurate, the game will seem very buggy. 2) Timing issues ---------------- Now that you have a good overview of what the main loop does and how time measurements affect the way the game works I can move on to HOW UT measures the time. 2.1: The issues - - - - - - - - * As you may have read before, UT uses the RDTSC instruction to measure time. This is an assembler instruction that reads the "Cycle Counter" value from your CPU. When you power up your PC, this cycle counter starts at 0 and during every clocktick it is incremented by 1. If you have a CPU with a clock speed of 1Hz, your CPU clock will tick once a second so your cycle counter will increase by 1 every second. If you have a 2Ghz CPU, your CPU Clock will tick 2 000 000 000 times per second. This means that your Cycle Counter will also increase 2 000 000 000 per second. In general this means that you can read the CPU Cycle Counter, divide it by your CPU clock speed (in Hz) and you will find the amount of seconds that have passed since you powered on your computer. Because this counter is updated during every clocktick, you can also measure much smaller intervals. On the surface, this seems like an excellent way to measure the time and indeed, for a long time it WAS a good way to measure the time. Every game released in the UT99 era used RDTSC for all time measurements. These days however, RDTSC is almost completely deprecated because of the great amount of issues that affect this technique. Here's a short overview of the problems with RDTSC: - The cpu clockspeed is not constant. the clockspeed generally fluctuates around the speed it's supposed to run at. This means that RDTSC is still reasonably accurate to measure the time your PC has been powered up for but to measure small intervals (such as needed by games), it is not a good choice anymore. - Many CPUs have some sort of dynamic frequency scaling. Dynamic overclocking or power management features such as Cool n Quiet or SpeedStep can drastically change the speed your CPU runs at. If your PC is using any of these techniques, RDTSC is inaccurate overall, even to measure greater time intervals. - Every CPU core has its own cycle counter and to make things worse, there are several situations in which the cycle counter on one cpu core can have a different value than the cycle counter on another cpu core. This means that if UT measures the start of an interval on one cpu core and the end of the the interval on another core, magic stuff can happen. In such cases the time can make big forward or backward leaps. This will cause MAJOR problems. Many people will have experienced that if they run UT on a multicore system without any kind of multicore fix, they will warp around really badly and in some cases their sound will even break up or their input will be completely broken. Using RDTSC is therefore NOT multicore safe. If you wish to run an RDTSC based game you will almost always have to restrict the game to run on one core. * The use of RDTSC for timing is not the only issue in UT. A second major issue is that all timing info (time intervals or absolute time) is stored in FLOAT format. The FLOAT format (also known as single float) can store floating point numbers using 4 bytes (or 32 bits). Unfortunately UT uses its timing functions primarily to measure small time intervals. At high framerates (anything above 80 frames per second really) the single float format becomes a real problem as it does not allow floating point numbers to be stored with great enough precision. At really high framerates (200+) the single float format is not just a problem anymore, it is in fact a game breaker because it's the cause of floating point overflows. Some players may have noticed that if they play at really high framerates (200-300+) in singleplayer mode, their game starts to speed up and down for no apperant reason. This is caused by the fact that everything is stored as single floats. Instead, UT should use double floats to store timing info. This allows the floating point numbers to use 64 bits instead of 32 and will ofcourse greatly increase the precision with which these numbers can stored. I'm not sure why single floats are used. Ever since the introduction of the 486dx CPU in the early 90's every CPU has been equipped with a floating point unit which converts every floating point number to a special 80 bit format internally. Using double floats is therefore just as fast as using single floats and because they allow much greater precision, they are the only logical choice to store timing info. * The last issue with the UT main loop is fortunately the least severe. As you may remember, step 11 of the main loop is a pause. The game waits at the end of every tick until it is time to start the next tick. This way the game can ensure a stable tickrate (= framerate!), which is ofcourse a very desirable feature for every game. Unfortunately the function UT uses to "pause" is also quite imprecise. Consider the situation in which you want your game to run at 100 frames per second. This means that a frame should be rendered every 10 milliseconds (1000 milliseconds per second / 100 frames per second = 10 milliseconds per frame). Suppose then that steps 1 through 10 of the main loop take 4 milliseconds (consistently). This means that at the end of every tick you would want to pause for 6 milliseconds before moving on to the next tick. To pause for 6 milliseconds, the game uses the following function call: Sleep(6.0); The sleep function is implemented by the operating system (in most cases windows) but it is not always accurate. I will spare you the details of how kernel interrupts and interrupt timers work. It suffices to know that calling Sleep(6.0) will not always make your application pause for 6ms. Usually your game will pause for 7ms, sometimes 8ms (provided that you have requested the interrupt timer to run at its greatest precision). This means that the game will often pause for too long, causing a minor decrease in framerate. 2.2: The solutions - - - - - - - - - - The previous subsection presented a small overview of the 3 issues with the UT main loop. Now it's time to look at some of the possible solutions: * The single float problem. The fix consists of using the double float problem instead. Nothing more, nothing less. * The sleeping problem. As you may remember, the Sleep function often pauses for a bit longer than requested. Although this is not a major problem it can still be fixed easily. Here's a piece of code that will do the trick: If you have no coding experience you may want to skip this section. // Call this when UT starts void InitTimer() { // This will increase the precision of the kernel interrupt // timer. Although this will slightly increase resource usage // this will also increase the precision of sleep calls and // this will in turn increase the stability of the framerate timeBeginPeriod(1); } // Call this when UT shuts down void UninitTimer() { // Restore the kernel timer config timeEndPeriod(1); } // Get the time since the pc powered up DOUBLE appGetTime() { static DWORD OldTime = timeGetTime(); static DOUBLE TimeInSec = 0.0; DWORD NewTime = timeGetTime(); // Fix for buggy timer hardware // Prevents backward time leaps if (NewTime > OldTime) { TimeInSec += (NewTime - OldTime) / 1000.0; OldTime = NewTime; } // Most likely wraparound else if (OldTime > 0xF0000000 && NewTime < 0x10000000) OldTime = NewTime; return TimeInSec; } // Sleep until the time has reached endSleep void ImprovedSleep(DOUBLE endSleep) { DOUBLE timeLeft = endSleep - appGetTime(); if (timeLeft > 2) Sleep(timeLeft - 2); while ((timeLeft = endSleep - appGetTime()) > 0) Sleep(0); } * The timing problem. There are several possible solutions and I've tested them all extensively. Most of them consist of giving up RDTSC for a more accurate timing method. - 1: Using GetTickCount instead of RDTSC. GetTickCount is a very lightweight function that retrieves the number of milliseconds since the pc was started. This function reads the kernel Tick Counter. Unfortunately this Tick Counter is often not accurate. In some cases it can be several milliseconds off. Even though it is the most resource friendly method to measure time, it is not advisible for short time intervals and therefore not the preferred method for games. Advantages: + Only timing method that is more resource friendly than RDTSC + Multicore safe. There's only one tick counter Disadvantages: + Poor accuracy - 2: Using timeGetTime instead of RDTSC. This is the function used by every (?) media player. It also returns the number of milliseconds since the pc was started but it retrieves the info from the interrupt hardware and is therefore incredibly accurate. Advantages: + Only slightly less resource friendly than RDTSC + Very accurate + Multicore safe - 3: Using QueryPerformanceCounter instead of RDTSC: This function is similar to RDTSC. It also reads a counter that is updated during every clocktick. The function is implemented in the windows kernel and has certain advantages over RDTSC. The most important advantage being that, if used right, the kernel garantuees that the results of this function are monotonic. Advantages: + Most accurate method if used right Disadvantages: + Not always multicore safe + Least resource friendly + Bugged on some systems - 4: Sticking to RDTSC: It is still possible to get UT to work properly with RDTSC based timing. These fixes are needed to do so though: + The game needs to be restricted to one core. RDTSC is NOT multicore safe + The cpu clock speed needs to be remeasured constantly (to compensate for dynamic frequency scaling due to overclocking or power management features) Remeasuring the clockspeed is not trivial though. The best way to do it is to use a "calibration thread" that remeasures the clock speed every X seconds and updates the game's timing variables accordingly. Using a calibration thread brings some new problems though. It will increase the resource usage of your game and it requires some synchronization with the main UT thread to update the timing values safely. * As I said, all of these options have been tested extensively (in the ACE private beta tests). My findings are that for high end systems the best performance and the least problems are achieved by using QueryPerformanceCounter to measure the time and by using a variant of the ImprovedSleep method to do the pausing. For mid range and low end systems the best performance was achieved with a combination of timeGetTime based timing and normal sleeps. For any system it was possible to completely eliminate speedhacking abuse and to greatly improve the "smoothness" of the game. 3) AntiCheatEngine ------------------ My goal for this project was twofold: * Providing a new publically available cheatprotection system that works with all modern operating systems and that uses heuristic methods to either block or detect cheats. * Fixing all major bugs and shortcomings in UT that could not be fixed without either having UT sourcecode access or interfering with a native cheatprotection Some of these shortcomings are the various problems with the UT main loop. ACE gives clients the option to fix these using either consolecommands or by using User.ini settings. Up until ACE v0.6q BETA, ACE is using the following settings by default: - For single core systems: + timeGetTime based timing + normal accuracy sleeps + double float precision + ace tickrate capping - For slow dual core systems: + timeGetTime based timing + high accuracy sleeps (ImprovedSleep method) + double float precision + ace tickrate capping - For dual core systems (> 1Ghz) or systems with more than 2 cores: + QueryPerformanceCounter based timing + high accuracy sleeps (a variant of the ImprovedSleep method) + double float precision + ace tickrate capping - In "Compatibility Mode" (mutate ace compattoggle): + RDTSC based timing + normal accuracy sleeps + double float precision + UT standard tickrate capping (limited to 200fps) From ACE v0.7 BETA onward, ACE will use the following settings by default: - With standard settings: + timeGetTime based timing + normal accuracy sleeps + double float precision + UT standard tickrate capping (limited to 200fps) - In "High Performance Mode" (mutate ace highperftoggle): + QueryPerformanceCounter based timing + high accuracy sleeps (a variant of the ImprovedSleep method) + double float precision + ace tickrate capping - In "Compatibility Mode" (mutate ace compattoggle): + RDTSC based timing + normal accuracy sleeps + double float precision + UT standard tickrate capping (limited to 200fps) * ACE tickrate capping: This is nothing fancy. It's the function that is used in step 9 of the UT main loop. ACE calculates the maximum tickrate as follows: ACE MaximumTickrate = Netspeed / 64 limited to ACE MaximumFramerate ACE MaximumFramerate is defined as follows: + If you are using a UTGLR renderer and you have Frameratelimit set to a value higher than 20, ACE MaximumFramerate = UTGLR frameratelimit + If you are not using a UTGLR renderer, ACE MaximumFramerate = 200 * Configuring it yourself. You can manually select your timing, sleeping and tickrate calculation method. To do this, you can add the following settings to your User.ini: [ACEv07_C.ACENative] << Change this to whatever version you're using TimingMode=0 SleepMode=0 bNonStrictCapping=False bForceHighPerf=False Possible settings: - TimingMode: + 0 = automatic + 1 = GetTickCount + 2 = timeGetTime + 3 = RDTSC with calibration + 4 = QueryPerformanceCounter - SleepMode: + 0 = automatic + 1 = normal accuracy sleeps + 2 = high accuracy sleeps (ImprovedSleep variant) - bNonStrictCapping: + False = ACE tickrate calculation + True = UT standard tickrate calculation - bForceHighPerf: (added in v0.7) + False = Uses TimingMode 0, SleepMode 0, bNonStrictCapping True + True = Uses TimingMode 4, SleepMode 3, bNonStrictCapping False SleepMode 3 is a third version of ImprovedSleep that cannot be set manually =============================================================================== The End ===============================================================================