------------ Benchmarking ------------ There is a new but growing set a benchmarking programs in the "tests" directory. These should be runnable using the following command-lines corresponding to each of the possible library builds: MSVC: nmake clean VC-bench nmake clean VCE-bench nmake clean VSE-bench Mingw32: make clean GC-bench make clean GCE-bench UWIN: The benchtests are run as part of the testsuite. Mutex benchtests ---------------- benchtest1 - Lock plus unlock on an unlocked mutex. benchtest2 - Lock plus unlock on a locked mutex. benchtest3 - Trylock on a locked mutex. benchtest4 - Trylock plus unlock on an unlocked mutex. Each test times up to three alternate synchronisation implementations as a reference, and then times each of the four mutex types provided by the library. Each is described below: Simple Critical Section - uses a simple Win32 critical section. There is no additional overhead for this case as there is in the remaining cases. POSIX mutex implemented using a Critical Section - The old implementation which uses runtime adaptation depending on the Windows variant being run on. When the pthreads DLL was run on WinNT or higher then POSIX mutexes would use Win32 Critical Sections. POSIX mutex implemented using a Win32 Mutex - The old implementation which uses runtime adaptation depending on the Windows variant being run on. When the pthreads DLL was run on Win9x then POSIX mutexes would use Win32 Mutexes (because TryEnterCriticalSection is not implemented on Win9x). PTHREAD_MUTEX_DEFAULT PTHREAD_MUTEX_NORMAL PTHREAD_MUTEX_ERRORCHECK PTHREAD_MUTEX_RECURSIVE - The current implementation supports these mutex types. The underlying basis of POSIX mutexes is now the same irrespective of the Windows variant, and should therefore have consistent performance. In all benchtests, the operation is repeated a large number of times and an average is calculated. Loop overhead is measured and subtracted from all test times. Comment on the results ---------------------- The gain in performance for Win9x systems is enormous - up to 40 times faster for unlocked mutexes (2 times faster for locked mutexes). Pthread_mutex_trylock also appears to be faster for locked mutexes. The price for the new consistency between WinNT and Win9x is slower performance (up to twice as long) across a lock/unlock sequence. It is difficult to get a good split timing for lock and unlock operations, but by code inspection, it is the unlock operation that is slowing the pair down in comparison with the old-style CS mutexes, even for the fast PTHREAD_MUTEX_NORMAL mutex type with no other waiting threads. However, comparitive performance for operations on already locked mutexes is very close. When this is translated to real-world applications, the overall camparitive performance should be almost identical on NT class systems. That is, applications with heavy mutex contention should have almost equal performance, while applications with only light mutex contention should also have almost equal performance because the most critical operation in this case is the lock operation. Overall, the newer pthreads-win32 mutex routines are only slower (on NT class systems) where and when it is least critical. Thanks go to Thomas Pfaff for the current implementation of mutex routines.