------------
Benchmarking
------------
There is a new but growing set a benchmarking programs in the
"tests" directory. These should be runnable using the
following command-lines corresponding to each of the possible
library builds:

MSVC:
nmake clean VC-bench
nmake clean VCE-bench
nmake clean VSE-bench

Mingw32:
make clean GC-bench
make clean GCE-bench

UWIN:
The benchtests are run as part of the testsuite.


Mutex benchtests
----------------

benchtest1 - Lock plus unlock on an unlocked mutex.
benchtest2 - Lock plus unlock on a locked mutex.
benchtest3 - Trylock on a locked mutex.
benchtest4 - Trylock plus unlock on an unlocked mutex.


Each test times up to three alternate synchronisation
implementations as a reference, and then times each of
the four mutex types provided by the library. Each is
described below:

Simple Critical Section
- uses a simple Win32 critical section. There is no
additional overhead for this case as there is in the
remaining cases.

POSIX mutex implemented using a Critical Section
- The old implementation which uses runtime adaptation
depending on the Windows variant being run on. When
the pthreads DLL was run on WinNT or higher then
POSIX mutexes would use Win32 Critical Sections.

POSIX mutex implemented using a Win32 Mutex
- The old implementation which uses runtime adaptation
depending on the Windows variant being run on. When
the pthreads DLL was run on Win9x then POSIX mutexes
would use Win32 Mutexes (because TryEnterCriticalSection
is not implemented on Win9x).

PTHREAD_MUTEX_DEFAULT
PTHREAD_MUTEX_NORMAL
PTHREAD_MUTEX_ERRORCHECK
PTHREAD_MUTEX_RECURSIVE
- The current implementation supports these mutex types.
The underlying basis of POSIX mutexes is now the same
irrespective of the Windows variant, and should therefore
have consistent performance.


In all benchtests, the operation is repeated a large
number of times and an average is calculated. Loop
overhead is measured and subtracted from all test times.

Comment on the results
----------------------
The gain in performance for Win9x systems is enormous - up to
40 times faster for unlocked mutexes (2 times faster for locked
mutexes).

Pthread_mutex_trylock also appears to be faster for locked mutexes.

The price for the new consistency between WinNT and Win9x is
slower performance (up to twice as long) across a lock/unlock
sequence. It is difficult to get a good split timing for lock
and unlock operations, but by code inspection, it is the unlock
operation that is slowing the pair down in comparison with the
old-style CS mutexes, even for the fast PTHREAD_MUTEX_NORMAL mutex
type with no other waiting threads. However, comparitive
performance for operations on already locked mutexes is very close.

When this is translated to real-world applications, the overall
camparitive performance should be almost identical on NT class
systems. That is, applications with heavy mutex contention should
have almost equal performance, while applications with only light
mutex contention should also have almost equal performance because
the most critical operation in this case is the lock operation.

Overall, the newer pthreads-win32 mutex routines are only slower
(on NT class systems) where and when it is least critical.

Thanks go to Thomas Pfaff for the current implementation of mutex
routines.