From 8c8bcc5d1737002a9d153105c16b262d2e201efa Mon Sep 17 00:00:00 2001 From: rpj Date: Tue, 19 Oct 2004 13:24:40 +0000 Subject: Semaphore speedups - alpha, but passes testsuite --- ChangeLog | 107 +++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 61 insertions(+), 46 deletions(-) (limited to 'ChangeLog') diff --git a/ChangeLog b/ChangeLog index c1fe46f..88756a2 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,40 +1,55 @@ -2004-10-15 Ross Johnson +2004-10-19 Ross Johnson - * implement.h (othread_mutex_t_): Use an event in place of - the POSIX semaphore. - * pthread_mutex_init.c: Create the event; remove semaphore init. - * pthread_mutex_destroy.c: Delete the event. - * pthread_mutex_lock.c: Replace the semaphore wait with the event wait. - * pthread_mutex_trylock.c: Likewise. - * pthread_mutex_timedlock.c: Likewise. - * pthread_mutex_unlock.c: Set the event. - + * sem_init.c (sem_init): New semaphore model based on the same idea + as mutexes, i.e. user space interlocked check to avoid + unnecessarily entering kernel space. Wraps the Win32 semaphore and + keeps it's own counter. + * sem_wait.c (sem_wait): Implemented user space check model. + * sem_post.c (sem_post): Likewise. + * sem_trywait.c (sem_trywait): Likewise. + * sem_timedwait.c (sem_timedwait): Likewise. + * sem_post_multiple.c (sem_post_multiple): Likewise. + * sem_getvalue.c (sem_getvalue): Likewise. + * ptw32_semwait.c (ptw32_semwait): Likewise. + * implement.h (sem_t_): Add counter element. + +2004-10-15 Ross Johnson + + * implement.h (othread_mutex_t_): Use an event in place of + the POSIX semaphore. + * pthread_mutex_init.c: Create the event; remove semaphore init. + * pthread_mutex_destroy.c: Delete the event. + * pthread_mutex_lock.c: Replace the semaphore wait with the event wait. + * pthread_mutex_trylock.c: Likewise. + * pthread_mutex_timedlock.c: Likewise. + * pthread_mutex_unlock.c: Set the event. + 2004-10-14 Ross Johnson - * pthread_mutex_lock.c (pthread_mutex_lock): New algorithm using - Terekhov's xchg based variation of Drepper's cmpxchg model. - Theoretically, xchg uses fewer clock cycles than cmpxchg (using IA-32 - as a reference), however, in my opinion bus locking dominates the - equation on smp systems, so the model with the least number of bus - lock operations in the execution path should win, which is Terekhov's - variant. On IA-32 uni-processor systems, it's faster to use the - CMPXCHG instruction without locking the bus than to use the XCHG - instruction, which always locks the bus. This makes the two variants - equal for the non-contended lock (fast lane) execution path on up - IA-32. Testing shows that the xchg variant is faster on up IA-32 as - well if the test forces higher lock contention frequency, even though - kernel calls should be dominating the times (on up IA-32, both - variants used CMPXCHG instructions and neither locked the bus). + * pthread_mutex_lock.c (pthread_mutex_lock): New algorithm using + Terekhov's xchg based variation of Drepper's cmpxchg model. + Theoretically, xchg uses fewer clock cycles than cmpxchg (using IA-32 + as a reference), however, in my opinion bus locking dominates the + equation on smp systems, so the model with the least number of bus + lock operations in the execution path should win, which is Terekhov's + variant. On IA-32 uni-processor systems, it's faster to use the + CMPXCHG instruction without locking the bus than to use the XCHG + instruction, which always locks the bus. This makes the two variants + equal for the non-contended lock (fast lane) execution path on up + IA-32. Testing shows that the xchg variant is faster on up IA-32 as + well if the test forces higher lock contention frequency, even though + kernel calls should be dominating the times (on up IA-32, both + variants used CMPXCHG instructions and neither locked the bus). * pthread_mutex_timedlock.c pthread_mutex_timedlock(): Similarly. * pthread_mutex_trylock.c (pthread_mutex_trylock): Similarly. * pthread_mutex_unlock.c (pthread_mutex_unlock): Similarly. - * ptw32_InterlockedCompareExchange.c (ptw32_InterlockExchange): New - function. - (PTW32_INTERLOCKED_EXCHANGE): Sets up macro to use inlined + * ptw32_InterlockedCompareExchange.c (ptw32_InterlockExchange): New + function. + (PTW32_INTERLOCKED_EXCHANGE): Sets up macro to use inlined ptw32_InterlockedExchange. - * implement.h (PTW32_INTERLOCKED_EXCHANGE): Set default to + * implement.h (PTW32_INTERLOCKED_EXCHANGE): Set default to InterlockedExchange(). - * Makefile: Building using /Ob2 so that asm sections within inline + * Makefile: Building using /Ob2 so that asm sections within inline functions are inlined. 2004-10-08 Ross Johnson @@ -42,16 +57,16 @@ * pthread_mutex_destroy.c (pthread_mutex_destroy): Critical Section element is no longer required. * pthread_mutex_init.c (pthread_mutex_init): Likewise. - * pthread_mutex_lock.c (pthread_mutex_lock): New algorithm following - Drepper's paper at http://people.redhat.com/drepper/futex.pdf, but - using the existing semaphore in place of the futex described in the + * pthread_mutex_lock.c (pthread_mutex_lock): New algorithm following + Drepper's paper at http://people.redhat.com/drepper/futex.pdf, but + using the existing semaphore in place of the futex described in the paper. Idea suggested by Alexander Terekhov - see: http://sources.redhat.com/ml/pthreads-win32/2003/msg00108.html * pthread_mutex_timedlock.c pthread_mutex_timedlock(): Similarly. * pthread_mutex_trylock.c (pthread_mutex_trylock): Similarly. * pthread_mutex_unlock.c (pthread_mutex_unlock): Similarly. - * pthread_barrier_wait.c (pthread_barrier_wait): Use inlined version - of InterlockedCompareExchange() if possible - determined at + * pthread_barrier_wait.c (pthread_barrier_wait): Use inlined version + of InterlockedCompareExchange() if possible - determined at build-time. * pthread_spin_destroy.c pthread_spin_destroy(): Likewise. * pthread_spin_lock.c pthread_spin_lock():Likewise. @@ -59,29 +74,29 @@ * pthread_spin_unlock.c (pthread_spin_unlock):Likewise. * ptw32_InterlockedCompareExchange.c: Sets up macro for inlined use. * implement.h (pthread_mutex_t_): Remove Critical Section element. - (PTW32_INTERLOCKED_COMPARE_EXCHANGE): Set to default non-inlined + (PTW32_INTERLOCKED_COMPARE_EXCHANGE): Set to default non-inlined version of InterlockedCompareExchange(). - * private.c: Include ptw32_InterlockedCompareExchange.c first for + * private.c: Include ptw32_InterlockedCompareExchange.c first for inlining. - * GNUmakefile: Add commandline option to use inlined + * GNUmakefile: Add commandline option to use inlined InterlockedCompareExchange(). * Makefile: Likewise. 2004-09-27 Ross Johnson - * pthread_mutex_lock.c (pthread_mutex_lock): Separate - PTHREAD_MUTEX_NORMAL logic since we do not need to keep or check some - state required by other mutex types; do not check mutex pointer arg - for validity - leave this to the system since we are only checking - for NULL pointers. This should improve speed of NORMAL mutexes and + * pthread_mutex_lock.c (pthread_mutex_lock): Separate + PTHREAD_MUTEX_NORMAL logic since we do not need to keep or check some + state required by other mutex types; do not check mutex pointer arg + for validity - leave this to the system since we are only checking + for NULL pointers. This should improve speed of NORMAL mutexes and marginally improve speed of other type. * pthread_mutex_trylock.c (pthread_mutex_trylock): Likewise. * pthread_mutex_unlock.c (pthread_mutex_unlock): Likewise; also avoid - entering the critical section for the no-waiters case, with approx. + entering the critical section for the no-waiters case, with approx. 30% reduction in lock/unlock overhead for this case. * pthread_mutex_timedlock.c (pthread_mutex_timedlock): Likewise; also - no longer keeps mutex if post-timeout second attempt succeeds - this - will assist applications that wish to impose strict lock deadlines, + no longer keeps mutex if post-timeout second attempt succeeds - this + will assist applications that wish to impose strict lock deadlines, rather than simply to escape from frozen locks. 2004-09-09 Tristan Savatier @@ -92,7 +107,7 @@ [Maintainer's note: the race condition is harmless on SPU systems and only a problem on MPU systems if concurrent access results in an exception (presumably generated by a hardware interrupt). There are - other instances of similar harmless race conditions that have not + other instances of similar harmless race conditions that have not been identified as issues.] 2004-09-09 Ross Johnson -- cgit v1.2.3