[firebase-br] Problemas com HT (era:PICO de 100%)

Eduardo Jedliczka edujed em gmail.com
Qua Out 5 12:51:02 -03 2005


---- Mensagem de Jim Starkey ---- 
terça-feira, 19 de julho de 2005 15:07
Does anyone actually have a good handle on what's happening?

 From the data given, it appears that the hyperthreading perform
degradation happens only with multiple Firebird threads but doesn't
happen with the same load on SMP.  By inference, this suggests that the
problem is interaction between Firebird threads and not between Firebird
and the operation system.  The obvious suspect is spinlocks.  The idea
behind hyperthreading is that the processors will change threads when
the running thread has a cache miss.  Spinlocks are long loops testing a
variable for a change, and by their nature, never result in a cache
miss.  Perhaps the "pause" instruction will do the trick and force a
thread switch.  In my (always) humble opinion, however, spinlocks are
highly dubious in SMP and a potential disaster in hyperthreading.

If it hasn't been tried already, I suggest that all spinlocks be
replaced with simple mutexes and see what happens.  I have already
eliminated all spinlocks from Vulcan other than nbak, which I'm afraid
to screw around with.

Spinlocks can be justified inside an OS microkernel, but don't belong in
high level code.  If you can't run, let somebody else run.

-- 

Jim Starkey
Netfrastructure, Inc.
978 526-1376

----  Mensagem de Geoff Worboys ----
terça-feira, 19 de julho de 2005 19:52
> I'd be happier if I knew what the race condition was and how
> it "solved".  It rather sounds like a problem papered over
> than fixed.

I asked for details back in February when I saw the description
in the FB2 alpha release notes.  Following are Nickolay's
responses to my questions.  I dont have any more details, but
hopefully this may be at least part of what you were wanting to
see.

-- 
Geoff Worboys
Telesis Computing


> This leads to two questions...
>
> 1. Was this "fix" specifically targetted at the problem I was describing, 
> or is it something else.  (Just wondering whether I am being overly 
> hopeful that the problem has actually been found and fixed.)

The Firebird 1.X SS problem was that if two requests are queued for
execution at the same time only one executes and second waits until the
next request arrives from somebody.
My fix dispatches requests to SS threads correctly.

But the whole situation is unlikely to appear during normal multi-user
workloads because users constantly try to access server which pushes the
queue.

> 2. Does the fix rely on other FB2 features, or is it a fix that can be 
> back-ported to FB1.5 ?

The problem is that if fixed it exposes race condition in lock manager
and the same workload locks up in lock manager with similar wake-up
problem, which can again be pushed via additional request.

The lock manager problem was also addressed in FB2, but the first fix
uses a new synchronization construct "semaphore with timeout". This
construct is not natively present on oldish Solaris and has to be
implemented via something else (such as SYSV semaphore and wakeup
signals/timers). In short, it backporting it is likely to hurt FB1.5
portability for a little while.

> I am definitely not the only one seeing this problem, and the more FB gets 
> out there the more the problem will be seen.  It would be great if this 
> fix (if it is about the problem I
> described) could be backported.

Lock manager dump and process dumps at lock-up point may shed some light
to the problem nature.

> Geoff Worboys

Nickolay

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel
----- Original Message ----- 
From: "Eduardo Jedliczka (TeamFB)" <jedyfb em gmail.com>
To: "FireBase" <lista em firebase.com.br>
Sent: Wednesday, October 05, 2005 12:37 PM
Subject: Re: [firebase-br] PICO de 100%


> Não sei se tenho alguma coisa sobre isto... preciso procurar nos meus 
> e-mails para ver se ainda tenho a mensagem original. mas lhe asseguro que 
> a ela foi postada na Lista de Desenvolvimento do FireBird 
> (firebird-devel em lists.sourceforge.net), e assim como a FireBase,  tem que 
> fazer um cadastro para receber os e-mails (faz tanto tempo que fiz o meu 
> que nem lembro mais como é....)
>
> O assunto estava relacionado com uma possível resolução do problema em 
> máquinas HT no FB 2, e discutiram que apesar de já terem algumas novidades 
> sobre isto, devido ao comportamento diferenciado do Lock (instrução de 
> baixo nível) em máquinas HT e bi-processadas, não seria viável implementar 
> isto no FB 2 por questões de prazo.
>
> Sucesso,
>
> =========================
> Eduardo Jedliczka
> Membro do TeamFB - FireBase
> Apucarana - Pr
> =========================
> ----- Original Message ----- 
> From: "Cristiano Joaquim - CPD" <cristiano.joaquim em auroraalimentos.com.br>
> To: "FireBase" <lista em firebase.com.br>
> Sent: Wednesday, October 05, 2005 9:13 AM
> Subject: RES: [firebase-br] PICO de 100%
>
>
> Eduardo, bom dia.
>
> Em private, você poderia me direcionar algum material que contenha esta
> documentação ???
>
> Grato,
>
> Cristiano Joaquim
> ANALISTA PROGRAMADOR
> AURORA ALIMENTOS
>
> -----Mensagem original-----
> De: lista-bounces em firebase.com.br
> [mailto:lista-bounces em firebase.com.br]Em nome de Eduardo Jedliczka
> Enviada em: terça-feira, 4 de outubro de 2005 12:46
> Para: Regis Sebastiani; FireBase
> Assunto: Re: [firebase-br] PICO de 100%
>
>
> O Problema de Spin que ocorre com o FB em máquinas HT só acontece quando 
>> duas requisições simultâneas, onde são redirecionadas uma para cada CPU
> virtual. sendo assim, uma delas é completamente resolvida e outra  fica
> bloqueada,  até que ocorra uma 3ª requisição, causando o desbloqueio 
> daquela
> que ficou presa. Além disto, há o problema do FB com máquinas 
> biprocessadas.
>
> =========================
> Eduardo Jedliczka
> (Membro do TeamFB)
> Apucarana - Pr
> ========================= 





Mais detalhes sobre a lista de discussão lista