Discussion:
imr issue
Mark Richardson
2009-07-21 01:03:15 UTC
Permalink
Hi all,
I've been playing with the imr (again for all that have seen my posts before), and I found an interesting bug. The REALLY interesting part is that mico 2.3.7 doesn't have this problem while 2.3.11, 2.3.12, and 2.3.13 do (currently using 2.3.13)!
 
I have a simple client & server that I use and they work just fine.  When creating the imr entry I use the following command...
imr create ImrTestInterfaceServer poa "/mylocation/ImrTestServer" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer -ORBImpRepoAddr inet:myMachine:2345
 
Then, I create multiple instances on the imr...
imr create ImrTestInterfaceServer0 poa "/mylocation/ImrTestServer 0" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer0 -ORBImpRepoAddr inet:myMachine:2345
imr create ImrTestInterfaceServer1 poa "/mylocation/ImrTestServer 1" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer1 -ORBImpRepoAddr inet:myMachine:2345
 
If I use the client to communicate with just one of them everything is fine.  But if I do this...
1. new client connects to the original (ImrTestServer)
2. disconnect & exit client (ImrTestServer still running)
3. new client connects to ImrTestServer0
4. disconnect & exit client (ImrTestServer0 still running)
5. new client connects to ImrTestServer1
6. disconnect & exit client (ImrTestServer1 still running)
------ everything up till now is good -----
7. new client connect to the original (ImrTestServer)
 
Then sometimes it works fine (if I restarted micod just before this), but most of the time ImrTestServer0 crashes.  Sometimes I even get ImrTestServer1 to crash.
 
The error that I get from the micod is...
ImrTestServer: mt_dispatcher.cc:124: virtual void MICO:MTDispatcher::process(MICO::seg_type*): Assertion '_msg->conn->state() == MICOMT:StateRefCnt::Terminated' failed
 
I always will get this if I repeat steps 1-6 enough (I've never gotten all the way through 2 times).  80% of the time the other 2 servers that are not being accessed by the client crash.
 
I've just started debugging, but it's gonna take me a while because I'm not that familar with the MTdispatcher.  Anyone out there got any ideas?
Karel Gardas
2009-07-21 18:25:14 UTC
Permalink
Post by Mark Richardson
Hi all,
I've been playing with the imr (again for all that have seen my posts before), and I found an interesting bug. The REALLY interesting part is that mico 2.3.7 doesn't have this problem while 2.3.11, 2.3.12, and 2.3.13 do (currently using 2.3.13)!
I have a simple client & server that I use and they work just fine. When creating the imr entry I use the following command...
imr create ImrTestInterfaceServer poa "/mylocation/ImrTestServer" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer -ORBImpRepoAddr inet:myMachine:2345
Then, I create multiple instances on the imr...
imr create ImrTestInterfaceServer0 poa "/mylocation/ImrTestServer 0" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer0 -ORBImpRepoAddr inet:myMachine:2345
imr create ImrTestInterfaceServer1 poa "/mylocation/ImrTestServer 1" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer1 -ORBImpRepoAddr inet:myMachine:2345
If I use the client to communicate with just one of them everything is fine. But if I do this...
1. new client connects to the original (ImrTestServer)
2. disconnect & exit client (ImrTestServer still running)
3. new client connects to ImrTestServer0
4. disconnect & exit client (ImrTestServer0 still running)
5. new client connects to ImrTestServer1
6. disconnect & exit client (ImrTestServer1 still running)
------ everything up till now is good -----
7. new client connect to the original (ImrTestServer)
Then sometimes it works fine (if I restarted micod just before this), but most of the time ImrTestServer0 crashes. Sometimes I even get ImrTestServer1 to crash.
The error that I get from the micod is...
ImrTestServer: mt_dispatcher.cc:124: virtual void MICO:MTDispatcher::process(MICO::seg_type*): Assertion '_msg->conn->state() == MICOMT:StateRefCnt::Terminated' failed
I always will get this if I repeat steps 1-6 enough (I've never gotten all the way through 2 times). 80% of the time the other 2 servers that are not being accessed by the client crash.
I've just started debugging, but it's gonna take me a while because I'm not that familar with the MTdispatcher. Anyone out there got any ideas?
It would be nice to see all threads stack dump of the crashed process
and also of the micod. Without it, it's hard to make any hint for
further debugging.

Cheers,
Karel
--
Karel Gardas ***@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
Mark Richardson
2009-08-05 01:26:27 UTC
Permalink
I tracked down the bug.  It's in the poamediator.cc line 677
Replace that line...
if (inf.pstate != Active && inf.pstate != Stopped &&
 
with this
--------------------------
if (inf.pstate == Active) {
  CORBA::Address *addr = (CORBA::Address*) inf.ior.addr();
 

--- On Tue, 7/21/09, Karel Gardas <***@objectsecurity.com> wrote:


From: Karel Gardas <***@objectsecurity.com>
Subject: Re: [mico-devel] imr issue
To: "Mark Richardson" <***@yahoo.com>
Cc: mico-***@mico.org
Date: Tuesday, July 21, 2009, 12:25 PM
Post by Mark Richardson
Hi all,
I've been playing with the imr (again for all that have seen my posts before), and I found an interesting bug. The REALLY interesting part is that mico 2.3.7 doesn't have this problem while 2.3.11, 2.3.12, and 2.3.13 do (currently using 2.3.13)!
 
I have a simple client & server that I use and they work just fine.  When creating the imr entry I use the following command...
imr create ImrTestInterfaceServer poa "/mylocation/ImrTestServer" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer -ORBImpRepoAddr inet:myMachine:2345
 
Then, I create multiple instances on the imr...
imr create ImrTestInterfaceServer0 poa "/mylocation/ImrTestServer 0" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer0 -ORBImpRepoAddr inet:myMachine:2345
imr create ImrTestInterfaceServer1 poa "/mylocation/ImrTestServer 1" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer1 -ORBImpRepoAddr inet:myMachine:2345
 
If I use the client to communicate with just one of them everything is fine.  But if I do this...
1. new client connects to the original (ImrTestServer)
2. disconnect & exit client (ImrTestServer still running)
3. new client connects to ImrTestServer0
4. disconnect & exit client (ImrTestServer0 still running)
5. new client connects to ImrTestServer1
6. disconnect & exit client (ImrTestServer1 still running)
------ everything up till now is good -----
7. new client connect to the original (ImrTestServer)
 
Then sometimes it works fine (if I restarted micod just before this), but most of the time ImrTestServer0 crashes.  Sometimes I even get ImrTestServer1 to crash.
 
The error that I get from the micod is...
ImrTestServer: mt_dispatcher.cc:124: virtual void MICO:MTDispatcher::process(MICO::seg_type*): Assertion '_msg->conn->state() == MICOMT:StateRefCnt::Terminated' failed
 
I always will get this if I repeat steps 1-6 enough (I've never gotten all the way through 2 times).  80% of the time the other 2 servers that are not being accessed by the client crash.
 
I've just started debugging, but it's gonna take me a while because I'm not that familar with the MTdispatcher.  Anyone out there got any ideas?
It would be nice to see all threads stack dump of the crashed process
and also of the micod. Without it, it's hard to make any hint for
further debugging.

Cheers,
Karel
--
Karel Gardas                  ***@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com
Mark Richardson
2009-08-05 01:33:54 UTC
Permalink
oops - sorry about that (dang email editor)
Anways, the bug is in poamediator.cc line 677
 
replace the line...
if (inf.pstate != Active && inf.pstate != Stopped &&
 
with this...
---------------------
if (inf.pstate==Active) {
  CORBA::Address *addr= (CORBA::Address *) inf.ior.addr();
  assert (addr)
 
  CORBA::ORBMsgId tempOrbMsgId=orb->new_orbid(orb->new_msgid());
  requests[tempOrbMsgId]=id;
  CORBA::ORBMsgId orbid=orb->bind_async(repoid, tag, addr, this, tempOrbMsgId);
  assert(orbid!=0);
  return TRUE;
}
else if (inf.pstate != Active && inf.pstate != Stopped &&
-------------------------------------
 
I know this isn't a complete fix (as the other states of inf.pstate are not considered), but it stops the problem.  Seems that if the server is active and running, then this section is skipped - then multiple messages are generated and sent to all the object adapters in svmap (which is why the other adapters crashed while the server that was desired continued to work).
 
I'd love it if this fix (or something similar) is included in the next mico - I don't know how to do that (I'm also not an official developer so I don't think I can submit a bug fix).
Mark

--- On Tue, 7/21/09, Karel Gardas <***@objectsecurity.com> wrote:


From: Karel Gardas <***@objectsecurity.com>
Subject: Re: [mico-devel] imr issue
To: "Mark Richardson" <***@yahoo.com>
Cc: mico-***@mico.org
Date: Tuesday, July 21, 2009, 12:25 PM
Post by Mark Richardson
Hi all,
I've been playing with the imr (again for all that have seen my posts before), and I found an interesting bug. The REALLY interesting part is that mico 2.3.7 doesn't have this problem while 2.3.11, 2.3.12, and 2.3.13 do (currently using 2.3.13)!
 
I have a simple client & server that I use and they work just fine.  When creating the imr entry I use the following command...
imr create ImrTestInterfaceServer poa "/mylocation/ImrTestServer" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer -ORBImpRepoAddr inet:myMachine:2345
 
Then, I create multiple instances on the imr...
imr create ImrTestInterfaceServer0 poa "/mylocation/ImrTestServer 0" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer0 -ORBImpRepoAddr inet:myMachine:2345
imr create ImrTestInterfaceServer1 poa "/mylocation/ImrTestServer 1" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer1 -ORBImpRepoAddr inet:myMachine:2345
 
If I use the client to communicate with just one of them everything is fine.  But if I do this...
1. new client connects to the original (ImrTestServer)
2. disconnect & exit client (ImrTestServer still running)
3. new client connects to ImrTestServer0
4. disconnect & exit client (ImrTestServer0 still running)
5. new client connects to ImrTestServer1
6. disconnect & exit client (ImrTestServer1 still running)
------ everything up till now is good -----
7. new client connect to the original (ImrTestServer)
 
Then sometimes it works fine (if I restarted micod just before this), but most of the time ImrTestServer0 crashes.  Sometimes I even get ImrTestServer1 to crash.
 
The error that I get from the micod is...
ImrTestServer: mt_dispatcher.cc:124: virtual void MICO:MTDispatcher::process(MICO::seg_type*): Assertion '_msg->conn->state() == MICOMT:StateRefCnt::Terminated' failed
 
I always will get this if I repeat steps 1-6 enough (I've never gotten all the way through 2 times).  80% of the time the other 2 servers that are not being accessed by the client crash.
 
I've just started debugging, but it's gonna take me a while because I'm not that familar with the MTdispatcher.  Anyone out there got any ideas?
It would be nice to see all threads stack dump of the crashed process
and also of the micod. Without it, it's hard to make any hint for
further debugging.

Cheers,
Karel
--
Karel Gardas                  ***@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com
Mark Richardson
2009-07-21 21:01:25 UTC
Permalink
Here's what I did...
1. Client connects to ImrTestInterfaceServer
2. Client disconnects and exits (ImrTestInterfaceServer still Running)
3. Client connects to ImrTestInterfaceServer0
4. Client disconnects and exits (ImrTestInterfaceServer0 still Running)
5. Client connects to ImrTestInterfaceServer1
6. Client disconnects and exits (ImrTestInterfaceServer1 still Running)
7. Client connects to ImrTestInterfaceServer
8. Client disconnects and exits
--- up till here, all is good ----
9. Client connects to ImrTestInterfaceServer0
 
At this point, ImrTestInterfaceServer1 crashes.  Here's the stack..
thread 1
#0  0x002187a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00a9e411 in ___newselect_nocancel () from /lib/tls/libc.so.6
#2  0x081d69b0 in MICO::SelectDispatcher::run (this=0x865e260, infinite=0 '\0') at dispatch.cc:447
#3  0x080a06dc in CORBA::ORB::run (this=0x865e060) at orb.cc:1773
#4  0x0804e5ff in main (argc=2, argv=0xbfe0caa4) at imrTestServer.cxx:193 (this is orb->run)
 
thread 2
#0  0x002187a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00c31a10 in ***@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#2  0x08663a3c in ?? ()
#3  0x081d1481 in MICO::WorkerThread::_run (this=0x0, arg=0x86639e8) at pthreads.h:458
#4  0x081cb7c1 in MICOMT::Thread::_thr_startup (this=0x86639e8, arg=0x86639e8) at pthreads.cc:169
#5  0x081cb8c4 in MICOMT::Thread::ThreadWrapper (arg=0xfffffffc) at pthreads.cc:149
#6  0x00c2d371 in start_thread () from /lib/tls/libpthread.so.0
#7  0x00aa59be in clone () from /lib/tls/libc.so.6
 
thread 3
#0  0x002187a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00a9e411 in ___newselect_nocancel () from /lib/tls/libc.so.6
#2  0x081d69b0 in MICO::SelectDispatcher::run (this=0x8662538, infinite=0 '\0') at dispatch.cc:447
#3  0x080b9cc0 in MICO::GIOPConnReader::_run (this=0x8662ae8, arg=0x8662ae8) at iop.h:523
#4  0x081cb7c1 in MICOMT::Thread::_thr_startup (this=0x8662ae8, arg=0x8662ae8) at pthreads.cc:169
#5  0x081cb8c4 in MICOMT::Thread::ThreadWrapper (arg=0xfffffdfe) at pthreads.cc:149
#6  0x00c2d371 in start_thread () from /lib/tls/libpthread.so.0
#7  0x00aa59be in clone () from /lib/tls/libc.so.6
 
thread 4
#0  0x002187a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00c31a10 in ***@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#2  0x08663f0c in ?? ()
#3  0x081d1481 in MICO::WorkerThread::_run (this=0x0, arg=0x8663eb8) at pthreads.h:458
#4  0x081cb7c1 in MICOMT::Thread::_thr_startup (this=0x8663eb8, arg=0x8663eb8) at pthreads.cc:169
#5  0x081cb8c4 in MICOMT::Thread::ThreadWrapper (arg=0xfffffffc) at pthreads.cc:149
#6  0x00c2d371 in start_thread () from /lib/tls/libpthread.so.0
#7  0x00aa59be in clone () from /lib/tls/libc.so.6
 
thread 5 - here's where SIGABRT received
#0  0x002187a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00a057f5 in raise () from /lib/tls/libc.so.6
#2  0x00a07199 in abort () from /lib/tls/libc.so.6
#3  0x009fedd1 in __assert_fail () from /lib/tls/libc.so.6
#4  0x081e72bc in MICO::MTDispatcher::process (this=0x8663d40, msg=0x86615c8) at util.h:81
#5  0x081cc717 in MICO::PassiveOperation::_run (this=0x8663d40) at operation.cc:236
#6  0x081d14bc in MICO::WorkerThread::_run (this=0x8663e28, arg=0x8663e28) at mt_manager.h:342
#7  0x081cb7c1 in MICOMT::Thread::_thr_startup (this=0x8663e28, arg=0x8663e28) at pthreads.cc:169
#8  0x081cb8c4 in MICOMT::Thread::ThreadWrapper (arg=0x0) at pthreads.cc:149
#9  0x00c2d371 in start_thread () from /lib/tls/libpthread.so.0
#10 0x00aa59be in clone () from /lib/tls/libc.so.6
 
Thanks for your help!

--- On Tue, 7/21/09, Karel Gardas <***@objectsecurity.com> wrote:


From: Karel Gardas <***@objectsecurity.com>
Subject: Re: [mico-devel] imr issue
To: "Mark Richardson" <***@yahoo.com>
Cc: mico-***@mico.org
Date: Tuesday, July 21, 2009, 12:25 PM
Post by Mark Richardson
Hi all,
I've been playing with the imr (again for all that have seen my posts before), and I found an interesting bug. The REALLY interesting part is that mico 2.3.7 doesn't have this problem while 2.3.11, 2.3.12, and 2.3.13 do (currently using 2.3.13)!
 
I have a simple client & server that I use and they work just fine.  When creating the imr entry I use the following command...
imr create ImrTestInterfaceServer poa "/mylocation/ImrTestServer" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer -ORBImpRepoAddr inet:myMachine:2345
 
Then, I create multiple instances on the imr...
imr create ImrTestInterfaceServer0 poa "/mylocation/ImrTestServer 0" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer0 -ORBImpRepoAddr inet:myMachine:2345
imr create ImrTestInterfaceServer1 poa "/mylocation/ImrTestServer 1" IDL:ImrTestInterface:1.0#ImrTestInterfaceServer1 -ORBImpRepoAddr inet:myMachine:2345
 
If I use the client to communicate with just one of them everything is fine.  But if I do this...
1. new client connects to the original (ImrTestServer)
2. disconnect & exit client (ImrTestServer still running)
3. new client connects to ImrTestServer0
4. disconnect & exit client (ImrTestServer0 still running)
5. new client connects to ImrTestServer1
6. disconnect & exit client (ImrTestServer1 still running)
------ everything up till now is good -----
7. new client connect to the original (ImrTestServer)
 
Then sometimes it works fine (if I restarted micod just before this), but most of the time ImrTestServer0 crashes.  Sometimes I even get ImrTestServer1 to crash.
 
The error that I get from the micod is...
ImrTestServer: mt_dispatcher.cc:124: virtual void MICO:MTDispatcher::process(MICO::seg_type*): Assertion '_msg->conn->state() == MICOMT:StateRefCnt::Terminated' failed
 
I always will get this if I repeat steps 1-6 enough (I've never gotten all the way through 2 times).  80% of the time the other 2 servers that are not being accessed by the client crash.
 
I've just started debugging, but it's gonna take me a while because I'm not that familar with the MTdispatcher.  Anyone out there got any ideas?
It would be nice to see all threads stack dump of the crashed process
and also of the micod. Without it, it's hard to make any hint for
further debugging.

Cheers,
Karel
--
Karel Gardas                  ***@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com
Loading...