Discussion:
Problems with Python's default dlopen flags
David Abrahams
2002-05-04 01:58:32 UTC
Permalink
Hi,

I'm hoping I can raise some interest in resolving this problem:

By default, Python does not use the RTLD_GLOBAL flag when opening
extension modules. Unfortunately, this breaks many C++ features when
used across modules(http://gcc.gnu.org/faq.html#dso). It also causes
these features to fail across the boundary between modules and any
shared library they might be linked to. This is a key arrangement for
Boost.Python: its extension modules all make use of a common shared
library.

I realize that we can change how modules are loaded using
sys.setdlopenflags(), imputils, etc., but all that puts knowledge in the
wrong place: the importer needs to know the special way to import a
given module. It seems to me that extension modules themselves should
have a way to report to Python that they need to be loaded with
RTLD_GLOBAL.

It appears that Boost.Python is not the only project that is having a
problem with the way this works
(http://aspn.activestate.com/ASPN/Mail/Message/xml-sig/1040230), so
perhaps there's a good reason think about alternatives?

I am not by any means an expert in GNU dynamic loading, so I only have
what are probably crackpot ideas about how to address this (if, indeed
Python is using the correct default). Making a show of constructive
suggestion:

Have Python look for a special symbol, say init<module>_dlopenflags().
If it's found, it's called. If the result doesn't match the current
dlopenflags the module is dlclose()d and re-opened with the requested
flags.

Thoughts?

-Dave

+---------------------------------------------------------------+
David Abrahams
C++ Booster (http://www.boost.org) O__ ==
Pythonista (http://www.python.org) c/ /'_ ==
resume: http://users.rcn.com/abrahams/resume.html (*) \(*) ==
email: ***@rcn.com
+---------------------------------------------------------------+
Martin v. Loewis
2002-05-04 06:37:56 UTC
Permalink
It seems to me that extension modules themselves should have a way
to report to Python that they need to be loaded with RTLD_GLOBAL.
No way; to change Python in this way would be extremely
foolish. Python was using RTLD_GLOBAL until 1.5.1, then this was
changed in 1.5.2 due to bug reports by users. Redhat decided to revert
this change, and consequently people run into problems with the Redhat
Python 1.5.2 installation.

Here is the original problem: A Python application was using both
Oracle and sockets, so it had the Oracle and socket modules
loaded. Unfortunately, both provided an initsocket function (at that
time; today the socket module provides init_socket). It so happened
that the dynamic linker chose the initsocket definition from the
socket module. When Oracle called its own initsocket function, the
call ended up in the Python module, and the application crashed; this
is painful to analyse.

Now, people apparently want to share symbols across modules. Let me
say that I find this desire misguided: Python extension modules are
*not* shared libraries, they *only* interface with the Python
interpreter. If you want to share symbols, use shared libraries: If
modules A.so and B.so have symbols in common, create a shared library
C.so that provides those symbols, and link both A.so and B.so with
this shared library.

Now, people still want to share symbols across modules. For that, you
can use CObjects: Export a CObject with an array of function pointers
in module A (e.g. as A.API), and import that C object in module B's
initialization code. See cStringIO and Numeric for examples.

Now, people still want to share symbols across modules. For that, they
can use sys.setdlopenflags.

It seems that this is a lose-lose situation: you can't please
everybody. In the current state, people that want to share symbols
can, if they really want to. With your proposed change, symbols that
accidentally clash between unrelated extensions cause problems, and
users of those modules can do nothing about it. Hence, the current
state is preferable.

HTH,
Martin
David Abrahams
2002-05-04 13:14:06 UTC
Permalink
----- Original Message -----
Post by Martin v. Loewis
It seems to me that extension modules themselves should have a way
to report to Python that they need to be loaded with RTLD_GLOBAL.
No way; to change Python in this way would be extremely
foolish. Python was using RTLD_GLOBAL until 1.5.1, then this was
changed in 1.5.2 due to bug reports by users. Redhat decided to revert
this change, and consequently people run into problems with the Redhat
Python 1.5.2 installation.
Did you misread my suggestion? I didn't say that RTLD_GLOBAL should be
the default way to load an extension module, only that there should be a
way for the module itself to determine how it's loaded.
Post by Martin v. Loewis
Here is the original problem: A Python application was using both
Oracle and sockets, so it had the Oracle and socket modules
loaded. Unfortunately, both provided an initsocket function (at that
time; today the socket module provides init_socket). It so happened
that the dynamic linker chose the initsocket definition from the
socket module. When Oracle called its own initsocket function, the
call ended up in the Python module, and the application crashed; this
is painful to analyse.
Yes, that must have been. I can also imagine that it causes problems for
identically-named (sub)modules in packages (though my lack of expertise
should be apparent here again: maybe dlsym() will always grab the symbol
from the newly opened library).
Post by Martin v. Loewis
Now, people apparently want to share symbols across modules. Let me
say that I find this desire misguided: Python extension modules are
*not* shared libraries, they *only* interface with the Python
interpreter.
It surprised me as well when I started developing Boost.Python, but it
turns out that people really think it's important to be able to do
component-based development on their Python extensions and occasionally
they need to be able to register things like exception translators from
one module which will be used by another module. However, as you can see
below, nothing that fancy is in play in this case...
Post by Martin v. Loewis
If you want to share symbols, use shared libraries: If
modules A.so and B.so have symbols in common, create a shared library
C.so that provides those symbols, and link both A.so and B.so with
this shared library.
Guess what? That's what Boost.Python does! In fact, in the cases we're
seeing that are fixed by using RTLD_GLOBAL **there's no need for sharing
of symbols across across A.so and B.so**! The arrangement looks like
this:

python
/ \
(dlopen) (dlopen)
/ \
| |
V V
ext1.so ext2.so
\ /
(ld) (ld)
\ /
\ /
| |
V V
libboost_python.so

And in fact, I expect to ask users to do something special, like
explicitly linking between extension modules, if they want to share
exception/RTTI information between ext1 and ext2 directly. However, this
is what I didn't expect: the lack of RTLD_GLOBAL flags interferes with
the ability for ext1.so to catch C++ exceptions thrown by
libboost_python.so!

Are you suggesting that in order to do this, my users need to add yet
another .so, a thin layer between Python and the guts of their extension
modules?
Post by Martin v. Loewis
Now, people still want to share symbols across modules. For that, you
can use CObjects: Export a CObject with an array of function pointers
in module A (e.g. as A.API), and import that C object in module B's
initialization code. See cStringIO and Numeric for examples.
Of course you realize that won't help with C++ exception tables...
Post by Martin v. Loewis
Now, people still want to share symbols across modules. For that, they
can use sys.setdlopenflags.
...which leads us back to the fact that the smarts are in the wrong
place. The extension module writer knows that this particular extension
needs to share symbols, and once the module is loaded it's too late.
Post by Martin v. Loewis
It seems that this is a lose-lose situation: you can't please
everybody. In the current state, people that want to share symbols
can, if they really want to. With your proposed change, symbols that
accidentally clash between unrelated extensions cause problems, and
users of those modules can do nothing about it. Hence, the current
state is preferable.
So give setdlopenflags a "force" option which overrides the setting
designated by the extension module. I realize it's messy (probably too
messy). If I could think of some non-messy advice for my users that
avoids a language change, I'd like that just as well.

-Dave
David Abrahams
2002-05-04 13:53:28 UTC
Permalink
[remembering to delete the ----- Original Message ----- line for Guido
this time <1E-15 wink>]
Post by David Abrahams
Post by Martin v. Loewis
It seems that this is a lose-lose situation: you can't please
everybody. In the current state, people that want to share symbols
can, if they really want to. With your proposed change, symbols that
accidentally clash between unrelated extensions cause problems, and
users of those modules can do nothing about it. Hence, the current
state is preferable.
So give setdlopenflags a "force" option which overrides the setting
designated by the extension module. I realize it's messy (probably too
messy). If I could think of some non-messy advice for my users that
avoids a language change, I'd like that just as well.
Come to think of it, no override is needed. If the module won't work
without sharing symbols, there's no point in overriding its desire for
RTLD_GLOBAL, because it still won't work. So I'm back to suggesting that
the module ought to tell python how to load it, period.

-Dave
Andrew MacIntyre
2002-05-05 00:40:09 UTC
Permalink
Post by David Abrahams
Did you misread my suggestion? I didn't say that RTLD_GLOBAL should be
the default way to load an extension module, only that there should be a
way for the module itself to determine how it's loaded.
I don't want to rain on your parade, but some reasearch into dynamic
linking loaders proves distressing:-

- some always link globally (Windows, OS/2 as I understand);
- the others require explicit specification of the method (global/local).

The ones that give you the choice require you to specify the method when
the first reference to the shared object is made. If you want to change
the mode, you have to dereference all entry points and unload the SO
before having another go. This turns out to be nightmarish.

In addition to the shim module elsewhere referred to, I think that you
might also be able to leverage a pure Python wrapper to import the other
modules (in effect a specialised version of the import hook wrappers
available.

Another apprach is to use a "core" module which links all the necessary
bits and provides a vector of entry points which can be retrieved via a
Python API call (I forget which relatively common module I've seen which
does this).

--
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: ***@bullseye.apana.org.au | Snail: PO Box 370
***@pcug.org.au | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
David Abrahams
2002-05-05 04:25:54 UTC
Permalink
----- Original Message -----
Post by Andrew MacIntyre
Post by David Abrahams
Did you misread my suggestion? I didn't say that RTLD_GLOBAL should be
the default way to load an extension module, only that there should be a
way for the module itself to determine how it's loaded.
I don't want to rain on your parade, but some reasearch into dynamic
linking loaders proves distressing:-
- some always link globally (Windows, OS/2 as I understand);
- the others require explicit specification of the method
(global/local).

This information is not new to me; I'm aware of the differences. I'm
quite familiar with the Windows model, and it's not quite global linking
in the same sense as on Unix: you explicitly specify which symbols are
designated for sharing.
Post by Andrew MacIntyre
The ones that give you the choice require you to specify the method when
the first reference to the shared object is made. If you want to change
the mode, you have to dereference all entry points and unload the SO
before having another go. This turns out to be nightmarish.
AFAICT that would not be a nightmare in the scenario I'm suggesting. The
initial dlopen wouldn't use RTLD_GLOBAL, so there would only be one or
two entry points to dereference before unloading the module and trying
again.
Post by Andrew MacIntyre
In addition to the shim module elsewhere referred to, I think that you
might also be able to leverage a pure Python wrapper to import the other
modules (in effect a specialised version of the import hook wrappers
available.
Yes, (as I'll repeat once again) this approach puts the burden on users
of the extension module to know that it needs to be imported in a
special way. That's just wrong.
Post by Andrew MacIntyre
Another apprach is to use a "core" module which links all the
necessary
Post by Andrew MacIntyre
bits and provides a vector of entry points which can be retrieved via a
Python API call (I forget which relatively common module I've seen which
does this).
That won't work for C++, and even if it could, it doesn't conform to my
users' needs. The symbols (not just entry points) which the compiler
generates to support RTTI and exception-handling aren't normally
available to users.

-Dave
Martin v. Loewis
2002-05-05 07:26:58 UTC
Permalink
Post by David Abrahams
Did you misread my suggestion? I didn't say that RTLD_GLOBAL should be
the default way to load an extension module, only that there should be a
way for the module itself to determine how it's loaded.
I dismissed your suggestion as being too complex. There are a number
of questions involved which I cannot answer that may effect usability
of this approach; most of them have to do with dlclosing the library:

1. If the extension module is C++ code, dlopening the module will run
constructors for global objects. dlclosing it will run destructors.
So the dlopen/dlclose/dlopen cycle might have side effects; that
might be confusing.
2. Again, with C++ code, on Linux, with gcc 2.95.x, a block-local
static object will register its destructor with atexit(3). When
the module is dlclosed, the code to be called at exit goes away;
then the program crashes atexit. This is undesirable.
3. If the module is also used as a library that some other module
links against, I'm not sure what the semantics of dlclose is. I'd
feel uncomfortable with such a feature if I don't know precisely
how it acts in boundary cases.
Post by David Abrahams
And in fact, I expect to ask users to do something special, like
explicitly linking between extension modules
Indeed, that should also work fine - if users explicitly link
extension modules against each other, they should be able to share
symbols. The need for RTLD_GLOBAL only occurs when they want to share
symbols, but don't want to link the modules against each other.
Post by David Abrahams
However, this is what I didn't expect: the lack of RTLD_GLOBAL flags
interferes with the ability for ext1.so to catch C++ exceptions
thrown by libboost_python.so!
That is surprising indeed, and hard to believe. Can you demonstrate
that in a small example?
Post by David Abrahams
Are you suggesting that in order to do this, my users need to add
yet another .so, a thin layer between Python and the guts of their
extension modules?
Originally, that's what I suggested. I now think that, for symbol
sharing, linking the modules against each other should be sufficient.
Post by David Abrahams
Post by Martin v. Loewis
Now, people still want to share symbols across modules. For that, you
can use CObjects: Export a CObject with an array of function pointers
in module A (e.g. as A.API), and import that C object in module B's
initialization code. See cStringIO and Numeric for examples.
Of course you realize that won't help with C++ exception tables...
Actually, I don't: I can't see what C++ exception tables have to do
with it - the exception regions are local in any case.
Post by David Abrahams
...which leads us back to the fact that the smarts are in the wrong
place. The extension module writer knows that this particular
extension needs to share symbols, and once the module is loaded it's
too late.
The extension module writer can't possibly have this knowledge - to
know whether it is _safe_ to share symbols, you have to know the
complete set of extension modules in the application. If a single
module uses your proposed feature, it would export its symbols to all
other extensions - whether they want those symbols or not. Hence you
might still end up with a situation where you can't use two extensions
in a single application because of module clashes.
Post by David Abrahams
So give setdlopenflags a "force" option which overrides the setting
designated by the extension module. I realize it's messy (probably too
messy). If I could think of some non-messy advice for my users that
avoids a language change, I'd like that just as well.
For that, I'd need to understand the problem of your users first. I'm
unhappy to introduce work-arounds for incompletely-understood
problems.

Regards,
Martin
David Abrahams
2002-05-05 13:33:29 UTC
Permalink
Post by Martin v. Loewis
Post by David Abrahams
Did you misread my suggestion? I didn't say that RTLD_GLOBAL should be
the default way to load an extension module, only that there should be a
way for the module itself to determine how it's loaded.
I dismissed your suggestion as being too complex.
Fair enough [explicit dismissal helps to reduce confusion].
Post by Martin v. Loewis
There are a number
of questions involved which I cannot answer that may effect usability
1. If the extension module is C++ code, dlopening the module will run
constructors for global objects. dlclosing it will run destructors.
So the dlopen/dlclose/dlopen cycle might have side effects; that
might be confusing.
and quite possibly unacceptable to users; granted. I hadn't thought of
that.
Post by Martin v. Loewis
2. Again, with C++ code, on Linux, with gcc 2.95.x, a block-local
static object will register its destructor with atexit(3). When
the module is dlclosed, the code to be called at exit goes away;
then the program crashes atexit. This is undesirable.
and absolutely unacceptable to me, so I guess that approach is out.
Post by Martin v. Loewis
3. If the module is also used as a library that some other module
links against, I'm not sure what the semantics of dlclose is. I'd
feel uncomfortable with such a feature if I don't know precisely
how it acts in boundary cases.
I would hope that it would be nicely reference-counted, but if *you*
don't know that answer I'm guessing it's undocumented and I wouldn't
want to count on that either.
Post by Martin v. Loewis
Post by David Abrahams
And in fact, I expect to ask users to do something special, like
explicitly linking between extension modules
Indeed, that should also work fine - if users explicitly link
extension modules against each other, they should be able to share
symbols. The need for RTLD_GLOBAL only occurs when they want to share
symbols, but don't want to link the modules against each other.
Heh, that's what I'd have thought, too.
Post by Martin v. Loewis
Post by David Abrahams
However, this is what I didn't expect: the lack of RTLD_GLOBAL flags
interferes with the ability for ext1.so to catch C++ exceptions
thrown by libboost_python.so!
That is surprising indeed, and hard to believe. Can you demonstrate
that in a small example?
Unfortunately, the only reproducible case we have is not exactly small.
However, we can give anyone interested full access to the machine and
test case where it's occurring [details appended at the bottom of this
message].
Post by Martin v. Loewis
Post by David Abrahams
Are you suggesting that in order to do this, my users need to add
yet another .so, a thin layer between Python and the guts of their
extension modules?
Originally, that's what I suggested. I now think that, for symbol
sharing, linking the modules against each other should be sufficient.
Post by David Abrahams
Post by Martin v. Loewis
Now, people still want to share symbols across modules. For that, you
can use CObjects: Export a CObject with an array of function pointers
in module A (e.g. as A.API), and import that C object in module B's
initialization code. See cStringIO and Numeric for examples.
Of course you realize that won't help with C++ exception tables...
Actually, I don't: I can't see what C++ exception tables have to do
with it - the exception regions are local in any case.
You're right, I mis-spoke: it has nothing to do with exception tables.
It's an RTTI problem: the type-specific catch clause exception-handler
doesn't catch the thrown exception.
Post by Martin v. Loewis
Post by David Abrahams
...which leads us back to the fact that the smarts are in the wrong
place. The extension module writer knows that this particular
extension needs to share symbols, and once the module is loaded it's
too late.
The extension module writer can't possibly have this knowledge - to
know whether it is _safe_ to share symbols, you have to know the
complete set of extension modules in the application.
Yes, but if the module /requires/ symbol sharing it sort of doesn't
matter whether it's safe. If you don't share symbols, the module won't
work (and will probably crash), just as when you share symbols and
there's a clash you'll probably get a crash.
Post by Martin v. Loewis
If a single
module uses your proposed feature, it would export its symbols to all
other extensions - whether they want those symbols or not. Hence you
might still end up with a situation where you can't use two extensions
in a single application because of module clashes.
Yep. The existence of namespaces in the C++ case may mitigate the
situation somewhat, but I understand the problem.
Post by Martin v. Loewis
Post by David Abrahams
So give setdlopenflags a "force" option which overrides the setting
designated by the extension module. I realize it's messy (probably too
messy). If I could think of some non-messy advice for my users that
avoids a language change, I'd like that just as well.
For that, I'd need to understand the problem of your users first. I'm
unhappy to introduce work-arounds for incompletely-understood
problems.
I agree with that attitude. And after all, my suggestion was really just
a straw-man.

---------

Environment:
RedHat 7.1 on Intel
gcc 3.0.4 used for everything, incl. Python compilation.
All our code (but not Python) is compiled with -g.
Example compile and link lines:


g++ -fPIC -ftemplate-depth-50 -DNDEBUG -w -g -I"/net/cci/rwgk/phenix/in
clude" -I"/net/cci/rwgk/cctbx/include" -I"/net/cci/rwgk/boost" -I/usr/l
ocal_cci/Python-2.2.1_gcc304/include/python2.2 -c
refinementtbxmodule.cpp
g++ -shared -Wl,-E -o refinementtbx.so refinementtbxmodule.o
error.o -L/net/taipan/scratch1/rwgk/py221gcc304/lib -lboost_python -lm

Here is the core of where things go wrong: (part of the Boost.Python
library)

try
{
PyObject* const result = f->do_call(args, keywords);
if (result != 0)
return result;
}
catch(const argument_error&)
{
// SHOULD COME BACK HERE
}
catch(...) // this catch clause inserted for debugging
{
// BUT COMES BACK HERE
throw;
}

The exact some source works with Compaq cxx on Alpha (and also Visual
C++ 6 & 7, Win32 CodeWarrior 7.0). I put in print statements and
compared the output from the Linux machine with the output from the
Alpha. This shows that under Linux the identity of the exception is
somehow lost: Alpha comes back where it says "SHOULD COME BACK HERE",
Linux comes back where it says "BUT COMES BACK HERE". This is after
many successful passes through "SHOULD COME BACK HERE".

The problem is very elusive. Most of the time the exception handling
works correctly. For example, if we change small details about the order
in which exceptions are thrown gcc works just fine
(execution flow still passes repeatedly through "SHOULD COME BACK
HERE"). Also, changing the dlopen flags used when loading extension
modules to include RTLD_GLOBAL makes the problem disappear.

It is very difficult to generate a simple test case (our application
requires both Python and Boost). I have spent days
trying to isolate what is going wrong, unfortunately to no avail.

Suggestion:

I (Ralf) could set up an account for you on our machines. Using this
account you would have direct access to all the source code files and
compiled binaries that are needed to reproduce our problem. You could
directly enter gdb to investigate.
Tim Peters
2002-05-05 17:07:21 UTC
Permalink
[David Abrahams, presumably quoting Ralf W. Grosse-Kunstleve]
Post by David Abrahams
...
The exact some source works with Compaq cxx on Alpha (and also Visual
C++ 6 & 7, Win32 CodeWarrior 7.0). I put in print statements and
compared the output from the Linux machine with the output from the
Alpha. This shows that under Linux the identity of the exception is
somehow lost: Alpha comes back where it says "SHOULD COME BACK HERE",
Linux comes back where it says "BUT COMES BACK HERE". This is after
many successful passes through "SHOULD COME BACK HERE".
The problem is very elusive. Most of the time the exception handling
works correctly. For example, if we change small details about the order
in which exceptions are thrown gcc works just fine
(execution flow still passes repeatedly through "SHOULD COME BACK
HERE"). Also, changing the dlopen flags used when loading extension
modules to include RTLD_GLOBAL makes the problem disappear.
Whoa -- this shows all the signs of a wild store and consequent memory
corruption. If it were truly a problem with resolving symbols, it would
fail every time. Instead it sometimes works, sometimes doesn't, and all
depending on "stuff that shouldn't matter". Try efence?
Gordon McMillan
2002-05-05 19:33:16 UTC
Permalink
Post by Tim Peters
[David Abrahams, presumably quoting Ralf W.
Grosse-Kunstleve] >
[stuff about exceptions on Linux...]
Post by Tim Peters
Whoa -- this shows all the signs of a wild store
and consequent memory corruption. If it were truly
a problem with resolving symbols, it would fail
every time. Instead it sometimes works, sometimes
doesn't, and all depending on "stuff that shouldn't
matter".
There is some strangeness to exceptions, Linux, gcc
and linking. In scxx (my minimalist C++ / Python
interface), there's no separate .so involved - the
scxx code is compiled in with the extension. There
are no statics involved, so C linkage works (you don't
need a relinked Python). At a certain gcc release,
exceptions thrown and caught at the top level stopped
working (abort). "eric" of scipy fame had a
similar (but not identical) experience.

I think scipy's fix was to require Python be built and
linked by g++. Mine was to stop doing that (throwing
and catching at the same level).

So we all have gcc and C++ exceptions and linkage in
common. Leg 4 of the elephant is out there someplace.

-- Gordon
http://www.mcmillan-inc.com/
David Abrahams
2002-05-05 21:01:54 UTC
Permalink
Post by Gordon McMillan
There is some strangeness to exceptions, Linux, gcc
and linking. In scxx (my minimalist C++ / Python
interface), there's no separate .so involved - the
scxx code is compiled in with the extension. There
are no statics involved, so C linkage works (you don't
need a relinked Python). At a certain gcc release,
exceptions thrown and caught at the top level
What does "at the top level" mean?
Post by Gordon McMillan
stopped
working (abort). "eric" of scipy fame had a
similar (but not identical) experience.
I think scipy's fix was to require Python be built and
linked by g++. Mine was to stop doing that (throwing
and catching at the same level).
Same question, I guess: what is a "level"?

-Dave
Gordon McMillan
2002-05-05 23:02:58 UTC
Permalink
Post by David Abrahams
Post by Gordon McMillan
There is some strangeness to exceptions, Linux, gcc
and linking. In scxx (my minimalist C++ / Python
interface), there's no separate .so involved - the
scxx code is compiled in with the extension. There
are no statics involved, so C linkage works (you
don't need a relinked Python). At a certain gcc
release, exceptions thrown and caught at the top
level
What does "at the top level" mean?
The function is an entry point. I think eric diagnosed
it as simply throw / catch at the same level. Throwing
in a called function & catching in the caller worked
fine for both of us.

-- Gordon
http://www.mcmillan-inc.com/
David Abrahams
2002-05-05 23:46:00 UTC
Permalink
----- Original Message -----
Post by Gordon McMillan
Post by David Abrahams
Post by Gordon McMillan
There is some strangeness to exceptions, Linux, gcc
and linking. In scxx (my minimalist C++ / Python
interface), there's no separate .so involved - the
scxx code is compiled in with the extension. There
are no statics involved, so C linkage works (you
don't need a relinked Python). At a certain gcc
release, exceptions thrown and caught at the top
level
What does "at the top level" mean?
The function is an entry point. I think eric diagnosed
it as simply throw / catch at the same level. Throwing
in a called function & catching in the caller worked
fine for both of us.
Are you saying that the following prints "fail"?

#include <iostream>

void init_mymodule()
{
try {
throw "hello";
}
catch( char const*) {}
catch(...) {
std::cout << "fail";
}
}

but that this does not?

#include <iostream>
void throw_hi() { throw "hello"; }
void init_mymodule()
{
try {
throw_hi();
}
catch( char const*) {}
catch(...) {
std::cout << "fail";
}
}
Gordon McMillan
2002-05-06 14:10:39 UTC
Permalink
On 5 May 2002 at 18:46, David Abrahams wrote:

[C++ exceptions and gcc 2.95(?)]
Post by David Abrahams
Are you saying that the following prints "fail"?
#include <iostream>
void init_mymodule()
{
try {
throw "hello";
}
catch( char const*) {}
catch(...) {
std::cout << "fail";
}
}
but that this does not?
#include <iostream>
void throw_hi() { throw "hello"; }
void init_mymodule()
{
try {
throw_hi();
}
catch( char const*) {}
catch(...) {
std::cout << "fail";
}
}
Yes, if you change "hello" to a non-primitive
type (and Python is not linked by g++).

-- Gordon
http://www.mcmillan-inc.com/
Martin v. Loewis
2002-05-05 21:45:39 UTC
Permalink
Post by Gordon McMillan
There is some strangeness to exceptions, Linux, gcc
and linking. In scxx (my minimalist C++ / Python
interface), there's no separate .so involved - the
scxx code is compiled in with the extension. There
are no statics involved, so C linkage works (you don't
need a relinked Python). At a certain gcc release,
exceptions thrown and caught at the top level stopped
working (abort).
I think Tim's analysis is right: If it fails every time, there is some
conceptual problem somewhere - this one sounds like a bug in gcc,
which might fail to emit exception regions correctly or some such (I
think I recall one such bug in g++).

If it sometimes fails, sometimes "succeed", it rather sounds like
memory corruption. I can well imagine how memory corruption could
affect RTTI: either the vtable pointer in an object gets overwritten,
or the RTTI objects themselves get overwritten.

Regards,
Martin
David Abrahams
2002-05-05 22:00:00 UTC
Permalink
----- Original Message -----
Post by Martin v. Loewis
I think Tim's analysis is right: If it fails every time, there is some
conceptual problem somewhere - this one sounds like a bug in gcc,
which might fail to emit exception regions correctly or some such (I
think I recall one such bug in g++).
Yes, 2.95.x definitely has EH bugs. 3.0.x is much better, AFAICT.

-Dave
Jack Jansen
2002-05-05 20:20:21 UTC
Permalink
Post by Martin v. Loewis
Post by David Abrahams
And in fact, I expect to ask users to do something special, like
explicitly linking between extension modules
Indeed, that should also work fine - if users explicitly link
extension modules against each other, they should be able to share
symbols.
From experience I can say this is not a good idea. MacOS
extension modules used to do this until Python 1.5.2, and it can
have undesired side effects (similar to the C++ initializer
problems you noted). If you need to communicate symbols between
module A and B it's better to make a normal shared library
ABglue.so which contains the glue code, and link both A and B
against it. The only question is where to put the ABglue.so
file, on Mac and some (most?) unixen it's probably okay to drop
it in lib-dynload, but it could be you need some fiddling of the
flags that control the shared library search path for A and B. I
think on Windows you're stuck with putting it in the system
directory (at least, I assume that that's why PyWinTypes is
there).
--
- Jack Jansen <***@oratrix.com>
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --
Emma Goldman -
David Abrahams
2002-05-05 21:08:48 UTC
Permalink
Post by Jack Jansen
Post by Martin v. Loewis
Post by David Abrahams
And in fact, I expect to ask users to do something special, like
explicitly linking between extension modules
Indeed, that should also work fine - if users explicitly link
extension modules against each other, they should be able to share
symbols.
From experience I can say this is not a good idea. MacOS
extension modules used to do this until Python 1.5.2, and it can
have undesired side effects (similar to the C++ initializer
problems you noted). If you need to communicate symbols between
module A and B it's better to make a normal shared library
ABglue.so which contains the glue code, and link both A and B
against it. The only question is where to put the ABglue.so
file,
That's hardly the only question:

If A needs to throw an exception which is caught in B, what is "the glue
code" that you put in ABglue.so?

If a class with inlined virtual functions is used by A with subclasses
defined in B, what is the "glue code"?

What about templates? ;-) (vague, but relevant)

-Dave
Jack Jansen
2002-05-05 22:09:33 UTC
Permalink
Post by David Abrahams
Post by Jack Jansen
Post by Martin v. Loewis
Post by David Abrahams
And in fact, I expect to ask users to do something special, like
explicitly linking between extension modules
Indeed, that should also work fine - if users explicitly link
extension modules against each other, they should be able to share
symbols.
From experience I can say this is not a good idea. MacOS
extension modules used to do this until Python 1.5.2, and it can
have undesired side effects (similar to the C++ initializer
problems you noted). If you need to communicate symbols between
module A and B it's better to make a normal shared library
ABglue.so which contains the glue code, and link both A and B
against it. The only question is where to put the ABglue.so
file,
If A needs to throw an exception which is caught in B, what is
"the glue
code" that you put in ABglue.so?
If a class with inlined virtual functions is used by A with subclasses
defined in B, what is the "glue code"?
What about templates? ;-) (vague, but relevant)
Well, you're now going into territory where I'm not really
familiar, C++, and more precisely it's runtime system on various
platforms (especially the stuff with zero-overhead runtimes and
such tends to make my head explode), but I would guess that if
you take the code from A that declares symbols used by B and
vice versa and move that to a common ancestor you should be done.

If all else fails you could even take the whole A module except
the init routine and stuff it in Acore.so, similarly for B, link
Acore.so against Bcore.so and vice versa and have the A and B
modules be skeletons with just the init routine. That way you
know that your Acore and Bcore will be loaded via the normal
shared library loading method, which probably handles
initializers and cross-segment exceptions correctly,
--
- Jack Jansen <***@oratrix.com>
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --
Emma Goldman -
David Abrahams
2002-05-05 22:16:06 UTC
Permalink
Post by Jack Jansen
Post by David Abrahams
If A needs to throw an exception which is caught in B, what is
"the glue
code" that you put in ABglue.so?
If a class with inlined virtual functions is used by A with
subclasses
Post by Jack Jansen
Post by David Abrahams
defined in B, what is the "glue code"?
What about templates? ;-) (vague, but relevant)
Well, you're now going into territory where I'm not really
familiar, C++, and more precisely it's runtime system on various
platforms (especially the stuff with zero-overhead runtimes and
such tends to make my head explode), but I would guess that if
you take the code from A that declares symbols used by B and
vice versa and move that to a common ancestor you should be done.
Speculation aside, the RTTI info which allows the compiler to identify
exceptions is never explicitly declared by the user. In some
implementations it's generated automatically wherever the first declared
virtual function is implemented. I don't know what the rules are when
there are no virtual functions, or the virtual functions are inlinied,
or the thrown type is a tempalte which is instantiated implicitly... of
course this is OT for Python-dev but I'm hoping now that I've got the
attention of MvL (a gcc developer) he'll answer these questions or
direct me to someone who can ;-)
Post by Jack Jansen
If all else fails you could even take the whole A module except
the init routine and stuff it in Acore.so, similarly for B, link
Acore.so against Bcore.so and vice versa and have the A and B
modules be skeletons with just the init routine. That way you
know that your Acore and Bcore will be loaded via the normal
shared library loading method, which probably handles
initializers and cross-segment exceptions correctly,
Yes, now we're back to MvL's suggestion to use shim libraries, which
provides very certain behavior but also complicates building and
deployment.

-Dave
Martin v. Loewis
2002-05-06 06:42:37 UTC
Permalink
I don't know what the rules are when there are no virtual functions,
or the virtual functions are inlinied, or the thrown type is a
tempalte which is instantiated implicitly... of course this is OT
for Python-dev but I'm hoping now that I've got the attention of MvL
(a gcc developer) he'll answer these questions or direct me to
someone who can ;-)
:-) In these cases, the compiler emits RTTI whenever it is "used",
which essentially means when a constructor or destructor is emitted
(since those explicitly reference the vtable, which explicitly
references the type_info object).

Thus, you may end up with multiple copies of the RTTI at the object
file level. When combining them into shared objects or executables,
when using the GNU linker, the linker will eliminate duplicates as
part of the gnu.linkonce processing; other linkers will pick an
arbitrary copy as part of the weak symbol processing.

At run-time, multiple copies across different DSOs are eliminated by
the dynamic loader (ld.so) *if* all those copies are in the global
symbol space (RTLD_GLOBAL).

During the development of the standard C++ ABI, people thought that
those mechanisms will guarantee that symbols can be resolved uniquely
at run-time, thus allowing address comparisons for typeinfo object
equality. It turned out that this won't work even in cases that are
meant to be supported, so the C++ runtime is now back to comparing
typeinfo object's .name() strings to establish equality.

Regards,
Martin
Martin v. Loewis
2002-05-05 21:50:21 UTC
Permalink
From experience I can say this is not a good idea. MacOS extension
modules used to do this until Python 1.5.2, and it can have undesired
side effects (similar to the C++ initializer problems you noted). If
you need to communicate symbols between module A and B it's better to
make a normal shared library ABglue.so which contains the glue code,
and link both A and B against it.
I completely agree that this is the best thing to do, as it is most
portable. On Linux (and probably all other SysV ELF systems), you can
get away with linking against an extension module.
The only question is where to put the ABglue.so file, on Mac and
some (most?) unixen it's probably okay to drop it in lib-dynload,
but it could be you need some fiddling of the flags that control the
shared library search path for A and B.
On Unix, you'll need to set LD_LIBRARY_PATH, or put the library into a
directory where ld.so automatically searches for libraries; that is
a deployment problem.
I think on Windows you're stuck with putting it in the system
directory (at least, I assume that that's why PyWinTypes is there).
Not necessarily: Any directory on PATH will do, plus the directory
where the executable is (i.e. c:\python22). For some reason,
c:\python22\DLLs is also searched, although I'm not sure what magic
arranges that.

Regards,
Martin
Gustavo Niemeyer
2002-05-06 13:45:44 UTC
Permalink
Hello Martin!
Post by Martin v. Loewis
On Unix, you'll need to set LD_LIBRARY_PATH, or put the library into a
directory where ld.so automatically searches for libraries; that is
a deployment problem.
That's solvable. He could create a custom distutils class to include a
parameter like -Wl,-rpath,<install-dir> at compile time.
--
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gisle Aas
2002-05-04 15:32:11 UTC
Permalink
You should take a look at how pyperl does it. It uses a stub loader
that python itself end up loading on 'import perl' and then this stub
loader loads the real module with RTLD_GLOBAL.

The stub loader is 'dlhack.c'. Get the source from
http://downloads.activestate.com/Zope-Perl/pyperl-1.0.1.tar.gz
--
Gisle Aas,
ActiveState
David Abrahams
2002-05-04 15:51:35 UTC
Permalink
----- Original Message -----
Post by Gisle Aas
You should take a look at how pyperl does it. It uses a stub loader
that python itself end up loading on 'import perl' and then this stub
loader loads the real module with RTLD_GLOBAL.
The stub loader is 'dlhack.c'. Get the source from
http://downloads.activestate.com/Zope-Perl/pyperl-1.0.1.tar.gz
AFAICT, this is what I meant when I wrote:

"Are you suggesting that in order to do this, my users need to add yet
another .so, a thin layer between Python and the guts of their extension
modules?"

There biggest problem with this arrangement is that my users are
creating lots of extension modules. Each one would require a seperate
stub loader, thereby doubling the number of shared objects they need to
create, complicating distribution and the build process.

I'm not familiar with pyperl, but from the source it looks like there's
just one Python extension module in this case (perl.so), so the same
problems may not apply.

-Dave
Mark Hammond
2002-05-05 04:08:44 UTC
Permalink
Or at least documenting it well :)

I don't understand the issues at all, other than my belief that Linux makes
it harder than it needs to be. Eg, in the recent PyXPCOM changes, I found
that I needed to add the following line of code:

// *sob* - seems necessary to open the .so as RTLD_GLOBAL
dlopen(PYTHON_SO,RTLD_NOW | RTLD_GLOBAL);

Where PYTHON_SO is the base-name of the Python shared library. I wont
pretend to understand the issues, but without this flag *my* code worked
fine, and I could call Python fine. However, as soon as Python attempted to
import a module in a dynamic library, it would fail with "_PyNone_Struct" as
unresolved. This includes standard Python extension modules (cStringIO,
new, etc) which obviously import fine normally.

Now, in this case, "my code" above is actually itself implemented in a .so.
(and has had the Python .so linked against it as I can call and use Python
itself fine without the extra dlopen()). This .so has been previously
dynamically loaded by Mozilla. Presumably this impacts the situation as
otherwise my code is doing a bog-standard Python initialization that works
perfectly well in its own executable.

As I said, I wont even pretend to understand the issues, but I do know that:
* The line above made my code work.
* Windows works perfectly in this situation, and always has.
* It shouldn't be this hard.

In the interests of moving this forward, I would be happy to build some test
or example code that implements some "interesting but reasonable" solutions
using dynamic linking. Eg, Martin's mail in this thread documents how
symbol sharing should be implemented using shims etc - however, in my
particular situation, I have no special symbol sharing requirements - all I
have is a dynamically loaded .so that itself needs to use Python in a .so,
but I am having problems with Python itself using other .so modules. Again,
note that my .so works fine, and can use Python itself (linked as a .so)
just fine, but Python itself is having problems loading .so modules. All
this is ActivePython, as Python doesn't (didn't?) support .so creation by
default.

<aside>
On the plus-side, that was *nothing* compared to the thread-state hacks I
had to pull to get this working. In my particular situation, I may either
be faced with Python already having been loaded, initialized, and
thread-state released, or Python never been initialized at all.

PRBool bDidInitPython = !Py_IsInitialized(); // well, I will next line,
anyway :-)
if (bDidInitPython) {
Py_Initialize();
if (!Py_IsInitialized()) {
LogError("Python initialization failed!\n");
return NS_ERROR_FAILURE;
}
PyEval_InitThreads();
}
// Get the Python interpreter state
PyThreadState *threadStateCreated = NULL;
PyThreadState *threadState = PyThreadState_Swap(NULL);
if (threadState==NULL) {
// no thread-state - set one up.
// *sigh* - what I consider a bug is that Python
// will deadlock unless we own the lock before creating
// a new interpreter (it appear Py_NewInterpreter has
// really only been tested/used with no thread lock
PyEval_AcquireLock();
threadState = threadStateCreated = Py_NewInterpreter();
PyThreadState_Swap(NULL);
}
PyEval_ReleaseLock();
PyEval_AcquireThread(threadState);
// finally I can now reliably use Python.
// cleanup
// Abandon the thread-lock, as the first thing Python does
// is re-establish the lock (the Python thread-state story SUCKS!!!)
if (threadStateCreated) {
Py_EndInterpreter(threadStateCreated);
PyEval_ReleaseLock(); // see Py_NewInterpreter call above
} else {
PyEval_ReleaseThread(threadState);
PyThreadState *threadStateSave = PyThreadState_Swap(NULL);
if (threadStateSave)
PyThreadState_Delete(threadStateSave);
}

I discovered that if no Python thread-state is current, there is no
reasonable way to get at a PyInterpreterState :( PyInterpreterState_Head()
is too recent to rely on. Further, if you call PyInterpreterState_New()
there is no reasonable way to set it up (eg, populate builtins etc) - this
logic is in Py_Initialize(), which relies on a global "initialized"
variable, rather than checking a value in PyInterpreterState. Thus,
PyInterpreterState_New, as a public symbol, appears useless.

I hope to make some specific thread-state cleanup proposals soon <wink>
</aside>

Mark.
Martin v. Loewis
2002-05-05 07:39:39 UTC
Permalink
Post by Mark Hammond
I don't understand the issues at all, other than my belief that Linux makes
it harder than it needs to be. Eg, in the recent PyXPCOM changes, I found
// *sob* - seems necessary to open the .so as RTLD_GLOBAL
dlopen(PYTHON_SO,RTLD_NOW | RTLD_GLOBAL);
Where PYTHON_SO is the base-name of the Python shared library.
I have a number of problems following your description, so let me try
to guess. In no released version of Python, libpython is build as a
shared library. However, I'll assume that you have build
libpythonxy.so as a shared library, and that this library provides the
symbols that normally the Python executable provides.

I further assume that the dlopen call is not in the code of the Python
interpreter itself, but somewhere inside Mozilla (I don't know what
PyXPCOM is).

In that case, using RTLD_GLOBAL is one option. It is quite safe to use
here, since it clutters the global namespace primarily with symbols
that all start with Py; those are unlikely to clash with other
symbols. Using RTLD_GLOBAL is needed since extension modules search
the Py symbols in the global namespace; they normally come from the
executable.

The other option is to explicitly link all extension modules against
your libpythonxy.so. If that is done, you can drop the RTLD_GLOBAL.
Of course, the standard build process won't link extension modules
with libpythonxy.so, since that library does not even exist in the
standard build process.
Post by Mark Hammond
* The line above made my code work.
Yes, see above.
Post by Mark Hammond
* Windows works perfectly in this situation, and always has.
This is because on Windows, every extension module is linked against
pythonxy.dll, to find the extension symbols.
Post by Mark Hammond
* It shouldn't be this hard.
I don't think it is hard. I feel that the situation on Windows is much
worse - to build an extension module, you need the import library, and
you need to link against it, and you thus hard-code the version of the
interpreter into the extension module, and you thus cannot use
extension modules across Python releases. There are trade-offs on both
sides.

Regards,
Martin
Mark Hammond
2002-05-05 09:05:24 UTC
Permalink
[Martin]
Post by Martin v. Loewis
I have a number of problems following your description, so let me try
to guess. In no released version of Python, libpython is build as a
shared library. However, I'll assume that you have build
libpythonxy.so as a shared library, and that this library provides the
symbols that normally the Python executable provides.
Thanks for the note! I now realize I left critical information deep in the
body - I was using ActivePython as it is the only released version of Python
that comes with a version built as a shared library (along with a
traditional static one)
Post by Martin v. Loewis
In that case, using RTLD_GLOBAL is one option. It is quite safe to use
here, since it clutters the global namespace primarily with symbols
that all start with Py; those are unlikely to clash with other
symbols. Using RTLD_GLOBAL is needed since extension modules search
the Py symbols in the global namespace; they normally come from the
executable.
OK - cool. So you are saying that what I have done is reasonable (whereas I
assumed it was a nasty hack :)
Post by Martin v. Loewis
The other option is to explicitly link all extension modules against
your libpythonxy.so. If that is done, you can drop the RTLD_GLOBAL.
Of course, the standard build process won't link extension modules
with libpythonxy.so, since that library does not even exist in the
standard build process.
Right. Notwithstanding that the distribution I used *does* have a
libpython, I see that the same extension modules are used by the static and
dynamic versions of the core, so therefore can't have that reference.

Thanks for the info - strangely enough, it appears it is not a "problem" at
all ;)

Mark.
Jack Jansen
2002-05-05 20:10:32 UTC
Permalink
Post by Mark Hammond
Or at least documenting it well :)
I don't understand the issues at all, other than my belief that Linux makes
it harder than it needs to be.
If I understand correctly it isn't Linux, it's the Python way of
dynamic loading which makes it more difficult. The way Python
does dynamic loading is modeled after how old Unix dynamic
loaders worked (which was mainly a dirty hack without much
design): you load a .so and at load time it will invoke a
special version of the "ld" link editor which will at that time
do symbol resolving, etc.
As this "magic resolving symbols from what happens to be the
loading application" is so ubiquitous in Unix most current unix
variants support it, one way or another.

This is a whole different situation from Windows and MacOS,
which are in this respect more modern, and have a conceptually
more simple model: you always link against a stub library, and
at runtime the symbols are always taken from the corresponding
dynamic library. This is a lot safer (no more symbols that are
accidentally satisfied from a completely unexpected library,
version checking, etc) but it can make a few things more
difficult, such as dynamic modules that refer to each other, or
loading extension modules into a statically linked program.


--
- Jack Jansen <***@oratrix.com>
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --
Emma Goldman -
Ralf W. Grosse-Kunstleve
2002-05-05 19:45:13 UTC
Permalink
Post by David Abrahams
Unfortunately, the only reproducible case we have is not exactly small.
However, we can give anyone interested full access to the machine and
test case where it's occurring [details appended at the bottom of this
message].
That message is referring to a slightly different situation:
I am using a statically linked libboost_python.a.

When I wrote the message that David attached I only had a very
complex example. Now I have a significantly simpler one, and this
is what I think is happening:

python -> dlopen ext1.so with statically linked libboost_python.a
python -> dlopen ext2.so with statically linked libboost_python.a
obj1 = ext1.a_type()
obj1 has a member function which will dispatch to one
of two alternative C++ function. In the course of
resolving which C++ function to use an exception is
raised in ext2.so that is meant to be caught in ext1.so.
If both extension modules are imported with RTLD_LOCAL
the exception is not caught correctly. However, when
using RTLD_GLOBAL this works properly.

Considering the discussion in this thread it is now pretty clear
that this behavior is expected.

So... I have just done the step that I should apparently have done
months ago (this is for how long the problem is bugging me):
Compiling our extension modules against a libboost_python.so.
With that my example works even without changing setdlopenflags.

I am obviously learning the hard way here. At least things
are starting to make sense now. And it seems as if RTLD_LOCAL
will work for a shared Boost.Python library.

Thanks for sharing your valuable knowledge about dynamic linking.

Ralf
David Abrahams
2002-05-05 20:59:07 UTC
Permalink
----- Original Message -----
Post by Ralf W. Grosse-Kunstleve
Thanks for sharing your valuable knowledge about dynamic linking.
...and my apologies for wasting bandwidth here on python-dev. I had
asked Ralf specific questions that would have led to my understanding
what he was doing wrong, but there was apparently a miscommunication
somewhere. I guess the lesson is not to bring bugs forward unless I can
reproduce them myself.

re-lurking-ly y'rs,
dave
Michael Hudson
2002-05-06 12:26:54 UTC
Permalink
Post by David Abrahams
----- Original Message -----
Post by Ralf W. Grosse-Kunstleve
Thanks for sharing your valuable knowledge about dynamic linking.
...and my apologies for wasting bandwidth here on python-dev.
I don't mind -- I've learnt a lot!

Cheers,
M.
--
I really hope there's a catastrophic bug insome future e-mail
program where if you try and send an attachment it cancels your
ISP account, deletes your harddrive, and pisses in your coffee
-- Adam Rixey
Martin v. Loewis
2002-05-05 22:20:12 UTC
Permalink
Post by Ralf W. Grosse-Kunstleve
python -> dlopen ext1.so with statically linked libboost_python.a
python -> dlopen ext2.so with statically linked libboost_python.a
That explains a lot of things indeed. It doesn't explain why the
exception handling on Linux fails (that should still work fine even
with two separate copy of each typeinfo object, IMO), but it gives a
clue as to what things may have broken: you get two copies of each
global object, and gives you access to both copies. Depending on the
exact interaction sequence, it is well possible that all kinds of
corruption occur.

Regards,
Martin
David Abrahams
2002-05-05 22:29:36 UTC
Permalink
Post by Martin v. Loewis
Post by Ralf W. Grosse-Kunstleve
python -> dlopen ext1.so with statically linked
libboost_python.a
Post by Martin v. Loewis
Post by Ralf W. Grosse-Kunstleve
python -> dlopen ext2.so with statically linked
libboost_python.a
Post by Martin v. Loewis
That explains a lot of things indeed. It doesn't explain why the
exception handling on Linux fails (that should still work fine even
with two separate copy of each typeinfo object, IMO),
Not the way I read http://gcc.gnu.org/faq.html#dso.
Am I missing something? If an exception is thrown from ext1 -> ext2 and
they're not sharing symbols, there will be distinct copies of all
typeinfo objects used in the two modules, and the address comparisons
used to determine whether a catch clause matches ought to fail, no?
Post by Martin v. Loewis
but it gives a
clue as to what things may have broken: you get two copies of each
global object, and gives you access to both copies. Depending on the
exact interaction sequence, it is well possible that all kinds of
corruption occur.
I don't think Ralf is explicitly using any non-const global objects or
explicitly relying on on object identity across extension modules, so
it's hard to imagine that this is at play.

-Dave
Martin v. Loewis
2002-05-06 06:31:03 UTC
Permalink
Post by David Abrahams
Post by Martin v. Loewis
That explains a lot of things indeed. It doesn't explain why the
exception handling on Linux fails (that should still work fine even
with two separate copy of each typeinfo object, IMO),
Not the way I read http://gcc.gnu.org/faq.html#dso.
Am I missing something? If an exception is thrown from ext1 -> ext2 and
they're not sharing symbols, there will be distinct copies of all
typeinfo objects used in the two modules, and the address comparisons
used to determine whether a catch clause matches ought to fail, no?
Right. However, address comparisons are used only in gcc 3.0; gcc 2.95
and earlier used string comparisons, and gcc 3.1 will use string
comparisons again. I was assuming that Ralf used gcc 2.95 to build the
binaries.

As Tim explains, if that was the cause, you'ld have a systematic error
(failure every time). If the failure is only sometimes, it must be
something else.
Post by David Abrahams
I don't think Ralf is explicitly using any non-const global objects or
explicitly relying on on object identity across extension modules, so
it's hard to imagine that this is at play.
Are you sure there are no objects of static storage duration in Boost?
It doesn't matter whether he uses them "explicitly": if he calls
functions that use them, the object being used depends on which DSO
the caller is in. With RTLD_GLOBAL, all calls go to one of the DSOs
(independent from caller), thus a single and consistent set of global
objects is used.

Regards,
Martin
David Abrahams
2002-05-06 10:09:08 UTC
Permalink
----- Original Message -----
Post by Martin v. Loewis
Post by David Abrahams
Post by Martin v. Loewis
That explains a lot of things indeed. It doesn't explain why the
exception handling on Linux fails (that should still work fine even
with two separate copy of each typeinfo object, IMO),
Not the way I read http://gcc.gnu.org/faq.html#dso.
Am I missing something? If an exception is thrown from ext1 -> ext2 and
they're not sharing symbols, there will be distinct copies of all
typeinfo objects used in the two modules, and the address comparisons
used to determine whether a catch clause matches ought to fail, no?
Right. However, address comparisons are used only in gcc 3.0; gcc 2.95
and earlier used string comparisons, and gcc 3.1 will use string
comparisons again.
That's interesting information! I wouldn't have guessed, myself, that
address comparison was a huge speed benefit in real applications anyway.
Post by Martin v. Loewis
I was assuming that Ralf used gcc 2.95 to build the
binaries.
No, he gave up up on 2.95.x a while back because the EH was too buggy.
Boost.Python v1 (which he's been using) relies more heavily than it
probably should on the EH mechanism.
Post by Martin v. Loewis
As Tim explains, if that was the cause, you'ld have a systematic error
(failure every time). If the failure is only sometimes, it must be
something else.
Post by David Abrahams
I don't think Ralf is explicitly using any non-const global objects or
explicitly relying on on object identity across extension modules, so
it's hard to imagine that this is at play.
Are you sure there are no objects of static storage duration in Boost?
No, only that there are none which need to be shared across modules.
Boost.Python v1 can also be built as a static library (even on Windows,
where sharing is explicit), and it works just fine.
Post by Martin v. Loewis
It doesn't matter whether he uses them "explicitly": if he calls
functions that use them, the object being used depends on which DSO
the caller is in. With RTLD_GLOBAL, all calls go to one of the DSOs
(independent from caller), thus a single and consistent set of global
objects is used.
Yes, I understand. Having a sense of Ralf's application, it seems possible
but unlikely to me that this is the case.

-Dave
Ralf W. Grosse-Kunstleve
2002-05-06 09:59:04 UTC
Permalink
The discussion in this thread is going in a direction that makes it
seem more suitable for the Python C++ SIG. Therefore I have just posted
my new findings here:

http://mail.python.org/pipermail/c++-sig/2002-May/001021.html

Thanks for all the feedback,
Ralf
Loading...