Discussion:
Does Zip Importer have to be Special?
Phil Thompson
2014-07-24 16:55:15 UTC
Permalink
I have an importer for use in applications that embed an interpreter
that does a similar job to the Zip importer (except that the storage is
a C data structure rather than a .zip file). Just like the Zip importer
I need to import my importer and add it to sys.path_hooks. However the
earliest opportunity I have to do this is after the Py_Initialize() call
returns - but this is too late because some parts of the standard
library have already needed to be imported.

My current workaround is to include a modified version of _bootstrap.py
as a frozen module that has the necessary steps added to the end of its
_install() function.

The Zip importer doesn't have this problem because it gets special
treatment - the call to its equivalent code is hard-coded and happens
exactly when needed.

What would help is a table of functions that were called where
_PyImportZip_Init() is currently called. By default the only entry in
the table would be _PyImportZip_Init. There would be a way of modifying
the table, either like how PyImport_FrozenModules is handled or how
Inittab is handled.

...or if there is a better solution that I have missed that doesn't
require a modified _bootstrap.py.

Thanks,
Phil
Brett Cannon
2014-07-24 17:48:59 UTC
Permalink
Post by Phil Thompson
I have an importer for use in applications that embed an interpreter
that does a similar job to the Zip importer (except that the storage is
a C data structure rather than a .zip file). Just like the Zip importer
I need to import my importer and add it to sys.path_hooks. However the
earliest opportunity I have to do this is after the Py_Initialize() call
returns - but this is too late because some parts of the standard
library have already needed to be imported.
My current workaround is to include a modified version of _bootstrap.py
as a frozen module that has the necessary steps added to the end of its
_install() function.
The Zip importer doesn't have this problem because it gets special
treatment - the call to its equivalent code is hard-coded and happens
exactly when needed.
What would help is a table of functions that were called where
_PyImportZip_Init() is currently called. By default the only entry in
the table would be _PyImportZip_Init. There would be a way of modifying
the table, either like how PyImport_FrozenModules is handled or how
Inittab is handled.
...or if there is a better solution that I have missed that doesn't
require a modified _bootstrap.py.
Basically you want a way to specify arguments into
importlib._bootstrap._install() so that sys.path_hooks and sys.meta_path
were configurable instead of hard-coded (it could also be done just past
importlib being installed, but that's a minor detail). Either way there is
technically no reason not to allow for it, just lack of motivation since
this would only come up for people who embed the interpreter AND have a
custom importer which affects loading the stdlib as well (any reason you
can't freeze the stdblib as a solution?).

We could go the route of some static array that people could modify.
Another option would be to allow for the specification of a single function
which is called just prior to importing the rest of the stdlib,

The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.

IOW allowing for easy patching of Python is probably the best option I can
think of. Would tweaking importlib._bootstrap._install() to accept
specified values for sys.meta_path and sys.path_hooks be enough so that you
can change the call site for those functions?
Phil Thompson
2014-07-24 18:12:13 UTC
Permalink
On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson
Post by Phil Thompson
I have an importer for use in applications that embed an interpreter
that does a similar job to the Zip importer (except that the storage is
a C data structure rather than a .zip file). Just like the Zip importer
I need to import my importer and add it to sys.path_hooks. However the
earliest opportunity I have to do this is after the Py_Initialize() call
returns - but this is too late because some parts of the standard
library have already needed to be imported.
My current workaround is to include a modified version of
_bootstrap.py
as a frozen module that has the necessary steps added to the end of its
_install() function.
The Zip importer doesn't have this problem because it gets special
treatment - the call to its equivalent code is hard-coded and happens
exactly when needed.
What would help is a table of functions that were called where
_PyImportZip_Init() is currently called. By default the only entry in
the table would be _PyImportZip_Init. There would be a way of
modifying
the table, either like how PyImport_FrozenModules is handled or how
Inittab is handled.
...or if there is a better solution that I have missed that doesn't
require a modified _bootstrap.py.
Basically you want a way to specify arguments into
importlib._bootstrap._install() so that sys.path_hooks and
sys.meta_path
were configurable instead of hard-coded (it could also be done just past
importlib being installed, but that's a minor detail). Either way there is
technically no reason not to allow for it, just lack of motivation since
this would only come up for people who embed the interpreter AND have a
custom importer which affects loading the stdlib as well (any reason you
can't freeze the stdblib as a solution?).
Not really. I'd lose the compression my importer implements.

(Are there any problems with freezing packages rather than simple
modules?)
We could go the route of some static array that people could modify.
Another option would be to allow for the specification of a single function
which is called just prior to importing the rest of the stdlib,
The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.
IOW allowing for easy patching of Python is probably the best option I can
think of. Would tweaking importlib._bootstrap._install() to accept
specified values for sys.meta_path and sys.path_hooks be enough so that you
can change the call site for those functions?
My importer runs under PathFinder so it needs sys.path as well (and
doesn't need sys.meta_path).

Phil
Brett Cannon
2014-07-24 18:26:21 UTC
Permalink
Post by Phil Thompson
On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson
Post by Phil Thompson
I have an importer for use in applications that embed an interpreter
that does a similar job to the Zip importer (except that the storage is
a C data structure rather than a .zip file). Just like the Zip importer
I need to import my importer and add it to sys.path_hooks. However the
earliest opportunity I have to do this is after the Py_Initialize() call
returns - but this is too late because some parts of the standard
library have already needed to be imported.
My current workaround is to include a modified version of
_bootstrap.py
as a frozen module that has the necessary steps added to the end of its
_install() function.
The Zip importer doesn't have this problem because it gets special
treatment - the call to its equivalent code is hard-coded and happens
exactly when needed.
What would help is a table of functions that were called where
_PyImportZip_Init() is currently called. By default the only entry in
the table would be _PyImportZip_Init. There would be a way of modifying
the table, either like how PyImport_FrozenModules is handled or how
Inittab is handled.
...or if there is a better solution that I have missed that doesn't
require a modified _bootstrap.py.
Basically you want a way to specify arguments into
importlib._bootstrap._install() so that sys.path_hooks and
sys.meta_path
were configurable instead of hard-coded (it could also be done just past
importlib being installed, but that's a minor detail). Either way there is
technically no reason not to allow for it, just lack of motivation since
this would only come up for people who embed the interpreter AND have a
custom importer which affects loading the stdlib as well (any reason you
can't freeze the stdblib as a solution?).
Not really. I'd lose the compression my importer implements.
(Are there any problems with freezing packages rather than simple
modules?)
Nope, modules and packages are both supported.
Post by Phil Thompson
We could go the route of some static array that people could modify.
Another option would be to allow for the specification of a single function
which is called just prior to importing the rest of the stdlib,
The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.
IOW allowing for easy patching of Python is probably the best option I can
think of. Would tweaking importlib._bootstrap._install() to accept
specified values for sys.meta_path and sys.path_hooks be enough so that you
can change the call site for those functions?
My importer runs under PathFinder so it needs sys.path as well (and
doesn't need sys.meta_path).
sys.path can be set via PYTHONPATH, etc. so that shouldn't be as much of an
issue.
Phil Thompson
2014-07-25 09:36:18 UTC
Permalink
On Thu Jul 24 2014 at 2:12:20 PM, Phil Thompson
Post by Phil Thompson
Post by Brett Cannon
IOW allowing for easy patching of Python is probably the best option I can
think of. Would tweaking importlib._bootstrap._install() to accept
specified values for sys.meta_path and sys.path_hooks be enough so that you
can change the call site for those functions?
My importer runs under PathFinder so it needs sys.path as well (and
doesn't need sys.meta_path).
sys.path can be set via PYTHONPATH, etc. so that shouldn't be as much of an
issue.
I prefer to have Py_IgnoreEnvironmentFlag set.

Also I'm not clear at what point I would import my custom importer?

Phil
Nick Coghlan
2014-07-24 20:42:39 UTC
Permalink
Post by Brett Cannon
The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.

Note that this is *exactly* the problem PEP 432 is designed to handle:
separating the configuration of the core interpreter from the configuration
of the operating system interfaces, so the latter can run relatively
normally (at least compared to today).

As you say, though it's a niche problem compared to something like
packaging, which is why it got bumped down my personal priority list. I
haven't even got back to the first preparatory step I identified which is
to separate out our main functions to a separate "Programs" directory so
it's easier to distinguish "embeds Python" sections of the code from the
more typical "is part of Python" and "extends Python" code.
Post by Brett Cannon
IOW allowing for easy patching of Python is probably the best option I
can think of.

Yeah, that sounds reasonable - IIRC, Christian ended up going with a
similar "make it patch friendly" approach for the hashing changes, rather
than going overboard with configuration options.

Cheers,
Nick.
Phil Thompson
2014-07-25 09:33:41 UTC
Permalink
Post by Brett Cannon
Post by Brett Cannon
The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.
separating the configuration of the core interpreter from the
configuration
of the operating system interfaces, so the latter can run relatively
normally (at least compared to today).
The implementation of PEP 432 would be great.
Post by Brett Cannon
As you say, though it's a niche problem compared to something like
packaging, which is why it got bumped down my personal priority list. I
haven't even got back to the first preparatory step I identified which is
to separate out our main functions to a separate "Programs" directory so
it's easier to distinguish "embeds Python" sections of the code from the
more typical "is part of Python" and "extends Python" code.
Is there any way for somebody you don't trust :) to be able to help move
it forward?

Phil
Nick Coghlan
2014-07-25 12:30:54 UTC
Permalink
This post might be inappropriate. Click to display it.
Loading...