Discussion:
PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Victor Stinner
2014-07-01 07:44:02 UTC
Permalink
Hi,

IMO we must decide if scandir() must support or not file descriptor.
It's an important decision which has an important impact on the API.


To support scandir(fd), the minimum is to store dir_fd in DirEntry:
dir_fd would be None for scandir(str).


scandir(fd) must not close the file descriptor, it should be done by
the caller. Handling the lifetime of the file descriptor is a
difficult problem, it's better to let the user decide how to handle
it.

There is the problem of the limit of open file descriptors, usually
1024 but it can be lower. It *can* be an issue for very deep file
hierarchy.

If we choose to support scandir(fd), it's probably safer to not use
scandir(fd) by default in os.walk() (use scandir(str) instead), wait
until the feature is well tested, corner cases are well known, etc.


The second step is to enhance pathlib.Path to support an optional file
descriptor. Path already has methods on filenames like chmod(),
exists(), rename(), etc.


Example:

fd = os.open(path, os.O_DIRECTORY)
try:
for entry in os.scandir(fd):
# ... use entry to benefit of entry cache: is_dir(), lstat_result ...
path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
# ... use path which uses dir_fd ...
finally:
os.close(fd)

Problem: if the path object is stored somewhere and use after the
loop, Path methods will fail because dir_fd was closed. It's even
worse if a new directory uses the same file descriptor :-/ (security
issue, or at least tricky bugs!)

Victor
Ben Hoyt
2014-07-01 12:26:15 UTC
Permalink
Thanks, Victor.

I don't have any experience with dir_fd handling, so unfortunately
can't really comment here.

What advantages does it bring? I notice that even os.listdir() on
Python 3.4 doesn't have anything related to file descriptors, so I'd
be in favour of not including support. We can always add it later.

-Ben
Post by Victor Stinner
Hi,
IMO we must decide if scandir() must support or not file descriptor.
It's an important decision which has an important impact on the API.
dir_fd would be None for scandir(str).
scandir(fd) must not close the file descriptor, it should be done by
the caller. Handling the lifetime of the file descriptor is a
difficult problem, it's better to let the user decide how to handle
it.
There is the problem of the limit of open file descriptors, usually
1024 but it can be lower. It *can* be an issue for very deep file
hierarchy.
If we choose to support scandir(fd), it's probably safer to not use
scandir(fd) by default in os.walk() (use scandir(str) instead), wait
until the feature is well tested, corner cases are well known, etc.
The second step is to enhance pathlib.Path to support an optional file
descriptor. Path already has methods on filenames like chmod(),
exists(), rename(), etc.
fd = os.open(path, os.O_DIRECTORY)
# ... use entry to benefit of entry cache: is_dir(), lstat_result ...
path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
# ... use path which uses dir_fd ...
os.close(fd)
Problem: if the path object is stored somewhere and use after the
loop, Path methods will fail because dir_fd was closed. It's even
worse if a new directory uses the same file descriptor :-/ (security
issue, or at least tricky bugs!)
Victor
_______________________________________________
Python-Dev mailing list
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
Victor Stinner
2014-07-01 13:01:26 UTC
Permalink
Post by Ben Hoyt
Thanks, Victor.
I don't have any experience with dir_fd handling, so unfortunately
can't really comment here.
What advantages does it bring? I notice that even os.listdir() on
Python 3.4 doesn't have anything related to file descriptors, so I'd
be in favour of not including support.
See https://docs.python.org/dev/library/os.html#dir-fd

The idea is to make sure that you get files from the same directory.
Problems occur when a directory is moved or a symlink is modified.
Example:

- you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned
by www user (website)
- you would like to remove the file "x": call unlink("/tmp/copy/passwd")
- ... but just before that, an attacker replaces the /tmp/copy
directory with a symlink to /etc
- you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh

Using unlink("passwd", dir_fd=tmp_copy_fd), you don't have this issue.
You are sure that you are working in /tmp/copy directory.

You can imagine a lot of other scenarios to override files and read
sensitive files.

Hopefully, the Linux rm commands knows unlinkat() sycall ;-)

***@selma$ mkdir -p a/b/c
***@selma$ strace -e unlinkat rm -rf a
unlinkat(5, "c", AT_REMOVEDIR) = 0
unlinkat(4, "b", AT_REMOVEDIR) = 0
unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0
+++ exited with 0 +++

We should implement a similar think in shutil.rmtree().

See also os.fwalk() which is a version of os.walk() providing dir_fd.
Post by Ben Hoyt
We can always add it later.
I would prefer to discuss that right now. My proposition is to accept
an int for scandir() and copy the int into DirEntry.dir_fd. It's not
that complex :-)

The enhancement of the pathlib module can be done later. By the way, I
know that Antoine Pitrou wanted to implemented file descriptors in
pathlib, but the feature was rejected or at least delayed.

Victor
Akira Li
2014-07-01 15:58:03 UTC
Permalink
Post by Ben Hoyt
Thanks, Victor.
I don't have any experience with dir_fd handling, so unfortunately
can't really comment here.
What advantages does it bring? I notice that even os.listdir() on
Python 3.4 doesn't have anything related to file descriptors, so I'd
be in favour of not including support. We can always add it later.
-Ben
Post by Victor Stinner
import os
os.listdir(os.open('.', os.O_RDONLY))
NOTE: os.supports_fd and os.supports_dir_fd are different sets.

See also,
https://mail.python.org/pipermail/python-dev/2014-June/135265.html


--
Akira


P.S. Please, don't put your answer on top of the message you are
replying to.
Post by Ben Hoyt
Post by Victor Stinner
Hi,
IMO we must decide if scandir() must support or not file descriptor.
It's an important decision which has an important impact on the API.
dir_fd would be None for scandir(str).
scandir(fd) must not close the file descriptor, it should be done by
the caller. Handling the lifetime of the file descriptor is a
difficult problem, it's better to let the user decide how to handle
it.
There is the problem of the limit of open file descriptors, usually
1024 but it can be lower. It *can* be an issue for very deep file
hierarchy.
If we choose to support scandir(fd), it's probably safer to not use
scandir(fd) by default in os.walk() (use scandir(str) instead), wait
until the feature is well tested, corner cases are well known, etc.
The second step is to enhance pathlib.Path to support an optional file
descriptor. Path already has methods on filenames like chmod(),
exists(), rename(), etc.
fd = os.open(path, os.O_DIRECTORY)
# ... use entry to benefit of entry cache: is_dir(), lstat_result ...
path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
# ... use path which uses dir_fd ...
os.close(fd)
Problem: if the path object is stored somewhere and use after the
loop, Path methods will fail because dir_fd was closed. It's even
worse if a new directory uses the same file descriptor :-/ (security
issue, or at least tricky bugs!)
Victor
_______________________________________________
Python-Dev mailing list
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
Charles-François Natali
2014-07-02 10:51:43 UTC
Permalink
Post by Victor Stinner
IMO we must decide if scandir() must support or not file descriptor.
It's an important decision which has an important impact on the API.
I don't think we should support it: it's way too complicated to use,
error-prone, and leads to messy APIs.
Victor Stinner
2014-07-02 11:59:26 UTC
Permalink
Post by Charles-François Natali
I don't think we should support it: it's way too complicated to use,
error-prone, and leads to messy APIs.
Can you please elaborate? Which kind of issue do you see? Handling the
lifetime of the directory file descriptor?

You don't like the dir_fd parameter of os functions?

I don't have an opinion of supporting scandir(int). I asked to discuss
it in the PEP directly.

Victor
Charles-François Natali
2014-07-02 17:20:41 UTC
Permalink
Post by Victor Stinner
Post by Charles-François Natali
I don't think we should support it: it's way too complicated to use,
error-prone, and leads to messy APIs.
Can you please elaborate? Which kind of issue do you see? Handling the
lifetime of the directory file descriptor?
Yes, among other things. You can e.g. have a look at os.fwalk() or
shutil._rmtree_safe_fd() to see that using those *properly* is far
from being trivial.
Post by Victor Stinner
You don't like the dir_fd parameter of os functions?
Exactly, I think it complicates the API for little benefit (FWIW, no
other language I know of exposes them).
Martin v. Löwis
2014-07-07 06:39:07 UTC
Permalink
Post by Victor Stinner
scandir(fd) must not close the file descriptor, it should be done by
the caller. Handling the lifetime of the file descriptor is a
difficult problem, it's better to let the user decide how to handle
it.
This is an open issue still: when is the file descriptor closed.

I think the generator returned from scandir needs to support a
.close method that guarantees to close the file descriptor.
AFAICT, the pure-Python prototype of scandir already does, but
it should be specified in the PEP.

While we are at it: is it intended that the generator will also
support the other generator methods, in particular .send and
.throw?

Regards,
Martin

Loading...