Discussion:
Add a frozendict builtin type
Victor Stinner
2012-02-27 18:53:27 UTC
Permalink
Rationale
=========

A frozendict type is a common request from users and there are various
implementations. There are two main Python implementations:

* "blacklist": frozendict inheriting from dict and overriding methods
to raise an exception when trying to modify the frozendict
* "whitelist": frozendict not inheriting from dict and only implement
some dict methods, or implement all dict methods but raise exceptions
when trying to modify the frozendict

The blacklist implementation has a major issue: it is still possible
to call write methods of the dict class (e.g. dict.set(my_frozendict,
key, value)).

The whitelist implementation has an issue: frozendict and dict are not
"compatible", dict is not a subclass of frozendict (and frozendict is
not a subclass of dict).

I propose to add a new frozendict builtin type and make dict type
inherits from it. frozendict would not have methods to modify its
content and values must be immutable.


Constraints
===========

* frozendict values must be immutable, as dict keys
* frozendict can be used with the C API of the dict object (e.g.
PyDict_GetItem) but write methods (e.g. PyDict_SetItem) would fail
with a TypeError ("expect dict, got frozendict")
* frozendict.__hash__() has to be determinist
* frozendict has not the following methods: clear, __delitem__, pop,
popitem, setdefault, __setitem__ and update. As tuple/frozenset has
less methods than list/set.
* issubclass(dict, frozendict) is True, whereas
issubclass(frozendict, dict) is False


Implementation
==============

* Add an hash field to the PyDictObject structure
* Make dict inherits from frozendict
* frozendict values are checked for immutability property by calling
their __hash__ method, with a fast-path for known immutable types
(int, float, bytes, str, tuple, frozenset)
* frozendict.__hash__ computes hash(frozenset(self.items())) and
caches the result is its private hash attribute

Attached patch is a work-in-progress implementation.


TODO
====

* Add a frozendict abstract base class to collections?
* frozendict may not overallocate dictionary buckets?

--

Examples of frozendict implementations:

http://bob.pythonmac.org/archives/2005/03/04/frozendict/
http://code.activestate.com/recipes/498072-implementing-an-immutable-dictionary/
http://code.activestate.com/recipes/414283-frozen-dictionaries/
http://corebio.googlecode.com/svn/trunk/apidocs/corebio.utils.frozendict-class.html
http://code.google.com/p/lingospot/source/browse/trunk/frozendict/frozendict.py
http://cmssdt.cern.ch/SDT/doxygen/CMSSW_4_4_2/doc/html/d6/d2f/classfrozendict_1_1frozendict.html

See also the recent discussion on python-list:

http://mail.python.org/pipermail/python-list/2012-February/1287658.html

--

See also the PEP 351.

Victor
Xavier Morel
2012-02-27 21:13:37 UTC
Permalink
Post by Victor Stinner
Rationale
=========
A frozendict type is a common request from users and there are various
* "blacklist": frozendict inheriting from dict and overriding methods
to raise an exception when trying to modify the frozendict
* "whitelist": frozendict not inheriting from dict and only implement
some dict methods, or implement all dict methods but raise exceptions
when trying to modify the frozendict
The blacklist implementation has a major issue: it is still possible
to call write methods of the dict class (e.g. dict.set(my_frozendict,
key, value)).
The whitelist implementation has an issue: frozendict and dict are not
"compatible", dict is not a subclass of frozendict (and frozendict is
not a subclass of dict).
This may be an issue at the C level (I'm not sure), but since this would
be a Python 3-only collection, "user" code (in Python) should/would
generally be using abstract base classes, so type-checking would not
be an issue (as in Python code performing `isinstance(a, dict)` checks
naturally failing on `frozendict`)

Plus `frozenset` does not inherit from `set`, it's a whitelist
reimplementation and I've never known anybody to care. So there's
that precedent. And of course there's no inheritance relationship
between lists and tuples.
Post by Victor Stinner
* frozendict has not the following methods: clear, __delitem__, pop,
popitem, setdefault, __setitem__ and update. As tuple/frozenset has
less methods than list/set.
It'd probably be simpler to define that frozendict is a Mapping (where
dict is a MutableMapping). And that's clearer.
Post by Victor Stinner
* Make dict inherits from frozendict
Isn't that the other way around from the statement above? Not that I'd
have an issue with it, it's much cleaner, but there's little gained by
doing so since `isinstance(a, dict)` will still fail if `a` is a
frozendict.
Post by Victor Stinner
* Add a frozendict abstract base class to collections?
Why? There's no `dict` ABC, and there are already a Mapping and a
MutableMapping ABC which fit the bill no?
Victor Stinner
2012-02-27 21:28:22 UTC
Permalink
Post by Xavier Morel
This may be an issue at the C level (I'm not sure), but since this would
be a Python 3-only collection, "user" code (in Python) should/would
generally be using abstract base classes, so type-checking would not
be an issue (as in Python code performing `isinstance(a, dict)` checks
naturally failing on `frozendict`)
Plus `frozenset` does not inherit from `set`, it's a whitelist
reimplementation and I've never known anybody to care. So there's
that precedent. And of course there's no inheritance relationship
between lists and tuples.
At a second thought, I realized that it does not really matter.
frozendict and dict can be "unrelated" (no inherance relation).

Victor
Jim J. Jewett
2012-02-27 22:50:35 UTC
Permalink
In http://mail.python.org/pipermail/python-dev/2012-February/116955.html
Post by Victor Stinner
The blacklist implementation has a major issue: it is still possible
to call write methods of the dict class (e.g. dict.set(my_frozendict,
key, value)).
It is also possible to use ctypes and violate even more invariants.
For most purposes, this falls under "consenting adults".
Post by Victor Stinner
The whitelist implementation has an issue: frozendict and dict are not
"compatible", dict is not a subclass of frozendict (and frozendict is
not a subclass of dict).
And because of Liskov substitutability, they shouldn't be; they should
be sibling children of a basedict that doesn't have the the mutating
methods, but also doesn't *promise* not to mutate.
Post by Victor Stinner
* frozendict values must be immutable, as dict keys
Why? That may be useful, but an immutable dict whose values
might mutate is also useful; by forcing that choice, it starts
to feel too specialized for a builtin.
Post by Victor Stinner
* Add an hash field to the PyDictObject structure
That is another indication that it should really be a sibling class;
most of the uses I have had for immutable dicts still didn't need
hashing. It might be a worth adding anyhow, but only to immutable
dicts -- not to every instance dict or keywords parameter.
Post by Victor Stinner
* frozendict.__hash__ computes hash(frozenset(self.items())) and
caches the result is its private hash attribute
Why? hash(frozenset(selk.keys())) would still meet the hash contract,
but it would be approximately twice as fast, and I can think of only
one case where it wouldn't work just as well. (That case is wanting
to store a dict of alternative configuration dicts (with no defaulting
of values), but ALSO wanting to use the configurations themselves
(as opposed to their names) as the dict keys.)

-jJ
--
If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them. -jJ
Victor Stinner
2012-02-27 23:34:08 UTC
Permalink
Post by Jim J. Jewett
Post by Victor Stinner
The blacklist implementation has a major issue: it is still possible
to call write methods of the dict class (e.g. dict.set(my_frozendict,
key, value)).
It is also possible to use ctypes and violate even more invariants.
For most purposes, this falls under "consenting adults".
My primary usage of frozendict would be pysandbox, a security module.
Attackers are not consenting adults :-)

Read-only dict would also help optimization, in the CPython peephole
or the PyPy JIT.

In pysandbox, I'm trying to replace __builtins_ and (maybe also
type.__dict__) by a frozendict. These objects rely on PyDict API and
so expect a type "compatible" with dict. But PyDict_GetItem() and
PyDict_SetItem() may use a test like isinstance(obj, (dict,
frozendict)), especially if the C strucure is "compatible". But
pysandbox should not drive the design of frozendict :-)
Post by Jim J. Jewett
Post by Victor Stinner
The whitelist implementation has an issue: frozendict and dict are not
"compatible", dict is not a subclass of frozendict (and frozendict is
not a subclass of dict).
And because of Liskov substitutability, they shouldn't be; they should
be sibling children of a basedict that doesn't have the the mutating
methods, but also doesn't *promise* not to mutate.
As I wrote, I realized that it doesn't matter if dict doesn't inherit
from frozendict.
Post by Jim J. Jewett
Post by Victor Stinner
 * frozendict values must be immutable, as dict keys
Why?  That may be useful, but an immutable dict whose values
might mutate is also useful; by forcing that choice, it starts
to feel too specialized for a builtin.
If values are mutables, the frozendict cannot be called "immutable".
tuple and frozenset can only contain immutables values.

All implementations of frozendict that I found expect frozendict to be hashable.
Post by Jim J. Jewett
Post by Victor Stinner
 * frozendict.__hash__ computes hash(frozenset(self.items())) and
caches the result is its private hash attribute
Why?  hash(frozenset(selk.keys())) would still meet the hash contract,
but it would be approximately twice as fast, and I can think of only
one case where it wouldn't work just as well.
Yes, it would faster but the hash is usually the hash of the whole
object content. E.g. the hash of a tuple is not the hash of items with
odd index, whereas such hash function would also meet the "hash
contract".

All implementations of frozendict that I found all use items, and not
only values or only keys.

Victor
Tres Seaver
2012-02-27 23:42:24 UTC
Permalink
Post by Victor Stinner
tuple and frozenset can only contain immutables values.
Tuples can contain mutables::

$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Post by Victor Stinner
({},)
({},)
$ python3
Python 3.2 (r32:88445, Mar 10 2011, 10:08:58)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Post by Victor Stinner
({},)
({},)


Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 ***@palladion.com
Palladion Software "Excellence by Design" http://palladion.com
Nick Coghlan
2012-02-28 01:00:08 UTC
Permalink
On Tue, Feb 28, 2012 at 9:34 AM, Victor Stinner
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
The blacklist implementation has a major issue: it is still possible
to call write methods of the dict class (e.g. dict.set(my_frozendict,
key, value)).
It is also possible to use ctypes and violate even more invariants.
For most purposes, this falls under "consenting adults".
My primary usage of frozendict would be pysandbox, a security module.
Attackers are not consenting adults :-)
Read-only dict would also help optimization, in the CPython peephole
or the PyPy JIT.
I'm pretty sure the PyPy jit can already pick up and optimise cases
where a dict goes "read-only" (i.e. stops being modified).

I think you need to elaborate on your use cases further, and explain
what *additional* changes would be needed, such as allowing frozendict
instances as __dict__ attributes in order to create truly immutable
objects in pure Python code.

In fact, that may be a better way to pitch the entire PEP. In current
Python, you *can't* create a truly immutable object without dropping
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
from decimal import Decimal
x = Decimal(1)
x
Decimal('1')
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
hash(x)
1
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
x._exp = 10
x
Decimal('1E+10')
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
hash(x)
10000000000
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
1.0.imag = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute 'imag' of 'float' objects is not writable

Yes, it's arguably covered by the "consenting adults" rule, but
really, Decimal instances should be just as immutable as int and float
instances. The only reason they aren't is that it's hard enough to set
it up in Python code that the Decimal implementation settles for "near
enough is good enough" and just uses __slots__ to prevent addition of
new attributes, but doesn't introduce the overhead of custom
__setattr__ and __delattr__ implementations to actively *prevent*
modifications.

We don't even need a new container type, we really just need an easy
way to tell the __setattr__ and __delattr__ descriptors for
"__slots__" that the instance initialisation is complete and further
modifications should be disallowed.

For example, if Decimal.__new__ could call "self.__lock_slots__()" at
the end to set a flag on the instance object, then the slot
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
x._exp = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute '_exp' of 'Decimal' objects is not writable

To be clear, all of this is currently *possible* if you use custom
descriptors (such as a property() implementation where setattr and
delattr look for such a flag) or override __setattr__/__delattr__.
However, for a micro-optimised type like Decimal, that's a hard choice
to be asked to make (and the current implementation came down on the
side of speed over enforcing correctness). Given that using __slots__
in the first place is, in and of itself, a micro-optimisation, I
suspect Decimal is far from the only "immutable" type implemented in
pure Python that finds itself having to make that trade-off. (An extra
boolean check in C is a *good* trade-off of speed for correctness.
Python level descriptor implementations or attribute access overrides,
on the other hand... not so much).

Cheers,
Nick.
--
Nick Coghlan   |   ***@gmail.com   |   Brisbane, Australia
Alex Gaynor
2012-02-28 01:20:34 UTC
Permalink
Post by Nick Coghlan
I'm pretty sure the PyPy jit can already pick up and optimise cases
where a dict goes "read-only" (i.e. stops being modified).
No, it doesn't. We handle cases like a type's dict, or a module's dict,
by having them use a different internal implementation (while, of course,
still being dicts at the Python level). We do *not* handle the case of
trying to figure out whether a Python object is immutable in any way.

Alex
Victor Stinner
2012-02-28 11:45:54 UTC
Permalink
I think you need to elaborate on your use cases further, ...
A frozendict can be used as a member of a set or as a key in a dictionary.

For example, frozendict is indirectly needed when you want to use an
object as a key of a dict, whereas one attribute of this object is a
dict. Use a frozendict instead of a dict for this attribute answers to
this problem.

frozendict helps also in threading and multiprocessing.

--
... and explain
what *additional* changes would be needed, such as allowing frozendict
instances as __dict__ attributes in order to create truly immutable
objects in pure Python code.
In current Python, you *can't* create a truly immutable object without dropping
Using frozendict in for type dictionary might be a use case, but
please don't focus on this example. There is currently a discussion on
python-ideas about this specific use case. I first proposed to use
frozendict in type.__new__, but then I proposed something completly
different: add a flag to a set to deny any modification of the type.
The flag may be set using "__final__ = True" in the class body for
example.

Victor
Antoine Pitrou
2012-02-28 11:53:27 UTC
Permalink
On Tue, 28 Feb 2012 12:45:54 +0100
Post by Victor Stinner
I think you need to elaborate on your use cases further, ...
A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an
object as a key of a dict, whereas one attribute of this object is a
dict.
It isn't. You just have to define __hash__ correctly.
Post by Victor Stinner
frozendict helps also in threading and multiprocessing.
How so?

Regards

Antoine.
Mark Shannon
2012-02-28 12:07:32 UTC
Permalink
Post by Antoine Pitrou
On Tue, 28 Feb 2012 12:45:54 +0100
Post by Victor Stinner
I think you need to elaborate on your use cases further, ...
A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an
object as a key of a dict, whereas one attribute of this object is a
dict.
It isn't. You just have to define __hash__ correctly.
Post by Victor Stinner
frozendict helps also in threading and multiprocessing.
How so?
Inter process/task communication requires copying. Inter/intra thread
communication uses reference semantics. To ensure these are the same,
the objects used in communication must be immutable.

Cheers,
Mark.
Paul Moore
2012-02-28 12:13:52 UTC
Permalink
Post by Mark Shannon
Post by Antoine Pitrou
Post by Victor Stinner
frozendict helps also in threading and multiprocessing.
How so?
Inter process/task communication requires copying. Inter/intra thread
communication uses reference semantics. To ensure these are the same,
the objects used in communication must be immutable.
Does that imply that in a frozendict, the *values* as well as the
*keys* must be immutable?

Isn't that a pretty strong limitation (and hence, does it not make
frozendicts a lot less useful than they might otherwise be)?
Antoine Pitrou
2012-02-28 12:11:08 UTC
Permalink
On Tue, 28 Feb 2012 12:07:32 +0000
Post by Mark Shannon
Post by Antoine Pitrou
On Tue, 28 Feb 2012 12:45:54 +0100
Post by Victor Stinner
I think you need to elaborate on your use cases further, ...
A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an
object as a key of a dict, whereas one attribute of this object is a
dict.
It isn't. You just have to define __hash__ correctly.
Post by Victor Stinner
frozendict helps also in threading and multiprocessing.
How so?
Inter process/task communication requires copying. Inter/intra thread
communication uses reference semantics. To ensure these are the same,
the objects used in communication must be immutable.
You just need them to be practically constant. No need for an immutable
type in the first place.

Regards

Antoine.
Victor Stinner
2012-02-28 12:17:47 UTC
Permalink
Post by Antoine Pitrou
Post by Victor Stinner
A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an
object as a key of a dict, whereas one attribute of this object is a
dict.
It isn't. You just have to define __hash__ correctly.
Define __hash__ on a mutable object can be surprising.

Or do you mean that you deny somehow the modification of the dict
attribute, and convert the dict to a immutable object before hashing
it?
Post by Antoine Pitrou
Post by Victor Stinner
frozendict helps also in threading and multiprocessing.
How so?
For example, you don't need a lock to read the frozendict content,
because you cannot modify the content.

Victor
Mark Shannon
2012-02-28 12:32:10 UTC
Permalink
Hi,

I don't know if an implementation of the frozendict actually exists,
but if anyone is planning on writing one then can I suggest that they
take a look at my new dict implementation:
http://bugs.python.org/issue13903
https://bitbucket.org/markshannon/cpython_new_dict/

Making dicts immutable (at the C level) is quite easy with my new
implementation.

Cheers,
Mark.
Mark Shannon
2012-02-28 09:47:59 UTC
Permalink
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
The blacklist implementation has a major issue: it is still possible
to call write methods of the dict class (e.g. dict.set(my_frozendict,
key, value)).
It is also possible to use ctypes and violate even more invariants.
For most purposes, this falls under "consenting adults".
My primary usage of frozendict would be pysandbox, a security module.
Attackers are not consenting adults :-)
Read-only dict would also help optimization, in the CPython peephole
or the PyPy JIT.
Not w.r.t. PyPy. It wouldn't do any harm though.

One use of frozendict that you haven't mentioned so far
is communication between concurrent processes/tasks.
These need to be able to copy objects without changing reference
semantics, which demands immutability.

Cheers,
Mark.
Victor Stinner
2012-02-28 17:41:37 UTC
Permalink
 * frozendict values must be immutable, as dict keys
Why?  That may be useful, but an immutable dict whose values
might mutate is also useful; by forcing that choice, it starts
to feel too specialized for a builtin.
Hum, I realized that calling hash(my_frozendict) on a frozendict
instance is enough to check if a frozendict only contains immutable
objects. And it is also possible to check manually that values are
immutable *before* creating the frozendict.

I also prefer to not check for immutability because it does simplify
the code :-)

$ diffstat frozendict-3.patch
Include/dictobject.h | 9 +
Lib/collections/abc.py | 1
Lib/test/test_dict.py | 59 +++++++++++
Objects/dictobject.c | 256 ++++++++++++++++++++++++++++++++++++++++++-------
Objects/object.c | 3
Python/bltinmodule.c | 1
6 files changed, 295 insertions(+), 34 deletions(-)

The patch is quite small to add a new builtin type. That's because
most of the code is shared with the builtin dict type. (But the patch
doesn't include the documentation, it didn't write it yet.)

Victor
Mark Shannon
2012-02-28 18:13:01 UTC
Permalink
Post by Victor Stinner
Post by Jim J. Jewett
Post by Victor Stinner
* frozendict values must be immutable, as dict keys
Why? That may be useful, but an immutable dict whose values
might mutate is also useful; by forcing that choice, it starts
to feel too specialized for a builtin.
Hum, I realized that calling hash(my_frozendict) on a frozendict
instance is enough to check if a frozendict only contains immutable
objects. And it is also possible to check manually that values are
immutable *before* creating the frozendict.
I also prefer to not check for immutability because it does simplify
the code :-)
$ diffstat frozendict-3.patch
Include/dictobject.h | 9 +
Lib/collections/abc.py | 1
Lib/test/test_dict.py | 59 +++++++++++
Objects/dictobject.c | 256 ++++++++++++++++++++++++++++++++++++++++++-------
Objects/object.c | 3
Python/bltinmodule.c | 1
6 files changed, 295 insertions(+), 34 deletions(-)
The patch is quite small to add a new builtin type. That's because
most of the code is shared with the builtin dict type. (But the patch
doesn't include the documentation, it didn't write it yet.)
Could you create an issue for this on the tracker, maybe write a PEP.
I don't think sending patches to this mailing list is the way to do this.

Would you mind taking a look at how your code interacts with PEP 412.

Cheers,
Mark.
Dirkjan Ochtman
2012-02-28 08:25:09 UTC
Permalink
On Mon, Feb 27, 2012 at 19:53, Victor Stinner
Post by Victor Stinner
A frozendict type is a common request from users and there are various
Perhaps this should also detail why namedtuple is not a viable alternative.

Cheers,

Dirkjan
Victor Stinner
2012-02-28 10:12:02 UTC
Permalink
Post by Victor Stinner
A frozendict type is a common request from users and there are various
Perhaps this should also detail why namedtuple is not a viable alternative.
It doesn't have the same API. Example: frozendict[key] vs
namedtuple.attr (namedtuple.key). namedtuple has no .keys() or
.items() method.

Victor
Victor Stinner
2012-02-28 12:14:15 UTC
Permalink
Updated patch and more justifications.

New patch:
- dict doesn't inherit from frozendict anymore
- frozendict is a subclass of collections.abc.Mutable
- more tests
 * frozendict.__hash__ computes hash(frozenset(self.items())) and
caches the result is its private hash attribute
hash(frozenset(self.items())) is preferred over
hash(sorted(self.items())) because keys and values may be unorderable.
frozenset() is faster than sorted(): O(n) vs O(n*log(n)).
a=frozendict.fromkeys('ai')
a
frozendict({'a': None, 'i': None})
b=frozendict.fromkeys('ia')
b
frozendict({'i': None, 'a': None})
hash(a) == hash(b)
True
a == b
True
tuple(a.items()) == tuple(b.items())
False
hash(frozendict({b'abc': 1, 'abc': 2}))
935669091
hash(frozendict({1: b'abc', 2: 'abc'}))
1319859033
 * Add a frozendict abstract base class to collections?
I realized that Mapping already exists and so the following patch is enough:

+Mapping.register(frozendict)
See also the PEP 351.
I read the PEP and the email explaining why it was rejected.

Just to be clear: the PEP 351 tries to freeze an object, try to
convert a mutable or immutable object to an immutable object. Whereas
my frozendict proposition doesn't convert anything: it just raises a
TypeError if you use a mutable key or value.

For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create
frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError.

Victor
Antoine Pitrou
2012-02-28 12:22:16 UTC
Permalink
On Tue, 28 Feb 2012 13:14:15 +0100
Post by Victor Stinner
Post by Victor Stinner
See also the PEP 351.
I read the PEP and the email explaining why it was rejected.
I think you should write a separate PEP and explain the use cases
clearly.

cheers

Antoine.
M.-A. Lemburg
2012-02-28 12:44:20 UTC
Permalink
Post by Victor Stinner
Post by Victor Stinner
See also the PEP 351.
I read the PEP and the email explaining why it was rejected.
Just to be clear: the PEP 351 tries to freeze an object, try to
convert a mutable or immutable object to an immutable object. Whereas
my frozendict proposition doesn't convert anything: it just raises a
TypeError if you use a mutable key or value.
For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create
frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError.
I fail to see the use case you're trying to address with this
kind of frozendict().

The purpose of frozenset() is to be able to use a set as dictionary
key (and to some extent allow for optimizations and safe
iteration). Your implementation can be used as dictionary key as well,
but why would you want to do that in the first place ?

If you're thinking about disallowing changes to the dictionary
structure, e.g. in order to safely iterate over its keys or items,
"freezing" the keys is enough.

Requiring the value objects not to change is too much of a restriction
to make the type useful in practice, IMHO.
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 28 2012)
Post by Victor Stinner
Post by Victor Stinner
Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2012-02-13: Released eGenix pyOpenSSL 0.13 http://egenix.com/go26
2012-02-09: Released mxODBC.Zope.DA 2.0.2 http://egenix.com/go25
2012-02-06: Released eGenix mx Base 3.2.3 http://egenix.com/go24

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
Steven D'Aprano
2012-02-28 14:56:52 UTC
Permalink
Post by M.-A. Lemburg
Post by Victor Stinner
Post by Victor Stinner
See also the PEP 351.
I read the PEP and the email explaining why it was rejected.
Just to be clear: the PEP 351 tries to freeze an object, try to
convert a mutable or immutable object to an immutable object. Whereas
my frozendict proposition doesn't convert anything: it just raises a
TypeError if you use a mutable key or value.
For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create
frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError.
I fail to see the use case you're trying to address with this
kind of frozendict().
The purpose of frozenset() is to be able to use a set as dictionary
key (and to some extent allow for optimizations and safe
iteration). Your implementation can be used as dictionary key as well,
but why would you want to do that in the first place ?
Because you have a mapping, and want to use a dict for speedy, convenient
lookups. Sometimes your mapping involves the key being a string, or an int, or
a tuple, or a set, and Python makes it easy to use that in a dict. Sometimes
the key is itself a mapping, and Python makes it very difficult.

Just google on "python frozendict" or "python immutabledict" and you will find
that this keeps coming up time and time again, e.g.:

http://www.cs.toronto.edu/~tijmen/programming/immutableDictionaries.html
http://code.activestate.com/recipes/498072-implementing-an-immutable-dictionary/
http://code.activestate.com/recipes/414283-frozen-dictionaries/
http://bob.pythonmac.org/archives/2005/03/04/frozendict/
http://python.6.n6.nabble.com/frozendict-td4377791.html
http://www.velocityreviews.com/forums/t648910-does-python3-offer-a-frozendict.html
http://stackoverflow.com/questions/2703599/what-would-be-a-frozen-dict
Post by M.-A. Lemburg
If you're thinking about disallowing changes to the dictionary
structure, e.g. in order to safely iterate over its keys or items,
"freezing" the keys is enough.
Requiring the value objects not to change is too much of a restriction
to make the type useful in practice, IMHO.
It's no more of a limitation than the limitation that strings can't change.

Frozendicts must freeze the value as well as the key. Consider the toy
example, mapping food combinations to calories:


d = { {appetizer => fried fish, main => double burger, drink => cola}: 5000,
{appetizer => None, main => green salad, drink => tea}: 200,
}

(syntax is only for illustration purposes)

Clearly the hash has to take the keys and values into account, which means
that both the keys and values have to be frozen.

(Values may be mutable objects, but then the frozendict can't be hashed --
just like tuples can't be hashed if any item in them is mutable.)
--
Steven
M.-A. Lemburg
2012-02-28 20:34:43 UTC
Permalink
Post by M.-A. Lemburg
Post by Victor Stinner
Post by Victor Stinner
See also the PEP 351.
I read the PEP and the email explaining why it was rejected.
Just to be clear: the PEP 351 tries to freeze an object, try to
convert a mutable or immutable object to an immutable object. Whereas
my frozendict proposition doesn't convert anything: it just raises a
TypeError if you use a mutable key or value.
For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create
frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError.
I fail to see the use case you're trying to address with this
kind of frozendict().
The purpose of frozenset() is to be able to use a set as dictionary
key (and to some extent allow for optimizations and safe
iteration). Your implementation can be used as dictionary key as well,
but why would you want to do that in the first place ?
Because you have a mapping, and want to use a dict for speedy, convenient lookups. Sometimes your
mapping involves the key being a string, or an int, or a tuple, or a set, and Python makes it easy
to use that in a dict. Sometimes the key is itself a mapping, and Python makes it very difficult.
Just google on "python frozendict" or "python immutabledict" and you will find that this keeps
http://www.cs.toronto.edu/~tijmen/programming/immutableDictionaries.html
http://code.activestate.com/recipes/498072-implementing-an-immutable-dictionary/
http://code.activestate.com/recipes/414283-frozen-dictionaries/
http://bob.pythonmac.org/archives/2005/03/04/frozendict/
http://python.6.n6.nabble.com/frozendict-td4377791.html
http://www.velocityreviews.com/forums/t648910-does-python3-offer-a-frozendict.html
http://stackoverflow.com/questions/2703599/what-would-be-a-frozen-dict
Only the first of those links appears to actually discuss reasons for
adding a frozendict, but it fails to provide real world use cases and
only gives theoretical reasons for why this would be nice to have.
From a practical view, a frozendict would allow thread-safe iteration
over a dict and enable more optimizations (e.g. using an optimized
lookup function, optimized hash parameters, etc.) to make lookup
in static tables more efficient.

OTOH, using a frozendict as key in some other dictionary is, well,
not a very realistic use case - programmers should think twice before
using such a design :-)
Post by M.-A. Lemburg
If you're thinking about disallowing changes to the dictionary
structure, e.g. in order to safely iterate over its keys or items,
"freezing" the keys is enough.
Requiring the value objects not to change is too much of a restriction
to make the type useful in practice, IMHO.
It's no more of a limitation than the limitation that strings can't change.
Frozendicts must freeze the value as well as the key. Consider the toy example, mapping food
d = { {appetizer => fried fish, main => double burger, drink => cola}: 5000,
{appetizer => None, main => green salad, drink => tea}: 200,
}
(syntax is only for illustration purposes)
Clearly the hash has to take the keys and values into account, which means that both the keys and
values have to be frozen.
(Values may be mutable objects, but then the frozendict can't be hashed -- just like tuples can't be
hashed if any item in them is mutable.)
Right, but that doesn't mean you have to require that values are hashable.

A frozendict could (and probably should) use the same logic as tuples:
if the values are hashable, the frozendict is hashable, otherwise not.
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 28 2012)
Post by M.-A. Lemburg
Post by Victor Stinner
Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2012-02-13: Released eGenix pyOpenSSL 0.13 http://egenix.com/go26
2012-02-09: Released mxODBC.Zope.DA 2.0.2 http://egenix.com/go25
2012-02-06: Released eGenix mx Base 3.2.3 http://egenix.com/go24

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
Raymond Hettinger
2012-02-29 19:17:05 UTC
Permalink
Post by Victor Stinner
A frozendict type is a common request from users and there are various
implementations.
ISTM, this request is never from someone who has a use case.
Instead, it almost always comes from "completers", people
who see that we have a frozenset type and think the core devs
missed the ObviousThingToDo(tm). Frozendicts are trivial to
implement, so that is why there are various implementations
(i.e. the implementations are more fun to write than they are to use).

The frozenset type covers a niche case that is nice-to-have but
*rarely* used. Many experienced Python users simply forget
that we have a frozenset type. We don't get bug reports or
feature requests about the type. When I do Python consulting
work, I never see it in a client's codebase. It does occasionally
get discussed in questions on StackOverflow but rarely gets
offered as an answer (typically on variants of the "how do you
make a set-of-sets" question). If Google's codesearch were still
alive, we could add another datapoint showing how infrequently
this type is used.

I wrote the C implementation for frozensets and the tests that
demonstrate their use in problems involving sets-of-sets, yet
I have *needed* the frozenset once in my career (for a NFA/DFA
conversion algorithm).

From this experience, I conclude that adding a frozendict type
would be a total waste (except that it would inspire more people
to request frozen variante of other containers).


Raymond


P.S. The one advantage I can see for frozensets and frozendicts
is that we have an opportunity to optimize them once they are built
(optimizing insertion order to minimize collisions, increasing or
decreasing density, eliminating dummy entries, etc). That being
said, the same could be accomplished for regular sets and dicts
by the addition of an optimize() method. I'm not really enamoured
of that idea though because it breaks the abstraction and because
people don't seem to need it (i.e. it has never been requested).
Eli Bendersky
2012-02-29 19:33:43 UTC
Permalink
Post by Raymond Hettinger
The frozenset type covers a niche case that is nice-to-have but
*rarely* used. Many experienced Python users simply forget
that we have a frozenset type. We don't get bug reports or
feature requests about the type. When I do Python consulting
work, I never see it in a client's codebase. It does occasionally
get discussed in questions on StackOverflow but rarely gets
offered as an answer (typically on variants of the "how do you
make a set-of-sets" question). If Google's codesearch were still
alive, we could add another datapoint showing how infrequently
this type is used.
<snip>
There are some alternatives to code.google.com, though. For example:

http://www.koders.com/default.aspx?s=frozenset&submit=Search&la=Python&li=*
Post by Raymond Hettinger
From a cursory look: quite a bit of the found results are from the various
Python implementations, and there is some duplication of projects, but it
would be unfair to conclude that frozenset is not being used since many of
the results do look legitimate. This is not to argue in favor or against
frozendict, just stating that there's still a way to search code online :)

Eli
Paul Moore
2012-02-29 21:08:20 UTC
Permalink
On 29 February 2012 19:17, Raymond Hettinger
Post by Raymond Hettinger
From this experience, I conclude that adding a frozendict type
would be a total waste (except that it would inspire more people
to request frozen variante of other containers).
It would (apparently) help Victor to fix issues in his pysandbox
project. I don't know if a secure Python sandbox is an important
enough concept to warrant core changes to make it possible. However,
if Victor was saying that implementing this PEP was all that is needed
to implement a secure sandbox, then that would be a very different
claim, and likely much more compelling (to some, at least - I have no
personal need for a secure sandbox).

Victor quotes 6 implementations. I don't see any rationale (either in
the email that started this thread, or in the PEP) to explain why
these aren't good enough, and in particular why the implementation has
to be in the core. There's the hint in the PEP "If frozendict is used
to harden Python (security purpose), it must be implemented in C". But
why in the core (as opposed to an extension)? And why and how would
frozendict help in hardening Python?

As it stands, I don't find the PEP compelling. The hardening use case
might be significant but Victor needs to spell it out if it's to make
a difference.

Paul.
Nick Coghlan
2012-02-29 21:13:15 UTC
Permalink
Post by Paul Moore
As it stands, I don't find the PEP compelling. The hardening use case
might be significant but Victor needs to spell it out if it's to make
a difference.
+1

Avoiding-usenet-nod-syndrome'ly,
Nick.
--
Nick Coghlan   |   ***@gmail.com   |   Brisbane, Australia
Chris Angelico
2012-02-29 23:13:01 UTC
Permalink
Post by Paul Moore
It would (apparently) help Victor to fix issues in his pysandbox
project. I don't know if a secure Python sandbox is an important
enough concept to warrant core changes to make it possible.
If a secure Python sandbox had been available last year, we would
probably be still using Python at work for end-user scripting, instead
of having had to switch to Javascript. At least, that would be the
case if this sandbox is what I think it is (we embed a scripting
language in our C++ main engine, and allow end users to customize and
partly drive our code). But features enabling that needn't be core; I
wouldn't object to having to get some third-party add-ons to make it
all work.

Chris Angelico
R. David Murray
2012-03-01 00:02:26 UTC
Permalink
Post by Chris Angelico
Post by Paul Moore
It would (apparently) help Victor to fix issues in his pysandbox
project. I don't know if a secure Python sandbox is an important
enough concept to warrant core changes to make it possible.
If a secure Python sandbox had been available last year, we would
probably be still using Python at work for end-user scripting, instead
of having had to switch to Javascript. At least, that would be the
case if this sandbox is what I think it is (we embed a scripting
language in our C++ main engine, and allow end users to customize and
partly drive our code). But features enabling that needn't be core; I
wouldn't object to having to get some third-party add-ons to make it
all work.
I likewise am aware of a project where the availability of sandboxing
might be make-or-break for continuing to use Python. In this case
the idea would be sandboxing plugins called from a Python main program.
I *think* that Victor's project would enable that, but I haven't looked at
it closely.

--David
Raymond Hettinger
2012-02-29 23:25:43 UTC
Permalink
Post by Paul Moore
As it stands, I don't find the PEP compelling. The hardening use case
might be significant but Victor needs to spell it out if it's to make
a difference.
If his sandboxing project needs it, the type need not be public.
It can join dictproxy and structseq in our toolkit of internal types.

Adding frozendict() as a new public type is unnecessary
and undesirable -- a proliferation of types makes it harder to
decide which tool is the most appropriate for a given problem.
The itertools module ran into the issue early. Adding a new
itertool tends to make the whole module harder to figure-out.


Raymond

P.S ISTM that lately Python is growing fatter without growing more
powerful or expressive. Generators, context managers, and decorators
were honking good ideas -- we need more of those rather than
minor variations on things we already have.

Plz forgive the typos -- I'm typing with one hand -- the other is holding
a squiggling baby :-)
Victor Stinner
2012-02-29 23:52:48 UTC
Permalink
Post by Paul Moore
It would (apparently) help Victor to fix issues in his pysandbox
project. I don't know if a secure Python sandbox is an important
enough concept to warrant core changes to make it possible.
Ok, let's talk about sandboxing and security.

The main idea of pysandbox is to reuse most of CPython but hide
"dangerous" functions and run untrusted code in a separated namespace.
The problem is to create the sandbox and ensure that it is not
possible to escape from this sandbox. pysandbox is still a
proof-of-concept, even if it works pretty well for short dummy
scripts. But pysandbox is not ready for real world programs.

pysandbox uses various "tricks" and "hacks" to create a sandbox. But
there is a major issue: the __builtins__ dict (or module) is available
and used everywhere (in module globals, in frames, in functions
globals, etc.), and can be modified. A read-only __builtins__ dict is
required to protect the sandbox. If the untrusted can modify
__builtins__, it can replace core functions like isinstance(), len(),
... and so modify code outside the sandbox.

To implement a frozendict in Python, pysandbox uses the blacklist
approach: a class inheriting from dict and override some methods to
raise an error. The whitelist approach cannot be used for a type
implemented in Python, because the __builtins__ type must inherit from
dict: ceval.c expects a type compatible with PyDict_GetItem and
PyDict_SetItem.

Problem: if you implement a frozendict type inheriting from dict in
Python, it is still possible to call dict methods (e.g.
dict.__setitem__()). To fix this issue, pysandbox removes all dict
methods modifying the dict: __setitem__, __delitem__, pop, etc. This
is a problem because untrusted code cannot use these methods on valid
dict created in the sandbox.
Post by Paul Moore
However,
if Victor was saying that implementing this PEP was all that is needed
to implement a secure sandbox, then that would be a very different
claim, and likely much more compelling (to some, at least - I have no
personal need for a secure sandbox).
A builtin frozendict type "compatible" with the PyDict C API is very
convinient for pysandbox because using this type for core features
like builtins requires very few modification. For example, use
frozendict for __builtins__ only requires to modify 3 lines in
frameobject.c.

I don't see how to solve the pysandbox issue (read-only __builtins__
issue, need to remove dict.__setitem__ & friends) without modifying
CPython (so without adding a frozendict type).
Post by Paul Moore
As it stands, I don't find the PEP compelling. The hardening use case
might be significant but Victor needs to spell it out if it's to make
a difference.
I don't know if hardening Python is a compelling argument to add a new
builtin type.

Victor
Raymond Hettinger
2012-03-01 00:11:58 UTC
Permalink
Post by Victor Stinner
I don't know if hardening Python is a compelling argument to add a new
builtin type.
It isn't.

Builtins are for general purpose use.
It is not something most people should use;
however, if it is a builtin, people will be drawn
to frozendicts like moths to a flame.
The tuple-as-frozenlist anti-pattern shows
what we're up against.

Another thought: if pypy is successful at providing sandboxing,
the need for sandboxing in CPython is substantially abated.


Raymond
Steven D'Aprano
2012-03-01 00:36:15 UTC
Permalink
Post by Antoine Pitrou
Post by Victor Stinner
I don't know if hardening Python is a compelling argument to add a new
builtin type.
It isn't.
Builtins are for general purpose use.
It is not something most people should use;
however, if it is a builtin, people will be drawn
to frozendicts like moths to a flame.
The tuple-as-frozenlist anti-pattern shows
what we're up against.
Perhaps I'm a little slow today, but I don't get this. Could you elaborate on
tuple-as-frozenlist anti-pattern please?

i.e. what it is, why it is an anti-pattern, and examples of it in real life?
--
Steven
Guido van Rossum
2012-03-01 03:05:18 UTC
Permalink
On Wed, Feb 29, 2012 at 3:52 PM, Victor Stinner
Post by Victor Stinner
Post by Paul Moore
It would (apparently) help Victor to fix issues in his pysandbox
project. I don't know if a secure Python sandbox is an important
enough concept to warrant core changes to make it possible.
Ok, let's talk about sandboxing and security.
The main idea of pysandbox is to reuse most of CPython but hide
"dangerous" functions and run untrusted code in a separated namespace.
The problem is to create the sandbox and ensure that it is not
possible to escape from this sandbox. pysandbox is still a
proof-of-concept, even if it works pretty well for short dummy
scripts. But pysandbox is not ready for real world programs.
I hope you have studied (recent) history. Sandboxes in Python
traditionally have not been secure. Read the archives for details.
Post by Victor Stinner
pysandbox uses various "tricks" and "hacks" to create a sandbox. But
there is a major issue: the __builtins__ dict (or module) is available
and used everywhere (in module globals, in frames, in functions
globals, etc.), and can be modified. A read-only __builtins__ dict is
required to protect the sandbox. If the untrusted can modify
__builtins__, it can replace core functions like isinstance(), len(),
... and so modify code outside the sandbox.
To implement a frozendict in Python, pysandbox uses the blacklist
approach: a class inheriting from dict and override some methods to
raise an error. The whitelist approach cannot be used  for a type
implemented in Python, because the __builtins__ type must inherit from
dict: ceval.c expects a type compatible with PyDict_GetItem and
PyDict_SetItem.
Problem: if you implement a frozendict type inheriting from dict in
Python, it is still possible to call dict methods (e.g.
dict.__setitem__()). To fix this issue, pysandbox removes all dict
methods modifying the dict: __setitem__, __delitem__, pop, etc. This
is a problem because untrusted code cannot use these methods on valid
dict created in the sandbox.
Post by Paul Moore
However,
if Victor was saying that implementing this PEP was all that is needed
to implement a secure sandbox, then that would be a very different
claim, and likely much more compelling (to some, at least - I have no
personal need for a secure sandbox).
A builtin frozendict type "compatible" with the PyDict C API is very
convinient for pysandbox because using this type for core features
like builtins requires very few modification. For example, use
frozendict for __builtins__ only requires to modify 3 lines in
frameobject.c.
I don't see how to solve the pysandbox issue (read-only __builtins__
issue, need to remove dict.__setitem__ & friends) without modifying
CPython (so without adding a frozendict type).
Post by Paul Moore
As it stands, I don't find the PEP compelling. The hardening use case
might be significant but Victor needs to spell it out if it's to make
a difference.
I don't know if hardening Python is a compelling argument to add a new
builtin type.
Victor
_______________________________________________
Python-Dev mailing list
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
--
--Guido van Rossum (python.org/~guido)
Victor Stinner
2012-03-01 10:01:07 UTC
Permalink
Post by Guido van Rossum
Post by Victor Stinner
The main idea of pysandbox is to reuse most of CPython but hide
"dangerous" functions and run untrusted code in a separated namespace.
The problem is to create the sandbox and ensure that it is not
possible to escape from this sandbox. pysandbox is still a
proof-of-concept, even if it works pretty well for short dummy
scripts. But pysandbox is not ready for real world programs.
I hope you have studied (recent) history. Sandboxes in Python
traditionally have not been secure. Read the archives for details.
The design of pysandbox makes it difficult to implement. It is mostly
based on blacklist, so any omission would lead to a vulnerability. I
read the recent history of sandboxes and see other security modules
for Python, and I don't understand your reference to "Sandboxes in
Python traditionally have not been secure." There is no known
vulnerability in pysandbox, did I miss something? (there is only a
limitation on the dict API because of the lack of frozendict.)

Are you talking about rexec/Bastion? (which cannot be qualified as "recent" :-))

pysandbox limitations are documented in its README file:

<< pysandbox is a sandbox for the Python namespace, not a sandbox between Python
and the operating system. It doesn't protect your system against Python
security vulnerabilities: vulnerabilities in modules/functions available in
your sandbox (depend on your sandbox configuration). By default, only few
functions are exposed to the sandbox namespace which limits the attack surface.

pysandbox is unable to limit the memory of the sandbox process: you have to use
your own protection. >>

Hum, I am also not sure that pysandbox "works" with threads :-) I mean
that enabling pysandbox impacts all running threads, not only one
thread, which can cause issues. It should also be mentioned.

PyPy sandbox has a different design: it uses a process with no
priviledge, all syscalls are redirected to another process which apply
security checks to each syscall.
http://doc.pypy.org/en/latest/sandbox.html

See also the seccomp-nurse project, a generic sandbox using Linux SECCOMP:
http://chdir.org/~nico/seccomp-nurse/

See also pysandbox README for a list of other Python security modules.

Victor
Guido van Rossum
2012-03-01 17:00:07 UTC
Permalink
On Thu, Mar 1, 2012 at 2:01 AM, Victor Stinner
Post by Victor Stinner
Post by Guido van Rossum
Post by Victor Stinner
The main idea of pysandbox is to reuse most of CPython but hide
"dangerous" functions and run untrusted code in a separated namespace.
The problem is to create the sandbox and ensure that it is not
possible to escape from this sandbox. pysandbox is still a
proof-of-concept, even if it works pretty well for short dummy
scripts. But pysandbox is not ready for real world programs.
I hope you have studied (recent) history. Sandboxes in Python
traditionally have not been secure. Read the archives for details.
The design of pysandbox makes it difficult to implement. It is mostly
based on blacklist, so any omission would lead to a vulnerability. I
read the recent history of sandboxes and see other security modules
for Python, and I don't understand your reference to  "Sandboxes in
Python traditionally have not been secure." There is no known
vulnerability in pysandbox, did I miss something? (there is only a
limitation on the dict API because of the lack of frozendict.)
Are you talking about rexec/Bastion? (which cannot be qualified as "recent" :-))
<< pysandbox is a sandbox for the Python namespace, not a sandbox between Python
and the operating system. It doesn't protect your system against Python
security vulnerabilities: vulnerabilities in modules/functions available in
your sandbox (depend on your sandbox configuration). By default, only few
functions are exposed to the sandbox namespace which limits the attack surface.
pysandbox is unable to limit the memory of the sandbox process: you have to use
your own protection. >>
Hum, I am also not sure that pysandbox "works" with threads :-) I mean
that enabling pysandbox impacts all running threads, not only one
thread, which can cause issues. It should also be mentioned.
PyPy sandbox has a different design: it uses a process with no
priviledge, all syscalls are redirected to another process which apply
security checks to each syscall.
http://doc.pypy.org/en/latest/sandbox.html
http://chdir.org/~nico/seccomp-nurse/
See also pysandbox README for a list of other Python security modules.
Hm. I can't tell what the purpose of a sandbox is from what you quote
from your own README here (and my cellphone tethering is slow enough
that clicking on the links doesn't work right now).

The sandboxes I'm familiar with (e.g. Google App Engine) are intended
to allow untrusted third parties to execute (more or less) arbitrary
code while strictly controlling which resources they can access. In
App Engine's case, an attacker who broke out of the sandbox would have
access to the inside of Google's datacenter, which would obviously be
bad -- that's why Google has developed its own sandboxing
technologies.

I do know that I don't feel comfortable having a sandbox in the Python
standard library or even recommending a 3rd party sandboxing solution
-- if someone uses the sandbox to protect a critical resource, and a
hacker breaks out of the sandbox, the author of the sandbox may be
held responsible for more than they bargained for when they made it
open source. (Doesn't an open source license limit your
responsibility? Who knows. AFAIK this question has not gotten to court
yet. I wouldn't want to have to go to court over it.)

I wasn't just referring of rexec/Bastion (though that definitely
shaped my thinking about this issue; much more recently someone (Tal,
I think was his name?) tried to come up with a sandbox and every time
he believed he had a perfect solution, somebody found a loophole.
(Hm..., you may have been involved that time yourself. :-)
--
--Guido van Rossum (python.org/~guido)
Victor Stinner
2012-03-01 17:44:53 UTC
Permalink
In App Engine's case, an attacker who broke out of the sandbox would have
access to the inside of Google's datacenter, which would obviously be
bad -- that's why Google has developed its own sandboxing
technologies.
This is not specific to Google: if an attacker breaks a sandbox,
he/she has access to everything. Depending on how the sandbox is
implemented, you have more or less code to audit.

pysandbox disables introspection in Python and create an empty
namespace to reduce as much as possible the attack surface. You are to
be very careful when you add a new feature/function and it is complex.
I do know that I don't feel comfortable having a sandbox in the Python
standard library or even recommending a 3rd party sandboxing solution
frozendict would help pysandbox but also any security Python module,
not security, but also (many) other use cases ;-)
I wasn't just referring of rexec/Bastion (though that definitely
shaped my thinking about this issue; much more recently someone (Tal,
I think was his name?) tried to come up with a sandbox and every time
he believed he had a perfect solution, somebody found a loophole.
(Hm..., you may have been involved that time yourself. :-)
pysandbox is based on tav's approach, but it is more complete and
implement more protections. It is also more functional (you have more
available functions and features).

I challenge anyone to try to break pysandbox!

Victor
Paul Moore
2012-03-01 18:06:14 UTC
Permalink
Post by Victor Stinner
I challenge anyone to try to break pysandbox!
Can you explain precisely how a frozendict will help pysandbox? Then
I'll be able to beat this challenge :-)

Paul.
Victor Stinner
2012-03-01 18:29:14 UTC
Permalink
Post by Paul Moore
Post by Victor Stinner
I challenge anyone to try to break pysandbox!
Can you explain precisely how a frozendict will help pysandbox? Then
I'll be able to beat this challenge :-)
See this email:
http://mail.python.org/pipermail/python-dev/2012-February/117011.html

The issue #14162 has also two patches: one to make it possible to use
frozendict for __builtins__, and another one to create read-only types
(which is more a proof-of-concept).
http://bugs.python.org/issue14162

Victor
Guido van Rossum
2012-03-01 18:07:20 UTC
Permalink
Post by Victor Stinner
frozendict would help pysandbox but also any security Python module,
not security, but also (many) other use cases ;-)
Well, let's focus on the other use cases, because to me the sandbox
use case is too controversial (never mind how confident you are :-).

I like thinking through the cache use case a bit more, since this is a
common pattern. But I think it would be sufficient there to prevent
accidental modification, so it should be sufficient to have a dict
subclass that overrides the various mutating methods: __setitem__,
__delitem__, pop(), popitem(), clear(), setdefault(), update().
Technically also __init__() -- although calling __init__() on an
existing object can hardly be called an accident. As was pointed out
this is easy to circumvent, but (together with a reminder in the
documentation) should be sufficient to avoid mistakes. I imagine
someone who actively wants to mess with the cache can probably also
reach into the cache implementation directly.

Also don't underestimate the speed of a shallow dict copy.

What other use cases are there? (I have to agree with the folks
pushing back hard. Even demonstrated repeated requests for a certain
feature do not prove a need -- it's quite common for people who are
trying to deal with some problem to go down the wrong rabbit hole in
their quest for a solution, and ending up thinking they need a certain
feature while completely overlooking a much simpler solution.)
--
--Guido van Rossum (python.org/~guido)
André Malo
2012-03-01 20:35:07 UTC
Permalink
Post by Guido van Rossum
Post by Victor Stinner
frozendict would help pysandbox but also any security Python module,
not security, but also (many) other use cases ;-)
Well, let's focus on the other use cases, because to me the sandbox
use case is too controversial (never mind how confident you are :-).
I like thinking through the cache use case a bit more, since this is a
common pattern. But I think it would be sufficient there to prevent
accidental modification, so it should be sufficient to have a dict
subclass that overrides the various mutating methods: __setitem__,
__delitem__, pop(), popitem(), clear(), setdefault(), update().
For the caching part, simply making the dictproxy type public would already
help a lot.
Post by Guido van Rossum
What other use cases are there?
dicts as keys or as set members. I do run into this from time to time and
always get tuple(sorted(items()) or something like that.

nd
--
s s^saaaaaoaaaoaaaaooooaaoaaaomaaaa a alataa aaoat a a
a maoaa a laoata a oia a o a m a o alaoooat aaool aaoaa
matooololaaatoto aaa o a o ms;s;\s;s;g;y;s;:;s;y#mailto: #
\51/\134\137| http://www.perlig.de #;print;# > ***@perlig.de
Guido van Rossum
2012-03-01 23:11:35 UTC
Permalink
Post by André Malo
Post by Guido van Rossum
Post by Victor Stinner
frozendict would help pysandbox but also any security Python module,
not security, but also (many) other use cases ;-)
Well, let's focus on the other use cases, because to me the sandbox
use case is too controversial (never mind how confident you are :-).
I like thinking through the cache use case a bit more, since this is a
common pattern. But I think it would be sufficient there to prevent
accidental modification, so it should be sufficient to have a dict
subclass that overrides the various mutating methods: __setitem__,
__delitem__, pop(), popitem(), clear(), setdefault(), update().
For the caching part, simply making the dictproxy type public would already
help a lot.
Heh, that's a great idea. Can you file a bug for that?
Post by André Malo
Post by Guido van Rossum
What other use cases are there?
dicts as keys or as set members. I do run into this from time to time and
always get tuple(sorted(items()) or something like that.
I know I've done that once or twice in my life too, but it's a pretty
rare use case and as you say the solution is simple enough. An
alternative is frozenset(d.items()) -- someone should compare the
timing of these for large dicts.
--
--Guido van Rossum (python.org/~guido)
Victor Stinner
2012-03-02 00:39:32 UTC
Permalink
Post by Guido van Rossum
What other use cases are there?
frozendict could be used to implement "read-only" types: it is not
possible to add or remove an attribute or set an attribute value, but
attribute value can be a mutable object. Example of an enum with my
type_final.patch (attached to issue #14162).
... red=1
... green=2
... blue=3
... __final__=True
...
Post by Guido van Rossum
Color.red
1
Post by Guido van Rossum
Color.red=2
TypeError: 'frozendict' object does not support item assignment
Post by Guido van Rossum
Color.yellow=4
TypeError: 'frozendict' object does not support item assignment
Post by Guido van Rossum
Color.__dict__
frozendict({...})

The implementation avoids the private PyDictProxy for read-only types,
type.__dict__ gives directly access to the frozendict (but
type.__dict__=newdict is still blocked).

The "__final__=True" API is just a proposition, it can be anything else,
maybe a metaclass.

Using a frozendict for type.__dict__ is not the only possible solution
to implement read-only types. There are also Python implementation using
properties. Using a frozendict is faster than using properties because
getting an attribute is just a fast dictionary lookup, whereas reading a
property requires to execute a Python function. The syntax to declare a
read-only class is also more classic using the frozendict approach.

Victor
Guido van Rossum
2012-03-02 00:50:06 UTC
Permalink
Post by Guido van Rossum
What other use cases are there?
frozendict could be used to implement "read-only" types: it is not possible
to add or remove an attribute or set an attribute value, but attribute value
can be a mutable object. Example of an enum with my type_final.patch
(attached to issue #14162).
...   red=1
...   green=2
...   blue=3
...   __final__=True
...
Post by Guido van Rossum
Color.red
1
Post by Guido van Rossum
Color.red=2
TypeError: 'frozendict' object does not support item assignment
Post by Guido van Rossum
Color.yellow=4
TypeError: 'frozendict' object does not support item assignment
Post by Guido van Rossum
Color.__dict__
frozendict({...})
The implementation avoids the private PyDictProxy for read-only types,
type.__dict__ gives directly access to the frozendict (but
type.__dict__=newdict is still blocked).
The "__final__=True" API is just a proposition, it can be anything else,
maybe a metaclass.
Using a frozendict for type.__dict__ is not the only possible solution to
implement read-only types. There are also Python implementation using
properties. Using a frozendict is faster than using properties because
getting an attribute is just a fast dictionary lookup, whereas reading a
property requires to execute a Python function. The syntax to declare a
read-only class is also more classic using the frozendict approach.
I think you should provide stronger arguments in each case why the
data needs to be truly immutable or read-only, rather than just using
a convention or an "advisory" API (like __private can be circumvented
but clearly indicates intent to the reader).
--
--Guido van Rossum (python.org/~guido)
R. David Murray
2012-03-02 01:50:51 UTC
Permalink
Post by Guido van Rossum
frozendict could be used to implement "read-only" types: it is not possible
to add or remove an attribute or set an attribute value, but attribute value
can be a mutable object. Example of an enum with my type_final.patch
(attached to issue #14162).
[...]
Post by Guido van Rossum
I think you should provide stronger arguments in each case why the
data needs to be truly immutable or read-only, rather than just using
a convention or an "advisory" API (like __private can be circumvented
but clearly indicates intent to the reader).
+1. Except in very limited circumstances (such as a security sandbox)
I would *much* rather have the code I'm interacting with use advisory
means rather than preventing me from being a consenting adult. (Having to
name mangle by hand when someone has used a __ method is painful enough,
thank you...good thing the need to do that doesn't dome up often (mostly
only in unit tests)).

--David
Nick Coghlan
2012-03-02 02:06:21 UTC
Permalink
+1.  Except in very limited circumstances (such as a security sandbox)
I would *much* rather have the code I'm interacting with use advisory
means rather than preventing me from being a consenting adult.  (Having to
name mangle by hand when someone has used a __ method is painful enough,
thank you...good thing the need to do that doesn't dome up often (mostly
only in unit tests)).
The main argument I'm aware of in favour of this kind of enforcement
is that it means you get exceptions at the point of *error* (trying to
modify the "read-only" dict), rather than having a strange
action-at-a-distance data mutation bug to track down.

However, in that case, it's just fine (and in fact better) if there is
a way around the default enforcement via a more verbose spelling.

Cheers,
Nick.
--
Nick Coghlan   |   ***@gmail.com   |   Brisbane, Australia
Yury Selivanov
2012-03-02 02:13:44 UTC
Permalink
Post by Guido van Rossum
I think you should provide stronger arguments in each case why the
data needs to be truly immutable or read-only, rather than just using
a convention or an "advisory" API (like __private can be circumvented
but clearly indicates intent to the reader).
Here's one more argument to support frozendicts.

For last several months I've been thinking about prohibiting coroutines
(generators + greenlets in our framework) to modify the global state.
If there is a guarantee that all coroutines of the whole application,
modules and framework are 100% safe from that, it's possible to do some
interesting stuff. For instance, dynamically balance jobs across all
application processes:

@coroutine
def on_generate_report(context):
data = yield fetch_financial_data(context)
...

In the above example, 'fetch_financial_data' may be executed in the
different process, or even on the different server, if the coroutines'
scheduler of current process decides so (based on its load, or a low
priority of the coroutine being scheduled).

With built-in frozendict it will be easier to secure modules or
functions' __globals__ that way, allowing to play with features closer
to the ones Erlang and other concurrent languages provide.

-
Yury
Guido van Rossum
2012-03-02 02:31:41 UTC
Permalink
Post by Yury Selivanov
Post by Guido van Rossum
I think you should provide stronger arguments in each case why the
data needs to be truly immutable or read-only, rather than just using
a convention or an "advisory" API (like __private can be circumvented
but clearly indicates intent to the reader).
Here's one more argument to support frozendicts.
For last several months I've been thinking about prohibiting coroutines
(generators + greenlets in our framework) to modify the global state.
If there is a guarantee that all coroutines of the whole application,
modules and framework are 100% safe from that, it's possible to do some
interesting stuff.  For instance, dynamically balance jobs across all
@coroutine
   data = yield fetch_financial_data(context)
   ...
In the above example, 'fetch_financial_data' may be executed in the
different process, or even on the different server, if the coroutines'
scheduler of current process decides so (based on its load, or a low
priority of the coroutine being scheduled).
With built-in frozendict it will be easier to secure modules or
functions' __globals__ that way, allowing to play with features closer
to the ones Erlang and other concurrent languages provide.
That sounds *very* far-fetched. You're pretty much designing a new
language variant. It's not an argument for burdening the original
language with a data type it doesn't need for itself.

You should be able to prototype what you want using an advisory
subclass (if you subclass dict and add __slots__=[] to it, it will
cost very little overhead) or using a custom extension that implements
the flavor of frozendict that works best for you -- given that you're
already using greenlets, another extension can't be a bid burden.
--
--Guido van Rossum (python.org/~guido)
Yury Selivanov
2012-03-02 02:44:25 UTC
Permalink
Post by Guido van Rossum
That sounds *very* far-fetched. You're pretty much designing a new
language variant. It's not an argument for burdening the original
Yeah, that's what we do ;)
Post by Guido van Rossum
You should be able to prototype what you want using an advisory
subclass (if you subclass dict and add __slots__=[] to it, it will
cost very little overhead) or using a custom extension that implements
the flavor of frozendict that works best for you -- given that you're
already using greenlets, another extension can't be a bid burden.
I understand. The only reason I wrote about it is to give an idea of
how frozendicts may be used besides just sandboxing. I'm not strongly
advocating for it, though.

-
Yury
Victor Stinner
2012-03-02 11:36:05 UTC
Permalink
Post by Guido van Rossum
I think you should provide stronger arguments in each case why the
data needs to be truly immutable or read-only, rather than just using
a convention or an "advisory" API (like __private can be circumvented
but clearly indicates intent to the reader).
I only know one use case for "truly immutable or read-only" object
(frozendict, "read-only" type, read-only proxy, etc.): security. I
know three modules using a C extension to implement read only objects:
zope.proxy, zope.security and mxProxy. pysandbox uses more ugly tricks
to implement read-only proxies :-) Such modules are used to secure web
applications for example. A frozendict type doesn't replace these
modules but help to implement security modules.

http://www.egenix.com/products/python/mxBase/mxProxy/
http://pypi.python.org/pypi/zope.proxy
http://pypi.python.org/pypi/zope.security

Victor

Mark Janssen
2012-03-02 00:25:35 UTC
Permalink
Post by Guido van Rossum
I do know that I don't feel comfortable having a sandbox in the Python
standard library or even recommending a 3rd party sandboxing solution
-- if someone uses the sandbox to protect a critical resource, and a
hacker breaks out of the sandbox, the author of the sandbox may be
held responsible for more than they bargained for when they made it
open source. (Doesn't an open source license limit your
responsibility? Who knows. AFAIK this question has not gotten to court
yet. I wouldn't want to have to go to court over it.)
Since there's no way (even theoretical way) to completely secure anything
(remember the DVD protection wars?), there's no way there should be any
liability if reasonable diligence is performed to provide security where
expected (which is probably calculable to some %-age of assets protected).
It's like putting a lock on the door of your house -- you can't expect to
be held liable is someone has a crowbar.

Open sourcing code could be said to be a disclaimer on any liability as
your letting people know that you've got nothing your trying to conceal.
It's like a dog who plays dead: by being totally open you're actually
more secure....

mark
Stephen J. Turnbull
2012-03-02 10:12:17 UTC
Permalink
Post by Mark Janssen
Since there's no way (even theoretical way) to completely secure anything
(remember the DVD protection wars?), there's no way there should be any
liability if reasonable diligence is performed to provide security where
expected (which is probably calculable to some %-age of assets
protected).
That's not how the law works, sorry. Look up "consequential damages,"
"contributory negligence," and "attractive nuisance." I'm not saying
that anybody will lose *in* court, but one can surely be taken *to*
court. If that happens to you, you've already lost (even if the other
side can't win).
Post by Mark Janssen
Open sourcing code could be said to be a disclaimer on any liability as
your letting people know that you've got nothing your trying to conceal.
Again, you seem to be revealing your ignorance of the law (not to
mention security -- a safe is supposed to be secure even if the
burglar has the blueprints). A comprehensive and presumably effective
disclaimer is part of the license, but it's not clear that even that
works. AFAIK such disclaimers are not well-tested in court.

Guido is absolutely right. There is a risk here (not in the
frozendict type, of course), but in distributing an allegedly
effective sandbox. I doubt Victor as an individual doing research has
a problem; the PSF is another matter.

BTW, Larry Rosen's book on Open Source Licensing is a good reference.
Andrew St. Laurent also has a book out, I like Larry's better but
YMMV.
Serhiy Storchaka
2012-03-01 07:43:13 UTC
Permalink
Post by Victor Stinner
Problem: if you implement a frozendict type inheriting from dict in
Python, it is still possible to call dict methods (e.g.
dict.__setitem__()). To fix this issue, pysandbox removes all dict
methods modifying the dict: __setitem__, __delitem__, pop, etc. This
is a problem because untrusted code cannot use these methods on valid
dict created in the sandbox.
You can redefine dict.__setitem__.

oldsetitem = dict.__setitem__
def newsetitem(self, value):
# check if self is not frozendict
...
oldsetitem(self, value)
....
dict.__setitem__ = newsetitem
Victor Stinner
2012-03-01 09:11:03 UTC
Permalink
Post by Serhiy Storchaka
Post by Victor Stinner
Problem: if you implement a frozendict type inheriting from dict in
Python, it is still possible to call dict methods (e.g.
dict.__setitem__()). To fix this issue, pysandbox removes all dict
methods modifying the dict: __setitem__, __delitem__, pop, etc. This
is a problem because untrusted code cannot use these methods on valid
dict created in the sandbox.
You can redefine dict.__setitem__.
Ah? It doesn't work here.
Post by Serhiy Storchaka
Post by Victor Stinner
dict.__setitem__=lambda key, value: None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'dict'

Victor
Serhiy Storchaka
2012-03-01 13:44:18 UTC
Permalink
Post by Victor Stinner
Post by Serhiy Storchaka
You can redefine dict.__setitem__.
Ah? It doesn't work here.
Post by Serhiy Storchaka
dict.__setitem__=lambda key, value: None
File "<stdin>", line 1, in<module>
TypeError: can't set attributes of built-in/extension type 'dict'
Hmm, yes, it's true. It was too presumptuous of me to believe that you
have not considered such simple approach.

But I will try to suggest another approach. `frozendict` inherits from
`dict`, but data is not stored in the parent, but in the internal
dictionary. And even if dict.__setitem__ is used, it will have no
visible effect.

class frozendict(dict):
def __init__(self, values={}):
self._values = dict(values)
def __getitem__(self, key):
return self._values[key]
def __setitem__(self, key, value):
raise TypeError ("expect dict, got frozendict")
...
Post by Victor Stinner
Post by Serhiy Storchaka
a = frozendict({1: 2, 3: 4})
a[1]
2
Post by Victor Stinner
Post by Serhiy Storchaka
a[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in __getitem__
KeyError: 5
Post by Victor Stinner
Post by Serhiy Storchaka
a[5] = 6
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in __setitem__
TypeError: expect dict, got frozendict
Post by Victor Stinner
Post by Serhiy Storchaka
dict.__setitem__(a, 5, 6)
a[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in __getitem__
KeyError: 5
Victor Stinner
2012-03-01 14:49:29 UTC
Permalink
Post by Serhiy Storchaka
But I will try to suggest another approach. `frozendict` inherits from
`dict`, but data is not stored in the parent, but in the internal
dictionary. And even if dict.__setitem__ is used, it will have no visible
effect.
         self._values = dict(values)
         return self._values[key]
         raise TypeError ("expect dict, got frozendict")
     ...
I would like to implement frozendict in C to be able to pass it to
PyDict_GetItem(), PyDict_SetItem() and PyDict_DelItem(). Using such
Python implementation, you would get surprising result:

d = frozendict()
dict.__setitem__(d, 'x', 1) # this is what Python does internally when
it expects a dict (e.g. in ceval.c for __builtins__)
'x' in d => False

(Python is not supposed to use the PyDict API if the object is a dict
subclass, but PyObject_Get/SetItem.)

Victor
Victor Stinner
2012-03-01 13:00:38 UTC
Permalink
Post by Victor Stinner
A builtin frozendict type "compatible" with the PyDict C API is very
convinient for pysandbox because using this type for core features
like builtins requires very few modification. For example, use
frozendict for __builtins__ only requires to modify 3 lines in
frameobject.c.
See the frozendict_builtins.patch attached to the issue #14162. Last version:
http://bugs.python.org/file24690/frozendict_builtins.patch

Victor
Victor Stinner
2012-03-01 00:23:07 UTC
Permalink
Post by Raymond Hettinger
Post by Victor Stinner
A frozendict type is a common request from users and there are various
implementations.
ISTM, this request is never from someone who has a use case.
One of my colleagues implemented recently its own frozendict class
(which the "frozendict" name ;-)). He tries to implement something
like the PEP 351, not a generic freeze() function but a specialized
function for his use case (only support list/tuple and dict/frozendict
if I remember correctly). It remembers me the question: why does
Python not provide a frozendict type?

Even if it is not possible to write a perfect freeze() function, it
looks like some developers need sort of this function and I hope that
frozendict would be a first step in the good direction.

Ruby has a freeze method. On a dict, it provides the same behaviour
than frozendict: the mapping cannot be modified anymore, but values
are still mutable.
http://ruby-doc.org/core-1.9.3/Object.html#method-i-freeze
Post by Raymond Hettinger
Many experienced Python users simply forget
that we have a frozenset type.  We don't get bug reports or
feature requests about the type.
I used it in my previous work to declare the access control list (ACL)
on services provided by XML-RPC object. To be honest, set could also
be used, but I chose frozenset to ensure that my colleagues don't try
to modify it without understanding the consequences of such change. It
was not a protecting against evil hackers from the Internet, but from
my colleagues :-)

Sorry, I didn't find any bug in frozenset :-) My usage was just to
declare a frozendict and then check if an item is in the set, and it
works pretty well!
Post by Raymond Hettinger
P.S.  The one advantage I can see for frozensets and frozendicts
is that we have an opportunity to optimize them once they are built
(optimizing insertion order to minimize collisions, increasing or
decreasing density, eliminating dummy entries, etc).  That being
said, the same could be accomplished for regular sets and dicts
by the addition of an optimize() method.
You can also implement more optimizations in Python peephole or PyPy
JIT because the mapping is constant and so you can do the lookup at
compilation, instead of doing it at runtime.

Dummy example:
---
config = frozendict(debug=False)
if config['debug']:
enable_debug()
---
config['debug'] is always False and so you can just drop the call to
enable_debug() while compiling this code. It would avoid the need of a
preprocessor in some cases (especially conditional code, like the C
#ifdef).

Victor
Raymond Hettinger
2012-03-01 01:45:06 UTC
Permalink
Post by Victor Stinner
One of my colleagues implemented recently its own frozendict class
(which the "frozendict" name ;-)
I write new collection classes all the time.
That doesn't mean they warrant inclusion in the library or builtins.
There is a use case for ListenableSets and ListenableDicts -- do we
need them in the library? I think not. How about case insensitive variants?
I think not. There are tons of recipes on ASPN and on PyPI.
That doesn't make them worth adding in to the core group of types.

As core developers, we need to place some value on language
compactness and learnability. The language has already gotten
unnecessarily fat -- it is the rare Python programmer who knows
set operations on dict views, new-style formatting, abstract base classes,
contextlib/functools/itertools, how the with-statement works,
how super() works, what properties/staticmethods/classmethods are for,
differences between new and old-style classes, Exception versus BaseException,
weakreferences, __slots__, chained exceptions, etc.

If we were to add another collections type, it would need to be something
that powerfully adds to the expressivity of the language. Minor variants
on what we already have just makes that language harder to learn and remember
but not providing much of a payoff in return.


Raymond
Georg Brandl
2012-03-01 06:52:20 UTC
Permalink
Post by Raymond Hettinger
Post by Victor Stinner
One of my colleagues implemented recently its own frozendict class
(which the "frozendict" name ;-)
I write new collection classes all the time.
That doesn't mean they warrant inclusion in the library or builtins.
There is a use case for ListenableSets and ListenableDicts -- do we
need them in the library? I think not. How about case insensitive variants?
I think not. There are tons of recipes on ASPN and on PyPI.
That doesn't make them worth adding in to the core group of types.
+1.

Georg
Giampaolo Rodolà
2012-03-01 09:28:46 UTC
Permalink
Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 01 Mar 2012 10:28:48 +0100
Received: from mail-iy0-f174.google.com (mail-iy0-f174.google.com
[209.85.210.174])
(using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
(No client certificate requested)
by mail.python.org (Postfix) with ESMTPS
for <Python-***@python.org>; Thu, 1 Mar 2012 10:28:47 +0100 (CET)
Received: by iagz16 with SMTP id z16so547479iag.19
for <Python-***@python.org>; Thu, 01 Mar 2012 01:28:46 -0800 (PST)
Received-SPF: pass (google.com: domain of ***@gmail.com designates
10.43.45.10 as permitted sender) client-ip.43.45.10;
Authentication-Results: mr.google.com;
spf=pass (google.com: domain of ***@gmail.com
designates 10.43.45.10 as permitted sender)
smtp.mail=***@gmail.com;
dkim=pass header.i=***@gmail.com
Received: from mr.google.com ([10.43.45.10])
by 10.43.45.10 with SMTP id ui10mr3399841icb.32.1330594126571 (num_hops
= 1); Thu, 01 Mar 2012 01:28:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type:content-transfer-encoding;
bh=xRu7aeOjt7qPP1VR3yk+7fc4z5F4iQoRenOcXj1BFAY=;
b=WYs340CiycaVrUZP1oJTr+NO64yBe0hsV7KLQuz4bB2i0Lt1sdNbWHQKqcXicPGrsj
PWcF+aFKFPyfafvzyBVsgMC3oAF48dljWa1jOG5fnPbFZGWOMyo2zXvocHHmcdpdP597
hfHJD/N5qAtt6kXxQiJDPXloBXu5B/Luo3mUAReceived: by 10.43.45.10 with SMTP id ui10mr2795623icb.32.1330594126526; Thu,
01 Mar 2012 01:28:46 -0800 (PST)
Received: by 10.50.202.65 with HTTP; Thu, 1 Mar 2012 01:28:46 -0800 (PST)
In-Reply-To: <873CAE07-45B6-4F19-903E-***@gmail.com>
X-BeenThere: python-***@python.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Python core developers <python-dev.python.org>
List-Unsubscribe: <http://mail.python.org/mailman/options/python-dev>,
<mailto:python-dev-***@python.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-dev>
List-Post: <mailto:python-***@python.org>
List-Help: <mailto:python-dev-***@python.org?subject=help>
List-Subscribe: <http://mail.python.org/mailman/listinfo/python-dev>,
<mailto:python-dev-***@python.org?subject=subscribe>
Sender: python-dev-bounces+python-python-dev=***@python.org
Errors-To: python-dev-bounces+python-python-dev=***@python.org
Archived-At: <http://permalink.gmane.org/gmane.comp.python.devel/130296>

Il 01 marzo 2012 02:45, Raymond Hettinger
Post by Victor Stinner
One of my colleagues implemented recently its own frozendict class
(which the "frozendict" name ;-)
I write new collection classes all the time.
That doesn't mean they warrant inclusion in the library or builtins.
There is a use case for ListenableSets and ListenableDicts -- do we
need them in the library?  I think not.  How about case insensitive
variants?
I think not.  There are tons of recipes on ASPN and on PyPI.
That doesn't make them worth adding in to the core group of types.
As core developers, we need to place some value on language
compactness and learnability.  The language has already gotten
unnecessarily fat -- it is the rare Python programmer who knows
set operations on dict views, new-style formatting, abstract base classes,
contextlib/functools/itertools, how the with-statement works,
how super() works, what properties/staticmethods/classmethods are for,
differences between new and old-style classes, Exception versus
BaseException,
weakreferences, __slots__, chained exceptions, etc.
If we were to add another collections type, it would need to be something
that powerfully adds to the expressivity of the language.  Minor variants
on what we already have just makes that language harder to learn and remember
but not providing much of a payoff in return.
Raymond
_______________________________________________
Python-Dev mailing list
http://mail.python.org/mailman/listinfo/python-dev
http://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com
+1

--- Giampaolo
http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/
http://code.google.com/p/pysendfile/
Yury Selivanov
2012-03-01 12:37:28 UTC
Permalink
Actually I find fronzendict concept quite useful. We also have an
implementation in our framework, and we use it, for instance, in
http request object, for parsed arguments and parsed forms, which
values shouldn't be ever modified once parsed.

Of course everybody can live without it, but given the fact of how
easy it is to implement it I think its OK to have it.

+1.
Post by Raymond Hettinger
Post by Victor Stinner
One of my colleagues implemented recently its own frozendict class
(which the "frozendict" name ;-)
I write new collection classes all the time.
That doesn't mean they warrant inclusion in the library or builtins.
There is a use case for ListenableSets and ListenableDicts -- do we
need them in the library? I think not. How about case insensitive variants?
I think not. There are tons of recipes on ASPN and on PyPI.
That doesn't make them worth adding in to the core group of types.
As core developers, we need to place some value on language
compactness and learnability. The language has already gotten
unnecessarily fat -- it is the rare Python programmer who knows
set operations on dict views, new-style formatting, abstract base classes,
contextlib/functools/itertools, how the with-statement works,
how super() works, what properties/staticmethods/classmethods are for,
differences between new and old-style classes, Exception versus BaseException,
weakreferences, __slots__, chained exceptions, etc.
If we were to add another collections type, it would need to be something
that powerfully adds to the expressivity of the language. Minor variants
on what we already have just makes that language harder to learn and remember
but not providing much of a payoff in return.
Raymond
_______________________________________________
Python-Dev mailing list
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com
Paul Moore
2012-03-01 14:08:18 UTC
Permalink
Actually I find fronzendict concept quite useful.  We also have an
implementation in our framework, and we use it, for instance, in
http request object, for parsed arguments and parsed forms, which
values shouldn't be ever modified once parsed.
The question isn't so much whether it's useful, as whether it's of
sufficiently general use to warrant putting it into the core language
(not even the stdlib, but the C core!). The fact that you have an
implementation of your own, actually indicates that not having it in
the core didn't cause you any real problems.

Remember - the bar for core acceptance is higher than just "it is
useful". I'm not even sure I see a strong enough case for frozendict
being in the standard library yet, let alone in the core.

Paul.
Steven D'Aprano
2012-03-01 01:28:41 UTC
Permalink
Post by Raymond Hettinger
Post by Victor Stinner
A frozendict type is a common request from users and there are various
implementations.
ISTM, this request is never from someone who has a use case.
Instead, it almost always comes from "completers", people
who see that we have a frozenset type and think the core devs
missed the ObviousThingToDo(tm). Frozendicts are trivial to
implement, so that is why there are various implementations
(i.e. the implementations are more fun to write than they are to use).
They might be trivial for *you*, but the fact that people keep asking for help
writing a frozendict, or stating that their implementation sucks, demonstrates
that for the average Python coder they are not trivial at all. And the
implementations I've seen don't seem to be so much fun as *tedious*.

E.g. google on "python frozendict" and the second link is from somebody who
had tried for "a couple of days" and is still not happy:

http://python.6.n6.nabble.com/frozendict-td4377791.html

You may dismiss him as a "completer", but what is asserted without evidence
can be rejected without evidence, and so we may just as well declare that he
has a brilliantly compelling use-case, if only we knew what it was... <wink>

I see one implementation on ActiveState that has at least one serious problem,
reported by you:

http://code.activestate.com/recipes/414283-frozen-dictionaries/


So I don't think we can dismiss frozendict as "trivial".
--
Steven
André Malo
2012-03-01 09:29:32 UTC
Permalink
Post by Raymond Hettinger
Post by Victor Stinner
A frozendict type is a common request from users and there are various
implementations.
ISTM, this request is never from someone who has a use case.
Instead, it almost always comes from "completers", people
who see that we have a frozenset type and think the core devs
missed the ObviousThingToDo(tm). Frozendicts are trivial to
implement, so that is why there are various implementations
(i.e. the implementations are more fun to write than they are to use).
The frozenset type covers a niche case that is nice-to-have but
*rarely* used. Many experienced Python users simply forget
that we have a frozenset type. We don't get bug reports or
feature requests about the type. When I do Python consulting
work, I never see it in a client's codebase. It does occasionally
get discussed in questions on StackOverflow but rarely gets
offered as an answer (typically on variants of the "how do you
make a set-of-sets" question). If Google's codesearch were still
alive, we could add another datapoint showing how infrequently
this type is used.
Here are my real-world use cases. Not for security, but for safety and
performance reasons (I've built by own RODict and ROList modeled after
dictproxy):

- Global, but immutable containers, e.g. as class members

- Caching. My data container objects (say, resultsets from a db or something)
usually inherit from list or dict (sometimes also set) and are cached
heavily. In order to ensure that they are not modified (accidentially), I
have to choices: deepcopy or immutability. deepcopy is so expensive, that
it's often cheaper to just leave out the cache. So I use immutability. (oh
well, the objects are further restricted with __slots__)

I agree, these are not general purpose issues, but they are not *rare*, I'd
think.

nd
Victor Stinner
2012-03-01 13:07:10 UTC
Permalink
Post by André Malo
Here are my real-world use cases. Not for security, but for safety and
performance reasons (I've built by own RODict and ROList modeled after
- Global, but immutable containers, e.g. as class members
I attached type_final.patch to the issue #14162 to demonstrate how
frozendict can be used to implement a "read-only" type. Last version:
http://bugs.python.org/file24696/type_final.patch
... __final__=True
... attr = 10
... def hello(self):
... print("hello")
...
Post by André Malo
FinalizedType.attr=12
TypeError: 'frozendict' object does not support item assignment
Post by André Malo
FinalizedType.hello=print
TypeError: 'frozendict' object does not support item assignment

(instance do still have a mutable dict)

My patch checks for the __final__ class attribute, but the conversion
from dict to frozendict may be done by a function or a type method.
Creating a read-only type is a different issue, it's just another
example of frozendict usage.

Victor
André Malo
2012-03-01 13:26:46 UTC
Permalink
Post by Victor Stinner
Post by André Malo
Here are my real-world use cases. Not for security, but for safety and
performance reasons (I've built by own RODict and ROList modeled after
- Global, but immutable containers, e.g. as class members
I attached type_final.patch to the issue #14162 to demonstrate how
http://bugs.python.org/file24696/type_final.patch
Oh, hmm. I rather meant something like that:

"""
class Foo:
some_mapping = frozendict(
blah=1, blub=2
)

or as a variant:

def zonk(some_default=frozendict(...)):
...

or simply a global object:

baz = frozendict(some_immutable_mapping)
"""

I'm not sure about your final types. I'm using __slots__ = () for such things
(?)

nd
Victor Stinner
2012-03-01 14:54:01 UTC
Permalink
Post by Victor Stinner
Post by André Malo
Here are my real-world use cases. Not for security, but for safety and
performance reasons (I've built by own RODict and ROList modeled after
- Global, but immutable containers, e.g. as class members
I attached type_final.patch to the issue #14162 to demonstrate how
http://bugs.python.org/file24696/type_final.patch
"""
   some_mapping = frozendict(
       blah=1, blub=2
   )
   ...
baz = frozendict(some_immutable_mapping)
"""
Ah yes, frozendict is useful for such cases.
I'm not sure about your final types. I'm using __slots__ = () for such things
... __slots__=('x',)
... x = 1
...
Post by Victor Stinner
Post by André Malo
A.x=2
A.x
2

Victor
André Malo
2012-03-01 15:05:27 UTC
Permalink
Post by Victor Stinner
Post by André Malo
I'm not sure about your final types. I'm using __slots__ = () for such things
... __slots__=('x',)
... x = 1
...
Post by André Malo
A.x=2
A.x
2
Ah, ok, I missed that. It should be fixable with a metaclass. Not very nicely,
though.

nd
Nick Coghlan
2012-03-01 13:34:56 UTC
Permalink
Post by André Malo
- Caching. My data container objects (say, resultsets from a db or something)
 usually inherit from list or dict (sometimes also set) and are cached
 heavily. In order to ensure that they are not modified (accidentially), I
 have to choices: deepcopy or immutability. deepcopy is so expensive, that
 it's often cheaper to just leave out the cache. So I use immutability. (oh
 well, the objects are further restricted with __slots__)
Speaking of caching - functools.lru_cache currently has to do a fair
bit of work in order to correctly cache keyword arguments. It's
obviously a *solvable* problem even without frozendict in the
collections module (it just stores the dict contents as a sorted tuple
of 2-tuples), but it would still be interesting to compare the
readability, speed and memory consumption differences of a version of
lru_cache that used frozendict to cache the keyword arguments instead.

Cheers,
Nick.
--
Nick Coghlan   |   ***@gmail.com   |   Brisbane, Australia
Serhiy Storchaka
2012-03-01 14:17:35 UTC
Permalink
Post by André Malo
- Caching. My data container objects (say, resultsets from a db or something)
usually inherit from list or dict (sometimes also set) and are cached
heavily. In order to ensure that they are not modified (accidentially), I
have to choices: deepcopy or immutability. deepcopy is so expensive, that
it's often cheaper to just leave out the cache. So I use immutability. (oh
well, the objects are further restricted with __slots__)
This is the first rational use of frozendict that I see. However, a deep
copy is still necessary to create the frozendict. For this case, I
believe, would be better to "freeze" dict inplace and then copy-on-write it.
André Malo
2012-03-01 14:47:08 UTC
Permalink
Post by Serhiy Storchaka
Post by André Malo
- Caching. My data container objects (say, resultsets from a db or
something) usually inherit from list or dict (sometimes also set) and are
cached heavily. In order to ensure that they are not modified
(accidentially), I have to choices: deepcopy or immutability. deepcopy is
so expensive, that it's often cheaper to just leave out the cache. So I
use immutability. (oh well, the objects are further restricted with
__slots__)
This is the first rational use of frozendict that I see. However, a deep
copy is still necessary to create the frozendict. For this case, I
believe, would be better to "freeze" dict inplace and then copy-on-write it.
In my case it's actually a half one. The data mostly comes from memcache ;)
I'm populating the object and then I'm done with it. People wanting to modify
it, need to copy it, yes. OTOH usually a shallow copy is enough (here).

Funnily my ROList actually provides a "sorted" method instead of "sort" in
order to create a sorted copy of the list.

nd
Serhiy Storchaka
2012-03-01 17:56:38 UTC
Permalink
Post by André Malo
Post by Serhiy Storchaka
This is the first rational use of frozendict that I see. However, a deep
copy is still necessary to create the frozendict. For this case, I
believe, would be better to "freeze" dict inplace and then copy-on-write it.
In my case it's actually a half one. The data mostly comes from memcache ;)
I'm populating the object and then I'm done with it. People wanting to modify
it, need to copy it, yes. OTOH usually a shallow copy is enough (here).
What if people modify dicts in deep?

a = frozendict({1: {2: 3}})
b = a.copy()
c = a.copy()
assert b[1][2] == 3
c[1][2] = 4
assert b[1][2] == 4

You need to copy incoming dict in depth.

def frozencopy(value):
if isinstance(value, list):
return tuple(frozencopy(x) for x in value)
if isinstance(value, dict):
return frozendict((frozencopy(k), frozencopy(v)) for k, v in value.items())
return value # I'm lucky

And when client wants to modify the result in depth it should call "unfrozencopy". Using frozendict profitable only when multiple clients are reading the result, but not modify it. Copy-on-write would help in all cases and would simplify the code. But this is a topic for python-ideas, sorry.
André Malo
2012-03-01 20:27:17 UTC
Permalink
Post by Serhiy Storchaka
Post by André Malo
Post by Serhiy Storchaka
This is the first rational use of frozendict that I see. However, a
deep copy is still necessary to create the frozendict. For this case,
I believe, would be better to "freeze" dict inplace and then
copy-on-write it.
In my case it's actually a half one. The data mostly comes from
memcache ;) I'm populating the object and then I'm done with it. People
wanting to modify it, need to copy it, yes. OTOH usually a shallow copy
is enough (here).
What if people modify dicts in deep?
that's the "here" part. They can't [1]. These objects are typically ROLists
of RODicts. Maybe nested deeper, but all RO* or other immutable types.

I cheated, by deepcopying always in the cache, but defining __deepcopy__ for
those RO* objects as "return self".

nd

[1] Well, an attacker could, because it's still based on regular dicts and
lists. But thatswhy it's not a security feature, but a safety net (here).
--
"Solides und umfangreiches Buch"
-- aus einer Rezension

<http://pub.perlig.de/books.html#apache2>
Loading...