Discussion:
Another case for frozendict
Jason R. Coombs
2014-07-13 14:04:17 UTC
Permalink
I repeatedly run into situations where a frozendict would be useful, and every time I do, I go searching and find the (unfortunately rejected) PEP-416. I'd just like to share another case where having a frozendict in the stdlib would be useful to me.
res = [db.cases.remove({'_id': doc['_id']}) for doc in fives]
len(res)
206

I can see that the results are the same for the first two queries.
res[0]
{'n': 1, 'err': None, 'ok': 1.0}
res[1]
{'n': 1, 'err': None, 'ok': 1.0}
set(res)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

I can't do that because dict is unhashable. That's reasonable, and if I had a frozen dict, I could easily work around this limitation and accomplish what I need.
set(map(frozendict, res))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'frozendict' is not defined

PEP-416 mentions a MappingProxyType, but that's no help.
res_ex = list(map(types.MappingProxyType, res))
set(res_ex)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'mappingproxy'

I can achieve what I need by constructing a set on the 'items' of the dict.
set(tuple(doc.items()) for doc in res)
{(('n', 1), ('err', None), ('ok', 1.0))}

But that syntax would be nicer if the result had the same representation as the input (mapping instead of tuple of pairs). A frozendict would have readily enabled the desirable behavior.

Although hashability is mentioned in the PEP under constraints, there are many use-cases that fall out of the ability to hash a dict, such as the one described above, which are not mentioned at all in use-cases for the PEP.

If there's ever any interest in reviving that PEP, I'm in favor of its implementation.
Victor Stinner
2014-07-13 14:13:14 UTC
Permalink
The PEP has been rejected, but the MappingProxyType is now public:

$ ./python
Python 3.5.0a0 (default:5af54ed3af02, Jul 12 2014, 03:13:04)
d={1:2}
import types
d = types.MappingProxyType(d)
d
mappingproxy({1: 2})
d[1]
2
d[1] = 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'mappingproxy' object does not support item assignment

Victor
Chris Angelico
2014-07-13 14:22:57 UTC
Permalink
I can achieve what I need by constructing a set on the ‘items’ of the dict.
set(tuple(doc.items()) for doc in res)
{(('n', 1), ('err', None), ('ok', 1.0))}
This is flawed; the tuple-of-tuples depends on iteration order, which
may vary. It should be a frozenset of those tuples, not a tuple. Which
strengthens your case; it's that easy to get it wrong in the absence
of an actual frozendict.

ChrisA
Russell E. Owen
2014-07-15 23:48:48 UTC
Permalink
In article
Post by Chris Angelico
I can achieve what I need by constructing a set on the ‘items’ of the dict.
set(tuple(doc.items()) for doc in res)
{(('n', 1), ('err', None), ('ok', 1.0))}
This is flawed; the tuple-of-tuples depends on iteration order, which
may vary. It should be a frozenset of those tuples, not a tuple. Which
strengthens your case; it's that easy to get it wrong in the absence
of an actual frozendict.
I would love to see frozendict in python.

I find myself using dicts for translation tables, usually tables that
should not be modified. Documentation usually suffices to get that idea
across, but it's not ideal.

frozendict would also be handy as a default values for function
arguments. In that case documentation isn't enough and one has to resort
to using a default value of None and then changing it in the function
body.

I like frozendict because I feel it is expressive and adds some safety.

-- Russell
MRAB
2014-07-16 02:27:23 UTC
Permalink
Post by Russell E. Owen
In article
Post by Chris Angelico
I can achieve what I need by constructing a set on the ‘items’ of the dict.
set(tuple(doc.items()) for doc in res)
{(('n', 1), ('err', None), ('ok', 1.0))}
This is flawed; the tuple-of-tuples depends on iteration order, which
may vary. It should be a frozenset of those tuples, not a tuple. Which
strengthens your case; it's that easy to get it wrong in the absence
of an actual frozendict.
I would love to see frozendict in python.
I find myself using dicts for translation tables, usually tables that
should not be modified. Documentation usually suffices to get that idea
across, but it's not ideal.
frozendict would also be handy as a default values for function
arguments. In that case documentation isn't enough and one has to resort
to using a default value of None and then changing it in the function
body.
I like frozendict because I feel it is expressive and adds some safety.
Here's another use-case.
Post by Russell E. Owen
Post by Chris Angelico
import re
# Make a regex.
... p = re.compile(r'(?P<first>\w+)\s+(?P<second>\w+)')
Post by Russell E. Owen
Post by Chris Angelico
# What are the named groups?
... p.groupindex
{'first': 1, 'second': 2}
Post by Russell E. Owen
Post by Chris Angelico
# Perform a match.
... m = p.match('FIRST SECOND')
Post by Russell E. Owen
Post by Chris Angelico
m.groupdict()
{'first': 'FIRST', 'second': 'SECOND'}
Post by Russell E. Owen
Post by Chris Angelico
# Try modifying the pattern object.
... p.groupindex['JUNK'] = 'foobar'
Post by Russell E. Owen
Post by Chris Angelico
# What are the named groups now?
... p.groupindex
{'first': 1, 'second': 2, 'JUNK': 'foobar'}
Post by Russell E. Owen
Post by Chris Angelico
# And the match object?
... m.groupdict()
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IndexError: no such group

It can't find a named group called 'JUNK'.

And with a bit more tinkering it's possible to crash Python. (I'll
leave that as an exercise for the reader! :-))
Post by Russell E. Owen
Post by Chris Angelico
import regex
# Make a regex.
... p = regex.compile(r'(?P<first>\w+)\s+(?P<second>\w+)')
Post by Russell E. Owen
Post by Chris Angelico
# What are the named groups?
... p.groupindex
{'second': 2, 'first': 1}
Post by Russell E. Owen
Post by Chris Angelico
# Perform a match.
... m = p.match('FIRST SECOND')
Post by Russell E. Owen
Post by Chris Angelico
m.groupdict()
{'second': 'SECOND', 'first': 'FIRST'}
Post by Russell E. Owen
Post by Chris Angelico
# Try modifying the regex.
... p.groupindex['JUNK'] = 'foobar'
Post by Russell E. Owen
Post by Chris Angelico
# What are the named groups now?
... p.groupindex
{'second': 2, 'first': 1}
Post by Russell E. Owen
Post by Chris Angelico
# And the match object?
... m.groupdict()
{'second': 'SECOND', 'first': 'FIRST'}

Using a frozendict instead would be a nicer solution.
R. David Murray
2014-07-16 13:37:55 UTC
Permalink
Post by MRAB
Here's another use-case.
import re
# Make a regex.
... p = re.compile(r'(?P<first>\w+)\s+(?P<second>\w+)')
# What are the named groups?
... p.groupindex
{'first': 1, 'second': 2}
# Perform a match.
... m = p.match('FIRST SECOND')
m.groupdict()
{'first': 'FIRST', 'second': 'SECOND'}
# Try modifying the pattern object.
... p.groupindex['JUNK'] = 'foobar'
# What are the named groups now?
... p.groupindex
{'first': 1, 'second': 2, 'JUNK': 'foobar'}
# And the match object?
... m.groupdict()
File "<stdin>", line 2, in <module>
IndexError: no such group
It can't find a named group called 'JUNK'.
IMO, preventing someone from shooting themselves in the foot by modifying
something they shouldn't modify according to the API is not a Python
use case ("consenting adults").
Post by MRAB
And with a bit more tinkering it's possible to crash Python. (I'll
leave that as an exercise for the reader! :-))
Preventing a Python program from being able to crash the interpreter,
that's a use case :)

--David
Devin Jeanpierre
2014-07-16 17:10:07 UTC
Permalink
Post by R. David Murray
IMO, preventing someone from shooting themselves in the foot by modifying
something they shouldn't modify according to the API is not a Python
use case ("consenting adults").
Then why have immutable objects at all? Why do you have to put tuples
and frozensets inside sets, instead of lists and sets? Compare with
Java, which really is "consenting adults" here -- you can add a
mutable object to a set, just don't mutate it, or you might not be
able to find it in the set again.

Several people seem to act as if the Pythonic way is to not allow for
any sort of immutable types at all. ISTM people are trying to
retroactively claim some standard of Pythonicity that never existed.
Python can and does protect you from shooting yourself in the foot by
making objects immutable. Or do you have another explanation for the
proliferation of immutable types, and the inability to add mutable
types to sets and dicts?

Using a frozendict to protect and enforce an invariant in the re
module is entirely reasonable. So is creating a new dict each time.
The intermediate -- reusing a mutable dict and failing in
incomprehensible ways if you mutate it, and potentially even crashing
due to memory safety issues -- is not Pythonic at all.

-- Devin
R. David Murray
2014-07-16 17:17:11 UTC
Permalink
Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 16 Jul 2014 19:17:12 +0200
Received: from webabinitio.net (unknown [173.13.127.166])
by mail.python.org (Postfix) with ESMTP
for <python-***@python.org>; Wed, 16 Jul 2014 19:17:12 +0200 (CEST)
Received: from localhost (mailhost.webabinitio.net [127.0.0.1])
by webabinitio.net (Postfix) with ESMTP id 1A9B4250DF6
for <python-***@python.org>; Wed, 16 Jul 2014 13:17:12 -0400 (EDT)
X-Virus-Scanned: amavisd-new at example.com
Received: from webabinitio.net ([127.0.0.1])
by localhost (mailhost.webabinitio.net [127.0.0.1]) (amavisd-new, port 10024)
with LMTP id WK14OhdNB2oB for <python-***@python.org>;
Wed, 16 Jul 2014 13:17:11 -0400 (EDT)
Received: from session.bitdance.com (unknown [192.168.0.103])
by webabinitio.net (Postfix) with ESMTP id BCFDB25017A
for <python-***@python.org>; Wed, 16 Jul 2014 13:17:11 -0400 (EDT)
In-reply-to: <CABicbJKDFdLs1+9M6OJZ=z-***@mail.gmail.com>
X-BeenThere: python-***@python.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Python core developers <python-dev.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-dev>,
<mailto:python-dev-***@python.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-dev/>
List-Post: <mailto:python-***@python.org>
List-Help: <mailto:python-dev-***@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-dev>,
<mailto:python-dev-***@python.org?subject=subscribe>
Errors-To: python-dev-bounces+python-python-dev=***@python.org
Sender: "Python-Dev"
<python-dev-bounces+python-python-dev=***@python.org>
Archived-At: <http://permalink.gmane.org/gmane.comp.python.devel/148716>
Post by Devin Jeanpierre
Post by R. David Murray
IMO, preventing someone from shooting themselves in the foot by modifying
something they shouldn't modify according to the API is not a Python
use case ("consenting adults").
Then why have immutable objects at all? Why do you have to put tuples
and frozensets inside sets, instead of lists and sets? Compare with
Java, which really is "consenting adults" here -- you can add a
mutable object to a set, just don't mutate it, or you might not be
able to find it in the set again.
Several people seem to act as if the Pythonic way is to not allow for
any sort of immutable types at all. ISTM people are trying to
retroactively claim some standard of Pythonicity that never existed.
Python can and does protect you from shooting yourself in the foot by
making objects immutable. Or do you have another explanation for the
proliferation of immutable types, and the inability to add mutable
types to sets and dicts?
Using a frozendict to protect and enforce an invariant in the re
module is entirely reasonable. So is creating a new dict each time.
The intermediate -- reusing a mutable dict and failing in
incomprehensible ways if you mutate it, and potentially even crashing
due to memory safety issues -- is not Pythonic at all.
You'll note I ended up agreeing with you there: when mutation breaks an
invariant of the object it came from, that's an issue. Which would be
the case if you could use mutable objects as keys.

--David

R. David Murray
2014-07-16 13:47:59 UTC
Permalink
Post by MRAB
# Try modifying the pattern object.
... p.groupindex['JUNK'] = 'foobar'
# What are the named groups now?
... p.groupindex
{'first': 1, 'second': 2, 'JUNK': 'foobar'}
# And the match object?
... m.groupdict()
File "<stdin>", line 2, in <module>
IndexError: no such group
It can't find a named group called 'JUNK'.
After I hit send on my previous message, I thought more about your
example. One of the issues here is that modifying the dict breaks an
invariant of the API. I have a similar situation in the email module,
and I used the same solution you did in regex: always return a new dict.
It would be nice to be able to return a frozendict instead of having the
overhead of building a new dict on each call. That by itself might not
be enough reason. But, if the user wants to use the data in modified form
elsewhere, they would then have to construct a new regular dict out of it,
making the decision to vary the data from what matches the state of the
object it came from an explicit one. That seems to fit the Python zen
("explicit is better than implicit").

So I'm changing my mind, and do consider this a valid use case, even
absent the crash.

--David
R. David Murray
2014-07-16 14:24:45 UTC
Permalink
Post by R. David Murray
It would be nice to be able to return a frozendict instead of having the
overhead of building a new dict on each call.
PyDictProxy_New() / types.MappingProxyType. It's a one line change in
Py_INCREF(self->dict)
return self->dict;
To
return PyDictProxy_New(self->dict);
return self.dct
To
return types.MappingProxyType(self.dct)
Which is cheaper than a copy, and avoids having to audit every use of
self->dict to ensure the semantics required for a "frozendict" are
respected, i.e. no mutation occurs after the dict becomes visible to the
user, and potentially has __hash__ called.
Post by R. David Murray
That by itself might not be enough reason. But, if the user wants to
use the data in modified form elsewhere, they would then have to
construct a new regular dict out of it, making the decision to vary
the data from what matches the state of the object it came from an
explicit one. That seems to fit the Python zen ("explicit is better
than implicit").
So I'm changing my mind, and do consider this a valid use case, even
absent the crash.
Avoiding crashes seems a better use for a read-only proxy, rather than a
hashable immutable type.
Good point. MappingProxyType wasn't yet exposed when I wrote that email
code.

--David
Eric Snow
2014-07-16 14:27:51 UTC
Permalink
Post by R. David Murray
After I hit send on my previous message, I thought more about your
example. One of the issues here is that modifying the dict breaks an
invariant of the API. I have a similar situation in the email module,
and I used the same solution you did in regex: always return a new dict.
It would be nice to be able to return a frozendict instead of having the
overhead of building a new dict on each call. That by itself might not
be enough reason. But, if the user wants to use the data in modified form
elsewhere, they would then have to construct a new regular dict out of it,
making the decision to vary the data from what matches the state of the
object it came from an explicit one. That seems to fit the Python zen
("explicit is better than implicit").
So I'm changing my mind, and do consider this a valid use case, even
absent the crash.
+1

A simple implementation is pretty straight-forward:

class FrozenDict(Mapping):
def __init__(self, *args, **kwargs):
self._map = dict(*args, **kwargs)
self._hash = ...
def __hash__(self):
return self._hash
def __len__(self):
return len(self._map)
def __iter__(self):
yield from self._map
def __getitem__(self, key):
return self._map[key]

This is actually something I've used before on a number of occasions.
Having it in the stdlib would be nice (though that alone is not
sufficient for inclusion :)). If there is a valid use case for a
frozen dict type in other stdlib modules, I'd consider that a pretty
good justification for adding it.

Incidentally, collections.abc.Mapping is the only one of the 6
container ABCs that does not have a concrete implementation (not
counting types.MappingProxyType which is only a proxy).

-eric
dw+
2014-07-16 14:04:29 UTC
Permalink
Post by R. David Murray
It would be nice to be able to return a frozendict instead of having the
overhead of building a new dict on each call.
There already is an in-between available both to Python and C:
PyDictProxy_New() / types.MappingProxyType. It's a one line change in
each case to return a temporary intermediary, using something like (C):
Py_INCREF(self->dict)
return self->dict;

To
return PyDictProxy_New(self->dict);

Or Python:
return self.dct

To
return types.MappingProxyType(self.dct)

Which is cheaper than a copy, and avoids having to audit every use of
self->dict to ensure the semantics required for a "frozendict" are
respected, i.e. no mutation occurs after the dict becomes visible to the
user, and potentially has __hash__ called.
Post by R. David Murray
That by itself might not be enough reason. But, if the user wants to
use the data in modified form elsewhere, they would then have to
construct a new regular dict out of it, making the decision to vary
the data from what matches the state of the object it came from an
explicit one. That seems to fit the Python zen ("explicit is better
than implicit").
So I'm changing my mind, and do consider this a valid use case, even
absent the crash.
Avoiding crashes seems a better use for a read-only proxy, rather than a
hashable immutable type.


David
Mark Roberts
2014-07-13 16:50:53 UTC
Permalink
I find it handy to use named tuple as my database mapping type. It allows you to perform this behavior seamlessly.

-Mark
I repeatedly run into situations where a frozendict would be useful, and every time I do, I go searching and find the (unfortunately rejected) PEP-416. I’d just like to share another case where having a frozendict in the stdlib would be useful to me.
res = [db.cases.remove({'_id': doc['_id']}) for doc in fives]
len(res)
206
I can see that the results are the same for the first two queries.
res[0]
{'n': 1, 'err': None, 'ok': 1.0}
res[1]
{'n': 1, 'err': None, 'ok': 1.0}
set(res)
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
I can’t do that because dict is unhashable. That’s reasonable, and if I had a frozen dict, I could easily work around this limitation and accomplish what I need.
set(map(frozendict, res))
File "<stdin>", line 1, in <module>
NameError: name 'frozendict' is not defined
PEP-416 mentions a MappingProxyType, but that’s no help.
res_ex = list(map(types.MappingProxyType, res))
set(res_ex)
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'mappingproxy'
I can achieve what I need by constructing a set on the ‘items’ of the dict.
set(tuple(doc.items()) for doc in res)
{(('n', 1), ('err', None), ('ok', 1.0))}
But that syntax would be nicer if the result had the same representation as the input (mapping instead of tuple of pairs). A frozendict would have readily enabled the desirable behavior.
Although hashability is mentioned in the PEP under constraints, there are many use-cases that fall out of the ability to hash a dict, such as the one described above, which are not mentioned at all in use-cases for the PEP.
If there’s ever any interest in reviving that PEP, I’m in favor of its implementation.
_______________________________________________
Python-Dev mailing list
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/wizzat%40gmail.com
dw+
2014-07-13 18:43:28 UTC
Permalink
PEP-416 mentions a MappingProxyType, but that’s no help.
Well, it kindof is. By combining MappingProxyType and UserDict the
desired effect can be achieved concisely:

import collections
import types

class frozendict(collections.UserDict):
def __init__(self, d, **kw):
if d:
d = d.copy()
d.update(kw)
else:
d = kw
self.data = types.MappingProxyType(d)

_h = None
def __hash__(self):
if self._h is None:
self._h = sum(map(hash, self.data.items()))
return self._h

def __repr__(self):
return repr(dict(self))
Although hashability is mentioned in the PEP under constraints, there are many
use-cases that fall out of the ability to hash a dict, such as the one
described above, which are not mentioned at all in use-cases for the PEP.
If there’s ever any interest in reviving that PEP, I’m in favor of its
implementation.
In its previous form, the PEP seemed more focused on some false
optimization capabilities of a read-only type, rather than as here, the
far more interesting hashability properties. It might warrant a fresh
PEP to more thoroughly investigate this angle.


David
dw+
2014-07-13 18:50:18 UTC
Permalink
Post by dw+
d = d.copy()
To cope with iterables, "d = d.copy()" should have read "d = dict(d)".


David
Nick Coghlan
2014-07-13 19:09:25 UTC
Permalink
Post by dw+
In its previous form, the PEP seemed more focused on some false
optimization capabilities of a read-only type, rather than as here, the
far more interesting hashability properties. It might warrant a fresh
PEP to more thoroughly investigate this angle.
RIght, the use case would be "frozendict as a simple alternative to a
full class definition", but even less structured than namedtuple in
that the keys may vary as well. That difference means that frozendict
applies more cleanly to semi-structured data manipulated as
dictionaries (think stuff deserialised from JSON) than namedtuple
does.

Cheers,
Nick.
--
Nick Coghlan | ***@gmail.com | Brisbane, Australia
Loading...