== on object tests identity in 3.x

Post by Andreas Maier
While discussing Python issue #12067
(http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4
implements '==' and '!=' on the object type such that if no special
equality test operations are implemented in derived classes, there is a
default implementation that tests for identity (as opposed to equality
of the values).
The relevant code is in function do_richcompare() in Objects/object.c.
IMHO, that default implementation contradicts the definition that '=='
and '!=' test for equality of the values of an object.
Python 2.x does not seem to have such a default implementation; == and
!= raise an exception if attempted on objects that don't implement
equality in derived classes.

Why do you think that?

% python
Python 2.7.6 (default, May 29 2014, 22:22:15)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

class x(object): pass

...

class y(object): pass

...

x != y

True

x == y

False

Chris Angelico

2014-07-07 15:22:54 UTC

Post by Benjamin Peterson
Why do you think that?
% python
Python 2.7.6 (default, May 29 2014, 22:22:15)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

class x(object): pass

...

class y(object): pass

...

x != y

True

x == y

False

Your analysis is flawed - you're testing the equality of the types,
not of instances. But your conclusion's correct; testing instances
does work the same way you're implying:

***@sikorsky:~$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

class x(object): pass

...

class y(object): pass

...

x() != y()

True

x() == y()

False

x() == x()

False

z = x()
z == z

True

ChrisA

Andreas Maier

2014-07-07 15:29:54 UTC

Post by Andreas Maier
Python 2.x does not seem to have such a default implementation; == and
!= raise an exception if attempted on objects that don't implement
equality in derived classes.

Why do you think that?

Because I looked at the source code of try_rich_compare() in object.c of
the 2.7 stream in the repository. Now, looking deeper into that module,
it turns out there is a whole number of variations of comparison
functions, so maybe I looked at the wrong one.

Instead of trying to figure out how they are called, it is probably
easier to just try it out, as you did. Your example certainly shows that
== between instances of type object returns a value.

So the Python 2.7 implementation shows the same discrepancy as Python
3.x regarding the == and != default implementation.

Does anyone know why?

Andy

Ethan Furman

2014-07-07 16:09:28 UTC

So the Python 2.7 implementation shows the same discrepancy as Python 3.x regarding the == and != default implementation.

Why do you see this as a discrepancy?

Just because two instances from the same object have the same value does not mean they are equal. For a real-life
example, look at twins: biologically identical, yet not equal.

looking-forward-to-the-rebuttal-mega-thread'ly yrs,
--
~Ethan~

Andreas Maier

2014-07-08 00:12:14 UTC

Post by Ethan Furman
Just because two instances from the same object have the same value
does not mean they are equal. For a real-life example, look at
twins: biologically identical, yet not equal.

I think they *are* equal in Python if they have the same value, by
definition, because somewhere the Python docs state that equality
compares the object's values.

The reality though is that value is more vague than equality test (as it
was already pointed out in this thread): A class designer can directly
implement what equality means to the class, but he or she cannot
implement an accessor method for the value. The value plays a role only
indirectly as part of equality and ordering tests.

Andy

Ethan Furman

2014-07-08 00:22:16 UTC

Post by Ethan Furman
Just because two instances from the same object have the same value does not mean they are equal. For a real-life
example, look at twins: biologically identical, yet not equal.

I think they *are* equal in Python if they have the same value, by definition, because somewhere the Python docs state
that equality compares the object's values.

And is personality of no value, then?

The reality though is that value is more vague than equality test (as it was already pointed out in this thread): A
class designer can directly implement what equality means to the class, but he or she cannot implement an accessor
method for the value. The value plays a role only indirectly as part of equality and ordering tests.

Not sure what you mean by this.

--
~Ethan~

Andreas Maier

2014-07-08 01:29:34 UTC

Post by Ethan Furman
Just because two instances from the same object have the same value
does not mean they are equal. For a real-life
example, look at twins: biologically identical, yet not equal.

I think they *are* equal in Python if they have the same value, by
definition, because somewhere the Python docs state
that equality compares the object's values.

And is personality of no value, then?

I guess you are pulling my leg, Ethan ... ;-)

But again, for a definition of equality between instances of a Python
class representing twins, one has to decide what attributes of the twins
are supposed to be part of that. If the designer of the class decides
that just the biology atributes are part of equality, fine. If he or she
decides that personality attributes are additionally part of equality,
also fine.

Post by Andreas Maier
The reality though is that value is more vague than equality test (as
it was already pointed out in this thread): A
class designer can directly implement what equality means to the
class, but he or she cannot implement an accessor
method for the value. The value plays a role only indirectly as part
of equality and ordering tests.

Not sure what you mean by this.

Equality has a precise implementation (and hence definition) in Python;
value does not.
So to argue that value and equality can be different, is moot in a way,
because it is not clear in Python what the value of an object is.

Andy

Stephen J. Turnbull

2014-07-08 01:51:51 UTC

A class designer can directly implement what equality means to the
class, but he or she cannot implement an accessor method for the
value.

Of course she can! What you mean to say, I think, is that Python does
not insist on an accessor method for the value. Ie, there is no dunder
method __value__ on instances of class object.

Xavier Morel

2014-07-07 15:58:39 UTC

That's incorrect on two levels:

1. What Terry notes in the bug comments is that because all Python 3
types inherit from object this can be done as a default __eq__/__ne__,
in Python 2 the fallback is encoded in the comparison framework
(PyObject_Compare and friends):
http://hg.python.org/cpython/file/01ec8bb7187f/Objects/object.c#l756
2. Unless comparison methods are overloaded and throw an error it will
always return either True or False (for comparison operator), never throw.

-> Can someone please elaborate what the reason for that is?
-> Where is the discrepancy between the documentation of == and its default implementation on object documented?
return True;
return False
raise ValueError("Equality cannot be determined in default implementation")

Why would comparing two objects of different types return False but
comparing two objects of the same type raise an error?

Andreas Maier

2014-07-07 16:11:07 UTC

1. What Terry notes in the bug comments is that because all Python 3
types inherit from object this can be done as a default __eq__/__ne__,
in Python 2 the fallback is encoded in the comparison framework
http://hg.python.org/cpython/file/01ec8bb7187f/Objects/object.c#l756
2. Unless comparison methods are overloaded and throw an error it will
always return either True or False (for comparison operator), never throw.

I was incorrect for Python 2.x.

Why would comparing two objects of different types return False

Because I think (but I'm not sure) that the type should play a role for
comparison of values. But maybe that does not embrace duck typing
sufficiently, and the type should be ignored by default for comparing
object values.

Post by Xavier Morel
but comparing two objects of the same type raise an error?

That I'm sure of: Because the default implementation (after having
exhausted all possibilities of calling __eq__ and friends) has no way to
find out whether the values(!!) of the objects are equal.

Andy

Jan Kaliszewski

2014-07-07 21:11:03 UTC

Received: from localhost (HELO mail.python.org) (127.0.0.1)
by albatross.python.org with SMTP; 07 Jul 2014 23:11:17 +0200
Received: from filifionka.chopin.edu.pl (unknown [195.187.82.235])
by mail.python.org (Postfix) with ESMTP
for <python-***@python.org>; Mon, 7 Jul 2014 23:11:17 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])
by filifionka.chopin.edu.pl (Postfix) with ESMTP id 7B56D221CDC
for <python-***@python.org>; Mon, 7 Jul 2014 23:11:11 +0200 (CEST)
X-Virus-Scanned: amavisd-new at chopin.edu.pl
Received: from filifionka.chopin.edu.pl ([127.0.0.1])
by localhost (suita.chopin.edu.pl [127.0.0.1]) (amavisd-new, port 10024)
with LMTP id mIc_P0B9twsq for <python-***@python.org>;
Mon, 7 Jul 2014 23:11:03 +0200 (CEST)
Received: from poczta.chopin.edu.pl (localhost [127.0.0.1])
(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested) (Authenticated sender: zuo)
by filifionka.chopin.edu.pl (Postfix) with ESMTPSA id D6B99394FB
for <python-***@python.org>; Mon, 7 Jul 2014 23:11:03 +0200 (CEST)
In-Reply-To: <***@gmx.de>
X-Sender: ***@chopin.edu.pl
User-Agent: AMFC webmail
X-BeenThere: python-***@python.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Python core developers <python-dev.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-dev>,
<mailto:python-dev-***@python.org?subject=unsubscribe>
List-Archive: <http://mail.python.org/pipermail/python-dev/>
List-Post: <mailto:python-***@python.org>
List-Help: <mailto:python-dev-***@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-dev>,
<mailto:python-dev-***@python.org?subject=subscribe>
Errors-To: python-dev-bounces+python-python-dev=***@python.org
Sender: "Python-Dev"
<python-dev-bounces+python-python-dev=***@python.org>
Archived-At: <http://permalink.gmane.org/gmane.comp.python.devel/148517>

[...]

Post by Andreas Maier
IMHO, that default implementation contradicts the definition that
'==' and '!=' test for equality of the values of an object.

[...]

Post by Andreas Maier
return True;
return False
raise ValueError("Equality cannot be determined in default implementation")

Why would comparing two objects of different types return False

Because I think (but I'm not sure) that the type should play a role
for comparison of values. But maybe that does not embrace duck typing
sufficiently, and the type should be ignored by default for comparing
object values.

Post by Xavier Morel
but comparing two objects of the same type raise an error?

That I'm sure of: Because the default implementation (after having
exhausted all possibilities of calling __eq__ and friends) has no way
to find out whether the values(!!) of the objects are equal.

IMHO, in Python context, "value" is a very vague term. Quite often we
can read it as the very basic (but not the only one) notion of "what
makes objects being equal or not" -- and then saying that "objects are
compared by value" is a tautology.

In other words, what object's "value" is -- is dependent on its nature:
e.g. the value of a list is what are the values of its consecutive
(indexed) items; the value of a set is based on values of all its
elements without notion of order or repetition; the value of a number is
a set of its abstract mathematical properties that determine what makes
objects being equal, greater, lesser, how particular arithmetic
operations work etc...

I think, there is no universal notion of "the value of a Python
object". The notion of identity seems to be most generic (every object
has it, event if it does not have any other property) -- and that's why
by default it is used to define the most basic feature of object's
*value*, i.e. "what makes objects being equal or not" (== and !=).
Another possibility would be to raise TypeError but, as Ethan Furman
wrote, it would be impractical (e.g. key-type-heterogenic dicts or sets
would be practically impossible to work with). On the other hand, the
notion of sorting order (< > <= >=) is a much more specialized object
property.

Cheers.
*j

Rob Cliffe

2014-07-07 21:31:55 UTC

[snip]
IMHO, in Python context, "value" is a very vague term. Quite often we
can read it as the very basic (but not the only one) notion of "what
makes objects being equal or not" -- and then saying that "objects are
compared by value" is a tautology.
In other words, what object's "value" is -- is dependent on its
nature: e.g. the value of a list is what are the values of its
consecutive (indexed) items; the value of a set is based on values of
all its elements without notion of order or repetition; the value of a
number is a set of its abstract mathematical properties that determine
what makes objects being equal, greater, lesser, how particular
arithmetic operations work etc...
I think, there is no universal notion of "the value of a Python
object". The notion of identity seems to be most generic (every
object has it, event if it does not have any other property) -- and
that's why by default it is used to define the most basic feature of
object's *value*, i.e. "what makes objects being equal or not" (== and
!=). Another possibility would be to raise TypeError but, as Ethan
Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts
or sets would be practically impossible to work with). On the other
hand, the notion of sorting order (< > <= >=) is a much more
specialized object property.

Quite so.

x, y = object(), object()
print 'Equal:', ' '.join(attr for attr in dir(x) if
getattr(x,attr)==getattr(y,attr))
print 'Unequal:', ' '.join(attr for attr in dir(x) if
getattr(x,attr)!=getattr(y,attr))

Equal: __class__ __doc__ __new__ __subclasshook__
Unequal: __delattr__ __format__ __getattribute__ __hash__ __init__
__reduce__ __reduce_ex__ __repr__ __setattr__ __sizeof__ __str__

Andreas, what attribute or combination of attributes do you think should
be the "values" of x and y?
Rob Cliffe

Ethan Furman

2014-07-07 15:55:08 UTC

Where is the discrepancy between the documentation of == and its default implementation on object documented?

There's seems to be no discrepancy (at least, you have not shown it), but to answer the question about why the default
equals operation is an identity test:

- all objects should be equal to themselves (there is only one that isn't, and it's weird)

- equality tests should not, as a general rule, raise exceptions -- they should return True or False

--
~Ethan~

Andreas Maier

2014-07-07 16:56:10 UTC

Post by Andreas Maier
Where is the discrepancy between the documentation of == and its
default implementation on object documented?

There's seems to be no discrepancy (at least, you have not shown it),

The documentation states consistently that == tests the equality of the
value of an object. The default implementation of == in both 2.x and 3.x
tests the object identity. Is that not a discrepancy?

Post by Ethan Furman
but to answer the question about why the default equals operation is an
- all objects should be equal to themselves (there is only one that
isn't, and it's weird)

I agree. But that is not a reason to conclude that different objects (as
per their identity) should be unequal. Which is what the default
implementation does.

Post by Ethan Furman
- equality tests should not, as a general rule, raise exceptions --
they should return True or False

Why not? Ordering tests also raise exceptions if ordering is not
implemented.

Andy

Ethan Furman

2014-07-07 17:43:34 UTC

Post by Andreas Maier
Where is the discrepancy between the documentation of == and its
default implementation on object documented?

There's seems to be no discrepancy (at least, you have not shown it),

The documentation states consistently that == tests the equality of the value of an object. The default implementation
of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy?

One could say that the value of an object is the object itself. Since different objects are different, then they are
not equal.

Post by Ethan Furman
but to answer the question about why the default equals operation is an
- all objects should be equal to themselves (there is only one that
isn't, and it's weird)

I agree. But that is not a reason to conclude that different objects (as per their identity) should be unequal. Which is
what the default implementation does.

Python cannot know which values are important in an equality test, and which are not. So it refuses to guess.

Think of a chess board, for example. Are any two black pawns equal? All 16 pawns came from the same Pawn class, the
only differences would be in the color and position, but the movement type is the same for all.

So equality for a pawn might mean the same color, or it might mean color and position, or it might mean can move to the
same position... it's up to the programmer to decide which of the possibilities is the correct one. Quite frankly, have
equality mean identity in this case also makes a lot of sense.

Post by Ethan Furman
- equality tests should not, as a general rule, raise exceptions --
they should return True or False

Why not? Ordering tests also raise exceptions if ordering is not implemented.

Besides the pawn example, this is probably a matter of practicality over purity -- equality tests are used extensively
through-out Python, and having exceptions raised at possibly any moment would not be a fun nor productive environment.

Ordering is much less frequent, and since we already tried always ordering things, falling back to type name if
necessary, we have discovered that that is not a good trade-off. So now if one tries to order things without specifying
how it should be done, one gets an exception.

--
~Ethan~

Andreas Maier

2014-07-07 23:36:25 UTC

Post by Andreas Maier
Where is the discrepancy between the documentation of == and its
default implementation on object documented?

There's seems to be no discrepancy (at least, you have not shown it),

The documentation states consistently that == tests the equality of
the value of an object. The default implementation
of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy?

One could say that the value of an object is the object itself. Since
different objects are different, then they are not equal.

Post by Ethan Furman
but to answer the question about why the default equals operation is an
- all objects should be equal to themselves (there is only one that
isn't, and it's weird)

I agree. But that is not a reason to conclude that different objects
(as per their identity) should be unequal. Which is
what the default implementation does.

Python cannot know which values are important in an equality test, and
which are not. So it refuses to guess.

Well, one could argue that using the address of an object for its value
equality test is pretty close to guessing, considering that given a
sensible definition of value equality, objects of different identity can
very well be equal but will always be considered unequal based on the
address.

Think of a chess board, for example. Are any two black pawns equal?
All 16 pawns came from the same Pawn class, the only differences would
be in the color and position, but the movement type is the same for all.
So equality for a pawn might mean the same color, or it might mean
color and position, or it might mean can move to the same position...
it's up to the programmer to decide which of the possibilities is the
correct one. Quite frankly, have equality mean identity in this case
also makes a lot of sense.

That's why I think equality is only defined once the class designer has
defined it. Using the address as a default for equality (that is, in
absence of such a designer's definition) may be an easy-to-implement
default, but not a very logical or sensible one.

Post by Ethan Furman
- equality tests should not, as a general rule, raise exceptions --
they should return True or False

Why not? Ordering tests also raise exceptions if ordering is not implemented.

So we have many cases of classes whose designers thought about whether a
sensible definition of equality was needed, and decided that an
address/identity-based equality definition was just what they needed,
yet they did not want to or could not use the "is" operator?

Can you give me an example for such a class (besides type object)? (I.e.
a class that does not have __eq__() and __ne__() but whose instances are
compared with == or !=)

Ordering is much less frequent, and since we already tried always
ordering things, falling back to type name if necessary, we have
discovered that that is not a good trade-off. So now if one tries to
order things without specifying how it should be done, one gets an
exception.

In Python 2, the default ordering implementation on type object uses the
identity (address) as the basis for ordering. In Python 3, that was
changed to raise an exception. That seems to be in sync with what you
are saying.

Maybe it would have been possible to also change that for the default
equality implementation in Python 3. But it was not changed. As I wrote
in another response, we now need to document this properly.

Benjamin Peterson

2014-07-07 23:49:40 UTC

Post by Andreas Maier
Where is the discrepancy between the documentation of == and its
default implementation on object documented?

There's seems to be no discrepancy (at least, you have not shown it),

The documentation states consistently that == tests the equality of
the value of an object. The default implementation
of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy?

One could say that the value of an object is the object itself. Since
different objects are different, then they are not equal.

Post by Ethan Furman
but to answer the question about why the default equals operation is an
- all objects should be equal to themselves (there is only one that
isn't, and it's weird)

I agree. But that is not a reason to conclude that different objects
(as per their identity) should be unequal. Which is
what the default implementation does.

Python cannot know which values are important in an equality test, and
which are not. So it refuses to guess.

Probably the best argument for the behavior is that "x is y" should
imply "x == y", which preludes raising an exception. No such invariant
is desired for ordering, so default implementations of < and > are not
provided in Python 3.

Andreas Maier

2014-07-07 23:55:55 UTC

Post by Andreas Maier
Where is the discrepancy between the documentation of == and its
default implementation on object documented?

There's seems to be no discrepancy (at least, you have not shown it),

The documentation states consistently that == tests the equality of
the value of an object. The default implementation
of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy?

One could say that the value of an object is the object itself. Since
different objects are different, then they are not equal.

Post by Ethan Furman
but to answer the question about why the default equals operation is an
- all objects should be equal to themselves (there is only one that
isn't, and it's weird)

I agree. But that is not a reason to conclude that different objects
(as per their identity) should be unequal. Which is
what the default implementation does.

Python cannot know which values are important in an equality test, and
which are not. So it refuses to guess.

I agree that "x is y" should imply "x == y".
The problem of the default implementation is that "x is not y" implies
"x != y" and that may or may not be true under a sensible definition of
equality.

Stephen J. Turnbull

2014-07-08 01:44:40 UTC

Post by Andreas Maier
The problem of the default implementation is that "x is not y"
implies "x != y" and that may or may not be true under a sensible
definition of equality.

I noticed this a long time ago and just decided it was covered by
"consenting adults". That is, if the "sensible definition" of x == y
is such that it can be true simultaneously with x != y, it's the
programmer's responsibility to notice that, and to provide an
implementation. But there's no issue that lack of an explicit
implementation of comparison causes a program to have ambiguous
meaning.

I also consider that for "every object has a value" to make sense as a
description of Python, that value must be representable by an object.
The obvious default representation for the value of any object is the
object itself!

Now, for this purpose you don't need a "canonical representation" of
an object's value. In particular, equality comparisons need not
explicitly construct a representative object. Some do, some don't, I
would suppose. For example, in comparing an integer with a float, I
would convert the integer to float and compare, but in comparing float
and complex I would check the complex for x.im == 0.0, and if true,
return the value of x.re == y.

I'm not sure how you interpret "value" to find the behavior of Python
(the default comparison) problematic. I suspect you'd have a hard
time coming up with an interpretation consistent with Python's object
orientation.

That said, it's probably worth documenting, but I don't know how much
of the above should be introduced into the documentation.

Steve

Ethan Furman

2014-07-07 23:52:17 UTC

Post by Benjamin Peterson
Probably the best argument for the behavior is that "x is y" should
imply "x == y", which preludes raising an exception. No such invariant
is desired for ordering, so default implementations of < and > are not
provided in Python 3.

Nice. This bit should definitely make it into the doc patch if not already in the docs.

--
~Ethan~

Steven D'Aprano

2014-07-08 01:58:33 UTC

Nice. This bit should definitely make it into the doc patch if not already in the docs.

However, saying this should not preclude classes where this is not the
case, e.g. IEEE-754 NANs. I would not like this wording (which otherwise
is very nice) to be used in the future to force reflexivity on object
equality.

https://en.wikipedia.org/wiki/Reflexive_relation

To try to cut off arguments:

- Yes, it is fine to have the default implementation of __eq__
assume reflexivity.

- Yes, it is fine for standard library containers (lists, dicts,
etc.) to assume reflexivity of their items.

- I'm fully aware that some people think the non-reflexivity of
NANs is logically nonsensical and a mistake. I do not agree
with them.

- I'm not looking to change anything here, the current behaviour
is fine, I just want to ensure that an otherwise admirable doc
change does not get interpreted in the future in a way that
prevents classes from defining __eq__ to be non-reflexive.

--
Steven

Ethan Furman

2014-07-08 02:25:58 UTC

Nice. This bit should definitely make it into the doc patch if not already in the docs.

However, saying this should not preclude classes where this is not the
case, e.g. IEEE-754 NANs. I would not like this wording (which otherwise
is very nice) to be used in the future to force reflexivity on object
equality.
https://en.wikipedia.org/wiki/Reflexive_relation
- Yes, it is fine to have the default implementation of __eq__
assume reflexivity.
- Yes, it is fine for standard library containers (lists, dicts,
etc.) to assume reflexivity of their items.
- I'm fully aware that some people think the non-reflexivity of
NANs is logically nonsensical and a mistake. I do not agree
with them.
- I'm not looking to change anything here, the current behaviour
is fine, I just want to ensure that an otherwise admirable doc
change does not get interpreted in the future in a way that
prevents classes from defining __eq__ to be non-reflexive.

Andreas Maier

2014-07-11 14:23:59 UTC

I like the motivation provided by Benjamin and will work it into the
doc patch for issue #12067. The NaN special case
will also stay in.

Cool -- you should nosy myself, D'Aprano, and Benjamin (at least) on
that issue.

Done.

Plus, I have uploaded a patch (v8) to issue #12067, that reflects
hopefully everything that was said (to the extent it was related to
comparisons).

Andy

Ethan Furman

2014-07-07 23:50:57 UTC

Post by Ethan Furman
Python cannot know which values are important in an equality test, and which are not. So it refuses to guess.

Well, one could argue that using the address of an object for its value equality test is pretty close to guessing,
considering that given a sensible definition of value equality, objects of different identity can very well be equal but
will always be considered unequal based on the address.

And what would be this 'sensible definition'?

1) The address of the object is irrelevant. While that is what CPython uses, it is not what every Python uses.

2) The 'is' operator is specialized, and should only rarely be needed. If equals is what you mean, use '=='.

3) If Python forced us to write our own __eq__ /for every single class/ what would happen? Well, I suspect quite a few
would make their own 'object' to inherit from, and would have the fallback of __eq__ meaning object identity.
Practicality beats purity.

Can you give me an example for such a class (besides type object)? (I.e. a class that does not have __eq__() and
__ne__() but whose instances are compared with == or !=)

I never add __eq__ to my classes until I come upon a place where I need to check if two instances of those classes are
'equal', for whatever I need equal to mean in that case.

Post by Ethan Furman
Ordering is much less frequent, and since we already tried always ordering things, falling back to type name if
necessary, we have discovered that that is not a good trade-off. So now if one tries to order things without
specifying how it should be done, one gets an exception.

In Python 2, the default ordering implementation on type object uses the identity (address) as the basis for ordering.
In Python 3, that was changed to raise an exception. That seems to be in sync with what you are saying.
Maybe it would have been possible to also change that for the default equality implementation in Python 3. But it was
not changed. As I wrote in another response, we now need to document this properly.

Doc patches are gratefully accepted. :)

--
~Ethan~

Andreas Maier

2014-07-08 01:18:16 UTC

Post by Ethan Furman
Python cannot know which values are important in an equality test,
and which are not. So it refuses to guess.

Well, one could argue that using the address of an object for its
value equality test is pretty close to guessing,
considering that given a sensible definition of value equality,
objects of different identity can very well be equal but
will always be considered unequal based on the address.

And what would be this 'sensible definition'?

One that only a class designer can define. That's why I argued for
raising an exception if that is not defined.

But as I stated elsewhere in this thread: It is as it is, and we need to
document it.

Post by Andreas Maier
So we have many cases of classes whose designers thought about
whether a sensible definition of equality was needed, and
decided that an address/identity-based equality definition was just
what they needed, yet they did not want to or could
not use the "is" operator?

1) The address of the object is irrelevant. While that is what
CPython uses, it is not what every Python uses.
2) The 'is' operator is specialized, and should only rarely be
needed. If equals is what you mean, use '=='.
3) If Python forced us to write our own __eq__ /for every single
class/ what would happen? Well, I suspect quite a few would make
their own 'object' to inherit from, and would have the fallback of
__eq__ meaning object identity. Practicality beats purity.

Post by Andreas Maier
Can you give me an example for such a class (besides type object)?
(I.e. a class that does not have __eq__() and
__ne__() but whose instances are compared with == or !=)

I never add __eq__ to my classes until I come upon a place where I
need to check if two instances of those classes are 'equal', for
whatever I need equal to mean in that case.

With that strategy, you would not be hurt if the default implementation
raised an exception in case the two objects are not identical. ;-)

Post by Ethan Furman
Ordering is much less frequent, and since we already tried always
ordering things, falling back to type name if
necessary, we have discovered that that is not a good trade-off. So
now if one tries to order things without
specifying how it should be done, one gets an exception.

In Python 2, the default ordering implementation on type object uses
the identity (address) as the basis for ordering.
In Python 3, that was changed to raise an exception. That seems to be
in sync with what you are saying.
Maybe it would have been possible to also change that for the default
equality implementation in Python 3. But it was
not changed. As I wrote in another response, we now need to document this properly.

Doc patches are gratefully accepted. :)

Understood. I will be working on it. :-)

Andy

Ethan Furman

2014-07-08 02:29:17 UTC

Post by Ethan Furman
I never add __eq__ to my classes until I come upon a place where I need to check if two instances of those classes are
'equal', for whatever I need equal to mean in that case.

With that strategy, you would not be hurt if the default implementation raised an exception in case the two objects are
not identical. ;-)

Yes, I would. Not identical means not equal until I say otherwise. Raising an exception instead of returning False
(for __eq__) would be horrible.

--
~Ethan~

Stephen J. Turnbull

2014-07-08 03:34:33 UTC

And what would be this 'sensible definition' [of value equality]?

I think that's the wrong question. I suppose Andreas's point is that
when the programmer doesn't provide a definition, there is no such
thing as a "sensible definition" to default to. I disagree, but given
that as the point of discussion, asking what the definition is, is moot.

2) The 'is' operator is specialized, and should only rarely be
needed.

Nitpick: Except that it's the preferred way to express identity with
singletons, AFAIK. ("if x is None: ...", not "if x == None: ...".)

Ethan Furman

2014-07-08 03:47:23 UTC

And what would be this 'sensible definition' [of value equality]?

He eventually made that point, but until he did I thought he meant that there was such a sensible default definition, he
just wasn't sharing what he thought it might be with us.

2) The 'is' operator is specialized, and should only rarely be
needed.

Nitpick: Except that it's the preferred way to express identity with
singletons, AFAIK. ("if x is None: ...", not "if x == None: ...".)

Not a nit at all, at least in my code -- the number of times I use '==' far outweighs the number of times I use 'is'.
Thus, 'is' is rare.

(Now, of course, I'll have to go measure that assertion and probably find out I am wrong :/ ).

--
~Ethan~

Andreas Maier

2014-07-11 14:10:47 UTC

And what would be this 'sensible definition' [of value equality]?

He eventually made that point, but until he did I thought he meant that
there was such a sensible default definition, he just wasn't sharing
what he thought it might be with us.

My main point is that a sensible definition is up to the class designer,
so (all freedom at hand) would prefer an exception as default. But that
cannot be changed at this point, and maybe never will. And I don't
intend to stir up that discussion again.

I dropped my other point about a better default comparison (i.e. one
with a result, not an exceptioN). It is not easy to define one unless
one comes to types such as sequences or integral types, and they in fact
have defined their own customizations for comparison.

Bottom line: I'm fine with just a doc patch, and a testcase improvement :-)

Andy

Terry Reedy

2014-07-07 18:20:42 UTC

A discrepancy between code and doc can be solved by changing either the
code or doc. This is a case where the code should not change (for back
compatibility with long standing behavior, if nothing else) and the doc
should.

--
Terry Jan Reedy

Andreas Maier

2014-07-07 23:37:09 UTC

Post by Jan Kaliszewski

[...]

Post by Andreas Maier
IMHO, that default implementation contradicts the definition that
'==' and '!=' test for equality of the values of an object.

[...]

Post by Andreas Maier
To me, a sensible default implementation for == on object would be
return True;
return False
raise ValueError("Equality cannot be determined in default
implementation")

Why would comparing two objects of different types return False

Because I think (but I'm not sure) that the type should play a role
for comparison of values. But maybe that does not embrace duck typing
sufficiently, and the type should be ignored by default for comparing
object values.

Post by Xavier Morel
but comparing two objects of the same type raise an error?

That I'm sure of: Because the default implementation (after having
exhausted all possibilities of calling __eq__ and friends) has no way
to find out whether the values(!!) of the objects are equal.

IMHO, in Python context, "value" is a very vague term. Quite often we
can read it as the very basic (but not the only one) notion of "what
makes objects being equal or not" -- and then saying that "objects are
compared by value" is a tautology.
In other words, what object's "value" is -- is dependent on its
nature: e.g. the value of a list is what are the values of its
consecutive (indexed) items; the value of a set is based on values of
all its elements without notion of order or repetition; the value of a
number is a set of its abstract mathematical properties that determine
what makes objects being equal, greater, lesser, how particular
arithmetic operations work etc...
I think, there is no universal notion of "the value of a Python
object". The notion of identity seems to be most generic (every
object has it, event if it does not have any other property) -- and
that's why by default it is used to define the most basic feature of
object's *value*, i.e. "what makes objects being equal or not" (== and
!=). Another possibility would be to raise TypeError but, as Ethan
Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts
or sets would be practically impossible to work with). On the other
hand, the notion of sorting order (< > <= >=) is a much more
specialized object property.

On the universal notion of a value in Python: In both 2.x and 3.x, it
reads (in 3.1. Objects, values and types):
- "Every object has an identity, a type and a value."
- "An object's /identity/ never changes once it has been created; ....
The /value/ of some objects can change. Objects whose value can change
are said to be /mutable/; objects whose value is unchangeable once they
are created are called /immutable/."

These are clear indications that there is an intention to have separate
concepts of identity and value in Python. If an instance of type object
can exist but does not have a universal notion of value, it should not
allow operations that need a value.

I do not really buy into the arguments that try to show how identity and
value are somehow the same. They are not, not even in Python.

The argument I can absolutely buy into is that the implementation cannot
be changed within a major release. So the real question is how we
document it.

I'll try to summarize in a separate posting.

Andy

Rob Cliffe

2014-07-08 01:59:30 UTC

Post by Andreas Maier
[...]

Post by Jan Kaliszewski
IMHO, in Python context, "value" is a very vague term. Quite often
we can read it as the very basic (but not the only one) notion of
"what makes objects being equal or not" -- and then saying that
"objects are compared by value" is a tautology.
In other words, what object's "value" is -- is dependent on its
nature: e.g. the value of a list is what are the values of its
consecutive (indexed) items; the value of a set is based on values of
all its elements without notion of order or repetition; the value of
a number is a set of its abstract mathematical properties that
determine what makes objects being equal, greater, lesser, how
particular arithmetic operations work etc...
I think, there is no universal notion of "the value of a Python
object". The notion of identity seems to be most generic (every
object has it, event if it does not have any other property) -- and
that's why by default it is used to define the most basic feature of
object's *value*, i.e. "what makes objects being equal or not" (==
and !=). Another possibility would be to raise TypeError but, as
Ethan Furman wrote, it would be impractical (e.g.
key-type-heterogenic dicts or sets would be practically impossible to
work with). On the other hand, the notion of sorting order (< > <=

=) is a much more specialized object property.

+1. See below.

Post by Andreas Maier
On the universal notion of a value in Python: In both 2.x and 3.x, it
- "*Every object has an identity, a type and a value.*"

Hm, is that *really* true?
Every object has an identity and a type, sure.
Every *variable* has a value, which is an object (an instance of some
class). (I think? :-) )
But ISTM that the notion of the value of an *object* exists more in our
minds than in Python. We say that number and string objects have a
value because the concepts of number and string, including how to
compare them, are intuitive for us, and these objects by design reflect
our concepts with some degree of fidelity. Ditto for lists,
dictionaries and sets which are only slightly less intuitive.

If I came across an int object and had no concept of what an integer
number was, how would I know what its "value" is supposed to be?
If I'm given an int object, "i", say, and pretend I don't know what an
integer is, I see that
len(dir(i)) == 64 # Python 2.7
(and there may be attributes that dir doesn't show).
How can I know from this bewildering list of 64 attributes (say they
were all written in Swahili) that I can obtain the "real" (pun not
intended) "value" with
i.real
or possibly
i.numerator
or
i.__str__()
or maybe somewhere else? ISTM "value" is a convention between humans,
not something intrinsic to a class definition. Or at best something
that is implied by the implementation of the comparison (or other)
operators in the class.

And can the following *objects* (class instances) be said to have a
(obvious) value?
obj1 = object()
def obj2(): pass
obj3 = (x for x in range(3))
obj4 = xrange(4)
And is there any sensible way of comparing two such similar objects, e.g.
obj3 = (x for x in range(3))
obj3a = (x for x in range(3))
except by id?
Well, possibly in some cases. You might define two functions as equal
if their code objects are identical (I'm outside my competence here, so
please no-one correct me if I've got the technical detail wrong). But I
don't see how you can compare two generators (other than by id) except
by calling them both destructively (possibly an infinite number of
times, and hoping that neither has unpredictable behaviour, side
effects, etc.).
As has already been said (more or less) in this thread, if you want to
be able to compare any two objects of the same type, and not by id, you
probably end up with a circular definition of "value" as "that (function
of an object's attributes) which is compared". Which is ultimately an
implementation decision for each type, not anything intrinsic to the type.
So it makes sense to consistently fall back on id when nothing else
obvious suggests itself.

Post by Andreas Maier
- "An object's /identity/ never changes once it has been created; ....
The /value/ of some objects can change. Objects whose value can change
are said to be /mutable/; objects whose value is unchangeable once
they are created are called /immutable/."

ISTM it needs to be explicitly documented for each class what the
"value" of an instance is intended to be. Oh, I'm being pedantic here,
sure. But I wonder if enforcing it would lead to more clarity of
thought (maybe even the realisation that some objects don't have a
value?? :-) ).

Post by Andreas Maier
These are clear indications that there is an intention to have
separate concepts of identity and value in Python. If an instance of
type object can exist but does not have a universal notion of value,
it should not allow operations that need a value.

As Jan says, this would make comparing container objects a pain.

Apologies if this message is a bit behind the times. There have been
about 10 contributions since I started composing this!
Best wishes,
Rob Cliffe

[...]

Chris Angelico

2014-07-08 02:15:27 UTC

If I came across an int object and had no concept of what an integer number
was, how would I know what its "value" is supposed to be?

The value of an integer is the number it represents. In CPython, it's
entirely possible to have multiple integer objects (ie objects with
unique identities) with the same value, although AIUI there are
Pythons for which that's not the case. The value of a float, Fraction,
Decimal, or complex is also the number it represents, so when you
compare 1==1.0, the answer is that they have the same value. They
can't possibly have the same identity (every object has a single
type), but they have the same value. But what *is* that value? It's
not something that can be independently recognized, because casting to

i = 2**53+1
f = float(i)
i == f

False

f == int(f)

True

Ergo the comparison of a float to an int cannot be done by casting the
int to float, nor by casting the float to int; it has to be done by
comparing the abstract numbers represented. Those are the objects'
values.

But what's the value of a sentinel object?

_SENTINEL = object()
def f(x, y=_SENTINEL):
do_something_with(x)
if y is not _SENTINEL: do_something_with(y)

I'd say this is a reasonable argument for the default object value to
be identity.

ChrisA

Steven D'Aprano

2014-07-08 03:12:02 UTC

Post by Andreas Maier
- "*Every object has an identity, a type and a value.*"

Hm, is that *really* true?

Yes. It's pretty much true by definition: objects are *defined* to have
an identity, type and value, even if that value is abstract rather than
concrete.

Post by Rob Cliffe
Every object has an identity and a type, sure.
Every *variable* has a value, which is an object (an instance of some
class). (I think? :-) )

I don't think so. Variables can be undefined, which means they don't
have a value:

py> del x
py> print x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

Post by Rob Cliffe
But ISTM that the notion of the value of an *object* exists more in our
minds than in Python.

Pretty much. How could it be otherwise? Human beings define the
semantics of objects, that is, their value, not Python.

[...]

Post by Rob Cliffe
If I came across an int object and had no concept of what an integer
number was, how would I know what its "value" is supposed to be?

You couldn't, any more than you would know what the value of a Watzit
object was if you knew nothing about Watzits. The value of an object is
intimitely tied to its semantics, what the object represents and what it
is intended to be used for. In general, we can say nothing about the
value of an object until we've read the documentation for the object.

But we can be confident that the object has *some* value, otherwise what
would be the point of it? In some cases, that value might be nothing
more than it's identity, but that's okay.

I think the problem we're having here is that some people are looking
for a concrete definition of what the value of an object is, but there
isn't one.

[...]

Post by Rob Cliffe
And can the following *objects* (class instances) be said to have a
(obvious) value?
obj1 = object()
def obj2(): pass
obj3 = (x for x in range(3))
obj4 = xrange(4)

The value as understood by a human reader, as opposed to the value as
assumed by Python, is not necessarily the same. As far as Python is
concerned, the value of all four objects is the object itself, i.e. its
identity. (For avoidance of doubt, not its id(), which is just a
number.)

A human reader could infer more than Python:

- the second object is a "do nothing" function;
- the third object is a lazy sequence (0, 1, 2);
- the fourth object is a lazy sequence (0, 1, 2, 3);

but since the class designer didn't deem it important enough, or
practical enough, to implement an __eq__ method that takes those things
into account, *for the purposes of equality* (but perhaps not other
purposes) we say that the value is just the object itself, its identity.

Post by Rob Cliffe
And is there any sensible way of comparing two such similar objects, e.g.
obj3 = (x for x in range(3))
obj3a = (x for x in range(3))
except by id?

In principle, one might peer into the two generators and note that they
perform exactly the same computations on exactly the same input, and
therefore should be deemed to have the same value. But since that's
hard, and "exactly the same" is not always well-defined, Python doesn't
try to be too clever and just uses a simpler idea: the value is the
object itself.

Post by Rob Cliffe
Well, possibly in some cases. You might define two functions as equal
if their code objects are identical (I'm outside my competence here, so
please no-one correct me if I've got the technical detail wrong). But I
don't see how you can compare two generators (other than by id) except
by calling them both destructively (possibly an infinite number of
times, and hoping that neither has unpredictable behaviour, side
effects, etc.).

Generator objects have code objects as well.

py> x = (a for a in (1, 2))
py> x.gi_code
<code object <genexpr> at 0xb7ee39f8, file "<stdin>", line 1>

ISTM it needs to be explicitly documented for each class what the
"value" of an instance is intended to be.

Why? What value (pun intended) is there in adding an explicit statement
of value to every single class?

"The value of a str is the str's sequence of characters."
"The value of a list is the list's sequence of items."
"The value of an int is the int's numeric value."
"The value of a float is the float's numeric value, or in the case of
INFs and NANs, that they are an INF or NAN."
"The value of a complex number is the ordered pair of its real and
imaginary components."
"The value of a re MatchObject is the MatchObject itself."

I don't see any benefit to forcing all classes to explicitly document
this sort of thing. It's nearly always redundant and unnecessary.

--
Steven

Chris Angelico

2014-07-08 03:31:46 UTC

Post by Steven D'Aprano
Why? What value (pun intended) is there in adding an explicit statement
of value to every single class?
"The value of a str is the str's sequence of characters."
"The value of a list is the list's sequence of items."
"The value of an int is the int's numeric value."
"The value of a float is the float's numeric value, or in the case of
INFs and NANs, that they are an INF or NAN."
"The value of a complex number is the ordered pair of its real and
imaginary components."
"The value of a re MatchObject is the MatchObject itself."
I don't see any benefit to forcing all classes to explicitly document
this sort of thing. It's nearly always redundant and unnecessary.

It's important where it's not obvious. For instance, two lists with
the same items are equal, two tuples with the same items are equal,
but a list and a tuple with the same items aren't. Doesn't mean it
necessarily has to be documented, though.

ChrisA

Rob Cliffe

2014-07-08 04:02:39 UTC

This post might be inappropriate. Click to display it.

Stephen J. Turnbull

2014-07-08 07:01:00 UTC

Post by Steven D'Aprano
Why? What value (pun intended) is there in adding an explicit statement
of value to every single class?

It troubles me a bit that "value" seems to be a fuzzy concept - it has
an obvious meaning for some types (int, float, list etc.) but for
callable objects you tell me that their value is the object itself,

Value is *abstract* and implicit, but not fuzzy: it's what you compare
when you test for equality. It's abstract in the sense that "inside
of Python" an object's value has to be an object (everything is an
object). Now, the question is "do we need a canonical representation
of objects' values?" Ie, do we need a mapping from from every object
conceivable within Python to a specific object that is its value?
Since Python generally allows, even prefers, duck-typing, the answer
presumably is "no". (Maybe you can think of Python programs you'd
like to write where the answer is "yes", but I don't have any
examples.) And in fact there is no such mapping in Python.

So the answer I propose is that an object's value needs a
representation in Python, but that representation doesn't need to be
unique. Any object is a representation of its own value, and if you
need two different objects to be equal to each other, you must define
their __eq__ methods to produce that result.

This (the fact that any object represents its value, and so can be
used as "the" standard of comparison for that value) is why it's so
important that equality be reflexive, symmetric, and transitive, and
why we really want to be careful about creating objects like NaN whose
definition is "my value isn't a value", and therefore "a = float('NaN');
a == a" evaluates to False.

I agree with Steven d'A that this rule is not part of the language
definition and shouldn't be, but it's the rule of thumb I find hardest
to imagine *ever* wanting to break in my own code (although I sort of
understand why the IEEE 754 committee found they had to).

Post by Rob Cliffe
How can we say if an object is mutable if we don't know what its value is?

Mutability is a different question. You can define a class whose
instances have mutable attributes but are nonetheless all compare
equal regardless of the contents of those attributes.

OTOH, the test for mutability to try to mutate it. If that doesn't
raise, it's mutable.

Steve

Chris Angelico

2014-07-08 07:09:27 UTC

Post by Stephen J. Turnbull
I agree with Steven d'A that this rule is not part of the language
definition and shouldn't be, but it's the rule of thumb I find hardest
to imagine *ever* wanting to break in my own code (although I sort of
understand why the IEEE 754 committee found they had to).

The reason NaN isn't equal to itself is because there are X bit
patterns representing NaN, but an infinite number of possible
non-numbers that could result from a calculation. Is
float("inf")-float("inf") equal to float("inf")/float("inf")? There
are three ways NaN equality could have been defined:

1) All NaNs are equal, as if NaN is some kind of "special number".
2) NaNs are equal if they have the exact same bit pattern, and unequal else.
3) All NaNs are unequal, even if they have the same bit pattern.

The first option is very dangerous, because it'll mean that "NaN
pollution" can actually result in unexpected equality. The second
looks fine - a NaN is equal to itself, for instance - but it suffers
from the pigeonhole problem, in that eventually you'll have two
numbers which resulted from different calculations and happen to have
the same bit pattern. The third is what IEEE went with. It's the
sanest option.

ChrisA

Stephen J. Turnbull

2014-07-08 07:53:50 UTC

Post by Chris Angelico
The reason NaN isn't equal to itself is because there are X bit
patterns representing NaN, but an infinite number of possible
non-numbers that could result from a calculation.

I understand that. But you're missing at least two alternatives that
involve raising on some calculations involving NaN, as well as the
fact that forcing inequality of two NaNs produced by equivalent
calculations is arguably just as wrong as allowing equality of two
NaNs produced by the different calculations. That's where things get
fuzzy for me -- in Python I would expect that preserving invariants
would be more important than computational efficiency, but evidently
it's not. I assume that I would have a better grasp on why Python
chose to go this way rather than that if I understood IEEE 754 better.

Chris Angelico

2014-07-08 07:59:11 UTC

Post by Stephen J. Turnbull
But you're missing at least two alternatives that
involve raising on some calculations involving NaN, as well as the
fact that forcing inequality of two NaNs produced by equivalent
calculations is arguably just as wrong as allowing equality of two
NaNs produced by the different calculations.

This is off-topic for this thread, but still...

The trouble is that your "arguably just as wrong" is an
indistinguishable case. If you don't want two different calculations'
NaNs to *ever* compare equal, the only solution is to have all NaNs
compare unequal - otherwise, two calculations might happen to produce
the same bitpattern, as there are only a finite number of them
available.

Post by Stephen J. Turnbull
That's where things get
fuzzy for me -- in Python I would expect that preserving invariants
would be more important than computational efficiency, but evidently
it's not.

What invariant is being violated for efficiency? As I see it, it's one
possible invariant (things should be equal to themselves) coming up
against another possible invariant (one way of generating NaN is
unequal to any other way of generating NaN).

Raising an exception is, of course, the purpose of signalling NaNs
rather than quiet NaNs, which is a separate consideration from how
they compare.

ChrisA

Anders J. Munch

2014-07-08 14:58:33 UTC

Post by Chris Angelico
This is off-topic for this thread, but still...
The trouble is that your "arguably just as wrong" is an
indistinguishable case. If you don't want two different calculations'
NaNs to *ever* compare equal, the only solution is to have all NaNs
compare unequal

For two NaNs computed differently to compare equal is no worse than 2+2
comparing equal to 1+3. You're comparing values, not their history.

You've prompted me to get a rant on the subject off my chest, I just posted an
article on NaN comparisons to python-list.

regards, Anders

Steven D'Aprano

2014-07-08 17:00:46 UTC

Post by Anders J. Munch
For two NaNs computed differently to compare equal is no worse than 2+2
comparing equal to 1+3. You're comparing values, not their history.

a = -23
b = -42
if log(a) == log(b):
print "a == b"

--
Steven

Chris Angelico

2014-07-08 17:13:00 UTC

Post by Anders J. Munch
For two NaNs computed differently to compare equal is no worse than 2+2
comparing equal to 1+3. You're comparing values, not their history.

a = -23
b = -42
print "a == b"

That could also happen from rounding error, though.

Post by Anders J. Munch

a = 2.0**52
b = a+1.0
a == b

False

Post by Anders J. Munch

log(a) == log(b)

True

Any time you do any operation on numbers that are close together but
not equal, you run the risk of getting results that, in
finite-precision floating point, are deemed equal, even though
mathematically they shouldn't be (two unequal numbers MUST have
unequal logarithms).

ChrisA

Steven D'Aprano

2014-07-08 16:57:45 UTC

I don't think so. Floating point == represents *numeric* equality, not
(for example) equality in the sense of "All Men Are Created Equal". Not
even numeric equality in the most general sense, but specifically in the
sense of (approximately) real-valued numbers, so it's an extremely
precise definition of "equal", not fuzzy in any way.

In an early post, you suggested that NANs don't have a value, or that
they have a value which is not a value. I don't think that's a good way
to look at it. I think the obvious way to think of it is that NAN's
value is Not A Number, exactly like it says on the box. Now, if
something is not a number, obviously you cannot compare it numerically:

"Considered as numbers, is the sound of rain on a tin roof
numerically equal to the sight of a baby smiling?"

Some might argue that the only valid answer to this question is "Mu",

https://en.wikipedia.org/wiki/Mu_%28negative%29#.22Unasking.22_the_question

but if we're forced to give a Yes/No True/False answer, then clearly
False is the only sensible answer. No, Virginia, Santa Claus is not the
same number as Santa Claus.

To put it another way, if x is not a number, then x != y for all
possible values of y -- including x.

[Disclaimer: despite the name, IEEE-754 arguably does not intend NANs to
be Not A Number in the sense that Santa Claus is not a number, but more
like "it's some number, but it's impossible to tell which". However,
despite that, the standard specifies behaviour which is best thought of
in terms of as the Santa Claus model.]

I'm not sure what you're referring to here. Is it that containers such
as lists and dicts are permitted to optimize equality tests with
identity tests for speed?

py> NAN = float('NAN')
py> a = [1, 2, NAN, 4]
py> NAN in a # identity is checked before equality
True
py> any(x == NAN for x in a)
False

When this came up for discussion last time, the clear consensus was that
this is reasonable behaviour. NANs and other such "weird" objects are
too rare and too specialised for built-in classes to carry the burden of
having to allow for them. If you want a "NAN-aware list", you can make
one yourself.

Post by Stephen J. Turnbull
I assume that I would have a better grasp on why Python
chose to go this way rather than that if I understood IEEE 754 better.

See the answer by Stephen Canon here:

http://stackoverflow.com/questions/1565164/

[quote]

It is not possible to specify a fixed-size arithmetic type that
satisfies all of the properties of real arithmetic that we know and
love. The 754 committee has to decide to bend or break some of them.
This is guided by some pretty simple principles:

When we can, we match the behavior of real arithmetic.
When we can't, we try to make the violations as predictable and as
easy to diagnose as possible.

[end quote]

In particular, reflexivity for NANs was dropped for a number of reasons,
some stronger than others:

- One of the weaker reasons for NAN non-reflexivity is that it preserved
the identity x == y <=> x - y == 0. Although that is the cornerstone
of real arithmetic, it's violated by IEEE-754 INFs, so violating it
for NANs is not a big deal either.

- Dropping reflexivity preserves the useful property that NANs compare
unequal to everything.

- Practicality beats purity: dropping reflexivity allowed programmers
to identify NANs without waiting years or decades for programming
languages to implement isnan() functions. E.g. before Python had
math.isnan(), I made my own:

def isnan(x):
return isinstance(x, float) and x != x

- Keeping reflexivity for NANs would have implied some pretty nasty
things, e.g. if log(-3) == log(-5), then -3 == -5.

Basically, and I realise that many people disagree with their decision
(notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the
IEEE-754 committee led by William Kahan decided that the problems caused
by having NANs compare unequal to themselves were much less than the
problems that would have been caused without it.

--
Steven

MRAB

2014-07-08 17:33:31 UTC

On 2014-07-08 17:57, Steven D'Aprano wrote:
[snip]

Post by Steven D'Aprano
In particular, reflexivity for NANs was dropped for a number of reasons,
- One of the weaker reasons for NAN non-reflexivity is that it preserved
the identity x == y <=> x - y == 0. Although that is the cornerstone
of real arithmetic, it's violated by IEEE-754 INFs, so violating it
for NANs is not a big deal either.
- Dropping reflexivity preserves the useful property that NANs compare
unequal to everything.
- Practicality beats purity: dropping reflexivity allowed programmers
to identify NANs without waiting years or decades for programming
languages to implement isnan() functions. E.g. before Python had
return isinstance(x, float) and x != x
- Keeping reflexivity for NANs would have implied some pretty nasty
things, e.g. if log(-3) == log(-5), then -3 == -5.

The log of a negative number is a complex number.

Post by Steven D'Aprano
Basically, and I realise that many people disagree with their decision
(notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the
IEEE-754 committee led by William Kahan decided that the problems caused
by having NANs compare unequal to themselves were much less than the
problems that would have been caused without it.

Steven D'Aprano

2014-07-09 01:22:42 UTC

Post by MRAB
The log of a negative number is a complex number.

Only in complex arithmetic. In real arithmetic, the log of a negative
number isn't a number at all.

--
Steven

Stephen J. Turnbull

2014-07-09 04:21:11 UTC

Post by Steven D'Aprano
I don't think so. Floating point == represents *numeric* equality,

There is no such thing as floating point == in Python. You can apply
== to two floating point numbers, but == (at the language level)
handles any two numbers, as well as pairs of things that aren't
numbers in the Python language. So it's a design decision to include
NaNs at all, and another design decision to follow IEEE in giving them
behavior that violates the definition of equivalence relation for ==.

Post by Steven D'Aprano
In an early post, you suggested that NANs don't have a value, or that
they have a value which is not a value. I don't think that's a good way
to look at it. I think the obvious way to think of it is that NAN's
value is Not A Number, exactly like it says on the box. Now, if

And if Python can't do something you ask it to do, it raises an
exception. Why should this be different? Obviously, it's question of
expedience.

Post by Steven D'Aprano
I'm not sure what you're referring to here. Is it that containers such
as lists and dicts are permitted to optimize equality tests with
identity tests for speed?

No, when I say I'm fuzzy I'm referring to the fact that although I
understand the logical rationale for IEEE 754 NaN behavior, I don't
really understand the ins and outs well enough to judge for myself
whether it's a good idea for Python to follow that model and turn ==
into something that is not an equivalence relation.

I'm not going to argue for a change, I just want to know where I stand.

Post by Steven D'Aprano
Basically, and I realise that many people disagree with their decision
(notably Bertrand Meyer of Eiffel fame, and our own Mark
Dickenson),

Indeed. So "it's the standard" does not mean there is a consensus of
experts. I'm willing to delegate to a consensus of expert opinion,
but not when some prominent local expert(s) disagree -- then I'd like
to understand well enough to come to my own conclusions.

Antoine Pitrou

2014-07-09 13:21:26 UTC

Post by Steven D'Aprano
I don't think so. Floating point == represents *numeric* equality,

This is becoming pointless hair-splitting.

float.__eq__(1.0, 2.0)

False

float.__eq__(1.0, 2)

False

float.__eq__(1.0, 1.0+0J)

NotImplemented

float.__eq__(1, 2)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor '__eq__' requires a 'float' object but received a
'int'

Please direct any further discussion of this to python-ideas.

Raymond Hettinger

2014-07-09 01:48:17 UTC

I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python.
The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it.

Once every few years, someone discovers IEEE-754, learns that NaNs
aren't supposed to be equal to themselves and becomes inspired
to open an old debate about whether the wreck Python in a effort
to make the world safe for NaNs. And somewhere along the way,
people forget that practicality beats purity.

Here are a few thoughts on the subject that may or may not add
a little clarity ;-)

* Python already has IEEE-754 compliant NaNs:

assert float('NaN') != float('NaN')

* Python already has the ability to filter-out NaNs:

[x for x in container if not math.nan(x)]

* In the numeric world, the most common use of NaNs is for
missing data (much like we usually use None). The property
of not being equality to itself is primarily useful in
low level code optimized to run a calculation to completion
without running frequent checks for invalid results
(much like @n/a is used in MS Excel).

* Python also lets containers establish their own invariants
to establish correctness, improve performance, and make it
possible to reason about our programs:

for x in c:
assert x in c

* Containers like dicts and sets have always used the rule
that identity-implies equality. That is central to their
implementation. In particular, the check of interned
string keys relies on identity to bypass a slow
character-by-character comparison to verify equality.

* Traditionally, a relation R is considered an equality
relation if it is reflexive, symmetric, and transitive:

R(x, x) -> True
R(x, y) -> R(y, x)
R(x, y) ^ R(y, z) -> R(x, z)

* Knowingly or not, programs tend to assume that all of those
hold. Test suites in particular assume that if you put
something in a container that assertIn() will pass.

* Here are some examples of cases where non-reflexive objects
would jeopardize the pragmatism of being able to reason
about the correctness of programs:

s = SomeSet()
s.add(x)
assert x in s

s.remove(x) # See collections.abc.Set.remove
assert not s

s.clear() # See collections.abc.Set.clear
asset not s

* What the above code does is up to the implementer of the
container. If you use the Set ABC, you can choose to
implement __contains__() and discard() to use straight
equality or identity-implies equality. Nothing prevents
you from making containers that are hard to reason about.

* The builtin containers make the choice for identity-implies
equality so that it is easier to build fast, correct code.
For the most part, this has worked out great (dictionaries
in particular have had identify checks built-in from almost
twenty years).

* Years ago, there was a debate about whether to add an __is__()
method to allow overriding the is-operator. The push for the
change was the "pure" notion that "all operators should be
customizable". However, the idea was rejected based on the
"practical" notions that it would wreck our ability to reason
about code, it slow down all code that used identity checks,
that library modules (ours and third-party) already made
deep assumptions about what "is" means, and that people would
shoot themselves in the foot with hard to find bugs.

Personally, I see no need to make the same mistake by removing
the identity-implies-equality rule from the built-in containers.
There's no need to upset the apple cart for nearly zero benefit.

IMO, the proposed quest for purity is misguided.
There are many practical reasons to let the builtin
containers continue work as the do now.

Raymond

Andreas Maier

2014-07-11 14:04:35 UTC

Post by Raymond Hettinger

Once every few years, someone discovers IEEE-754, learns that NaNs
aren't supposed to be equal to themselves and becomes inspired
to open an old debate about whether the wreck Python in a effort
to make the world safe for NaNs. And somewhere along the way,
people forget that practicality beats purity.
Here are a few thoughts on the subject that may or may not add
a little clarity ;-)
assert float('NaN') != float('NaN')
[x for x in container if not math.nan(x)]
* In the numeric world, the most common use of NaNs is for
missing data (much like we usually use None). The property
of not being equality to itself is primarily useful in
low level code optimized to run a calculation to completion
without running frequent checks for invalid results
* Python also lets containers establish their own invariants
to establish correctness, improve performance, and make it
assert x in c
* Containers like dicts and sets have always used the rule
that identity-implies equality. That is central to their
implementation. In particular, the check of interned
string keys relies on identity to bypass a slow
character-by-character comparison to verify equality.
* Traditionally, a relation R is considered an equality
R(x, x) -> True
R(x, y) -> R(y, x)
R(x, y) ^ R(y, z) -> R(x, z)
* Knowingly or not, programs tend to assume that all of those
hold. Test suites in particular assume that if you put
something in a container that assertIn() will pass.
* Here are some examples of cases where non-reflexive objects
would jeopardize the pragmatism of being able to reason
s = SomeSet()
s.add(x)
assert x in s
s.remove(x) # See collections.abc.Set.remove
assert not s
s.clear() # See collections.abc.Set.clear
asset not s
* What the above code does is up to the implementer of the
container. If you use the Set ABC, you can choose to
implement __contains__() and discard() to use straight
equality or identity-implies equality. Nothing prevents
you from making containers that are hard to reason about.
* The builtin containers make the choice for identity-implies
equality so that it is easier to build fast, correct code.
For the most part, this has worked out great (dictionaries
in particular have had identify checks built-in from almost
twenty years).
* Years ago, there was a debate about whether to add an __is__()
method to allow overriding the is-operator. The push for the
change was the "pure" notion that "all operators should be
customizable". However, the idea was rejected based on the
"practical" notions that it would wreck our ability to reason
about code, it slow down all code that used identity checks,
that library modules (ours and third-party) already made
deep assumptions about what "is" means, and that people would
shoot themselves in the foot with hard to find bugs.
Personally, I see no need to make the same mistake by removing
the identity-implies-equality rule from the built-in containers.
There's no need to upset the apple cart for nearly zero benefit.

Containers delegate the equal comparison on the container to their
elements; they do not apply identity-based comparison to their elements.
At least that is the externally visible behavior.

Only the default comparison behavior implemented on type object follows
the identity-implies-equality rule.

As part of my doc patch, I will upload an extension to the
test_compare.py test suite, which tests all built-in containers with
values whose order differs the identity order, and it shows that the
value order and equality wins over identity, if implemented.

Post by Raymond Hettinger
IMO, the proposed quest for purity is misguided.
There are many practical reasons to let the builtin
containers continue work as the do now.

As I said, I can accept compatibility reasons. Plus, the argument
brought up by Benjamin about the desire for the the
identity-implies-equality rule as a default, with no corresponding rule
for order comparison (and I added both to the doc patch).

Andy

Ethan Furman

2014-07-11 20:54:40 UTC

Post by Raymond Hettinger
Personally, I see no need to make the same mistake by removing
the identity-implies-equality rule from the built-in containers.
There's no need to upset the apple cart for nearly zero benefit.

Containers delegate the equal comparison on the container to their elements; they do not apply identity-based comparison
to their elements. At least that is the externally visible behavior.

If that were true, then [NaN] == [NaN] would be False, and it is not.

Here is the externally visible behavior:

Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
--> NaN = float('nan')
--> NaN == NaN
False
--> [NaN] == [NaN]
True

--
~Ethan~

Andreas Maier

2014-07-13 15:13:20 UTC

Containers delegate the equal comparison on the container to their
elements; they do not apply identity-based comparison
to their elements. At least that is the externally visible behavior.

If that were true, then [NaN] == [NaN] would be False, and it is not.
Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
--> NaN = float('nan')
--> NaN == NaN
False
--> [NaN] == [NaN]
True

Ouch, that hurts ;-)

First, the delegation of sequence equality to element equality is not
something I have come up with during my doc patch. It has always been in
5.9 Comparisons of the Language Reference (copied from Python 3.4):

"Tuples and lists are compared lexicographically using comparison of
corresponding elements. This means that to compare equal, each element
must compare equal and the two sequences must be of the same type and
have the same length."

Second, if not by delegation to equality of its elements, how would the
equality of sequences defined otherwise?

But your test is definitely worth having a closer look at. I have
broadened the test somewhat and that brings up further questions. Here
is the test output, and a discussion of the results (test program
try_eq.py and its output test_eq.out are attached to issue #12067):

Test #1: Different equal int objects:

obj1: type=<class 'int'>, str=257, id=39305936
obj2: type=<class 'int'>, str=257, id=39306160

a) obj1 is obj2: False
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True

Discussion:

Case 1.c) can be interpreted that the list delegates its == to the == on
its elements. It cannot be interpreted to delegate to identity
comparison. That is consistent with how everyone (I hope ;-) would
expect int objects to behave, or lists or dicts of them.

The motivation for case f) is explained further down, it has to do with
caching.

Test #2: Same int object:

obj1: type=<class 'int'>, str=257, id=39305936
obj2: type=<class 'int'>, str=257, id=39305936

a) obj1 is obj2: True
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True

-> No surprises (I hope).

Test #3: Different equal float objects:

obj1: type=<class 'float'>, str=257.0, id=5734664
obj2: type=<class 'float'>, str=257.0, id=5734640

a) obj1 is obj2: False
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True

Discussion:

I added this test only to show that float NaN is a special case, and
that this test for float objects - that are not NaN - behaves like test
#1 for int objects.

Test #4: Same float object:

obj1: type=<class 'float'>, str=257.0, id=5734664
obj2: type=<class 'float'>, str=257.0, id=5734664

a) obj1 is obj2: True
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: True

-> Same as test #2, hopefully no surprises.

Test #5: Different float NaN objects:

obj1: type=<class 'float'>, str=nan, id=5734784
obj2: type=<class 'float'>, str=nan, id=5734976

a) obj1 is obj2: False
b) obj1 == obj2: False
c) [obj1] == [obj2]: False
d) {obj1:'v'} == {obj2:'v'}: False
e) {'k':obj1} == {'k':obj2}: False
f) obj1 == obj2: False

Discussion:

Here, the list behaves as I would expect under the rule that it
delegates equality to its elements. Case c) allows that interpretation.
However, an interpretation based on identity would also be possible.

Test #6: Same float NaN object:

obj1: type=<class 'float'>, str=nan, id=5734784
obj2: type=<class 'float'>, str=nan, id=5734784

a) obj1 is obj2: True
b) obj1 == obj2: False
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
f) obj1 == obj2: False

Discussion (this is Ethan's example):

Case 6.b) shows the special behavior of float NaN that is documented: a
float NaN object is the same as itself but unequal to itself.

Case 6.c) is the surprising case. It could be interpreted in two ways
(at least that's what I found):

1) The comparison is based on identity of the float objects. But that is
inconsistent with test #4. And why would the list special-case NaN
comparison in such a way that it ends up being inconsistent with the
special definition of NaN (outside of the list)?

2) The list does not always delegate to element equality, but attempts
to optimize if the objects are the same (same identity). We will see
later that that happens. Further, when comparing float NaNs of the same
identity, the list implementation forgot to special-case NaNs. Which
would be a bug, IMHO. I did not analyze the C implementation, so this is
all speculation based upon external visible behavior.

Test #7: Different objects (with equal x) of class C
(C.__eq__() implemented with equality of x,
C.__ne__() returning NotImplemented):

obj1: type=<class '__main__.C'>, str=C(256), id=39406504
obj2: type=<class '__main__.C'>, str=C(256), id=39406616

a) obj1 is obj2: False
C.__eq__(): self=39406504, other=39406616, returning True
b) obj1 == obj2: True
C.__eq__(): self=39406504, other=39406616, returning True
c) [obj1] == [obj2]: True
C.__eq__(): self=39406616, other=39406504, returning True
d) {obj1:'v'} == {obj2:'v'}: True
C.__eq__(): self=39406504, other=39406616, returning True
e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406616, returning True
f) obj1 == obj2: True

The __eq__() and __ne__() implementations each print a debug message.
The __ne__() is only defined to verify that it is not invoked, and that
the inherited default __ne__() does not chime in.

Discussion:

Here we see that the list equality comparison does invoke the element
equality. However, the picture becomes more complex further down.

Test #8: Same object of class C
(C.__eq__() implemented with equality of x,
C.__ne__() returning NotImplemented):

obj1: type=<class '__main__.C'>, str=C(256), id=39406504
obj2: type=<class '__main__.C'>, str=C(256), id=39406504

a) obj1 is obj2: True
C.__eq__(): self=39406504, other=39406504, returning True
b) obj1 == obj2: True
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406504, returning True
f) obj1 == obj2: True

Discussion:

The == on the class C objects in case 8.b) invokes __eq__(), even though
the objects are the same object. This can be explained by the desire in
Python that classes should be able not to be reflexive, if needed. Like
float NaN, for example.

Now, the list equality in case 8.c) is interesting. The list equality
does not invoke element equality. Even though object equality in case
8.b) did not assume reflexivity and invoked the __eq__() method, the
list seems to assume reflexivity and seems to go by object identity.

The only other potential explanation (that I found) would be that some
aspects of the comparison behavior are cached. That's why I added the
cases f), which show that caching for comparison results does not happen
(the __eq__() method is invoked again).

So we are back to discussing why element equality does not assume
reflexivity, but list equality does. IMHO, that is another bug, or maybe
the same one.

Test #9: Different objects (with equal x) of class D
(D.__eq__() implemented with inequality of x,
D.__ne__() returning NotImplemented):

obj1: type=<class '__main__.D'>, str=C(256), id=39407064
obj2: type=<class '__main__.D'>, str=C(256), id=39406952

a) obj1 is obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
b) obj1 == obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
c) [obj1] == [obj2]: False
D.__eq__(): self=39406952, other=39407064, returning False
d) {obj1:'v'} == {obj2:'v'}: False
D.__eq__(): self=39407064, other=39406952, returning False
e) {'k':obj1} == {'k':obj2}: False
D.__eq__(): self=39407064, other=39406952, returning False
f) obj1 == obj2: False

Discussion:

Class D implements __eq__() by != on the data attribute. This test does
not really show any surprises, and is consistent with the theory that
list comparison delegates to element comparison. This is really just a
preparation for the next test, that uses the same object of this class.

Test #10: Same object of class D
(D.__eq__() implemented with inequality of x,
D.__ne__() returning NotImplemented):

obj1: type=<class '__main__.D'>, str=C(256), id=39407064
obj2: type=<class '__main__.D'>, str=C(256), id=39407064

a) obj1 is obj2: True
D.__eq__(): self=39407064, other=39407064, returning False
b) obj1 == obj2: False
c) [obj1] == [obj2]: True
d) {obj1:'v'} == {obj2:'v'}: True
e) {'k':obj1} == {'k':obj2}: True
D.__eq__(): self=39407064, other=39407064, returning False
f) obj1 == obj2: False

Discussion:

The inequality-based implementation of __eq__() explains case 10.b). It
is surprising (to me) that the list comparison in case 10.c) returns
True. If one compares that to case 9.c), one could believe that the
identities of the objects are used for both cases. But why would the
list not respect the result of __eq__() if it is implemented?

This behavior seems at least to be consistent with surprise of case 6.c)

In order to not just rely on the external behavior, I started digging
into the C implementation. For list equality comparison, I started at
list_richcompare() which uses PyObject_RichCompareBool(), which
shortcuts its result based on identity comparison, and thus enforces
reflexitivity.

The comment on line 714 in object.c in PyObject_RichCompareBool() also
confirms that:

/* Quick result when objects are the same.
Guarantees that identity implies equality. */

IMHO, we need to discuss whether we are serious with the direction that
was claimed earlier in this thread, that reflexivity (i.e. identity
implies equality) should be decided upon by the classes and not by the
Python language. As I see it, we have some pieces of code that enforce
reflexivity, and some that don't.

Andy

Steven D'Aprano

2014-07-13 16:23:03 UTC

Post by Andreas Maier
Second, if not by delegation to equality of its elements, how would the
equality of sequences defined otherwise?

Wow. I'm impressed by the amount of detailed effort you've put into
investigating this. (Too much detail to absorb, I'm afraid.) But perhaps
you might have just asked on the python-***@python.org mailing list, or
here, where we would have told you the answer:

list __eq__ first checks element identity before going on
to check element equality.

If you can read C, you might like to check the list source code:

http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c

but if I'm reading it correctly, list.__eq__ conceptually looks
something like this:

def __eq__(self, other):
if not isinstance(other, list):
return NotImplemented
if len(other) != len(self):
return False
for a, b in zip(self, other):
if not (a is b or a == b):
return False
return True

(The actual code is a bit more complex than that, since there is a
single function, list_richcompare, which handles all the rich
comparisons.)

The critical test is PyObject_RichCompareBool here:

http://hg.python.org/cpython/file/22e5a85ba840/Objects/object.c

which explicitly says:

/* Quick result when objects are the same.
Guarantees that identity implies equality. */

[...]

Post by Andreas Maier
I added this test only to show that float NaN is a special case,

NANs are not a special case. List __eq__ treats all object types
identically (pun intended):

py> class X:
... def __eq__(self, other): return False
...
py> x = X()
py> x == x
False
py> [x] == [X()]
False
py> [x] == [x]
True

[...]

Post by Andreas Maier
Case 6.c) is the surprising case. It could be interpreted in two ways
1) The comparison is based on identity of the float objects. But that is
inconsistent with test #4. And why would the list special-case NaN
comparison in such a way that it ends up being inconsistent with the
special definition of NaN (outside of the list)?

It doesn't. NANs are not special cased in any way.

This was discussed to death some time ago, both on python-dev and
python-ideas. If you're interested, you can start here:

https://mail.python.org/pipermail/python-list/2012-October/633992.html

which is in the middle of one of the threads, but at least it gets you
to the right time period.

Post by Andreas Maier
2) The list does not always delegate to element equality, but attempts
to optimize if the objects are the same (same identity).

Right! It's not just lists -- I believe that tuples, dicts and sets
behave the same way.

Post by Andreas Maier
We will see
later that that happens. Further, when comparing float NaNs of the same
identity, the list implementation forgot to special-case NaNs. Which
would be a bug, IMHO.

"Forgot"? I don't think the behaviour of list comparisons is an
accident.

NAN equality is non-reflexive. Very few other things are the same. It
would be seriously weird if alist == alist could return False. You'll
note that the IEEE-754 standard has nothing to say about the behaviour
of Python lists containing NANs, so we're free to pick whatever
behaviour makes the most sense for Python, and that is to minimise the
"Gotcha!" factor.

NANs are a gotcha to anyone who doesn't know IEEE-754, and possibly even
some who do. I will go to the barricades to fight to keep the
non-reflexivity of NANs *in isolation*, but I believe that Python has
made the right decision to treat lists containing NANs the same as
everything else.

NAN == NAN # obeys IEEE-754 semantics and returns False

[NAN] == [NAN] # obeys standard expectation that equality is reflexive

This behaviour is not a bug, it is a feature. As far as I am concerned,
this only needs documenting. If anyone needs list equality to honour the
special behaviour of NANs, write a subclass or an equal() function.

--
Steven

Chris Angelico

2014-07-13 16:34:20 UTC

Post by Andreas Maier
We will see
later that that happens. Further, when comparing float NaNs of the same
identity, the list implementation forgot to special-case NaNs. Which
would be a bug, IMHO.

"Forgot"? I don't think the behaviour of list comparisons is an
accident.

Well, "forgot" is on the basis that the identity check is intended to
be a mere optimization. If that were the case ("don't actually call
__eq__ when you reckon it'll return True"), then yes, failing to
special-case NaN would be a bug. But since it's intended behaviour, as
explained further down, it's not a bug and not the result of
forgetfulness.

ChrisA

Nick Coghlan

2014-07-13 18:11:58 UTC

Post by Chris Angelico

Post by Andreas Maier
We will see
later that that happens. Further, when comparing float NaNs of the same
identity, the list implementation forgot to special-case NaNs. Which
would be a bug, IMHO.

"Forgot"? I don't think the behaviour of list comparisons is an
accident.

Right, it's not a mere optimisation - it's the only way to get
containers to behave sensibly. Otherwise we'd end up with nonsense

Post by Chris Angelico

Post by Andreas Maier
x = float("nan")
x in [x]

False

That currently returns True because of the identity check - it would
return False if we delegated the check to float.__eq__ because the
defined IEEE754 behaviour for NaN's breaks the mathematical definition
of an equivalence class as a transitive, reflexive and commutative
operation. (It breaks it for *good reasons*, but we still need to
figure out a way of dealing with the impedance mismatch between the
definition of floats and the definition of container invariants like
"assert x in [x]")

The current approach means that the lack of reflexivity of NaN's stays
confined to floats and similar types - it doesn't leak out and infect
the behaviour of the container types.

What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.

Cheers,
Nick.

--
Nick Coghlan | ***@gmail.com | Brisbane, Australia

Chris Angelico

2014-07-13 18:16:11 UTC

Post by Nick Coghlan
What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.

Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found
a parallel explanation of sequence equality.

ChrisA

Nick Coghlan

2014-07-13 18:23:42 UTC

Post by Chris Angelico

Post by Nick Coghlan
What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.

Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found
a parallel explanation of sequence equality.

We might need to expand the tables of sequence operations to cover
equality and inequality checks - those are currently missing.

Cheers,
Nick.

Post by Chris Angelico
ChrisA
_______________________________________________
Python-Dev mailing list
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

--
Nick Coghlan | ***@gmail.com | Brisbane, Australia

Marko Rauhamaa

2014-07-13 19:54:02 UTC

Post by Nick Coghlan
Right, it's not a mere optimisation - it's the only way to get
containers to behave sensibly. Otherwise we'd end up with nonsense

Post by Andreas Maier
x = float("nan")
x in [x]

False

Why is that nonsense? I mean, why is it any more nonsense than

Post by Nick Coghlan

Post by Steven D'Aprano
x == x

False

Anyway, personally, I'm perfectly "happy" to live with the choices of
past generations, regardless of whether they were good or not. What you
absolutely don't want to do is "correct" the choices of past generations.

Marko

Akira Li

2014-07-13 20:05:27 UTC

Nick Coghlan <***@gmail.com> writes:
...

Post by Nick Coghlan
definition of floats and the definition of container invariants like
"assert x in [x]")
The current approach means that the lack of reflexivity of NaN's stays
confined to floats and similar types - it doesn't leak out and infect
the behaviour of the container types.
What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.

There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.

--
Akira

Andreas Maier

2014-07-16 11:40:03 UTC

Post by Akira Li
...

There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.

I currently know about these two issues related to fixing the docs:

http://bugs.python.org/11945 - about NaN values in containers
http://bugs.python.org/12067 - comparisons

I am working on the latter, currently. The patch only targets the
comparisons chapter in the Language Reference, there is another
comparisons chapter in the Library Reference, and one in the Tutorial.

I will need to update the patch to issue 12067 as a result of this
discussion.

Andy

Andreas Maier

2014-07-16 15:24:16 UTC

Post by Akira Li
...
There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.

http://bugs.python.org/11945 - about NaN values in containers
http://bugs.python.org/12067 - comparisons
I am working on the latter, currently. The patch only targets the
comparisons chapter in the Language Reference, there is another
comparisons chapter in the Library Reference, and one in the Tutorial.
I will need to update the patch to issue 12067 as a result of this
discussion.

I have uploaded v9 of the patch to issue 12067; it should address the
recent discussion (plus Mark's review comment on the issue itself).

Please review.

Andy

Andreas Maier

2014-07-16 11:39:55 UTC

Post by Andreas Maier
Second, if not by delegation to equality of its elements, how would the
equality of sequences defined otherwise?

Wow. I'm impressed by the amount of detailed effort you've put into
investigating this. (Too much detail to absorb, I'm afraid.) But perhaps
list __eq__ first checks element identity before going on
to check element equality.

I apologize for not asking. It seems I was looking at the trees
(behaviors of specific cases) without seeing the wood (identity goes first).

Post by Steven D'Aprano
http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c

I can read (and write) C fluently, but (1) I don't have a build
environment on my Windows system so I cannot debug it, and (2) I find it
hard to judge from just looking at the C code which C function is
invoked when the Python code enters the C code.
(Quoting Raymond H. from his blog: "Unless you know where to look,
searching the source for an answer can be a time consuming intellectual
investment.")

So thanks for clarifying this.

I guess I am arriving (slowly and still partly reluctantly, and I'm not
alone with that feeling, it seems ...) at the bottom line of all this,
which is that reflexivity is an important goal in Python, that
self-written non-reflexive classes are not intended nor well supported,
and that the non-reflexive NaN is considered an exception that cannot be
expected to be treated consistently non-reflexive.

Post by Steven D'Aprano
This was discussed to death some time ago, both on python-dev and
https://mail.python.org/pipermail/python-list/2012-October/633992.html
which is in the middle of one of the threads, but at least it gets you
to the right time period.

I read a number of posts in that thread by now. Sorry for not reading it
earlier, but the mailing list archive just does not lend itself to
searching the past. Of course, one can google it ;-)

Andy

Ethan Furman

2014-07-14 02:55:37 UTC

Post by Ethan Furman
Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
--> NaN = float('nan')
--> NaN == NaN
False
--> [NaN] == [NaN]
True

Ouch, that hurts ;-)

Yeah, I've been bitten enough times that now I try to always test code before I post. ;)

Post by Andreas Maier
Test #8: Same object of class C
(C.__eq__() implemented with equality of x,
obj1: type=<class '__main__.C'>, str=C(256), id=39406504
obj2: type=<class '__main__.C'>, str=C(256), id=39406504
a) obj1 is obj2: True
C.__eq__(): self=39406504, other=39406504, returning True

This is interesting/weird/odd -- why is __eq__ being called for an 'is' test?

--- test_eq.py ----------------------------
class TestEqTrue:
def __eq__(self, other):
print('Test.__eq__ returning True')
return True

class TestEqFalse:
def __eq__(self, other):
print('Test.__eq__ returning False')
return False

tet = TestEqTrue()
print(tet is tet)
print(tet in [tet])

tef = TestEqFalse()
print(tef is tef)
print(tef in [tef])
-------------------------------------------

When I run this all I get is four Trues, never any messages about being in __eq__.

How did you get that result?

--
~Ethan~

Ethan Furman

2014-07-14 05:51:04 UTC

This is interesting/weird/odd -- why is __eq__ being called for an 'is' test?

The debug messages are printed before the result is printed. So this is the debug message for the next case, 8.b).

Ah, whew! That's a relief.

Sorry for not explaining it.

Had I been reading more closely I would (hopefully) have noticed that, but I was headed out the door at the time.

--
~Ethan~

Andreas Maier

2014-07-14 05:33:46 UTC

This is interesting/weird/odd -- why is __eq__ being called for an 'is' test?

The debug messages are printed before the result is printed. So this is
the debug message for the next case, 8.b).

Sorry for not explaining it.

Andy

Andreas Maier

2014-07-07 23:37:48 UTC