Discussion:
release plan for 2.5 ?
(too old to reply)
Fredrik Lundh
2006-02-07 23:56:32 UTC
Permalink
a while ago, I wrote

> > Hopefully something can get hammered out so that at least the Python
> > 3 docs can premiere having been developed on by the whole community.
>
> why wait for Python 3 ?
>
> what's the current release plan for Python 2.5, btw? I cannot find a
> relevant PEP, and the "what's new" says "late 2005":
>
> http://www.python.org/dev/doc/devel/whatsnew/contents.html

but I don't think that anyone followed up on this. what's the current
status ?

</F>
Neal Norwitz
2006-02-08 03:03:11 UTC
Permalink
On 2/7/06, Fredrik Lundh <***@pythonware.com> wrote:
> >
> > what's the current release plan for Python 2.5, btw? I cannot find a
> > relevant PEP, and the "what's new" says "late 2005":
> >
> but I don't think that anyone followed up on this. what's the current
> status ?

Guido and I had a brief discussion about this. IIRC, he was thinking
alpha around March and release around summer. I think this is
aggressive with all the things still to do. We really need to get the
ssize_t branch integrated.

There are a bunch of PEPs that have been accepted (or close), but not
implemented. I think these include (please correct me, so we can get
a good list):

http://www.python.org/peps/

SA 308 Conditional Expressions
SA 328 Imports: Multi-Line and Absolute/Relative
SA 342 Coroutines via Enhanced Generators
S 343 The "with" Statement
S 353 Using ssize_t as the index type

This one should be marked as final I believe:

SA 341 Unifying try-except and try-finally

n
Jeremy Hylton
2006-02-08 03:26:02 UTC
Permalink
It looks like we need a Python 2.5 Release Schedule PEP.

Jeremy

On 2/7/06, Neal Norwitz <***@gmail.com> wrote:
> On 2/7/06, Fredrik Lundh <***@pythonware.com> wrote:
> > >
> > > what's the current release plan for Python 2.5, btw? I cannot find a
> > > relevant PEP, and the "what's new" says "late 2005":
> > >
> > but I don't think that anyone followed up on this. what's the current
> > status ?
>
> Guido and I had a brief discussion about this. IIRC, he was thinking
> alpha around March and release around summer. I think this is
> aggressive with all the things still to do. We really need to get the
> ssize_t branch integrated.
>
> There are a bunch of PEPs that have been accepted (or close), but not
> implemented. I think these include (please correct me, so we can get
> a good list):
>
> http://www.python.org/peps/
>
> SA 308 Conditional Expressions
> SA 328 Imports: Multi-Line and Absolute/Relative
> SA 342 Coroutines via Enhanced Generators
> S 343 The "with" Statement
> S 353 Using ssize_t as the index type
>
> This one should be marked as final I believe:
>
> SA 341 Unifying try-except and try-finally
>
> n
> _______________________________________________
> Python-Dev mailing list
> Python-***@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>
Neal Norwitz
2006-02-08 06:35:31 UTC
Permalink
On 2/7/06, Jeremy Hylton <***@alum.mit.edu> wrote:
> It looks like we need a Python 2.5 Release Schedule PEP.

Very draft: http://www.python.org/peps/pep-0356.html

Needs lots of work and release managers. Anthony, Martin, Fred, Sean
are all mentioned with TBDs and question marks.

n
Guido van Rossum
2006-02-10 20:21:26 UTC
Permalink
On 2/7/06, Neal Norwitz <***@gmail.com> wrote:
> On 2/7/06, Jeremy Hylton <***@alum.mit.edu> wrote:
> > It looks like we need a Python 2.5 Release Schedule PEP.
>
> Very draft: http://www.python.org/peps/pep-0356.html
>
> Needs lots of work and release managers. Anthony, Martin, Fred, Sean
> are all mentioned with TBDs and question marks.

Before he went off to a boondoggle^Woff-site at a Mexican resort, Neal
made me promise that I'd look at this and try to get the 2.5 release
plan going for real.

First things first: we need a release manager. Anthony, do you want to
do the honors again, or are you ready for retirement?

Next, the schedule. Neal's draft of the schedule has us releasing 2.5
in October. That feels late -- nearly two years after 2.4 (which was
released on Nov 30, 2004). Do people think it's reasonable to strive
for a more aggressive (by a month) schedule, like this:

alpha 1: May 2006
alpha 2: June 2006
beta 1: July 2006
beta 2: August 2006
rc 1: September 2006
final: September 2006

??? Would anyone want to be even more aggressive (e.g. alpha 1 right
after PyCon???). We could always do three alphas.

There's a bunch of sections (some very long) towards the end of the
PEP of questionable use; Neal just copied these from the 2.4 release
schedule (PEP 320):

- Ongoing tasks
- Carryover features from Python 2.4
- Carryover features from Python 2.3 (!)

Can someone go over these and suggest which we should keep, which we
should drop? (I may do this later, but I have other priorities below.)

Then, the list of features that ought to be in 2.5. Quoting Neal's draft:

> PEP 308: Conditional Expressions

Definitely. Don't we have a volunteer doing this now?

> PEP 328: Absolute/Relative Imports

Yes, please.

> PEP 343: The "with" Statement

Didn't Michael Hudson have a patch?

> PEP 352: Required Superclass for Exceptions

I believe this is pretty much non-controversial; it's a much weaker
version of PEP 348 which was rightfully rejected for being too
radical. I've tweaked some text in this PEP and approved it. Now we
need to make it happen. It might be quite a tricky thing, since
Exception is currently implemented in C as a classic class. If Brett
wants to sprint on this at PyCon I'm there to help (Mon/Tue only).
Fortunately we have MWH's patch 1104669 as a starting point.

> PEP 353: Using ssize_t as the index type

Neal tells me that this is in progress in a branch, but that the code
is not yet flawless (tons of warnings etc.). Martin, can you tell us
more? When do you expect this to land? Maybe aggressively merging into
the HEAD and then releasing it as alpha would be a good way to shake
out the final issues???

Other PEPs I'd like comment on:

PEP 357 (__index__): the patch isn't on SF yet, but otherwise I'm all
for this, and I'd like to accept it ASAP to get it in 2.5. It doesn't
look like it'll cause any problems.

PEP 314 (metadata v1.1): this is marked as completed, but there's a
newer PEP available: PEP 334 (metadata v1.2). That PEP has 2.5 as its
target date. Shouldn't we implement it? (This is a topic that I
haven't followed closely.) There's also the question whether 314
should be marked final. Andrew or Richard?

PEP 355 (path module): I still haven't reviewed this, because I'm -0
on adding what appears to me duplicate functionality. But if there's a
consensus building perhaps it should be allowed to go forward (and
then I *will* review it carefully).

I found a few more PEPs slated for 2.5 but that haven't seen much action lately:

PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
freezing arbitrary mutable data structures. Are there champions who
want to argue this?

PEP 349 - str() may return unicode. Where is this? I'm not at all sure
the PEP is ready. it would probably be a lot of work to make this work
everywhere in the C code, not to mention the stdlib .py code. Perhaps
this should be targeted for 2.6 instead? The consequences seem
potentially huge.

PEP 315 - do while. A simple enough syntax proposal, albeit one
introducing a new keyword (which I'm fine with). I kind of like it but
it doesn't strike me as super important -- if we put this off until
Py3k I'd be fine with that too. Opinions? Champions?

Ouch, a grep produced tons more. Quick rundown:

PEP 246 - adaptation. I'm still as lukewarm as ever; it needs
interfaces, promises to cause a paradigm shift, and the global map
worries me.

PEP 323 - copyable iterators. Seems stalled. Alex, do you care?

PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?

PEP 337 - logging in the stdlib. What of it? This seems a good idea
but potentially disruptive (because backwards incompatible). Also it
could be done piecemeal on an opportunistic basis. Any volunteers?

PEP 338 - support -m for modules in packages. I believe Nick Coghlan
is close to implementing this. I'm fine with accepting it.

PEP 344 - exception chaining. There are deep problems with this due to
circularities; perhaps we should drop this, or revisit it for Py3k.

That's the "pep parade" for now. It would be appropriate to start a
new topic to discuss specific PEPs; a response to this thread
referencing the new thread would be appropriate.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Raymond Hettinger
2006-02-10 21:05:35 UTC
Permalink
[Guido van Rossum]
> PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
> freezing arbitrary mutable data structures. Are there champions who
> want to argue this?

It has at least one anti-champion. I think it is a horrible idea and would
like to see it rejected in a way that brings finality. If needed, I can
elaborate in a separate thread.



> PEP 315 - do while. A simple enough syntax proposal, albeit one
> introducing a new keyword (which I'm fine with). I kind of like it but
> it doesn't strike me as super important -- if we put this off until
> Py3k I'd be fine with that too. Opinions? Champions?

I helped tweak a few issues with the PEP and got added as a co-author.
I didn't push for it because the syntax is a little odd if nothing appears
in
the while suite:

do:
val = source.read(1)
process(val)
while val != lastitem:
pass

I never found a way to improve this. Dropping the final colon and
post-while
steps improved the looks but diverged too far away from the rest of the
language:

do:
val = source.read(1)
process(val)
while val != lastitem

So, unless another champion arises, putting this off until Py3k is fine with
me.



> PEP 323 - copyable iterators. Seems stalled. Alex, do you care?

I installed the underlying mechanism in support of itertools.tee() in Py2.4.

So, if anyone really wants to make xrange() copyable, it is now a trivial
task --
likewise for any other iterator that has a potentially copyable state.

I've yet to find a use case for it, so I never pushed for the rest of
the PEP to be implemented. There's nothing wrong with the idea,
but there doesn't seem to be much interest.



> PEP 344 - exception chaining. There are deep problems with this due to
> circularities; perhaps we should drop this, or revisit it for Py3k.

I wouldn't hold-up Py2.5 for this.

My original idea for this was somewhat simpler. Essentially, a high-level
function would concatenate extra string information onto the result of an
exception raised at a lower level. That strategy was applied to an existing
problem for type objects and has met with good success.

IOW, there is a simpler alternative on the table, but resolution won't take
place until we collectively take interest in it again. At this point, it
seems to be low on everyone's priority list (including mine).



Raymond
Alex Martelli
2006-02-11 20:55:10 UTC
Permalink
On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:

> [Guido van Rossum]
>> PEP 351 - freeze protocol. I'm personally -1; I don't like the
>> idea of
>> freezing arbitrary mutable data structures. Are there champions who
>> want to argue this?
>
> It has at least one anti-champion. I think it is a horrible idea
> and would
> like to see it rejected in a way that brings finality. If needed,
> I can
> elaborate in a separate thread.

Could you please do that? I'd like to understand all of your
objections. Thanks!


Alex
Barry Warsaw
2006-02-11 21:18:59 UTC
Permalink
On Feb 11, 2006, at 3:55 PM, Alex Martelli wrote:

>
> On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:
>
>> [Guido van Rossum]
>>> PEP 351 - freeze protocol. I'm personally -1; I don't like the
>>> idea of
>>> freezing arbitrary mutable data structures. Are there champions who
>>> want to argue this?
>>
>> It has at least one anti-champion. I think it is a horrible idea
>> and would
>> like to see it rejected in a way that brings finality. If needed,
>> I can
>> elaborate in a separate thread.
>
> Could you please do that? I'd like to understand all of your
> objections. Thanks!

Better yet, add them to the PEP.

-Barry
Raymond Hettinger
2006-02-11 22:04:43 UTC
Permalink
----- Original Message -----
From: "Alex Martelli" <***@gmail.com>
To: "Raymond Hettinger" <***@rcn.com>
Cc: <python-***@python.org>
Sent: Saturday, February 11, 2006 3:55 PM
Subject: PEP 351


>
> On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:
>
>> [Guido van Rossum]
>>> PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
>>> freezing arbitrary mutable data structures. Are there champions who
>>> want to argue this?
>>
>> It has at least one anti-champion. I think it is a horrible idea and
>> would
>> like to see it rejected in a way that brings finality. If needed, I can
>> elaborate in a separate thread.
>
> Could you please do that? I'd like to understand all of your objections.
> Thanks!

Here was one email on the subject:
http://mail.python.org/pipermail/python-dev/2005-October/057586.html

I have a number of comp.lang.python posts on the subject also.

The presence of frozenset() tempts this sort of hypergeneralization. The
first stumbling block comes with dictionaries. Even if you skip past the
question of why you would want to freeze a dictionary (do you really want to
use it as a key?), one find that dicts are not naturally freezable -- dicts
compare using both keys and values; hence, if you want to hash a dict, you
need to hash both the keys and values, which means that the values have to
be hashable, a new and suprising requirement -- also, the values cannot be
mutated or else an equality comparison will fail when search for a frozen
dict that has been used as a key. One person who experimented with an
implementation dealt with the problem by recursively freezing all the
components (perhaps one of the dict's values is another dict which then
needs to be frozen too). Executive summary: freezing dicts is a can of
worms and not especially useful.

Another thought is that PEP 351 reflects a world view of wanting to treat
all containers polymorphically. I would suggest that they aren't designed
that way (i.e. you use different methods to add elements to lists, dicts,
and sets). Also, it is not especially useful to shovel around mutable
containers without respect to their type. Further, even if they were
polymorphic and freezable, treating them generically is likely to reflect
bad design -- the soul of good programming is the correct choice of
appropriate data structures.

Another PEP 351 world view is that tuples can serve as frozenlists; however,
that view represents a Liskov violation (tuples don't support the same
methods). This idea resurfaces and has be shot down again every few months.

More important than all of the above is the thought that auto-freezing is
like a bad C macro, it makes too much implicit and hides too much -- the
supported methods change, there is a issue keeping in sync with the
non-frozen original, etc.

In my experience with frozensets, I've learned that freezing is not an
incidental downstream effect; instead, it is an intentional, essential part
of the design and needs to be explicit.

If more is needed on the subject, I'll hunt down my old posts and organize
them. I hope we don't offer a freeze() builtin. If it is there, it will be
tempting to use it and I think it will steer people away from good design
and have a net harmful effect.


Raymond

P.S. The word "freezing" is itself misleading because it suggests an
in-place change. However, it really means that a new object is created
(just like tuple(somelist)).
Noam Raphael
2006-02-11 23:15:12 UTC
Permalink
Hello,

I just wanted to say this: you can reject PEP 351, please don't reject
the idea of frozen objects completely. I'm working on an idea similar
to that of the PEP, and I think that it can be done elegantly, without
the concrete problems that Raymond pointed. I didn't work on it in the
last few weeks, because of my job, but I hope to come back to it soon
and post a PEP and a reference implementation in CPython.

My quick responses, mostly to try to convince that I know a bit about
what I'm talking about:

First about the last point: I suggest that the function will be named
frozen(x), which suggests that nothing happens to x, you only get a
"frozen x". I suggest that this operation won't be called "freezing
x", but "making a frozen copy of x".

Now, along with the original order. Frozen dicts - if you want, you
can decide that dicts aren't frozenable, and that's ok. But if you do
want to make frozen copies of dicts, it isn't really such a problem -
it's similar to hashing a tuple, which requires recursive hashing of
all its elements; for making a frozen copy of a dict, you make a
frozen copy of all its values.

Treating all containers polymorphically - I don't suggest that. In my
suggestion, you may have frozen lists, frozen tuples (which are normal
tuples with frozen elements), frozen sets and frozen dicts.

Treating tuples as frozen lists - I don't suggest to do that. But if
my suggestion is accepted, there would be no need for tuples - frozen
lists would be just as useful.

And about the other concerns:

> More important than all of the above is the thought that auto-freezing is
> like a bad C macro, it makes too much implicit and hides too much -- the
> supported methods change, there is a issue keeping in sync with the
> non-frozen original, etc.
>
> In my experience with frozensets, I've learned that freezing is not an
> incidental downstream effect; instead, it is an intentional, essential part
> of the design and needs to be explicit.

I think these concerns can only be judged given a real suggestion,
along with an implementation. I have already implemented most of my
idea in CPython, and I think it's elegant and doesn't cause problems.
Of course, I may not be objective about the subject, but I only ask to
wait for the real suggestion before dropping it down.

To summarize, I see the faults in PEP 351. I think that another,
fairly similar idea might be a good one.

Have a good week,
Noam
Raymond Hettinger
2006-02-12 02:49:47 UTC
Permalink
[Noam]
> I just wanted to say this: you can reject PEP 351, please don't reject
> the idea of frozen objects completely. I'm working on an idea similar
> to that of the PEP,
. . .
> I think these concerns can only be judged given a real suggestion,
> along with an implementation. I have already implemented most of my
> idea in CPython, and I think it's elegant and doesn't cause problems.
> Of course, I may not be objective about the subject, but I only ask to
> wait for the real suggestion before dropping it down

I was afraid of this -- the freezing concept is a poison that will cause
some good minds to waste a good deal of their time. Once frozensets were
introduced, it was like lighting a flame drawing moths to their doom. At
first, it seems like such a natural, obvious extension to generically freeze
anything that is mutable. People exploring it seem to lose sight of
motivating use cases and get progressively turned around. It doesn't take
long to suddenly start thinking it is a good idea to have mutable strings,
to recursively freeze components of a dictionary, to introduce further
list/tuple variants, etc. Perhaps a consistent solution can be found, but
it no longer resembles Python; rather, it is a new language, one that is not
grounded in real-world use cases. Worse, I think a frozen() built-in would
be hazardous to users, drawing them away from better solutions to their
problems.

Expect writing and defending a PEP to consume a month of your life. Before
devoting more of your valuable time, here's a checklist of questions to ask
yourself (sort of a mid-project self-assessment and reality check):

1. It is already possible to turn many objects into key strings -- perhaps
by marshaling, pickling, or making a custom repr such as
repr(sorted(mydict.items())). Have you ever had occasion to use this? IOW,
have you ever really needed to use a dictionary as a key to another
dictionary? Has there been any clamor for a frozendict(), not as a toy
recipe but as a real user need that cannot be met by other Python
techniques? If the answer is no, it should be a hint that a generalized
freezing protocol will rot in the basement.

2. Before introducing a generalized freezing protocol, wouldn't it make
sense to write a third-party extension for just frozendicts, just to see if
anyone can possibly make productive use of it? One clue would be to search
for code that exercises the existing code in dict.__eq__(). If you rarely
have occasion to compare dicts, then it is certainly even more rare to want
to be able to hash them. If not, then is this project being pursued because
it is interesting or because there's a burning need that hasn't surfaced
before?

3. Does working out the idea entail recursive freezing of a dictionary?
Does that impose limits on generality (you can freeze some dicts but not
others)? Does working out the idea lead you to mutable strings? If so,
don't count on Guido's support..

4. Leaving reality behind (meaning actual problems that aren't readily
solvable with the existing language), try to contrive some hypothetical use
cases? Any there any that are not readily met by the simple recipe in the
earlier email:
http://mail.python.org/pipermail/python-dev/2005-October/057586.html ?

5. How extensively does the rest of Python have to change to support the new
built-in. If the patch ends-up touching many objects and introducing new
rules, then the payoff needs to be pretty darned good. I presume that for
frozen(x) to work a lot of types have to be modified. Python seems to fare
quite well without frozendicts and frozenlists, so do we need to introduce
them just to make the new frozen() built-in work with more than just sets?


Raymond
Guido van Rossum
2006-02-13 21:09:59 UTC
Permalink
I've rejected PEP 351, with a reference to this thread as the rationale.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Bengt Richter
2006-02-12 03:24:17 UTC
Permalink
On Sat, 11 Feb 2006 12:55:10 -0800, Alex Martelli <***@gmail.com> wrote:

>
>On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:
>
>> [Guido van Rossum]
>>> PEP 351 - freeze protocol. I'm personally -1; I don't like the
>>> idea of
>>> freezing arbitrary mutable data structures. Are there champions who
>>> want to argue this?
>>
>> It has at least one anti-champion. I think it is a horrible idea
>> and would
>> like to see it rejected in a way that brings finality. If needed,
>> I can
>> elaborate in a separate thread.
>
>Could you please do that? I'd like to understand all of your
>objections. Thanks!
>
>
PMJI. I just read PEP 351, and had an idea for doing the same without pre-instantiating protected
subclasses, and doing the wrapping on demand instead. Perhaps of interest? (Or if already considered
and rejected, shouldn't this be mentioned in the PEP?)

The idea is to factor out freezing from the objects to be frozen. If it's going to involve copying anyway,
feeding the object to a wrapping class constructor doesn't seem like much extra overhead.

The examples in the PEP were very amenable to this approach, but I don't know how it would apply
to whatever Alex's use cases might be.

Anyhow, why shouldn't you be able to call freeze(an_ordinary_list) and get back freeze(xlist(an_ordinary_list))
automatically, based e.g. on a freeze_registry_dict[type(an_ordinary_list)] => xlist lookup, if plain hash fails?

Common types that might be usefully freezable could be pre-registered, and when a freeze fails
on a user object (presumably inheriting a __hash__ that bombs or because he wants it to) the programmer's
solution would be to define a suitable callable to produce the frozen object, and register that, but not modify his
unwrapped pre-freeze-mods object types and instantiations.

BTW, xlist wouldn't need to exist, since freeze_registry_dict[type(alist)] could just return the tuple type.
Otherwise the programmer would make a wrapper class taking the object as an __init__ (or maybe __new__) arg,
and intercepting the mutating methods etc., and stuff that in the freeze_registry_dict. IWT some metaclass stuff
might make it possible to parameterize a lot of wrapper class aspects, e.g., if you gave it a
__mutator_method_name_list__ to work with.

Perhaps freeze builtin could be a callable object with __call__ for the freeze "function" call
and with e.g. freeze.register(objtype, wrapper_class) as a registry API.

I am +0 on any of this in any case, not having had a use case to date, but I thought taking the
__freeze__ out of the objects (by not forcing them to be them pre-instantiatated as wrapped instances)
and letting registered freeze wrappers do it on demand instead might be interesting to someone.
If not, or if it's been discussed (no mention on the PEP tho) feel free to ignore ;-)

BTW freeze as just described might be an instance of

class Freezer(object):
def __init__(self):
self._registry_dict = {
set:frozenset,
list:tuple,
dict:imdict}
def __call__(self, obj):
try: return hash(obj)
except TypeError:
freezer = self._registry_dict.get(type(obj))
if freezer: return freezer(obj)
raise TypeError('object is not freezable')
def register(self, objtype, wrapper):
self._registry_dict[objtype] = wrapper

(above refers to imdict from PEP 351)
Usage example:

>>> import alt351
>>> freeze = alt351.Freezer()
(well, pretend freeze is builtin)

>>> fr5 = freeze(range(5))
>>> fr5
(0, 1, 2, 3, 4)
>>> d = dict(a=1,b=2)
>>> d
{'a': 1, 'b': 2}
>>> fd = freeze(d)
>>> fd
{'a': 1, 'b': 2}
>>> fd['a']
1
>>> fd['a']=3
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "alt351.py", line 7, in _immutable
raise TypeError('object is immutable')
TypeError: object is immutable
>>> type(fd)
<class 'alt351.imdict'>

+0 ;-)


Regards,
Bengt Richter
Greg Ewing
2006-02-13 06:35:07 UTC
Permalink
Bengt Richter wrote:

> Anyhow, why shouldn't you be able to call freeze(an_ordinary_list) and get back freeze(xlist(an_ordinary_list))
> automatically, based e.g. on a freeze_registry_dict[type(an_ordinary_list)] => xlist lookup, if plain hash fails?

[Cue: sound of loud alarm bells going off in Greg's head]

-1 on having any kind of global freezing registry.

If we need freezing at all, I think it would be quite
sufficient to have a few types around such as
frozenlist(), frozendict(), etc.

I would consider it almost axiomatic that code needing
to freeze something will know what type of thing it is
freezing. If it doesn't, it has no business attempting
to do so.

If you need to freeze something not covered by the
standard frozen types, write your own class or function
to handle it, and invoke it explicitly where appropriate.

Greg
Thomas Wouters
2006-02-10 21:11:31 UTC
Permalink
On Fri, Feb 10, 2006 at 12:21:26PM -0800, Guido van Rossum wrote:

> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

Well, PyCon might be a nice place to finish any PEP patches. I know I'll be
available to do such work on the sprint days ;) I don't think that means
we'll have a working repository with all 2.5 features right after, though.

> > PEP 308: Conditional Expressions

> Definitely. Don't we have a volunteer doing this now?

There is a volunteer, but he's new at this, so he probably needs a bit of
time to work through the intricacies of the AST, the compiler and the eval
loop.

--
Thomas Wouters <***@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Barry Warsaw
2006-02-10 22:00:23 UTC
Permalink
On Feb 10, 2006, at 3:21 PM, Guido van Rossum wrote:
>
> PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
> freezing arbitrary mutable data structures. Are there champions who
> want to argue this?

I have no interest in it any longer, and wouldn't shed a tear if it
were rejected.

One other un-PEP'd thing. I'd like to put email 3.1 in Python 2.5
with the new module naming scheme. The old names will still work,
and all the unit tests pass. Do we need a PEP for that?

-Barry
Guido van Rossum
2006-02-10 22:47:01 UTC
Permalink
On 2/10/06, Raymond Hettinger <***@verizon.net> wrote:
> [Barry Warsaw"]like to put email 3.1 in Python 2.5
> > with the new module naming scheme. The old names will still work,
> > and all the unit tests pass. Do we need a PEP for that?
>
> +1

I don't know if Raymond meant "we need a PEP" or "go ahead with the
feature" but my own feeling is that this doesn't need a PEP and Barry
can Just Do It.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Barry Warsaw
2006-02-10 23:26:51 UTC
Permalink
On Feb 10, 2006, at 5:47 PM, Guido van Rossum wrote:

> On 2/10/06, Raymond Hettinger <***@verizon.net> wrote:
>> [Barry Warsaw"]like to put email 3.1 in Python 2.5
>>> with the new module naming scheme. The old names will still work,
>>> and all the unit tests pass. Do we need a PEP for that?
>>
>> +1
>
> I don't know if Raymond meant "we need a PEP" or "go ahead with the
> feature" but my own feeling is that this doesn't need a PEP and Barry
> can Just Do It.

I was going to ask the same thing. :)

Cool. So far there have been no objections on the email-sig, so I'll
try to move the sandbox to the trunk this weekend. That should give
us plenty of time to shake out any nastiness.

-Barry
Raymond Hettinger
2006-02-10 23:32:06 UTC
Permalink
Just do it.

----- Original Message -----
From: "Guido van Rossum" <***@python.org>
To: "Raymond Hettinger" <***@rcn.com>
Cc: "Barry Warsaw" <***@python.org>; <python-***@python.org>
Sent: Friday, February 10, 2006 5:47 PM
Subject: Re: [Python-Dev] release plan for 2.5 ?


On 2/10/06, Raymond Hettinger <***@verizon.net> wrote:
> [Barry Warsaw"]like to put email 3.1 in Python 2.5
> > with the new module naming scheme. The old names will still work,
> > and all the unit tests pass. Do we need a PEP for that?
>
> +1

I don't know if Raymond meant "we need a PEP" or "go ahead with the
feature" but my own feeling is that this doesn't need a PEP and Barry
can Just Do It.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Raymond Hettinger
2006-02-10 22:45:54 UTC
Permalink
[Barry Warsaw"]like to put email 3.1 in Python 2.5
> with the new module naming scheme. The old names will still work,
> and all the unit tests pass. Do we need a PEP for that?

+1
M.-A. Lemburg
2006-02-10 22:06:24 UTC
Permalink
Guido van Rossum wrote:
>> PEP 328: Absolute/Relative Imports
>
> Yes, please.

+0 for adding relative imports. -1 for raising errors for
in-package relative imports using the current notation
in Python 2.6.

See:

http://mail.python.org/pipermail/python-dev/2004-September/048695.html

for a previous discussion.

The PEP still doesn't have any mention of the above discussion or
later follow-ups.

The main argument is that the strategy to make absolute imports
mandatory and offer relative imports as work-around breaks the
possibility to produce packages that work in e.g. Python 2.4 and
2.6, simply because Python 2.4 doesn't support the needed
relative import syntax.

The only strategy left would be to use absolute imports throughout,
which isn't all that bad, except when it comes to relocating a
package or moving a set of misc. modules into a package - which is
not all that uncommon in larger projects, e.g. to group third-party
top-level modules into a package to prevent cluttering up the
top-level namespace or to simply make a clear distinction in
your code that you are relying on a third-party module, e.g

from thirdparty import tool

I don't mind having to deal with a warning for these, but don't
want to see this raise an error before Py3k.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 10 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Thomas Wouters
2006-02-10 22:38:42 UTC
Permalink
On Fri, Feb 10, 2006 at 11:06:24PM +0100, M.-A. Lemburg wrote:
> Guido van Rossum wrote:
> >> PEP 328: Absolute/Relative Imports
> >
> > Yes, please.

> +0 for adding relative imports. -1 for raising errors for
> in-package relative imports using the current notation
> in Python 2.6.

+1/-1 for me. Being able to explicitly demand relative imports is good,
breaking things soon bad. I'll happily shoehorn this in at the sprints after
PyCon ;)

--
Thomas Wouters <***@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Guido van Rossum
2006-02-10 22:45:54 UTC
Permalink
On 2/10/06, Thomas Wouters <***@xs4all.net> wrote:
> On Fri, Feb 10, 2006 at 11:06:24PM +0100, M.-A. Lemburg wrote:
> > Guido van Rossum wrote:
> > >> PEP 328: Absolute/Relative Imports
> > >
> > > Yes, please.
>
> > +0 for adding relative imports. -1 for raising errors for
> > in-package relative imports using the current notation
> > in Python 2.6.
>
> +1/-1 for me. Being able to explicitly demand relative imports is good,
> breaking things soon bad. I'll happily shoehorn this in at the sprints after
> PyCon ;)

The PEP has the following timeline (my interpretation):

2.4: implement new behavior with from __future__ import absolute_import
2.5: deprecate old-style relative import unless future statement present
2.6: disable old-style relative import, future statement no longer necessary

Since it wasn't implemented in 2.4, I think all these should be bumped
by one release. Aahz, since you own the PEP, can you do that (and make
any other updates that might result)?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Thomas Wouters
2006-02-10 22:55:36 UTC
Permalink
On Fri, Feb 10, 2006 at 02:45:54PM -0800, Guido van Rossum wrote:

> The PEP has the following timeline (my interpretation):
>
> 2.4: implement new behavior with from __future__ import absolute_import
> 2.5: deprecate old-style relative import unless future statement present
> 2.6: disable old-style relative import, future statement no longer necessary

> Since it wasn't implemented in 2.4, I think all these should be bumped
> by one release. Aahz, since you own the PEP, can you do that (and make
> any other updates that might result)?

Bumping is fine (of course), but I'd like a short discussion on the actual
disabling before it happens (rather than the disabling happening without
anyone noticing until beta2.) There seem to be a lot of users still using
2.3, at the moment, in spite of its age. Hopefully, by the time 2.7 comes
out, everyone will have switched to 2.5, but if not, it could still be a
major annoyance to conscientious module-writers, like MAL.

--
Thomas Wouters <***@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Alex Martelli
2006-02-10 22:54:25 UTC
Permalink
On 2/10/06, Guido van Rossum <***@python.org> wrote:
...
> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:

October would seem to me to be just about right. I don't see that one
month either way should make any big difference, though.

> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

If I could have a definitive frozen list of features by the first week
of April at the latest, that could make it (as a "2.5 preview") into
the 2nd edition of "Python in a Nutshell". But since alphas are not
feature-frozen, it wouldn't make much of a difference to me, I think.

> Other PEPs I'd like comment on:
>
> PEP 357 (__index__): the patch isn't on SF yet, but otherwise I'm all
> for this, and I'd like to accept it ASAP to get it in 2.5. It doesn't
> look like it'll cause any problems.

It does look great, and by whatever name I support it most heartily.
Do, however, notice that it's "yet another specialpurpose adaptation
protocol" and that such specific restricted solutions to the general
problem, with all of their issues, will just keep piling up forever
(and need legacy support ditto) until and unless your temperature wrt
246 (or any variation thereof) should change.

> PEP 355 (path module): I still haven't reviewed this, because I'm -0
> on adding what appears to me duplicate functionality. But if there's a

I feel definitely -0 towards it too.

> PEP 315 - do while. A simple enough syntax proposal, albeit one
> introducing a new keyword (which I'm fine with). I kind of like it but
> it doesn't strike me as super important -- if we put this off until
> Py3k I'd be fine with that too. Opinions? Champions?

Another -0 from me. I suggest we shelve it for now and revisit in 3k
(maybe PEPs in that state, "not in any 2.* but revisit for 3.0", need
a special status value).

> PEP 246 - adaptation. I'm still as lukewarm as ever; it needs
> interfaces, promises to cause a paradigm shift, and the global map
> worries me.

Doesn't _need_ interfaces as a concept -- any unique markers as
"protocol names" would do, even strings, although obviously the
"stronger" the markers the better (classes/types for example would be
just perfect). It was written on the assumption of interfaces just
because they were being proposed just before it. The key "paradigm
shift" is to offer a way to unify what's already being widely done, in
haphazard and dispersed manners. And I'll be quite happy to rewrite
it in terms of a more nuanced hierarchy of maps (e.g. builtin /
per-module / lexically nested, or whatever) if that's what it takes to
warm you to it -- I just think it would be over-engineering it, since
in practice the global-on-all-modules map would cover by far most
usage (both for "blessed" protocols that come with Python, and for the
use of "third party" adapting framework A to consume stuff that
framework B produces, global is the natural "residence"; other uses
are far less important.


> PEP 323 - copyable iterators. Seems stalled. Alex, do you care?

Sure, I'd like to make this happen, particularly since Raymond appears
to have already done the hard part. What would you like to see
happening to bless it for 2.5?

> PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?

Ditto -- I'd like at least SOME of it to be in 2.5. What needs to
happen for that?


Alex
Brett Cannon
2006-02-10 23:06:45 UTC
Permalink
On 2/10/06, Guido van Rossum <***@python.org> wrote:
> On 2/7/06, Neal Norwitz <***@gmail.com> wrote:
> > On 2/7/06, Jeremy Hylton <***@alum.mit.edu> wrote:
> > > It looks like we need a Python 2.5 Release Schedule PEP.
> >
> > Very draft: http://www.python.org/peps/pep-0356.html
> >
> > Needs lots of work and release managers. Anthony, Martin, Fred, Sean
> > are all mentioned with TBDs and question marks.
>
> Before he went off to a boondoggle^Woff-site at a Mexican resort, Neal
> made me promise that I'd look at this and try to get the 2.5 release
> plan going for real.
>
> First things first: we need a release manager. Anthony, do you want to
> do the honors again, or are you ready for retirement?
>
> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:
>
> alpha 1: May 2006
> alpha 2: June 2006
> beta 1: July 2006
> beta 2: August 2006
> rc 1: September 2006
> final: September 2006
>
> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.
>

I think that schedule is fine, but going alpha after PyCon is too fast
with the number of PEPs that need implementing.
[SNIP]
> > PEP 352: Required Superclass for Exceptions
>
> I believe this is pretty much non-controversial; it's a much weaker
> version of PEP 348 which was rightfully rejected for being too
> radical. I've tweaked some text in this PEP and approved it. Now we
> need to make it happen. It might be quite a tricky thing, since
> Exception is currently implemented in C as a classic class. If Brett
> wants to sprint on this at PyCon I'm there to help (Mon/Tue only).
> Fortunately we have MWH's patch 1104669 as a starting point.
>

I might sprint on it. It's either this or I will work on the AST
stuff (the PyObject branch is still not finishd and thus it has not
been finalized if that solution or the way it is now will be the final
way of implementing the compiler and I would like to see this
settled).

Either way I take responsibility to make sure the PEP gets implemented
so you can take that question off of the schedule PEP.

[SNIP]
> PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
> freezing arbitrary mutable data structures. Are there champions who
> want to argue this?
>

If Barry doesn't even care anymore I say kill it.

[SNIP]
> PEP 315 - do while. A simple enough syntax proposal, albeit one
> introducing a new keyword (which I'm fine with). I kind of like it but
> it doesn't strike me as super important -- if we put this off until
> Py3k I'd be fine with that too. Opinions? Champions?
>

Eh, seems okay but I am not jumping up and down for it. Waiting until
Python 3 is fine with me if a discussion is warranted (don't really
remember it coming up before).
[SNIP]
> PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?
>

I say put off. This could be discussed at PyCon since this might be
an important type to get right.

[SNIP]
> PEP 344 - exception chaining. There are deep problems with this due to
> circularities; perhaps we should drop this, or revisit it for Py3k.
>

I say revisit issues later. Raymond says he has an idea for chaining
just the messages which could be enough help for developers. But
either way I don't think this has been hashed out enough to go in
as-is. I suspect a simpler solution will work, such as ditching the
traceback and only keeping either the text that would have been
printed or just the exception instance (and thus also its message).

-Brett
Martin v. Löwis
2006-02-10 20:40:59 UTC
Permalink
Guido van Rossum wrote:
>> PEP 353: Using ssize_t as the index type
>
>
> Neal tells me that this is in progress in a branch, but that the code
> is not yet flawless (tons of warnings etc.). Martin, can you tell us
> more?

"It works", in a way. You only get the tons of warnings with the
right compiler, and you don't actually need to fix them all to get
something useful. Not all modules need to be converted to support
more than 2**31 elements for all containers they operate on, so
this could also be based on user feedback.

Some users (so far, just Marc-Andre) have complained that this
breaks backwards compatibility. Some improvements can be made still,
but for some aspects (tp_as_sequence callbacks), I think the best
we can hope for is compiler warnings about incorrect function
pointer types.

> When do you expect this to land? Maybe aggressively merging into
> the HEAD and then releasing it as alpha would be a good way to shake
> out the final issues???

Sure: I hope to complete this all in March.

Regards,
Martin
Neil Schemenauer
2006-02-11 05:08:09 UTC
Permalink
Guido van Rossum <***@python.org> wrote:
> PEP 349 - str() may return unicode. Where is this?

Does that mean you didn't find and read the PEP or was it written so
badly that it answered none of your questions? The PEP is on
python.org with all the rest. I set the status to "Deferred"
because it seemed that no one was interested in the change.

> I'm not at all sure the PEP is ready. it would probably be a lot
> of work to make this work everywhere in the C code, not to mention
> the stdlib .py code. Perhaps this should be targeted for 2.6
> instead? The consequences seem potentially huge.

The backwards compatibility problems *seem* to be relatively minor.
I only found one instance of breakage in the standard library. Note
that my patch does not change PyObject_Str(); that would break
massive amounts of code. Instead, I introduce a new function:
PyString_New(). I'm not crazy about the name but I couldn't think
of anything better.

Neil
Guido van Rossum
2006-02-11 05:25:21 UTC
Permalink
On 2/10/06, Neil Schemenauer <***@arctrix.com> wrote:
> Guido van Rossum <***@python.org> wrote:
> > PEP 349 - str() may return unicode. Where is this?
>
> Does that mean you didn't find and read the PEP or was it written so
> badly that it answered none of your questions? The PEP is on
> python.org with all the rest. I set the status to "Deferred"
> because it seemed that no one was interested in the change.

Sorry -- it was an awkward way to ask "what's the status"? You've answered that.

> > I'm not at all sure the PEP is ready. it would probably be a lot
> > of work to make this work everywhere in the C code, not to mention
> > the stdlib .py code. Perhaps this should be targeted for 2.6
> > instead? The consequences seem potentially huge.
>
> The backwards compatibility problems *seem* to be relatively minor.
> I only found one instance of breakage in the standard library. Note
> that my patch does not change PyObject_Str(); that would break
> massive amounts of code. Instead, I introduce a new function:
> PyString_New(). I'm not crazy about the name but I couldn't think
> of anything better.

So let's think about this more post 2.5.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Bengt Richter
2006-02-11 05:30:00 UTC
Permalink
On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <***@arctrix.com> wrote:

>Guido van Rossum <***@python.org> wrote:
>> PEP 349 - str() may return unicode. Where is this?
>
>Does that mean you didn't find and read the PEP or was it written so
>badly that it answered none of your questions? The PEP is on
>python.org with all the rest. I set the status to "Deferred"
>because it seemed that no one was interested in the change.
>
>> I'm not at all sure the PEP is ready. it would probably be a lot
>> of work to make this work everywhere in the C code, not to mention
>> the stdlib .py code. Perhaps this should be targeted for 2.6
>> instead? The consequences seem potentially huge.
>
>The backwards compatibility problems *seem* to be relatively minor.
>I only found one instance of breakage in the standard library. Note
>that my patch does not change PyObject_Str(); that would break
>massive amounts of code. Instead, I introduce a new function:
>PyString_New(). I'm not crazy about the name but I couldn't think
>of anything better.
>
Should this not be coordinated with PEP 332?

Regards,
Bengt Richter
Guido van Rossum
2006-02-11 05:35:26 UTC
Permalink
> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <***@arctrix.com> > >The backwards compatibility problems *seem* to be relatively minor.
> >I only found one instance of breakage in the standard library. Note
> >that my patch does not change PyObject_Str(); that would break
> >massive amounts of code. Instead, I introduce a new function:
> >PyString_New(). I'm not crazy about the name but I couldn't think
> >of anything better.

On 2/10/06, Bengt Richter <***@oz.net> wrote:
> Should this not be coordinated with PEP 332?

Probably.. But that PEP is rather incomplete. Wanna work on fixing that?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Bengt Richter
2006-02-11 08:20:27 UTC
Permalink
On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <***@python.org> wrote:

>> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <***@arctrix.com> > >The backwards compatibility problems *seem* to be relatively minor.
>> >I only found one instance of breakage in the standard library. Note
>> >that my patch does not change PyObject_Str(); that would break
>> >massive amounts of code. Instead, I introduce a new function:
>> >PyString_New(). I'm not crazy about the name but I couldn't think
>> >of anything better.
>
>On 2/10/06, Bengt Richter <***@oz.net> wrote:
>> Should this not be coordinated with PEP 332?
>
>Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
>
I'd be glad to add my thoughts, but first of course it's Skip's PEP,
and Martin casts a long shadow when it comes to character coding issues
that I suspect will have to be considered.

(E.g., if there is a b'...' literal for bytes, the actual characters of
the source code itself that the literal is being expressed in could be ascii
or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source
is at least temporarily normalized to Unicode, and then re-encoded (except now
for string literals?) per coding cookie or other encoding inference. (I may be
out of date, gotta catch up).

If one way or the other a string literal is in Unicode, then presumably so is
a byte string b'...' literal -- i.e. internally u"b'...'" just before
being turned into bytes.

Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes
for non-ascii and non-printables, to define the full 8 bits without encoding error?
Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'),
to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1').
(but how does this play with str being able to produce unicode? And when do these changes happen?)
I guess I'm getting ahead of myself ;-)

So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid
going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.

I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt
that anyone could then improve further. I don't know about an early deadline. I don't want
to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my
time more effectively ;-)

I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't
know who else might be interested...

Regards,
Bengt Richter
Guido van Rossum
2006-02-13 17:55:56 UTC
Permalink
One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes objects from string objects, with specific
explicit encodings, and those should suffice, at least initially.

I also wonder if having a b"..." literal would just add more confusion
-- bytes are not characters, but b"..." makes it appear as if they
are.

--Guido

On 2/11/06, Bengt Richter <***@oz.net> wrote:
> On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <***@python.org> wrote:
>
> >> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <***@arctrix.com> > >The backwards compatibility problems *seem* to be relatively minor.
> >> >I only found one instance of breakage in the standard library. Note
> >> >that my patch does not change PyObject_Str(); that would break
> >> >massive amounts of code. Instead, I introduce a new function:
> >> >PyString_New(). I'm not crazy about the name but I couldn't think
> >> >of anything better.
> >
> >On 2/10/06, Bengt Richter <***@oz.net> wrote:
> >> Should this not be coordinated with PEP 332?
> >
> >Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
> >
> I'd be glad to add my thoughts, but first of course it's Skip's PEP,
> and Martin casts a long shadow when it comes to character coding issues
> that I suspect will have to be considered.
>
> (E.g., if there is a b'...' literal for bytes, the actual characters of
> the source code itself that the literal is being expressed in could be ascii
> or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source
> is at least temporarily normalized to Unicode, and then re-encoded (except now
> for string literals?) per coding cookie or other encoding inference. (I may be
> out of date, gotta catch up).
>
> If one way or the other a string literal is in Unicode, then presumably so is
> a byte string b'...' literal -- i.e. internally u"b'...'" just before
> being turned into bytes.
>
> Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes
> for non-ascii and non-printables, to define the full 8 bits without encoding error?
> Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'),
> to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1').
> (but how does this play with str being able to produce unicode? And when do these changes happen?)
> I guess I'm getting ahead of myself ;-)
>
> So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid
> going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.
>
> I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt
> that anyone could then improve further. I don't know about an early deadline. I don't want
> to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my
> time more effectively ;-)
>
> I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't
> know who else might be interested...
>
> Regards,
> Bengt Richter
>
> _______________________________________________
> Python-Dev mailing list
> Python-***@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


--
--Guido van Rossum (home page: http://www.python.org/~guido/)
M.-A. Lemburg
2006-02-13 18:12:18 UTC
Permalink
Guido van Rossum wrote:
> One recommendation: for starters, I'd much rather see the bytes type
> standardized without a literal notation. There should be are lots of
> ways to create bytes objects from string objects, with specific
> explicit encodings, and those should suffice, at least initially.
>
> I also wonder if having a b"..." literal would just add more confusion
> -- bytes are not characters, but b"..." makes it appear as if they
> are.

Agreed.

Given that we have a source code encoding which would need
to be honored, b"..." doesn't really make all that much sense
(unless you always use hex escapes).

Note that if we drop the string type, all codecs which currently
return strings will have to return bytes. This gives you a pretty
exhaustive way of defining your binary literals in Python :-)

Here's one:

data = "abc".encode("latin-1")

To simplify things we might want to have

bytes("abc")

do the above encoding per default.

> --Guido
>
> On 2/11/06, Bengt Richter <***@oz.net> wrote:
>> On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <***@python.org> wrote:
>>
>>>> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <***@arctrix.com> > >The backwards compatibility problems *seem* to be relatively minor.
>>>>> I only found one instance of breakage in the standard library. Note
>>>>> that my patch does not change PyObject_Str(); that would break
>>>>> massive amounts of code. Instead, I introduce a new function:
>>>>> PyString_New(). I'm not crazy about the name but I couldn't think
>>>>> of anything better.
>>> On 2/10/06, Bengt Richter <***@oz.net> wrote:
>>>> Should this not be coordinated with PEP 332?
>>> Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
>>>
>> I'd be glad to add my thoughts, but first of course it's Skip's PEP,
>> and Martin casts a long shadow when it comes to character coding issues
>> that I suspect will have to be considered.
>>
>> (E.g., if there is a b'...' literal for bytes, the actual characters of
>> the source code itself that the literal is being expressed in could be ascii
>> or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source
>> is at least temporarily normalized to Unicode, and then re-encoded (except now
>> for string literals?) per coding cookie or other encoding inference. (I may be
>> out of date, gotta catch up).
>>
>> If one way or the other a string literal is in Unicode, then presumably so is
>> a byte string b'...' literal -- i.e. internally u"b'...'" just before
>> being turned into bytes.
>>
>> Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes
>> for non-ascii and non-printables, to define the full 8 bits without encoding error?
>> Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'),
>> to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1').
>> (but how does this play with str being able to produce unicode? And when do these changes happen?)
>> I guess I'm getting ahead of myself ;-)
>>
>> So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid
>> going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.
>>
>> I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt
>> that anyone could then improve further. I don't know about an early deadline. I don't want
>> to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my
>> time more effectively ;-)
>>
>> I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't
>> know who else might be interested...
>>
>> Regards,
>> Bengt Richter
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-***@python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-Dev mailing list
> Python-***@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Greg Ewing
2006-02-14 10:59:04 UTC
Permalink
Guido van Rossum wrote:

> I also wonder if having a b"..." literal would just add more confusion
> -- bytes are not characters, but b"..." makes it appear as if they
> are.

I'm inclined to agree. Bytes objects are more likely to be used
for things which are *not* characters -- if they're characters,
they would be better kept in strings or char arrays.

+1 on any eventual bytes literal looking completely different
from a string literal.

Greg
Phillip J. Eby
2006-02-13 18:19:04 UTC
Permalink
At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
>One recommendation: for starters, I'd much rather see the bytes type
>standardized without a literal notation. There should be are lots of
>ways to create bytes objects from string objects, with specific
>explicit encodings, and those should suffice, at least initially.
>
>I also wonder if having a b"..." literal would just add more confusion
>-- bytes are not characters, but b"..." makes it appear as if they
>are.

Why not just have the constructor be:

bytes(initializer [,encoding])

Where initializer must be either an iterable of suitable integers, or a
unicode/string object. If the latter (i.e., it's a basestring), the
encoding argument would then be required. Then, there's no need for
special codec support for the bytes type, since you call bytes on the thing
to be encoded. And of course, no need for a 'b' literal.
Guido van Rossum
2006-02-13 20:34:52 UTC
Permalink
On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
> At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
> >One recommendation: for starters, I'd much rather see the bytes type
> >standardized without a literal notation. There should be are lots of
> >ways to create bytes objects from string objects, with specific
> >explicit encodings, and those should suffice, at least initially.
> >
> >I also wonder if having a b"..." literal would just add more confusion
> >-- bytes are not characters, but b"..." makes it appear as if they
> >are.
>
> Why not just have the constructor be:
>
> bytes(initializer [,encoding])
>
> Where initializer must be either an iterable of suitable integers, or a
> unicode/string object. If the latter (i.e., it's a basestring), the
> encoding argument would then be required. Then, there's no need for
> special codec support for the bytes type, since you call bytes on the thing
> to be encoded. And of course, no need for a 'b' literal.

It'd be cruel and unusual punishment though to have to write

bytes("abc", "Latin-1")

I propose that the default encoding (for basestring instances) ought
to be "ascii" just like everywhere else. (Meaning, it should really be
the system default encoding, which defaults to "ascii" and is
intentionally hard to change.)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
M.-A. Lemburg
2006-02-13 21:55:01 UTC
Permalink
Guido van Rossum wrote:
> On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
>> At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
>>> One recommendation: for starters, I'd much rather see the bytes type
>>> standardized without a literal notation. There should be are lots of
>>> ways to create bytes objects from string objects, with specific
>>> explicit encodings, and those should suffice, at least initially.
>>>
>>> I also wonder if having a b"..." literal would just add more confusion
>>> -- bytes are not characters, but b"..." makes it appear as if they
>>> are.
>> Why not just have the constructor be:
>>
>> bytes(initializer [,encoding])
>>
>> Where initializer must be either an iterable of suitable integers, or a
>> unicode/string object. If the latter (i.e., it's a basestring), the
>> encoding argument would then be required. Then, there's no need for
>> special codec support for the bytes type, since you call bytes on the thing
>> to be encoded. And of course, no need for a 'b' literal.
>
> It'd be cruel and unusual punishment though to have to write
>
> bytes("abc", "Latin-1")
>
> I propose that the default encoding (for basestring instances) ought
> to be "ascii" just like everywhere else. (Meaning, it should really be
> the system default encoding, which defaults to "ascii" and is
> intentionally hard to change.)

We're talking about Py3k here: "abc" will be a Unicode string,
so why restrict the conversion to 7 bits when you can have 8 bits
without any conversion problems ?

While we're at it: I'd suggest that we remove the auto-conversion
from bytes to Unicode in Py3k and the default encoding along with
it. In Py3k the standard lib will have to be Unicode compatible
anyway and string parser markers like "s#" will have to go away
as well, so there's not much need for this anymore.

(Maybe a bit radical, but I guess that's what Py3k is meant for.)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Phillip J. Eby
2006-02-13 22:15:05 UTC
Permalink
At 10:55 PM 2/13/2006 +0100, M.-A. Lemburg wrote:
>Guido van Rossum wrote:
> > On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
> >> At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
> >>> One recommendation: for starters, I'd much rather see the bytes type
> >>> standardized without a literal notation. There should be are lots of
> >>> ways to create bytes objects from string objects, with specific
> >>> explicit encodings, and those should suffice, at least initially.
> >>>
> >>> I also wonder if having a b"..." literal would just add more confusion
> >>> -- bytes are not characters, but b"..." makes it appear as if they
> >>> are.
> >> Why not just have the constructor be:
> >>
> >> bytes(initializer [,encoding])
> >>
> >> Where initializer must be either an iterable of suitable integers, or a
> >> unicode/string object. If the latter (i.e., it's a basestring), the
> >> encoding argument would then be required. Then, there's no need for
> >> special codec support for the bytes type, since you call bytes on the
> thing
> >> to be encoded. And of course, no need for a 'b' literal.
> >
> > It'd be cruel and unusual punishment though to have to write
> >
> > bytes("abc", "Latin-1")
> >
> > I propose that the default encoding (for basestring instances) ought
> > to be "ascii" just like everywhere else. (Meaning, it should really be
> > the system default encoding, which defaults to "ascii" and is
> > intentionally hard to change.)
>
>We're talking about Py3k here: "abc" will be a Unicode string,
>so why restrict the conversion to 7 bits when you can have 8 bits
>without any conversion problems ?

Actually, I thought we were talking about adding bytes() in 2.5.

However, now that you've brought this up, it actually makes perfect sense
to just use latin-1 as the effective encoding for both strings and
unicode. In Python 2.x, strings are byte strings by definition, so it's
only in 3.0 that an encoding would be required. And again, latin1 is a
reasonable, roundtrippable default encoding.

So, it sounds like making the encoding default to latin-1 would be a
reasonably safe approach in both 2.x and 3.x.


>While we're at it: I'd suggest that we remove the auto-conversion
>from bytes to Unicode in Py3k and the default encoding along with
>it. In Py3k the standard lib will have to be Unicode compatible
>anyway and string parser markers like "s#" will have to go away
>as well, so there's not much need for this anymore.

I thought all this was already in the plan for 3.0, but maybe I assume too
much. :)
M.-A. Lemburg
2006-02-13 23:03:35 UTC
Permalink
Phillip J. Eby wrote:
>>>> Why not just have the constructor be:
>>>>
>>>> bytes(initializer [,encoding])
>>>>
>>>> Where initializer must be either an iterable of suitable integers, or a
>>>> unicode/string object. If the latter (i.e., it's a basestring), the
>>>> encoding argument would then be required. Then, there's no need for
>>>> special codec support for the bytes type, since you call bytes on the
>> thing
>>>> to be encoded. And of course, no need for a 'b' literal.
>>> It'd be cruel and unusual punishment though to have to write
>>>
>>> bytes("abc", "Latin-1")
>>>
>>> I propose that the default encoding (for basestring instances) ought
>>> to be "ascii" just like everywhere else. (Meaning, it should really be
>>> the system default encoding, which defaults to "ascii" and is
>>> intentionally hard to change.)
>> We're talking about Py3k here: "abc" will be a Unicode string,
>> so why restrict the conversion to 7 bits when you can have 8 bits
>> without any conversion problems ?
>
> Actually, I thought we were talking about adding bytes() in 2.5.

Then we'd need to make the "ascii" encoding assumption
again, just like Guido proposed.

> However, now that you've brought this up, it actually makes perfect sense
> to just use latin-1 as the effective encoding for both strings and
> unicode. In Python 2.x, strings are byte strings by definition, so it's
> only in 3.0 that an encoding would be required. And again, latin1 is a
> reasonable, roundtrippable default encoding.

It is. However, it's not a reasonable assumption of the
default encoding since there are many encodings out there
that special case the characters 0x80-0xFF, hence the choice
of using ASCII as default encoding in Python.

The conversion from Unicode to bytes is different in this
respect, since you are converting from a "bigger" type to
a "smaller" one. Choosing latin-1 as default for this
conversion would give you all 8 bits, instead of just 7
bits that ASCII provides.

> So, it sounds like making the encoding default to latin-1 would be a
> reasonably safe approach in both 2.x and 3.x.

Reasonable for bytes(): yes. In general: no.

>> While we're at it: I'd suggest that we remove the auto-conversion
>>from bytes to Unicode in Py3k and the default encoding along with
>> it. In Py3k the standard lib will have to be Unicode compatible
>> anyway and string parser markers like "s#" will have to go away
>> as well, so there's not much need for this anymore.
>
> I thought all this was already in the plan for 3.0, but maybe I assume too
> much. :)

Wouldn't want to wait for Py4D :-)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Phillip J. Eby
2006-02-13 23:17:07 UTC
Permalink
At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
>The conversion from Unicode to bytes is different in this
>respect, since you are converting from a "bigger" type to
>a "smaller" one. Choosing latin-1 as default for this
>conversion would give you all 8 bits, instead of just 7
>bits that ASCII provides.

I was just pointing out that since byte strings are bytes by definition,
then simply putting those bytes in a bytes() object doesn't alter the
existing encoding. So, using latin-1 when converting a string to bytes
actually seems like the the One Obvious Way to do it.

I'm so accustomed to being wary of encoding issues that the idea doesn't
*feel* right at first - I keep going, "but you can't know what encoding
those bytes are". Then I go, Duh, that's the point. If you convert
str->bytes, there's no conversion and no interpretation - neither the str
nor the bytes object knows its encoding, and that's okay. So
str(bytes_object) (in 2.x) should also just turn it back to a normal
bytestring.

In fact, the 'encoding' argument seems useless in the case of str objects,
and it seems it should default to latin-1 for unicode objects. The only
use I see for having an encoding for a 'str' would be to allow confirming
that the input string in fact is valid for that encoding. So,
"bytes(some_str,'ascii')" would be an assertion that some_str must be valid
ASCII.


> > So, it sounds like making the encoding default to latin-1 would be a
> > reasonably safe approach in both 2.x and 3.x.
>
>Reasonable for bytes(): yes. In general: no.

Right, I was only talking about bytes().

For 3.0, the type formerly known as "str" won't exist, so only the Unicode
part will be relevant then.
Guido van Rossum
2006-02-13 23:23:45 UTC
Permalink
On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
> At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
> >The conversion from Unicode to bytes is different in this
> >respect, since you are converting from a "bigger" type to
> >a "smaller" one. Choosing latin-1 as default for this
> >conversion would give you all 8 bits, instead of just 7
> >bits that ASCII provides.
>
> I was just pointing out that since byte strings are bytes by definition,
> then simply putting those bytes in a bytes() object doesn't alter the
> existing encoding. So, using latin-1 when converting a string to bytes
> actually seems like the the One Obvious Way to do it.

This actually makes some sense -- bytes(s) where isinstance(s, str)
should just copy the data, since we can't know what encoding the user
believes it is in anyway. (With the exception of string literals,
where it makes sense to assume that the user believes it is in the
same encoding as the source code -- but I believe non-ASCII characters
in string literals are disallowed anyway, or at least known to cause
undefined results in rats.)

> I'm so accustomed to being wary of encoding issues that the idea doesn't
> *feel* right at first - I keep going, "but you can't know what encoding
> those bytes are". Then I go, Duh, that's the point. If you convert
> str->bytes, there's no conversion and no interpretation - neither the str
> nor the bytes object knows its encoding, and that's okay. So
> str(bytes_object) (in 2.x) should also just turn it back to a normal
> bytestring.

You've got me convinced. Scrap my previous responses in this thread.

> In fact, the 'encoding' argument seems useless in the case of str objects,

Right.

> and it seems it should default to latin-1 for unicode objects.

But here I disagree.

> The only
> use I see for having an encoding for a 'str' would be to allow confirming
> that the input string in fact is valid for that encoding. So,
> "bytes(some_str,'ascii')" would be an assertion that some_str must be valid
> ASCII.

We already have ways to assert that a string is ASCII.

> For 3.0, the type formerly known as "str" won't exist, so only the Unicode
> part will be relevant then.

And I think then the encoding should be required or default to ASCII.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Phillip J. Eby
2006-02-14 00:09:57 UTC
Permalink
At 03:23 PM 2/13/2006 -0800, Guido van Rossum wrote:
>On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
> > The only
> > use I see for having an encoding for a 'str' would be to allow confirming
> > that the input string in fact is valid for that encoding. So,
> > "bytes(some_str,'ascii')" would be an assertion that some_str must be valid
> > ASCII.
>
>We already have ways to assert that a string is ASCII.

I didn't mean that it was the only purpose. In Python 2.x, practical code
has to sometimes deal with "string-like" objects. That is, code that takes
either strings or unicode. If such code calls bytes(), it's going to want
to include an encoding so that unicode conversions won't fail. But
silently ignoring the encoding argument in that case isn't a good idea.

Ergo, I propose to permit the encoding to be specified when passing in a
(2.x) str object, to allow code that handles both str and unicode to be
"str-stable" in 2.x.

I'm fine with rejecting an encoding argument if the initializer is not a
str or unicode; I just don't want the call signature to vary based on a
runtime distinction between str and unicode. And, I don't want the
encoding argument to be silently ignored when you pass in a string. If I
assert that I'm encoding ASCII (or utf-8 or whatever), then the string
should be required to be valid. If I don't pass in an encoding, then I'm
good to go.

(This is orthogonal to the issue of what encoding is used as a default for
conversions from the unicode type, btw.)


> > For 3.0, the type formerly known as "str" won't exist, so only the Unicode
> > part will be relevant then.
>
>And I think then the encoding should be required or default to ASCII.

The reason I'm arguing for latin-1 is symmetry in 2.x versions only. (In
3.x, there's no str vs. unicode, and thus nothing to be symmetrical.) So,
if you invoke bytes() without an encoding on a 2.x basestring, you should
get the same result. Latin-1 produces "the same result" when viewed in
terms of the resulting byte string.

If we don't go with latin-1, I'd argue for requiring an encoding for
unicode objects in 2.x, because that seems like the only reasonable way to
break the symmetry between str and unicode, even though it forces
"str-stable" code to specify an encoding. The key is that at least *one*
of the signatures needs to be stable in meaning across both str and unicode
in 2.x in order to allow unicode-safe, str-stable code to be written.

(Again, for 3.x, this issue doesn't come into play because there's only one
string type to worry about; what the default is or whether there's a
default is therefore entirely up to you.)
Guido van Rossum
2006-02-14 00:29:27 UTC
Permalink
On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
> I didn't mean that it was the only purpose. In Python 2.x, practical code
> has to sometimes deal with "string-like" objects. That is, code that takes
> either strings or unicode. If such code calls bytes(), it's going to want
> to include an encoding so that unicode conversions won't fail.

That sounds like a rather hypothetical example. Have you thought it
through? Presumably code that accepts both str and unicode either
doesn't care about encodings, but simply returns objects of the same
type as the arguments -- and then it's unlikely to want to convert the
arguments to bytes; or it *does* care about encodings, and then it
probably already has to special-case str vs. unicode because it has to
control how str objects are interpreted.

> But
> silently ignoring the encoding argument in that case isn't a good idea.
>
> Ergo, I propose to permit the encoding to be specified when passing in a
> (2.x) str object, to allow code that handles both str and unicode to be
> "str-stable" in 2.x.

Again, have you thought this through?

What would bytes("abc\xf0", "latin-1") *mean*? Take the string
"abc\xf0", interpret it as being encoded in XXX, and then encode from
XXX to Latin-1. But what's XXX? As I showed in a previous post,
"abc\xf0".encode("latin-1") *fails* because the source for the
encoding is assumed to be ASCII.

I think we can make this work only when the string in fact only
contains ASCII and the encoding maps ASCII to itself (which most
encodings do -- but e.g. EBCDIC does not). But I'm not sure how useful
that is.

> I'm fine with rejecting an encoding argument if the initializer is not a
> str or unicode; I just don't want the call signature to vary based on a
> runtime distinction between str and unicode.

I'm still not sure that this will actually help anyone.

> And, I don't want the
> encoding argument to be silently ignored when you pass in a string.

Agreed.

> If I
> assert that I'm encoding ASCII (or utf-8 or whatever), then the string
> should be required to be valid.

Defined how? That the string is already in that encoding?

> If I don't pass in an encoding, then I'm
> good to go.
>
> (This is orthogonal to the issue of what encoding is used as a default for
> conversions from the unicode type, btw.)

Right. The issues are completely different!

> > > For 3.0, the type formerly known as "str" won't exist, so only the Unicode
> > > part will be relevant then.
> >
> >And I think then the encoding should be required or default to ASCII.
>
> The reason I'm arguing for latin-1 is symmetry in 2.x versions only. (In
> 3.x, there's no str vs. unicode, and thus nothing to be symmetrical.) So,
> if you invoke bytes() without an encoding on a 2.x basestring, you should
> get the same result. Latin-1 produces "the same result" when viewed in
> terms of the resulting byte string.

Only if you assume the str object is encoded in Latin-1.

Your argument for symmetry would be a lot stronger if we used Latin-1
for the conversion between str and Unicode. But we don't. I like the
other interpretation (which I thought was yours too?) much better: str
<--> bytes conversions don't use encodings by simply change the type
without changing the bytes; conversion between either and unicode
works exactly the same, and requires an encoding unless all the
characters involved are pure ASCII.

> If we don't go with latin-1, I'd argue for requiring an encoding for
> unicode objects in 2.x, because that seems like the only reasonable way to
> break the symmetry between str and unicode, even though it forces
> "str-stable" code to specify an encoding. The key is that at least *one*
> of the signatures needs to be stable in meaning across both str and unicode
> in 2.x in order to allow unicode-safe, str-stable code to be written.

Using ASCII as the default encoding has the same property -- it can
remain stable across the 2.x / 3.0 boundary.

> (Again, for 3.x, this issue doesn't come into play because there's only one
> string type to worry about; what the default is or whether there's a
> default is therefore entirely up to you.)

A nice-to-have property would be that it might be possible to write
code that today deals with Unicode and str, but in 3.0 will deal with
Unicode and bytes instead. But I'm not sure how likely that is since
bytes objects won't have most methods that str and Unicode objects
have (like lower(), find(), etc.).

There's one property that bytes, str and unicode all share: type(x[0])
== type(x), at least as long as len(x) >= 1. This is perhaps the
ultimate test for string-ness.

Or should b[0] be an int, if b is a bytes object? That would change
things dramatically.

There's also the consideration for APIs that, informally, accept
either a string or a sequence of objects. Many of these exist, and
they are probably all being converted to support unicode as well as
str (if it makes sense at all). Should a bytes object be considered as
a sequence of things, or as a single thing, from the POV of these
types of APIs? Should we try to standardize how code tests for the
difference? (Currently all sorts of shortcuts are being taken, from
isinstance(x, (list, tuple)) to isinstance(x, basestring).)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Barry Warsaw
2006-02-14 04:59:03 UTC
Permalink
On Feb 13, 2006, at 7:29 PM, Guido van Rossum wrote:

> There's one property that bytes, str and unicode all share: type(x[0])
> == type(x), at least as long as len(x) >= 1. This is perhaps the
> ultimate test for string-ness.

But not perfect, since of course other containers can contain objects
of their own type too. But it leads to an interesting issue...

> Or should b[0] be an int, if b is a bytes object? That would change
> things dramatically.

This makes me think I want an unsigned byte type, which b[0] would
return. In another thread I think someone mentioned something about
fixed width integral types, such that you could have an object that
was guaranteed to be 8-bits wide, 16-bits wide, etc. Maybe you also
want signed and unsigned versions of each. This may seem like YAGNI
to many people, but as I've been working on a tightly embedded/
extended application for the last few years, I've definitely had
occasions where I wish I could more closely and more directly model
my C values as Python objects (without using the standard workarounds
or writing my own C extension types).

But anyway, without hyper-generalizing, it's still worth asking
whether a bytes type is just a container of byte objects, where the
contained objects would be distinct, fixed 8-bit unsigned integral
types.

> There's also the consideration for APIs that, informally, accept
> either a string or a sequence of objects. Many of these exist, and
> they are probably all being converted to support unicode as well as
> str (if it makes sense at all). Should a bytes object be considered as
> a sequence of things, or as a single thing, from the POV of these
> types of APIs? Should we try to standardize how code tests for the
> difference? (Currently all sorts of shortcuts are being taken, from
> isinstance(x, (list, tuple)) to isinstance(x, basestring).)

I think bytes objects are very much like string objects today --
they're the photons of Python since they can act like either
sequences or scalars, depending on the context. For example, we have
code that needs to deal with situations where an API can return
either a scalar or a sequence of those scalars. So we have a utility
function like this:

def thingiter(obj):
try:
it = iter(obj)
except TypeError:
yield obj
else:
for item in it:
yield item

Maybe there's a better way to do this, but the most obvious problem
is that (for our use cases), this fails for strings because in this
context we want strings to act like scalars. So we add a little test
just before the "try:" like "if isinstance(obj, basestring): yield
obj". But that's yucky.

I don't know what the solution is -- if there /is/ a solution short
of special case tests like above, but I think the key observation is
that sometimes you want your string to act like a sequence and
sometimes you want it to act like a scalar. I suspect bytes objects
will be the same way.

-Barry
Greg Ewing
2006-02-14 11:35:17 UTC
Permalink
Barry Warsaw wrote:

> This makes me think I want an unsigned byte type, which b[0] would
> return.

Come to think of it, this is something I don't
remember seeing discussed. I've been thinking
that bytes[i] would return an integer, but is
the intention that it would return another bytes
object?

Greg
Barry Warsaw
2006-02-14 13:32:42 UTC
Permalink
On Feb 14, 2006, at 6:35 AM, Greg Ewing wrote:

> Barry Warsaw wrote:
>
>> This makes me think I want an unsigned byte type, which b[0] would
>> return.
>
> Come to think of it, this is something I don't
> remember seeing discussed. I've been thinking
> that bytes[i] would return an integer, but is
> the intention that it would return another bytes
> object?

A related question: what would bytes([104, 101, 108, 108, 111, 8004])
return? An exception hopefully. I also think you'd want bytes([x
for x in some_bytes_object]) to return an object equal to the original.

-Barry
Greg Ewing
2006-02-14 11:25:03 UTC
Permalink
Guido van Rossum wrote:

> There's also the consideration for APIs that, informally, accept
> either a string or a sequence of objects.

My preference these days is not to design APIs that
way. It's never necessary and it avoids a lot of
problems.

Greg
Michael Hudson
2006-02-14 13:03:39 UTC
Permalink
Greg Ewing <***@canterbury.ac.nz> writes:

> Guido van Rossum wrote:
>
>> There's also the consideration for APIs that, informally, accept
>> either a string or a sequence of objects.
>
> My preference these days is not to design APIs that
> way. It's never necessary and it avoids a lot of
> problems.

Oh yes.

Cheers,
mwh

--
ZAPHOD: Listen three eyes, don't try to outweird me, I get stranger
things than you free with my breakfast cereal.
-- The Hitch-Hikers Guide to the Galaxy, Episode 7
Phillip J. Eby
2006-02-14 05:20:56 UTC
Permalink
At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
> > I didn't mean that it was the only purpose. In Python 2.x, practical code
> > has to sometimes deal with "string-like" objects. That is, code that takes
> > either strings or unicode. If such code calls bytes(), it's going to want
> > to include an encoding so that unicode conversions won't fail.
>
>That sounds like a rather hypothetical example. Have you thought it
>through? Presumably code that accepts both str and unicode either
>doesn't care about encodings, but simply returns objects of the same
>type as the arguments -- and then it's unlikely to want to convert the
>arguments to bytes; or it *does* care about encodings, and then it
>probably already has to special-case str vs. unicode because it has to
>control how str objects are interpreted.

Actually, it's the other way around. Code that wants to output
uninterpreted bytes right now and accepts either strings or Unicode has to
special-case *unicode* -- not str, because str is the only "bytes type" we
currently have.

This creates an interesting issue in WSGI for Jython, which of course only
has one (unicode-based) string type now. Since there's no bytes type in
Python in general, the only solution we could come up with was to treat
such strings as latin-1:

http://www.python.org/peps/pep-0333.html#unicode-issues

This is why I'm biased towards latin-1 encoding of unicode to bytes; it's
"the same thing" as an uninterpreted string of bytes.

I think the difference in our viewpoints is that you're still thinking
"string" thoughts, whereas I'm thinking "byte" thoughts. Bytes are just
bytes; they don't *have* an encoding.

So, if you think of "converting a string to bytes" as meaning "create an
array of numerals corresponding to the characters in the string", then this
leads to a uniform result whether the characters are in a str or a unicode
object. In other words, to me, bytes(str_or_unicode) should be treated as:

bytes(map(ord, str_or_unicode))

In other words, without an encoding, bytes() should simply treat str and
unicode objects *as if they were a sequence of integers*, and produce an
error when an integer is out of range. This is a logical and consistent
interpretation in the absence of an encoding, because in that case you
don't care about the encoding - it's just raw data.

If, however, you include an encoding, then you're stating that you want to
encode the *meaning* of the string, not merely its integer values.


>What would bytes("abc\xf0", "latin-1") *mean*? Take the string
>"abc\xf0", interpret it as being encoded in XXX, and then encode from
>XXX to Latin-1. But what's XXX? As I showed in a previous post,
>"abc\xf0".encode("latin-1") *fails* because the source for the
>encoding is assumed to be ASCII.

I'm saying that XXX would be the same encoding as you specified. i.e.,
including an encoding means you are encoding the *meaning* of the string.

However, I believe I mainly proposed this as an alternative to having
bytes(str_or_unicode) work like bytes(map(ord,str_or_unicode)), which I
think is probably a saner default.


>Your argument for symmetry would be a lot stronger if we used Latin-1
>for the conversion between str and Unicode. But we don't.

But that's because we're dealing with its meaning *as a string*, not merely
as ordinals in a sequence of bytes.


> I like the
>other interpretation (which I thought was yours too?) much better: str
><--> bytes conversions don't use encodings by simply change the type
>without changing the bytes;

I like it better too. The part you didn't like was where MAL and I believe
this should be extended to Unicode characters in the 0-255 range also. :)


>There's one property that bytes, str and unicode all share: type(x[0])
>== type(x), at least as long as len(x) >= 1. This is perhaps the
>ultimate test for string-ness.
>
>Or should b[0] be an int, if b is a bytes object? That would change
>things dramatically.

+1 for it being an int. Heck, I'd want to at least consider the
possibility of introducing a character type (chr?) in Python 3.0, and
getting rid of the "iterating a string yields strings"
characteristic. I've found it to be a bit of a pain when dealing with
heterogeneous nested sequences that contain strings.


>There's also the consideration for APIs that, informally, accept
>either a string or a sequence of objects. Many of these exist, and
>they are probably all being converted to support unicode as well as
>str (if it makes sense at all). Should a bytes object be considered as
>a sequence of things, or as a single thing, from the POV of these
>types of APIs? Should we try to standardize how code tests for the
>difference? (Currently all sorts of shortcuts are being taken, from
>isinstance(x, (list, tuple)) to isinstance(x, basestring).)

I'm inclined to think of certain features at least in terms of the buffer
interface, but that's not something that's really exposed at the Python level.
James Y Knight
2006-02-14 07:09:55 UTC
Permalink
On Feb 14, 2006, at 12:20 AM, Phillip J. Eby wrote:
> bytes(map(ord, str_or_unicode))
>
> In other words, without an encoding, bytes() should simply treat
> str and
> unicode objects *as if they were a sequence of integers*, and
> produce an
> error when an integer is out of range. This is a logical and
> consistent
> interpretation in the absence of an encoding, because in that case you
> don't care about the encoding - it's just raw data.


If you're talking about "raw data", then make bytes(unicodestring)
produce what buffer(unicodestring) currently does -- something
completely and utterly worthless. :) [it depends on how you compiled
python and what endianness your system has.]

There really is no case where you don't care about the
encoding...there is always a specific desired output encoding, and
you have to think about what encoding that is. The argument that
latin-1 is a sensible default just because you can convert to latin-1
by chopping off the upper 3 bytes of a unicode character's ordinal
position is not convincing; you're still doing an encoding operation,
it just happens to be computationally easy. That Jython programs have
to pretend that unicode strings are an appropriate way to store
bytes, and thus often have to do fake "latin-1" conversions which are
really no such thing, doesn't make a convincing argument either.
Using unicode strings to store bytes read from or written to a socket
is really just broken.

Actually having any default encoding at all is IMO a poor idea, but
as python has one at the moment (ascii), might as well keep using it
for consistency until it's eliminated (sys.setdefaultencoding
('undefined') is my friend.)

James
Michael Foord
2006-02-13 23:40:16 UTC
Permalink
Phillip J. Eby wrote:
[snip..]
>
> In fact, the 'encoding' argument seems useless in the case of str objects,
> and it seems it should default to latin-1 for unicode objects. The only
>
-1 for having an implicit encode that behaves differently to other
implicit encodes/decodes that happen in Python. Life is confusing enough
already.

Michael Foord
Guido van Rossum
2006-02-13 23:44:27 UTC
Permalink
On 2/13/06, Michael Foord <***@voidspace.org.uk> wrote:
> Phillip J. Eby wrote:
> [snip..]
> >
> > In fact, the 'encoding' argument seems useless in the case of str objects,
> > and it seems it should default to latin-1 for unicode objects. The only
> >
> -1 for having an implicit encode that behaves differently to other
> implicit encodes/decodes that happen in Python. Life is confusing enough
> already.

But adding an encoding doesn't help. The str.encode() method always
assumes that the string itself is ASCII-encoded, and that's not good
enough:

>>> "abc".encode("latin-1")
'abc'
>>> "abc".decode("latin-1")
u'abc'
>>> "abc\xf0".decode("latin-1")
u'abc\xf0'
>>> "abc\xf0".encode("latin-1")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
3: ordinal not in range(128)
>>>

The right way to look at this is, as Phillip says, to consider
conversion between str and bytes as not an encoding but a data type
change *only*.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Barry Warsaw
2006-02-13 23:50:40 UTC
Permalink
On Mon, 2006-02-13 at 15:44 -0800, Guido van Rossum wrote:

> The right way to look at this is, as Phillip says, to consider
> conversion between str and bytes as not an encoding but a data type
> change *only*.

That sounds right to me too.
-Barry
Michael Foord
2006-02-13 23:53:16 UTC
Permalink
Guido van Rossum wrote:
> On 2/13/06, Michael Foord <***@voidspace.org.uk> wrote:
>
>> Phillip J. Eby wrote:
>> [snip..]
>>
>>> In fact, the 'encoding' argument seems useless in the case of str objects,
>>> and it seems it should default to latin-1 for unicode objects. The only
>>>
>>>
>> -1 for having an implicit encode that behaves differently to other
>> implicit encodes/decodes that happen in Python. Life is confusing enough
>> already.
>>
>
> But adding an encoding doesn't help. The str.encode() method always
> assumes that the string itself is ASCII-encoded, and that's not good
> enough:
>
>
Sorry - I meant for the unicode to bytes case. A default encoding that
behaves differently to the current to implicit encodes/decodes would be
confusing IMHO.

I agree that string to bytes shouldn't change the value of the bytes.
The least confusing description of a non-unicode string is 'byte-string'.

Michael Foord
>>>> "abc".encode("latin-1")
>>>>
> 'abc'
>
>>>> "abc".decode("latin-1")
>>>>
> u'abc'
>
>>>> "abc\xf0".decode("latin-1")
>>>>
> u'abc\xf0'
>
>>>> "abc\xf0".encode("latin-1")
>>>>
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> 3: ordinal not in range(128)
>
>
> The right way to look at this is, as Phillip says, to consider
> conversion between str and bytes as not an encoding but a data type
> change *only*.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
Guido van Rossum
2006-02-14 00:09:32 UTC
Permalink
On 2/13/06, Michael Foord <***@voidspace.org.uk> wrote:
> Sorry - I meant for the unicode to bytes case. A default encoding that
> behaves differently to the current to implicit encodes/decodes would be
> confusing IMHO.

And I am in agreement with you there (I think only Phillip argued otherwise).

> I agree that string to bytes shouldn't change the value of the bytes.

It's a deal then.

Can the owner of PEP 332 update the PEP to record these decisions?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
James Y Knight
2006-02-14 00:49:55 UTC
Permalink
On Feb 13, 2006, at 7:09 PM, Guido van Rossum wrote:

> On 2/13/06, Michael Foord <***@voidspace.org.uk> wrote:
>> Sorry - I meant for the unicode to bytes case. A default encoding
>> that
>> behaves differently to the current to implicit encodes/decodes
>> would be
>> confusing IMHO.
>
> And I am in agreement with you there (I think only Phillip argued
> otherwise).
>
>> I agree that string to bytes shouldn't change the value of the bytes.
>
> It's a deal then.
>
> Can the owner of PEP 332 update the PEP to record these decisions?

So, in python2.X, you have:
- bytes("\x80"), you get a bytestring with a single byte of value
0x80 (when no encoding is specified, and the object is a str, it
doesn't try to encode it at all).
- bytes("\x80", encoding="latin-1"), you get an error, because
encoding "\x80" into latin-1 implicitly decodes it into a unicode
object first, via the system-wide default: ascii.
- bytes(u"\x80"), you get an error, because the default encoding for
a unicode string is ascii.
- bytes(u"\x80", encoding="latin-1"), you get a bytestring with a
single byte of value 0x80.

In py3k, when the str object is eliminated, then what do you have?
Perhaps
- bytes("\x80"), you get an error, encoding is required. There is no
such thing as "default encoding" anymore, as there's no str object.
- bytes("\x80", encoding="latin-1"), you get a bytestring with a
single byte of value 0x80.


James
Guido van Rossum
2006-02-14 01:11:42 UTC
Permalink
On 2/13/06, James Y Knight <***@fuhm.net> wrote:
> So, in python2.X, you have:
> - bytes("\x80"), you get a bytestring with a single byte of value
> 0x80 (when no encoding is specified, and the object is a str, it
> doesn't try to encode it at all).
> - bytes("\x80", encoding="latin-1"), you get an error, because
> encoding "\x80" into latin-1 implicitly decodes it into a unicode
> object first, via the system-wide default: ascii.
> - bytes(u"\x80"), you get an error, because the default encoding for
> a unicode string is ascii.
> - bytes(u"\x80", encoding="latin-1"), you get a bytestring with a
> single byte of value 0x80.

Yes to all.

> In py3k, when the str object is eliminated, then what do you have?
> Perhaps
> - bytes("\x80"), you get an error, encoding is required. There is no
> such thing as "default encoding" anymore, as there's no str object.
> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
> single byte of value 0x80.

Yes to both again.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Neil Schemenauer
2006-02-14 02:52:40 UTC
Permalink
Guido van Rossum <***@python.org> wrote:
>> In py3k, when the str object is eliminated, then what do you have?
>> Perhaps
>> - bytes("\x80"), you get an error, encoding is required. There is no
>> such thing as "default encoding" anymore, as there's no str object.
>> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
>> single byte of value 0x80.
>
> Yes to both again.

I haven't been following this dicussion about bytes() real closely
but I don't think that bytes() should do the encoding. We already
have a way to spell that:

"\x80".encode('latin-1')

Also, I think it would useful to introduce byte array literals at
the same time as the bytes object. That would allow people to use
byte arrays without having to get involved with all the silly string
encoding confusion.

Neil
Fred L. Drake, Jr.
2006-02-14 03:29:21 UTC
Permalink
On Monday 13 February 2006 21:52, Neil Schemenauer wrote:
> Also, I think it would useful to introduce byte array literals at
> the same time as the bytes object. That would allow people to use
> byte arrays without having to get involved with all the silly string
> encoding confusion.

bytes([0, 1, 2, 3])


-Fred

--
Fred L. Drake, Jr. <fdrake at acm.org>
Guido van Rossum
2006-02-14 04:07:49 UTC
Permalink
On 2/13/06, Neil Schemenauer <***@arctrix.com> wrote:
> Guido van Rossum <***@python.org> wrote:
> >> In py3k, when the str object is eliminated, then what do you have?
> >> Perhaps
> >> - bytes("\x80"), you get an error, encoding is required. There is no
> >> such thing as "default encoding" anymore, as there's no str object.
> >> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
> >> single byte of value 0x80.
> >
> > Yes to both again.
>
> I haven't been following this dicussion about bytes() real closely
> but I don't think that bytes() should do the encoding. We already
> have a way to spell that:
>
> "\x80".encode('latin-1')

But in 2.5 we can't change that to return a bytes object without
creating HUGE incompatibilities.

In general I've come to appreciate that there are two ways of
converting an object of type A to an object of type B: ask an A
instance to convert itself to a B, or ask the type B to create a new
instance from an A. Depending on what A and B are, both APIs make
sense; sometimes reasons of decoupling require that A can't know about
B, in which case you have to use the latter approach; sometimes B
can't know about A, in which case you have to use the former. Even
when A == B we sometimes support both APIs: to create a new list from
a list a, you can write a[:] or list(a); to create a new dict from a
dict d, you can write d.copy() or dict(d).

An advantage of the latter API is that there's no confusion about the
resulting type -- dict(d) is definitely a dict, and list(a) is
definitely a list. Not so for d.copy() or a[:] -- if the input type is
another mapping or sequence, it'll probably return an object of that
same type.

Again, it depends on the application which is better.

I think that bytes(s, <encoding>) is fine, especially for expressing a
new type, since it is unambiguous about the result type, and has no
backwards compatibility issues.

> Also, I think it would useful to introduce byte array literals at
> the same time as the bytes object. That would allow people to use
> byte arrays without having to get involved with all the silly string
> encoding confusion.

You missed the part where I said that introducing the bytes type
*without* a literal seems to be a good first step. A new type, even
built-in, is much less drastic than a new literal (which requires
lexer and parser support in addition to everything else).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Nick Coghlan
2006-02-14 11:53:04 UTC
Permalink
Guido van Rossum wrote:
> In general I've come to appreciate that there are two ways of
> converting an object of type A to an object of type B: ask an A
> instance to convert itself to a B, or ask the type B to create a new
> instance from an A.

And the difference between the two isn't even always that clear cut. Sometimes
you'll ask type B to create a new instance from an A, and then while you're
not looking type B cheats and goes and asks the A instance to do it instead ;)

Cheers,
Nick.

--
Nick Coghlan | ***@gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
Neil Schemenauer
2006-02-14 19:31:07 UTC
Permalink
On Mon, Feb 13, 2006 at 08:07:49PM -0800, Guido van Rossum wrote:
> On 2/13/06, Neil Schemenauer <***@arctrix.com> wrote:
> > "\x80".encode('latin-1')
>
> But in 2.5 we can't change that to return a bytes object without
> creating HUGE incompatibilities.

People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X. That spelling would provide a way of ensuring the type
of the return value.

> You missed the part where I said that introducing the bytes type
> *without* a literal seems to be a good first step. A new type, even
> built-in, is much less drastic than a new literal (which requires
> lexer and parser support in addition to everything else).

Are you concerned about the implementation effort? If so, I don't
think that's justified since adding a new string prefix should be
pretty straightforward (relative to rest of the effort involved).
Are you comfortable with the proposed syntax?

Neil
Guido van Rossum
2006-02-14 23:13:37 UTC
Permalink
On 2/14/06, Neil Schemenauer <***@arctrix.com> wrote:
> People could spell it bytes(s.encode('latin-1')) in order to make it
> work in 2.X. That spelling would provide a way of ensuring the type
> of the return value.

At the cost of an extra copying step.

[Guido]
> > You missed the part where I said that introducing the bytes type
> > *without* a literal seems to be a good first step. A new type, even
> > built-in, is much less drastic than a new literal (which requires
> > lexer and parser support in addition to everything else).
>
> Are you concerned about the implementation effort? If so, I don't
> think that's justified since adding a new string prefix should be
> pretty straightforward (relative to rest of the effort involved).

Not so much the implementation but also the documentation, updating
3rd party Python preprocessors, etc.

> Are you comfortable with the proposed syntax?

Not entirely, since I don't know what b"abc<euro>def" would mean
(where <euro> is a Unicode Euro character typed in whatever source
encoding was used).

Instead of b"abc" (only ASCII) you could write bytes("abc"). Instead
of b"\xf0\xff\xee" you could write bytes([0xf0, 0xff, 0xee]).

The key disconnect for me is that if bytes are not characters, we
shouldn't use a literal notation that resembles the literal notation
for characters. And there's growing consensus that a bytes type should
be considered as an array of (8-bit unsigned) ints.

Also, bytes objects are (in my mind anyway) mutable. We have no other
literal notation for mutable objects. What would the following code
print?

for i in range(2):
b = b"abc"
print b
b[0] = ord("A")

Would the second output line print abc or Abc?

I guess the only answer that makes sense is that it should print abc
both times; but that means that b"abc" must be internally implemented
by creating a new bytes object each time. Perhaps the implementation
effort isn't so minimal after all...

(PS why is there a reply-to in your email the excludes you from the
list of recipients but includes me?)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Martin v. Löwis
2006-02-14 06:58:01 UTC
Permalink
Guido van Rossum wrote:
>>In py3k, when the str object is eliminated, then what do you have?
>>Perhaps
>>- bytes("\x80"), you get an error, encoding is required. There is no
>>such thing as "default encoding" anymore, as there's no str object.
>>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
>>single byte of value 0x80.
>
>
> Yes to both again.

Please reconsider, and don't give bytes() an encoding= argument.
It doesn't need one. In Python 3, people should write

"\x80".encode("latin-1")

if they absolutely want to, although they better write

bytes([0x80])

Now, the first form isn't valid in 2.5, but

bytes(u"\x80".encode("latin-1"))

could work in all versions.

Regards,
Martin
Guido van Rossum
2006-02-14 23:13:29 UTC
Permalink
On 2/13/06, "Martin v. Löwis" <***@v.loewis.de> wrote:
> Guido van Rossum wrote:
> >>In py3k, when the str object is eliminated, then what do you have?
> >>Perhaps
> >>- bytes("\x80"), you get an error, encoding is required. There is no
> >>such thing as "default encoding" anymore, as there's no str object.
> >>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
> >>single byte of value 0x80.
> >
> > Yes to both again.
>
> Please reconsider, and don't give bytes() an encoding= argument.
> It doesn't need one. In Python 3, people should write
>
> "\x80".encode("latin-1")
>
> if they absolutely want to, although they better write
>
> bytes([0x80])
>
> Now, the first form isn't valid in 2.5, but
>
> bytes(u"\x80".encode("latin-1"))
>
> could work in all versions.

In 3.0, I agree that .encode() should return a bytes object.

I'd almost be convinced that in 2.x bytes() doesn't need an encoding
argument, except it will require excessive copying.
bytes(u.encode("utf8")) will certainly use 2*len(u) bytes space (plus
a constant); bytes(u, "utf8") only needs len(u) bytes. In 3.0,
bytes(s.encode(xxx)) would also create an extra copy, since the bytes
type is mutable (we all agree on that, don't we?).

I think that's a good enough argument for 2.x. We could keep the
extended API as an alternative form in 3.x, or automatically translate
calls to bytes(x, y) into x.encode(y).

BTW I think we'll need a new PEP instead of PEP 332. The latter has
almost no details relevant to this discussion, and it seems to treat
bytes as a near-synonym for str in 2.x. That's not the way this
discussion is going it seems.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Fuzzyman
2006-02-14 09:29:37 UTC
Permalink
Guido van Rossum wrote:

> [snip..]
>
>>In py3k, when the str object is eliminated, then what do you have?
>>Perhaps
>>- bytes("\x80"), you get an error, encoding is required. There is no
>>such thing as "default encoding" anymore, as there's no str object.
>>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
>>single byte of value 0x80.
>>
>>
>
>Yes to both again.
>
>
>
*Slightly* related question. Sorry for the tangent.

In Python 3K, when the string data-type has gone, what will
``open(filename).read()`` return ? Will the object returned have a
``decode`` method, to coerce to a unicode string ?

Also, what datatype will ``u'some string'.encode('ascii')`` return ?

I assume that when the ``bytes`` datatype is implemented, we will be
able to do ``open(filename, 'wb').write(bytes(somedata))`` ? Hmmm... I
probably ought to read the bytes PEP and the Py3k one...

Just curious...

All the best,

Michael Foord

>--
>--Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
Guido van Rossum
2006-02-14 19:07:09 UTC
Permalink
On 2/14/06, Fuzzyman <***@voidspace.org.uk> wrote:
> In Python 3K, when the string data-type has gone,

Technically it won't be gone; str will mean what it already means in
Jython and IronPython (for which CPython uses unicode in 2.x).

> what will
> ``open(filename).read()`` return ?

Since you didn't specify an open mode, it'll open it as a text file
using some default encoding (or perhaps it can guess the encoding from
file metadata -- this is all OS specific). So it'll return a string.

If you open the file in binary mode, however, read() will return a
bytes object. I'm currently considering whether we should have a
single open() function which returns different types of objects
depending on a string parameter's value, or whether it makes more
sense to have different functions, e.g. open() for text files and
openbinary() for binary files. I believe Fredrik Lundh wants open() to
use binary mode and opentext() for text files, but that seems
backwards -- surely text files are more commonly used, and surely the
most common operation should have the shorter name -- call it the
Huffman Principle.

> Will the object returned have a
> ``decode`` method, to coerce to a unicode string ?

No, the object returned will *be* a (unicode) string.

But a bytes object (returned by a binary open operation) will have a
decode() method.

> Also, what datatype will ``u'some string'.encode('ascii')`` return ?

It will be a syntax error (u"..." will be illegal).

The str.encode() method will return a bytes object (if the design goes
as planned -- none of this is set in stone yet).

> I assume that when the ``bytes`` datatype is implemented, we will be
> able to do ``open(filename, 'wb').write(bytes(somedata))`` ? Hmmm... I
> probably ought to read the bytes PEP and the Py3k one...

Sort of (except perhaps we'd be using openbinary(filename, 'w")).
Perhaps write(somedata) should automatically coerce the data to bytes?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Just van Rossum
2006-02-14 20:35:50 UTC
Permalink
Guido van Rossum wrote:

> > what will
> > ``open(filename).read()`` return ?
>
> Since you didn't specify an open mode, it'll open it as a text file
> using some default encoding (or perhaps it can guess the encoding from
> file metadata -- this is all OS specific). So it'll return a string.
>
> If you open the file in binary mode, however, read() will return a
> bytes object. I'm currently considering whether we should have a
> single open() function which returns different types of objects
> depending on a string parameter's value, or whether it makes more
> sense to have different functions, e.g. open() for text files and
> openbinary() for binary files. I believe Fredrik Lundh wants open() to
> use binary mode and opentext() for text files, but that seems
> backwards -- surely text files are more commonly used, and surely the
> most common operation should have the shorter name -- call it the
> Huffman Principle.

+1 for two functions.

My choice would be open() for binary and opentext() for text. I don't
find that backwards at all: the text function is going to be more
different from the current open() function then the binary function
would be since in many ways the str type is closer to bytes than to
unicode.

Maybe it's even better to use opentext() AND openbinary(), and deprecate
plain open(). We could even introduce them at the same time as bytes()
(and leave the open() deprecation for 3.0).

Just
Alex Martelli
2006-02-14 22:37:59 UTC
Permalink
On 2/14/06, Just van Rossum <***@letterror.com> wrote:
...
> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

What about shorter names, such as 'text' instead of 'opentext' and
'data' instead of 'openbinary'? By eschewing the 'open' prefix we
might make it easy to eventually migrate off it. Maybe text and data
could be two subclasses of file, with file remaining initially as it
is (and perhaps becoming an abstract-only baseclass at the time 'open'
is deprecated).

In real life, people do all the time use 'open' inappropriately (on
non-text files on Windows): one of the most frequent tasks on
python-help has to do with diagnosing that this is what happened and
suggest the addition of an explicit 'rb' or 'wb' argument. This
unending chore, in particular, makes me very wary of forever keeping
open to mean "open this _text_ file".


Alex
Barry Warsaw
2006-02-14 22:48:57 UTC
Permalink
On Tue, 2006-02-14 at 14:37 -0800, Alex Martelli wrote:

> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'? By eschewing the 'open' prefix we
> might make it easy to eventually migrate off it. Maybe text and data
> could be two subclasses of file, with file remaining initially as it
> is (and perhaps becoming an abstract-only baseclass at the time 'open'
> is deprecated).

I was actually thinking about static methods file.text() and file.data()
which seem nicely self descriptive, if a little bit longer.

-Barry
Guido van Rossum
2006-02-14 22:51:20 UTC
Permalink
On 2/14/06, Just van Rossum <***@letterror.com> wrote:
> Guido van Rossum wrote:
> > [...] surely text files are more commonly used, and surely the
> > most common operation should have the shorter name -- call it the
> > Huffman Principle.
>
> +1 for two functions.
>
> My choice would be open() for binary and opentext() for text. I don't
> find that backwards at all: the text function is going to be more
> different from the current open() function then the binary function
> would be since in many ways the str type is closer to bytes than to
> unicode.

It's still backwards because the current open function defaults to
text on Windows (the only platform where it matters any more).

> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

And then, on 2/14/06, Alex Martelli <***@gmail.com> wrote:
> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'? By eschewing the 'open' prefix we
> might make it easy to eventually migrate off it. Maybe text and data
> could be two subclasses of file, with file remaining initially as it
> is (and perhaps becoming an abstract-only baseclass at the time 'open'
> is deprecated).

Plain 'text' and 'data' don't convey the fact that we're talking about
opening I/O objects here. If you want, we could say textfile() and
datafile(). (I'm fine with data instead of binary.)

But somehow I still like the 'open' verb. It has a long and rich
tradition. And it also nicely conveys that it is a factory function
which may return objects of different types (though similar in API)
based upon either additional arguments (e.g. buffering) or the
environment (e.g. encodings) or even inspection of the file being
opened.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Thomas Wouters
2006-02-14 08:09:22 UTC
Permalink
On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:

> But adding an encoding doesn't help. The str.encode() method always
> assumes that the string itself is ASCII-encoded, and that's not good
> enough:

> >>> "abc".encode("latin-1")
> 'abc'
> >>> "abc".decode("latin-1")
> u'abc'
> >>> "abc\xf0".decode("latin-1")
> u'abc\xf0'
> >>> "abc\xf0".encode("latin-1")
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> 3: ordinal not in range(128)

These comments disturb me. I never really understood why (byte) strings grew
the 'encode' method, since 8-bit strings *are already encoded*, by their
very nature. I mean, I understand it's useful because Python does
non-unicode encodings like 'hex', but I don't really understand *why*. The
benefits don't seem to outweigh the cost (but that's hindsight.)

Directly encoding a (byte) string into a unicode encoding is mostly useless,
as you've shown. The only use-case I can think of is translating ASCII in,
for instance, EBCDIC. Encoding anything into an ASCII superset is a no-op,
unless the system encoding isn't 'ascii' (and that's pretty rare, and not
something a Python programmer should depend on.) On the other hand, the fact
that (byte) strings have an 'encode' method creates a lot of confusion in
unicode-newbies, and causes programs to break only when input is non-ASCII.
And non-ASCII input just happens too often and too unpredictably in
'real-world' code, and not enough in European programmers' tests ;P

Unicode objects and strings are not the same thing. We shouldn't treat them
as the same thing. They share an interface (like lists and tuples do), and
if you only use that interface, treating them as the same kind object is
mostly ok. They actually share *less* of an interface than lists and tuples,
though, as comparing strings to unicode objects can raise an exception,
whereas comparing lists to tuples is not expected to. For anything less
trivial than indexing, slicing and most of the string methods, and anything
what so ever involving non-ASCII (or, rather, non-system-encoding), unicode
objects and strings *must* be treated separately. For instance, there is no
correct way to do:

s.split("\x80")

unless you know the type of 's'. If it's unicode, you want u"\x80" instead
of "\x80". If it's not unicode, splitting "\x80" may not even be sensible,
but you wouldn't know from looking at the code -- maybe it expects a
specific encoding (or encoding family), maybe not. As soon as you deal with
unicode, you need to really understand the concept, and too many programmers
don't. And it's very hard to tell from someone's comments whether they fail
to understand or just get some of the terminology wrong; that's why Guido's
comments about 'encoding a byte string' and 'what if the file encoding is
Unicode' scare me. The unicode/string mixup almost makes me wish Python
was statically typed.

So please, please, please don't make the mistake of 'doing something' with
the 'encoding' argument to 'bytes(s, encoding)' when 's' is a (byte) string.
It wouldn't actually be usable except for the same things as 'str.encode':
to convert from ASCII to non-ASCII-supersets, or to convert to non-unicode
encodings (such as 'hex'.) You can achieve those two by doing, e.g.,
'bytes(s.encode('hex'))' if you really want to. Ignoring the encoding
(rather than raising an exception) would also allow code to be trivially
portable between Python 2.x and Py3K, when "" is actually a unicode object.

Not that I'm happy with ignoring anything, but not ignoring would be bigger
crime here.

Oh, and while on the subject, I'm not convinced going all-unicode in Py3K is
a good idea either, but maybe I should save that discussion for PyCon. I'm
not thinking "why do we need unicode" anymore (which I did two years ago ;)
but I *am* thinking it'll be a big step for 90% of the programmers if they
have to grasp unicode and encodings to be able to even do 'raw_input()'
sensibly. I know I spend an inordinate amount of time trying to explain the
basics on #python on irc.freenode.net already.

--
Thomas Wouters <***@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Guido van Rossum
2006-02-14 23:13:33 UTC
Permalink
On 2/14/06, Thomas Wouters <***@xs4all.net> wrote:
> On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:
>
> > But adding an encoding doesn't help. The str.encode() method always
> > assumes that the string itself is ASCII-encoded, and that's not good
> > enough:
>
> > >>> "abc".encode("latin-1")
> > 'abc'
> > >>> "abc".decode("latin-1")
> > u'abc'
> > >>> "abc\xf0".decode("latin-1")
> > u'abc\xf0'
> > >>> "abc\xf0".encode("latin-1")
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in ?
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> > 3: ordinal not in range(128)

(Note that I've since been convinced that bytes(s) where type(s) ==
str should just return a bytes object containing the same bytes as s,
regardless of encoding. So basically you're preaching to the choir
now. The only remaining question is what if anything to do with an
encoding argment when the first argument is of type str...)

> These comments disturb me. I never really understood why (byte) strings grew
> the 'encode' method, since 8-bit strings *are already encoded*, by their
> very nature. I mean, I understand it's useful because Python does
> non-unicode encodings like 'hex', but I don't really understand *why*. The
> benefits don't seem to outweigh the cost (but that's hindsight.)

It may also have something to do with Jython compatibility (which has
str and unicode being the same thing) or 3.0 future-proofing.

> Directly encoding a (byte) string into a unicode encoding is mostly useless,
> as you've shown. The only use-case I can think of is translating ASCII in,
> for instance, EBCDIC. Encoding anything into an ASCII superset is a no-op,
> unless the system encoding isn't 'ascii' (and that's pretty rare, and not
> something a Python programmer should depend on.) On the other hand, the fact
> that (byte) strings have an 'encode' method creates a lot of confusion in
> unicode-newbies, and causes programs to break only when input is non-ASCII.
> And non-ASCII input just happens too often and too unpredictably in
> 'real-world' code, and not enough in European programmers' tests ;P

Oh, there are lots of ways that non-ASCII input can break code, you
don't have to invoke encode() on str objects to get that effect. :/

> Unicode objects and strings are not the same thing. We shouldn't treat them
> as the same thing.

Well in 3.0 they *will* be the same thing, and in Jython they already are.

> They share an interface (like lists and tuples do), and
> if you only use that interface, treating them as the same kind object is
> mostly ok. They actually share *less* of an interface than lists and tuples,
> though, as comparing strings to unicode objects can raise an exception,
> whereas comparing lists to tuples is not expected to.

No, it causes silent surprises since [1,2,3] != (1,2,3).

> For anything less
> trivial than indexing, slicing and most of the string methods, and anything
> what so ever involving non-ASCII (or, rather, non-system-encoding), unicode
> objects and strings *must* be treated separately. For instance, there is no
> correct way to do:
>
> s.split("\x80")
>
> unless you know the type of 's'. If it's unicode, you want u"\x80" instead
> of "\x80". If it's not unicode, splitting "\x80" may not even be sensible,
> but you wouldn't know from looking at the code -- maybe it expects a
> specific encoding (or encoding family), maybe not. As soon as you deal with
> unicode, you need to really understand the concept, and too many programmers
> don't. And it's very hard to tell from someone's comments whether they fail
> to understand or just get some of the terminology wrong; that's why Guido's
> comments about 'encoding a byte string' and 'what if the file encoding is
> Unicode' scare me. The unicode/string mixup almost makes me wish Python
> was statically typed.

I'm mostly trying to reflect various broken mental models that users
may have. Believe me, my own confusion is nothing compared to the
confusion that occurs in less gifted users. :-)

The only use case for mixing ASCII and Unicode that I *wanted* to work
right was the mixing of pure ASCII strings (typically literals) with
Unicode data. And that works.

Where things unfortunately fall flat is when you start reading data
from files or interactive input and it gives you some encoded str
object instead of a Unicode object. Our mistake was that we didn't
foresee this clearly enough. Perhaps open(filename).read(), where the
file contains non-ASCII bytes, should have been changed to either
return a Unicode string (if an encoding can somehow be guessed), or
raise an exception, rather than returning an str object in some
unknown (and usually unknowable) encoding.

I hope to fix that in 3.0 too, BTW.

> So please, please, please don't make the mistake of 'doing something' with
> the 'encoding' argument to 'bytes(s, encoding)' when 's' is a (byte) string.
> It wouldn't actually be usable except for the same things as 'str.encode':
> to convert from ASCII to non-ASCII-supersets, or to convert to non-unicode
> encodings (such as 'hex'.) You can achieve those two by doing, e.g.,
> 'bytes(s.encode('hex'))' if you really want to. Ignoring the encoding
> (rather than raising an exception) would also allow code to be trivially
> portable between Python 2.x and Py3K, when "" is actually a unicode object.
>
> Not that I'm happy with ignoring anything, but not ignoring would be bigger
> crime here.

I'm beginning to see that this is a pretty reasonable interpretation.

> Oh, and while on the subject, I'm not convinced going all-unicode in Py3K is
> a good idea either, but maybe I should save that discussion for PyCon. I'm
> not thinking "why do we need unicode" anymore (which I did two years ago ;)
> but I *am* thinking it'll be a big step for 90% of the programmers if they
> have to grasp unicode and encodings to be able to even do 'raw_input()'
> sensibly. I know I spend an inordinate amount of time trying to explain the
> basics on #python on irc.freenode.net already.

I'm actually hoping that by having all strings be Unicode we'd
*reduce* the amount of confusion. The key (see above where I admitted
this as our biggest Unicode mistake) is to make sure that the
encoding/decoding is built into all I/O operations.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Martin v. Löwis
2006-02-14 06:52:13 UTC
Permalink
Phillip J. Eby wrote:
> I was just pointing out that since byte strings are bytes by definition,
> then simply putting those bytes in a bytes() object doesn't alter the
> existing encoding. So, using latin-1 when converting a string to bytes
> actually seems like the the One Obvious Way to do it.

This is a misconception. In Python 2.x, the type str already *is* a
bytes type. So if S is an instance of 2.x str, bytes(S) does not need
to do any conversion. You don't need to assume it is latin-1: it's
already bytes.

> In fact, the 'encoding' argument seems useless in the case of str objects,
> and it seems it should default to latin-1 for unicode objects.

I agree with the former, but not with the latter. There shouldn't be a
conversion of Unicode objects to bytes at all. If you want bytes from
a Unicode string U, write

bytes(U.encode(encoding))

Regards,
Martin
James Y Knight
2006-02-14 16:08:30 UTC
Permalink
On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:

> Phillip J. Eby wrote:
>> I was just pointing out that since byte strings are bytes by
>> definition,
>> then simply putting those bytes in a bytes() object doesn't alter the
>> existing encoding. So, using latin-1 when converting a string to
>> bytes
>> actually seems like the the One Obvious Way to do it.
>
> This is a misconception. In Python 2.x, the type str already *is* a
> bytes type. So if S is an instance of 2.x str, bytes(S) does not need
> to do any conversion. You don't need to assume it is latin-1: it's
> already bytes.
>
>> In fact, the 'encoding' argument seems useless in the case of str
>> objects,
>> and it seems it should default to latin-1 for unicode objects.
>
> I agree with the former, but not with the latter. There shouldn't be a
> conversion of Unicode objects to bytes at all. If you want bytes from
> a Unicode string U, write
>
> bytes(U.encode(encoding))

I like it, it makes sense. Unicode strings are simply not allowed as
arguments to the byte constructor. Thinking about it, why would it be
otherwise? And if you're mixing str-strings and unicode-strings, that
means the str-strings you're sometimes giving are actually not byte
strings, but character strings anyhow, so you should be encoding
those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

Kill the encoding argument, and you're left with:

Python2.X:
- bytes(bytes_object) -> copy constructor
- bytes(str_object) -> copy the bytes from the str to the bytes object
- bytes(sequence_of_ints) -> make bytes with the values of the ints,
error on overflow

Python3.X removes str, and most APIs that did return str return bytes
instead. Now all you have is:
- bytes(bytes_object) -> copy constructor
- bytes(sequence_of_ints) -> make bytes with the values of the ints,
error on overflow

Nice and simple.

James
Phillip J. Eby
2006-02-14 16:25:01 UTC
Permalink
At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:

>On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:
>
>>Phillip J. Eby wrote:
>>>I was just pointing out that since byte strings are bytes by
>>>definition,
>>>then simply putting those bytes in a bytes() object doesn't alter the
>>>existing encoding. So, using latin-1 when converting a string to
>>>bytes
>>>actually seems like the the One Obvious Way to do it.
>>
>>This is a misconception. In Python 2.x, the type str already *is* a
>>bytes type. So if S is an instance of 2.x str, bytes(S) does not need
>>to do any conversion. You don't need to assume it is latin-1: it's
>>already bytes.
>>
>>>In fact, the 'encoding' argument seems useless in the case of str
>>>objects,
>>>and it seems it should default to latin-1 for unicode objects.
>>
>>I agree with the former, but not with the latter. There shouldn't be a
>>conversion of Unicode objects to bytes at all. If you want bytes from
>>a Unicode string U, write
>>
>> bytes(U.encode(encoding))
>
>I like it, it makes sense. Unicode strings are simply not allowed as
>arguments to the byte constructor. Thinking about it, why would it be
>otherwise? And if you're mixing str-strings and unicode-strings, that
>means the str-strings you're sometimes giving are actually not byte
>strings, but character strings anyhow, so you should be encoding
>those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

Actually, I think you mean:

if isinstance(s_or_U, str):
s_or_U = s_or_U.decode('utf-8')

b = bytes(s_or_U.encode('utf-8'))

Or maybe:

if isinstance(s_or_U, unicode):
s_or_U = s_or_U.encode('utf-8')

b = bytes(s_or_U)

Which is why I proposed that the boilerplate logic get moved *into* the
bytes constructor. I think this use case is going to be common in today's
Python, but in truth I'm not as sure what bytes() will get used *for* in
today's Python. I'm probably overprojecting based on the need to use str
objects now, but bytes aren't going to be a replacement for str for a good
while anyway.


>Kill the encoding argument, and you're left with:
>
>Python2.X:
>- bytes(bytes_object) -> copy constructor
>- bytes(str_object) -> copy the bytes from the str to the bytes object
>- bytes(sequence_of_ints) -> make bytes with the values of the ints,
>error on overflow
>
>Python3.X removes str, and most APIs that did return str return bytes
>instead. Now all you have is:
>- bytes(bytes_object) -> copy constructor
>- bytes(sequence_of_ints) -> make bytes with the values of the ints,
>error on overflow
>
>Nice and simple.

I could certainly live with that approach, and it certainly rules out all
the "when does the encoding argument apply and when should it be an error
to pass it" questions. :)
James Y Knight
2006-02-14 18:36:26 UTC
Permalink
On Feb 14, 2006, at 11:25 AM, Phillip J. Eby wrote:
> At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
>> I like it, it makes sense. Unicode strings are simply not allowed as
>> arguments to the byte constructor. Thinking about it, why would it be
>> otherwise? And if you're mixing str-strings and unicode-strings, that
>> means the str-strings you're sometimes giving are actually not byte
>> strings, but character strings anyhow, so you should be encoding
>> those too. bytes(s_or_U.encode('utf-8')) is a perfectly good
>> spelling.
> Actually, I think you mean:
>
> if isinstance(s_or_U, str):
> s_or_U = s_or_U.decode('utf-8')
>
> b = bytes(s_or_U.encode('utf-8'))
>
> Or maybe:
>
> if isinstance(s_or_U, unicode):
> s_or_U = s_or_U.encode('utf-8')
>
> b = bytes(s_or_U)
>
> Which is why I proposed that the boilerplate logic get moved *into*
> the bytes constructor. I think this use case is going to be common
> in today's Python, but in truth I'm not as sure what bytes() will
> get used *for* in today's Python. I'm probably overprojecting
> based on the need to use str objects now, but bytes aren't going to
> be a replacement for str for a good while anyway.


I most certainly *did not* mean that. If you are mixing together str
and unicode instances, the str instances _must be_ in the default
encoding (ascii). Otherwise, you are bound for failure anyhow, e.g.
''.join(['\x95', u'1']). Str is used for two things right now: 1) a
byte string. 2) a unicode string restricted to 7bit ASCII. These two
uses are separate and you cannot mix them without causing disaster.

You've created an interface which can take either a utf8 byte-string,
or unicode character string. But that's wrong and can only cause
problems. It should take either an encoded bytestring, or a unicode
character string. Not both. If it takes a unicode character string,
there are two ways of spelling that in current python: a "str" object
with only ASCII in it, or a "unicode" object with arbitrary
characters in it. bytes(s_or_U.encode('utf-8')) works correctly with
both.

James
M.-A. Lemburg
2006-02-14 16:47:39 UTC
Permalink
James Y Knight wrote:
> Kill the encoding argument, and you're left with:
>
> Python2.X:
> - bytes(bytes_object) -> copy constructor
> - bytes(str_object) -> copy the bytes from the str to the bytes object
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,
> error on overflow
>
> Python3.X removes str, and most APIs that did return str return bytes
> instead. Now all you have is:
> - bytes(bytes_object) -> copy constructor
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,
> error on overflow
>
> Nice and simple.

Albeit, too simple.

The above approach would basically remove the possibility to easily
create bytes() from literals in Py3k, since literals in Py3k create
Unicode objects, e.g. bytes("123") would not work in Py3k.

It's hard to imagine how you'd provide a decent upgrade path
for bytes() if you introduce the above semantics in Py2.x.

People would start writing bytes("123") in Py2.x and expect
it to also work in Py3k, which it wouldn't.

To prevent this, you'd have to outrule bytes() construction
from strings altogether, which doesn't look like a viable
option either.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
James Y Knight
2006-02-14 18:35:44 UTC
Permalink
On Feb 14, 2006, at 11:47 AM, M.-A. Lemburg wrote:
> The above approach would basically remove the possibility to easily
> create bytes() from literals in Py3k, since literals in Py3k create
> Unicode objects, e.g. bytes("123") would not work in Py3k.

That is true. And I think that is correct. There should be b"string"
syntax.

> It's hard to imagine how you'd provide a decent upgrade path
> for bytes() if you introduce the above semantics in Py2.x.
>
> People would start writing bytes("123") in Py2.x and expect
> it to also work in Py3k, which it wouldn't.

Agreed, it won't work.

> To prevent this, you'd have to outrule bytes() construction
> from strings altogether, which doesn't look like a viable
> option either.

I don't think you have to do that, you just have to provide b"string".

I'd like to point out that the previous proposal had the same issue:

On Feb 13, 2006, at 8:11 PM, Guido van Rossum wrote:
> On 2/13/06, James Y Knight <***@fuhm.net> wrote:
>> In py3k, when the str object is eliminated, then what do you have?
>> Perhaps
>> - bytes("\x80"), you get an error, encoding is required. There is no
>> such thing as "default encoding" anymore, as there's no str object.
>> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
>> single byte of value 0x80.
>>
>
> Yes to both again.

James
Josiah Carlson
2006-02-14 17:28:54 UTC
Permalink
James Y Knight <***@fuhm.net> wrote:
> I like it, it makes sense. Unicode strings are simply not allowed as
> arguments to the byte constructor. Thinking about it, why would it be
> otherwise? And if you're mixing str-strings and unicode-strings, that
> means the str-strings you're sometimes giving are actually not byte
> strings, but character strings anyhow, so you should be encoding
> those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

I also like the removal of the encoding...

> Kill the encoding argument, and you're left with:
>
> Python2.X:
> - bytes(bytes_object) -> copy constructor
> - bytes(str_object) -> copy the bytes from the str to the bytes object
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,
> error on overflow
>
> Python3.X removes str, and most APIs that did return str return bytes
> instead. Now all you have is:
> - bytes(bytes_object) -> copy constructor
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,
> error on overflow

What's great is that this already works:

>>> import array
>>> array.array('b', [1,2,3])
array('b', [1, 2, 3])
>>> array.array('b', "hello")
array('b', [104, 101, 108, 108, 111])
>>> array.array('b', u"hello")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: array initializer must be list or string
>>> array.array('b', [150])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: signed char is greater than maximum
>>> array.array('B', [150])
array('B', [150])
>>> array.array('B', [350])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: unsigned byte integer is greater than maximum


And out of the deal we can get both signed and unsigned ints.

Re: Adam Olsen
> I'm starting to wonder, do we really need anything fancy? Wouldn't it
> be sufficient to have a way to compactly store 8-bit integers?

It already exists. It could just use another interface. The buffer
interface offers any array the ability to return strings. That may have
to change to return bytes objects in Py3k.

- Josiah
Guido van Rossum
2006-02-13 23:15:23 UTC
Permalink
On 2/13/06, Phillip J. Eby <***@telecommunity.com> wrote:
> Actually, I thought we were talking about adding bytes() in 2.5.

I was.

> However, now that you've brought this up, it actually makes perfect sense
> to just use latin-1 as the effective encoding for both strings and
> unicode. In Python 2.x, strings are byte strings by definition, so it's
> only in 3.0 that an encoding would be required. And again, latin1 is a
> reasonable, roundtrippable default encoding.
>
> So, it sounds like making the encoding default to latin-1 would be a
> reasonably safe approach in both 2.x and 3.x.

I disagree. IMO the same reasons why we don't do this now for the
conversion between str and unicode stands for bytes.

> >While we're at it: I'd suggest that we remove the auto-conversion
> >from bytes to Unicode in Py3k and the default encoding along with
> >it. In Py3k the standard lib will have to be Unicode compatible
> >anyway and string parser markers like "s#" will have to go away
> >as well, so there's not much need for this anymore.

I don't know yet what the C API will look like in 3.0. But it may well
have to support auto-conversion from Unicode to char* using some
system default encoding (e.g. the Windows default code page?) in order
to be able to conveniently wrap OS APIs that use char* instead of some
sort of Unicode (and each OS has its own way of interpreting char* as
Unicode -- I believe Apple uses UTF-8?).

> I thought all this was already in the plan for 3.0, but maybe I assume too
> much. :)

In Py3k, I can see two reasonable approaches to conversion between
strings (Unicode) and bytes: always require an explicit encoding, or
assume ASCII. Anything else is asking for trouble IMO.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum
2006-02-13 23:10:50 UTC
Permalink
On 2/13/06, M.-A. Lemburg <***@egenix.com> wrote:
> Guido van Rossum wrote:
> > It'd be cruel and unusual punishment though to have to write
> >
> > bytes("abc", "Latin-1")
> >
> > I propose that the default encoding (for basestring instances) ought
> > to be "ascii" just like everywhere else. (Meaning, it should really be
> > the system default encoding, which defaults to "ascii" and is
> > intentionally hard to change.)
>
> We're talking about Py3k here: "abc" will be a Unicode string,
> so why restrict the conversion to 7 bits when you can have 8 bits
> without any conversion problems ?

As Phillip guessed, I was indeed thinking about introducing bytes()
sooner than that, perhaps even in 2.5 (though I don't want anything
rushed).

Even in Py3k though, the encoding issue stands -- what if the file
encoding is Unicode? Then using Latin-1 to encode bytes by default
might not by what the user expected. Or what if the file encoding is
something totally different? (Cyrillic, Greek, Japanese, Klingon.)
Anything default but ASCII isn't going to work as expected. ASCII
isn't going to work as expected either, but it will complain loudly
(by throwing a UnicodeError) whenever you try it, rather than causing
subtle bugs later.

> While we're at it: I'd suggest that we remove the auto-conversion
> from bytes to Unicode in Py3k and the default encoding along with
> it.

I'm not sure which auto-conversion you're talking about, since there
is no bytes type yet. If you're talking about the auto-conversion from
str to unicode: the bytes type should not be assumed to have *any*
properties that the current str type has, and that includes
auto-conversion.

> In Py3k the standard lib will have to be Unicode compatible
> anyway and string parser markers like "s#" will have to go away
> as well, so there's not much need for this anymore.
>
> (Maybe a bit radical, but I guess that's what Py3k is meant for.)

Right.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
M.-A. Lemburg
2006-02-14 17:58:11 UTC
Permalink
Guido van Rossum wrote:
> On 2/13/06, M.-A. Lemburg <***@egenix.com> wrote:
>> Guido van Rossum wrote:
>>> It'd be cruel and unusual punishment though to have to write
>>>
>>> bytes("abc", "Latin-1")
>>>
>>> I propose that the default encoding (for basestring instances) ought
>>> to be "ascii" just like everywhere else. (Meaning, it should really be
>>> the system default encoding, which defaults to "ascii" and is
>>> intentionally hard to change.)
>> We're talking about Py3k here: "abc" will be a Unicode string,
>> so why restrict the conversion to 7 bits when you can have 8 bits
>> without any conversion problems ?
>
> As Phillip guessed, I was indeed thinking about introducing bytes()
> sooner than that, perhaps even in 2.5 (though I don't want anything
> rushed).

Hmm, that is probably going to be too early. As the thread shows
there are lots of things to take into account, esp. since if you
plan to introduce byte() in 2.x, the upgrade path to 3.x would
have to be carefully planned. Otherwise, we end up introducing
a feature which is meant to prepare for 3.x and then we end up
causing breakage when the move is finally implemented.

> Even in Py3k though, the encoding issue stands -- what if the file
> encoding is Unicode? Then using Latin-1 to encode bytes by default
> might not by what the user expected. Or what if the file encoding is
> something totally different? (Cyrillic, Greek, Japanese, Klingon.)
> Anything default but ASCII isn't going to work as expected. ASCII
> isn't going to work as expected either, but it will complain loudly
> (by throwing a UnicodeError) whenever you try it, rather than causing
> subtle bugs later.

I think there's a misunderstanding here: in Py3k, all "string"
literals will be converted from the source code encoding to
Unicode. There are no ambiguities - a Klingon character will still
map to the same ordinal used to create the byte content regardless
of whether the source file is encoded in UTF-8, UTF-16 or
some Klingon charset (are there any ?).

Furthermore, by restricting to ASCII you'd also outrule hex escapes
which seem to be the natural choice for presenting binary data in
literals - the Unicode representation would then only be an
implementation detail of the way Python treats "string" literals
and a user would certainly expect to find e.g. \x88 in the bytes object
if she writes bytes('\x88').

But maybe you have something different in mind... I'm talking
about ways to create bytes() in Py3k using "string" literals.

>> While we're at it: I'd suggest that we remove the auto-conversion
>> from bytes to Unicode in Py3k and the default encoding along with
>> it.
>
> I'm not sure which auto-conversion you're talking about, since there
> is no bytes type yet. If you're talking about the auto-conversion from
> str to unicode: the bytes type should not be assumed to have *any*
> properties that the current str type has, and that includes
> auto-conversion.

I was talking about the automatic conversion of 8-bit strings to
Unicode - which was a key feature to make the introduction of
Unicode less painful, but will no longer be necessary in Py3k.

>> In Py3k the standard lib will have to be Unicode compatible
>> anyway and string parser markers like "s#" will have to go away
>> as well, so there's not much need for this anymore.
>>
>> (Maybe a bit radical, but I guess that's what Py3k is meant for.)
>
> Right.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Martin v. Löwis
2006-02-14 06:47:13 UTC
Permalink
M.-A. Lemburg wrote:
> We're talking about Py3k here: "abc" will be a Unicode string,
> so why restrict the conversion to 7 bits when you can have 8 bits
> without any conversion problems ?

YAGNI. If you have a need for byte string in source code, it will
typically be "random" bytes, which can be nicely used through

bytes([0x73, 0x9f, 0x44, 0xd2, 0xfb, 0x49, 0xa3, 0x14, 0x8b, 0xee])

For larger blocks, people should use base64.string_to_bytes (which
can become a synonym for base64.decodestring in Py3k).

If you have bytes that are meaningful text for some application
(say, a wire protocol), it is typically ASCII-Text. No protocol
I know of uses non-ASCII characters for protocol information.

Of course, you need a way to get .encode output as bytes somehow,
both in 2.5, and in Py3k. I suggest writing

bytes(s.encode(encoding))

In 2.5, bytes() can be constructed from strings, and will do a
conversion; in Py3k, .encode will already return a string, so
this will be a no-op.

Regards,
Martin
Adam Olsen
2006-02-14 07:04:32 UTC
Permalink
On 2/13/06, "Martin v. Löwis" <***@v.loewis.de> wrote:
> M.-A. Lemburg wrote:
> > We're talking about Py3k here: "abc" will be a Unicode string,
> > so why restrict the conversion to 7 bits when you can have 8 bits
> > without any conversion problems ?
>
> YAGNI. If you have a need for byte string in source code, it will
> typically be "random" bytes, which can be nicely used through
>
> bytes([0x73, 0x9f, 0x44, 0xd2, 0xfb, 0x49, 0xa3, 0x14, 0x8b, 0xee])
>
> For larger blocks, people should use base64.string_to_bytes (which
> can become a synonym for base64.decodestring in Py3k).
>
> If you have bytes that are meaningful text for some application
> (say, a wire protocol), it is typically ASCII-Text. No protocol
> I know of uses non-ASCII characters for protocol information.

What would that imply for repr()? To support eval(repr(x)) it would
have to produce whatever format the source code includes to begin
with.

If I understand correctly there's three main candidates:
1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x
2. Direct copying to str/unicode if it's only ascii values, switching
to a list of hex literals if there's any non-ascii values
3. b"foo" literal with ascii for all ascii characters (other than \
and "), \xFF for individual characters that aren't ascii

Given the choice I prefer the third option, with the second option as
my runner up. The first option just screams "silent errors" to me.


--
Adam Olsen, aka Rhamphoryncus
Martin v. Löwis
2006-02-14 07:14:57 UTC
Permalink
Adam Olsen wrote:
> What would that imply for repr()? To support eval(repr(x))

I don't think eval(repr(x)) needs to be supported for the bytes
type. However, if that is desirable, it should return something
like

bytes([1,2,3])

Regards,
Martin
Adam Olsen
2006-02-14 12:47:39 UTC
Permalink
On 2/14/06, "Martin v. Löwis" <***@v.loewis.de> wrote:
> Adam Olsen wrote:
> > What would that imply for repr()? To support eval(repr(x))
>
> I don't think eval(repr(x)) needs to be supported for the bytes
> type. However, if that is desirable, it should return something
> like
>
> bytes([1,2,3])

I'm starting to wonder, do we really need anything fancy? Wouldn't it
be sufficient to have a way to compactly store 8-bit integers?

In 2.x we could convert unicode like this:
bytes(ord(c) for c in u"It's...".encode('utf-8'))
u"It's...".byteencode('utf-8') # Shortcut for above

In 3.0 it changes to:
"It's...".encode('utf-8')
u"It's...".byteencode('utf-8') # Same as above, kept for compatibility

Passing a str or unicode directly to bytes() would be an error.
repr(bytes(...)) would produce bytes([1,2,3]).

Probably need a __bytes__() method that print can call, or even better
a __print__(file) method[0]. The write() methods would of course have
to support bytes objects.

I realize it would be odd for the interactive interpret to print them
as a list of ints by default:
>>> u"It's...".byteencode('utf-8')
[73, 116, 39, 115, 46, 46, 46]
But maybe it's time we stopped hiding the real nature of bytes from users?


[0] By this I mean calling objects recursively and telling them what
file to print to, rather than getting a temporary string from them and
printing that. I always wondered why you could do that from C
extensions but not from Python code.

--
Adam Olsen, aka Rhamphoryncus
Georg Brandl
2006-02-10 21:09:52 UTC
Permalink
Guido van Rossum wrote:

> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:
>
> alpha 1: May 2006
> alpha 2: June 2006
> beta 1: July 2006
> beta 2: August 2006
> rc 1: September 2006
> final: September 2006
>
> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

I am not experienced in releasing, but with the multitude of new things
introduced in Python 2.5, could it be a good idea to release an early alpha
not long after all (most of?) the desired features are in the trunk?
That way people would get to testing sooner and the number of non-obvious
bugs may be reduced (I'm thinking of the import PEP, the implementation of
which is bound to be hairy, or "with" in its full extent).

Georg
Neal Norwitz
2006-02-12 06:38:10 UTC
Permalink
On 2/10/06, Georg Brandl <***@gmx.net> wrote:
>
> I am not experienced in releasing, but with the multitude of new things
> introduced in Python 2.5, could it be a good idea to release an early alpha
> not long after all (most of?) the desired features are in the trunk?

In the past, all new features had to be in before beta 1 IIRC (it
could have been beta 2 though). The goal is to get things in sooner,
preferably prior to alpha.

For 2.5, we should strive really hard to get features implemented
prior to alpha 1. Some of the changes (AST, ssize_t) are pervasive.
AST while localized, ripped the guts out of something every script
needs (more or less). ssize_t touches just about everything it seems.

n
Thomas Wouters
2006-02-12 10:51:41 UTC
Permalink
On Sat, Feb 11, 2006 at 10:38:10PM -0800, Neal Norwitz wrote:
> On 2/10/06, Georg Brandl <***@gmx.net> wrote:

> > I am not experienced in releasing, but with the multitude of new things
> > introduced in Python 2.5, could it be a good idea to release an early alpha
> > not long after all (most of?) the desired features are in the trunk?

> In the past, all new features had to be in before beta 1 IIRC (it
> could have been beta 2 though). The goal is to get things in sooner,
> preferably prior to alpha.

Well, in the past, features -- even syntax changes -- have gone in between
the last beta and the final release (but reminding Guido might bring him to
tears of regret. ;) Features have also gone into what would have been
'bugfix releases' if you looked at the numbering alone (1.5 -> 1.5.1 ->
1.5.2, for instance.) "The past" doesn't have a very impressive track
record... However, beta 1 is a very good ultimate deadline, and it's been
stuck by for the last few years, AFAIK. But I concur with:

> For 2.5, we should strive really hard to get features implemented
> prior to alpha 1. Some of the changes (AST, ssize_t) are pervasive.
> AST while localized, ripped the guts out of something every script
> needs (more or less). ssize_t touches just about everything it seems.

that as many features as possible, in particular the broad-touching ones,
should be in alpha 1.

--
Thomas Wouters <***@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Nick Coghlan
2006-02-11 11:04:53 UTC
Permalink
Guido van Rossum wrote:
> PEP 338 - support -m for modules in packages. I believe Nick Coghlan
> is close to implementing this. I'm fine with accepting it.

I just checked in a new version of PEP 338 that cleans up the approach so that
it provides support for any PEP 302 compliant packaging mechanism as well as
normal filesystem packages.

I've started a new thread for the discussion:
PEP 338 - Executing Modules as Scripts

Cheers,
Nick.

--
Nick Coghlan | ***@gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
Neal Norwitz
2006-02-12 06:32:58 UTC
Permalink
On 2/10/06, Guido van Rossum <***@python.org> wrote:
>
> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:
>
> alpha 1: May 2006
> alpha 2: June 2006
> beta 1: July 2006
> beta 2: August 2006
> rc 1: September 2006
> final: September 2006

I think this is very reasonable. Based on Martin's message and if we
can get everyone fired up and implementing, it would possible to start
in April. I'll update the PEP for starting in May now. We can revise
further later.

> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

I think PyCon is too early, but 3 alphas is a good idea. I'll add
this as well. Probably separated by 3-4 weeks so it doesn't change
the schedule much. The exact schedule will still changed based on
release manager availability and other stuff that needs to be
implemented.

> > PEP 353: Using ssize_t as the index type
>
> Neal tells me that this is in progress in a branch, but that the code
> is not yet flawless (tons of warnings etc.). Martin, can you tell us
> more? When do you expect this to land? Maybe aggressively merging into
> the HEAD and then releasing it as alpha would be a good way to shake
> out the final issues???

I'm tempted to say we should merge now. I know the branch works on
64-bit boxes. I can test on a 32-bit box if Martin hasn't already.
There will be a lot of churn fixing problems, but maybe we can get
more people involved.

n
Martin v. Löwis
2006-02-12 11:13:53 UTC
Permalink
Neal Norwitz wrote:
> I'm tempted to say we should merge now. I know the branch works on
> 64-bit boxes. I can test on a 32-bit box if Martin hasn't already.
> There will be a lot of churn fixing problems, but maybe we can get
> more people involved.

The ssize_t branch has now all the API I want it to have. I just
posted the PEP to comp.lang.python, maybe people have additional
things they consider absolutely necessary.

There are two aspects left, and both can be done after the merge:
- a lot of modules still need adjustments, to really support
64-bit collections. This shouldn't cause any API changes, AFAICT.

- the printing of Py_ssize_t values should be supported. I think
Tim proposed to provide the 'z' formatter across platforms.
This is a new API, but it's a pure extension, so it can be
done in the trunk.

I would like to avoid changing APIs after the merge to the trunk
has happened; I remember Guido saying (a few years ago) that this
change must be a single large change, rather many small incremental
changes. I agree, and I hope I have covered everything that needs
to be covered.

Regards,
Martin
Guido van Rossum
2006-02-13 19:12:42 UTC
Permalink
On 2/12/06, "Martin v. Löwis" <***@v.loewis.de> wrote:
> Neal Norwitz wrote:
> > I'm tempted to say we should merge now. I know the branch works on
> > 64-bit boxes. I can test on a 32-bit box if Martin hasn't already.
> > There will be a lot of churn fixing problems, but maybe we can get
> > more people involved.
>
> The ssize_t branch has now all the API I want it to have. I just
> posted the PEP to comp.lang.python, maybe people have additional
> things they consider absolutely necessary.
>
> There are two aspects left, and both can be done after the merge:
> - a lot of modules still need adjustments, to really support
> 64-bit collections. This shouldn't cause any API changes, AFAICT.
>
> - the printing of Py_ssize_t values should be supported. I think
> Tim proposed to provide the 'z' formatter across platforms.
> This is a new API, but it's a pure extension, so it can be
> done in the trunk.

Great news. I'm looking forward to getting this over with!

> I would like to avoid changing APIs after the merge to the trunk
> has happened; I remember Guido saying (a few years ago) that this
> change must be a single large change, rather many small incremental
> changes. I agree, and I hope I have covered everything that needs
> to be covered.

Let me qualify that a bit -- I'd be okay with one honking big change
followed by some minor adjustments. I'd say that, since you've already
done so much in the branch, we're quickly approaching the point where
the extra testing we get from merging soon out-benefits the problems
some folks may experience due to the branch not being perfect yet.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Phillip J. Eby
2006-02-10 21:07:50 UTC
Permalink
At 12:21 PM 2/10/2006 -0800, Guido van Rossum wrote:
> > PEP 343: The "with" Statement
>
>Didn't Michael Hudson have a patch?

PEP 343's "Accepted" status was reverted to "Draft" in October, and then
changed back to "Accepted". I believe the latter change is an error, since
you haven't pronounced on the changes. Have you reviewed the __context__
stuff that was added?

In any case Michael's patch was pre-AST branch merge, and no longer
reflects the current spec.


>PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?

Wasn't the plan to just make this a builtin version of array.array for
bytes, plus a .decode method and maybe a few other tweaks? We presumably
won't be able to .encode() to bytes or get bytes from sockets and files
until 3.0, but having the type and being able to write it to files and
sockets would be nice. I'm not sure about the b"" syntax, ISTR it was
controversial but I don't remember if there was a resolution.


>PEP 314 (metadata v1.1): this is marked as completed, but there's a
>newer PEP available: PEP 334 (metadata v1.2). That PEP has 2.5 as its
>target date. Shouldn't we implement it? (This is a topic that I
>haven't followed closely.) There's also the question whether 314
>should be marked final. Andrew or Richard?

I'm concerned that both metadata PEPs push to define syntax for things that
have undefined semantics. And worse, to define incompatible syntax in some
cases. PEP 345 for example, dictates the use of StrictVersion syntax for
the required version of Python and the version of external requirements,
but Python's own version numbers don't conform to strict version
syntax. ISTM that the metadata standard needs more work, especially since
PyPI doesn't actually support using all of the metadata provided by the
implemented version of the standard. There's no way to search for
requires/provides, for example (which is one reason why I went with
distribution names for dependency resolution in setuptools). Also, the
specs don't allow for a Maintainer distinct from the package Author, even
though the distutils themselves allow this. IMO, 345 needs to go back to
the drawing board, and I'm not really thrilled with the currently-useless
"requires/provides" stuff in PEP 314.

If we do anything with the package metadata in Python 2.5, I'd like it to
be *installing* PKG-INFO files alongside the packages, using a filename of
the form "distributionname-version-py2.5.someext". Setuptools supports
such files currently under the ".egg-info" extension, but I'd be just as
happy with '.pkg-info' if it becomes a Python standard addition to the
installation. Having this gives most of the benefits of PEP 262 (database
of installed packages), although I wouldn't mind extending the PKG-INFO
file format to include some of the PEP 262 additional data.

These are probably distutils-sig and/or catalog-sig topics; I just mainly
wanted to point out that 314, 245, and 262 need at least some tweaking and
possibly rethinking before any push to implementation.
Guido van Rossum
2006-02-10 21:29:30 UTC
Permalink
On 2/10/06, Phillip J. Eby <***@telecommunity.com> wrote:

I'm not following up to anything that Phillip wrote (yet), but his
response reminded me of two more issues:

- wsgiref, an implementation of PEP 333 (Web Standard Gateway
interface). I think this might make a good addition to the standard
library. The web-sig has been discussing additional things that might
be proposed for addition but I believe there's no consensus -- in any
case we ought to be conservative.

- setuplib? Wouldn't it make sense to add this to the 2.5 stdlib?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Georg Brandl
2006-02-10 21:38:59 UTC
Permalink
Guido van Rossum wrote:

> - setuplib? Wouldn't it make sense to add this to the 2.5 stdlib?

If you mean setuptools, I'm a big +1 (if it's production-ready by that time).
Together with a whipped up cheese shop we should finally be able to put up
something equal to cpan/rubygems.

Georg
Michael Hudson
2006-02-12 23:30:27 UTC
Permalink
"Phillip J. Eby" <***@telecommunity.com> writes:

> At 12:21 PM 2/10/2006 -0800, Guido van Rossum wrote:
>> > PEP 343: The "with" Statement
>>
>>Didn't Michael Hudson have a patch?
>
> PEP 343's "Accepted" status was reverted to "Draft" in October, and then
> changed back to "Accepted". I believe the latter change is an error, since
> you haven't pronounced on the changes. Have you reviewed the __context__
> stuff that was added?
>
> In any case Michael's patch was pre-AST branch merge, and no longer
> reflects the current spec.

It also never quite reflected the spec at the time, although I forget
the detail it didn't support :/

Cheers,
mwh

--
81. In computing, turning the obvious into the useful is a living
definition of the word "frustration".
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Brett Cannon
2006-02-08 05:39:19 UTC
Permalink
On 2/7/06, Neal Norwitz <***@gmail.com> wrote:
> On 2/7/06, Fredrik Lundh <***@pythonware.com> wrote:
> > >
> > > what's the current release plan for Python 2.5, btw? I cannot find a
> > > relevant PEP, and the "what's new" says "late 2005":
> > >
> > but I don't think that anyone followed up on this. what's the current
> > status ?
>
> Guido and I had a brief discussion about this. IIRC, he was thinking
> alpha around March and release around summer. I think this is
> aggressive with all the things still to do. We really need to get the
> ssize_t branch integrated.
>
> There are a bunch of PEPs that have been accepted (or close), but not
> implemented. I think these include (please correct me, so we can get
> a good list):
>
> http://www.python.org/peps/
>
> SA 308 Conditional Expressions
> SA 328 Imports: Multi-Line and Absolute/Relative
> SA 342 Coroutines via Enhanced Generators
> S 343 The "with" Statement
> S 353 Using ssize_t as the index type
>
> This one should be marked as final I believe:
>
> SA 341 Unifying try-except and try-finally
>

Supposedly Guido is close on pronouncing on PEP 352 (Required
Superclass for Exceptions), or so he said last time that thread came
about.

-Brett
Continue reading on narkive:
Loading...