Discussion:
Why does _pyio.*.readinto have to work with 'b' arrays?
Nikolaus Rath
2014-06-14 22:39:19 UTC
Permalink
Hello,

The _pyio.BufferedIOBase class contains the following hack to make sure
that you can read-into array objects with format 'b':

try:
b[:n] = data
except TypeError as err:
import array
if not isinstance(b, array.array):
raise err
b[:n] = array.array('b', data)

I am now wondering if I should implement the same hack in BufferedReader
(cf. issue 20578). Is there anything special about 'b' arrays that
justifies to treat them this way?

Note that readinto is supposed to work with any object implementing the
buffer protocol, but the Python implementation only works with
bytearrays and (with the above hack) 'b' arrays. Even using a 'B' array
import _pyio
from array import array
buf = array('b', b'x' * 10)
_pyio.open('/dev/zero', 'rb').readinto(buf)
10
buf = array('B', b'x' * 10)
_pyio.open('/dev/zero', 'rb').readinto(buf)
Traceback (most recent call last):
File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 662, in readinto
b[:n] = data
TypeError: can only assign array (not "bytes") to array slice

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 667, in readinto
b[:n] = array.array('b', data)
TypeError: bad argument type for built-in operation


It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.


Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«
Benjamin Peterson
2014-06-15 00:41:44 UTC
Permalink
Post by Nikolaus Rath
It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly
important compared to the C _io module.
Nick Coghlan
2014-06-15 04:31:36 UTC
Permalink
Post by Benjamin Peterson
Post by Nikolaus Rath
It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly
important compared to the C _io module.
_pyio was written before the various memoryview fixes that were
implemented in Python 3.3 - it seems to me it would make more sense to
use memoryview to correctly handle arbitrary buffer exporters (we
implemented similar fixes for the base64 module in 3.4).

Cheers,
Nick.
--
Nick Coghlan | ***@gmail.com | Brisbane, Australia
Nikolaus Rath
2014-06-15 04:57:12 UTC
Permalink
Post by Nick Coghlan
Post by Benjamin Peterson
Post by Nikolaus Rath
It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly
important compared to the C _io module.
_pyio was written before the various memoryview fixes that were
implemented in Python 3.3 - it seems to me it would make more sense to
use memoryview to correctly handle arbitrary buffer exporters (we
implemented similar fixes for the base64 module in 3.4).
Definitely. But is there a way to do that without writing C code?
Post by Nick Coghlan
Post by Benjamin Peterson
Post by Nikolaus Rath
from array import array
a = array('b', b'x'*10)
am = memoryview(a)
am[:3] = b'foo'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: memoryview assignment: lvalue and rvalue have different
structures
Post by Nick Coghlan
Post by Benjamin Peterson
Post by Nikolaus Rath
am[:3] = memoryview(b'foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: memoryview assignment: lvalue and rvalue have different
structures
Post by Nick Coghlan
Post by Benjamin Peterson
Post by Nikolaus Rath
am.format = 'B'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute 'format' of 'memoryview' objects is not writable
Post by Nick Coghlan
Post by Benjamin Peterson
Post by Nikolaus Rath
am[:3] = array('b', b'foo')
but that's again specific to a being a 'b'-array.


Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«
Nick Coghlan
2014-06-15 06:37:48 UTC
Permalink
Post by Nikolaus Rath
Post by Nick Coghlan
Post by Benjamin Peterson
Post by Nikolaus Rath
It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly
important compared to the C _io module.
_pyio was written before the various memoryview fixes that were
implemented in Python 3.3 - it seems to me it would make more sense to
use memoryview to correctly handle arbitrary buffer exporters (we
implemented similar fixes for the base64 module in 3.4).
Definitely. But is there a way to do that without writing C code?
Yes, Python level reshaping and typecasting of memory views is one of
the key enhancements Stefan implemented for 3.3.
Post by Nikolaus Rath
Post by Nick Coghlan
Post by Benjamin Peterson
from array import array
a = array('b', b'x'*10)
am = memoryview(a)
a
array('b', [120, 120, 120, 120, 120, 120, 120, 120, 120, 120])
Post by Nikolaus Rath
Post by Nick Coghlan
Post by Benjamin Peterson
am[:3] = memoryview(b'foo').cast('b')
a
array('b', [102, 111, 111, 120, 120, 120, 120, 120, 120, 120])

Cheers,
Nick.
--
Nick Coghlan | ***@gmail.com | Brisbane, Australia
Nikolaus Rath
2014-06-15 19:03:28 UTC
Permalink
Post by Nick Coghlan
Post by Nikolaus Rath
Post by Nick Coghlan
Post by Benjamin Peterson
Post by Nikolaus Rath
It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly
important compared to the C _io module.
_pyio was written before the various memoryview fixes that were
implemented in Python 3.3 - it seems to me it would make more sense to
use memoryview to correctly handle arbitrary buffer exporters (we
implemented similar fixes for the base64 module in 3.4).
Definitely. But is there a way to do that without writing C code?
Yes, Python level reshaping and typecasting of memory views is one of
the key enhancements Stefan implemented for 3.3.
[..]

Ah, nice. I'll use that. Thank you Stefan :-).


Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«
Victor Stinner
2014-06-15 09:31:43 UTC
Permalink
Post by Benjamin Peterson
Post by Nikolaus Rath
It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly
important compared to the C _io module.
Which types are accepted by the readinto() method of the C io module? If
the C module only accepts bytearray, the array hack must be removed from
_pyio.

The _pyio module is mostly used for testing purpose, it's much slower. I
hope that nobody uses it in production, the module is private (underscore
prefix). So it's fine to break backward compatibilty to have the same
behaviour then the C module.

Victor
Nikolaus Rath
2014-06-15 19:05:09 UTC
Permalink
Post by Victor Stinner
Post by Benjamin Peterson
Post by Nikolaus Rath
It seems to me that a much cleaner solution would be to simply declare
_pyio's readinto to only work with bytearrays, and to explicitly raise a
(more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly
important compared to the C _io module.
Which types are accepted by the readinto() method of the C io module?
Everything implementing the buffer protocol.
Post by Victor Stinner
If the C module only accepts bytearray, the array hack must be removed
from _pyio.
_pyio currently accepts only bytearray and 'b'-type arrays. But it seems
with memoryview.cast() we now have a way to make it behave like the C
module.


Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«
Loading...