cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now

Discussion:

Serhiy Storchaka

2014-07-30 03:59:15 UTC

http://hg.python.org/cpython/rev/79a5fbe2c78f
changeset: 91935:79a5fbe2c78f
parent: 91933:fbd104359ef8
date: Tue Jul 29 19:41:11 2014 -0400
Issue #22003: When initialized from a bytes object, io.BytesIO() now
defers making a copy until it is mutated, improving performance and
memory use on some use cases.
Patch by David Wilson.

Did you compare this with issue #15381 [1]?

[1] http://bugs.python.org/issue15381

Serhiy Storchaka

2014-07-30 06:11:24 UTC

Permalink

Post by Serhiy Storchaka

Did you compare this with issue #15381 [1]?
[1] http://bugs.python.org/issue15381

Using microbenchmark from issue22003:

$ cat i.py
import io
word = b'word'
line = (word * int(79/len(word))) + b'\n'
ar = line * int((4 * 1048576) / len(line))
def readlines():
return len(list(io.BytesIO(ar)))
print('lines: %s' % (readlines(),))
$ ./python -m timeit -s 'import i' 'i.readlines()'

Before patch: 10 loops, best of 3: 46.9 msec per loop
After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop
After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop

Antoine Pitrou

2014-07-30 13:59:48 UTC

Permalink

Post by Serhiy Storchaka

Did you compare this with issue #15381 [1]?

Not really, but David's patch is simple enough and does a good job of
accelerating the read-only BytesIO case.

Post by Serhiy Storchaka
$ ./python -m timeit -s 'import i' 'i.readlines()'
Before patch: 10 loops, best of 3: 46.9 msec per loop
After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop
After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop

I'm surprised your patch does better here. Any idea why?

Regards

Antoine.

Serhiy Storchaka

2014-07-30 19:48:52 UTC

Permalink

Post by Antoine Pitrou

Post by Serhiy Storchaka

Did you compare this with issue #15381 [1]?

Not really, but David's patch is simple enough and does a good job of
accelerating the read-only BytesIO case.

Ignoring tests and comments my patch adds/removes/modifies about 200
lines, and David's patch -- about 150 lines of code. But it's __sizeof__
looks not correct, correcting it requires changing about 50 lines. In
sum the complexity of both patches is about equal.

Post by Antoine Pitrou

I'm surprised your patch does better here. Any idea why?

I didn't look at David's patch too close yet. But my patch includes
optimization for end-of-line scanning.

Antoine Pitrou

2014-07-30 21:23:25 UTC

Permalink

Post by Serhiy Storchaka
Ignoring tests and comments my patch adds/removes/modifies about 200
lines, and David's patch -- about 150 lines of code. But it's __sizeof__
looks not correct, correcting it requires changing about 50 lines. In
sum the complexity of both patches is about equal.

I meant that David's approach is conceptually simpler, which makes it
easier to review.
Regardless, there is no exclusive-OR here: if you can improve over the
current version, there's no reason not to consider it/

Post by Serhiy Storchaka
I didn't look at David's patch too close yet. But my patch includes
optimization for end-of-line scanning.

Ahah, unrelated stuff :-)

Serhiy Storchaka

2014-07-31 14:09:41 UTC

Permalink

Post by Antoine Pitrou
I meant that David's approach is conceptually simpler, which makes it
easier to review.
Regardless, there is no exclusive-OR here: if you can improve over the
current version, there's no reason not to consider it/

Unfortunately there is no anything common in implementations.
Conceptually David came in his last patch to same idea as in issue15381
but with different and less general implementation. To apply my patch
you need first rollback issue22003 changes (except tests).

dw+

2014-07-30 09:46:30 UTC

Permalink

Hi Serhiy,

At least conceptually, 15381 seems the better approach, but getting a
correct implementation may take more iterations than the (IMHO) simpler
change in 22003. For my tastes, the current 15381 implementation seems a
little too magical in relying on Py_REFCNT() as the sole indication that
a PyBytes can be mutated.

For the sake of haste, 22003 only addresses the specific regression
introduced in Python 3.x BytesIO, compared to 2.x StringI, where 3.x
lacked an equivalent no-copies specialization.

David

Zachary Ware

2014-07-30 20:11:51 UTC

Permalink

I'd like to point out a couple of compiler warnings on Windows:

On Tue, Jul 29, 2014 at 6:45 PM, antoine.pitrou

diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
--- a/Modules/_io/bytesio.c
+++ b/Modules/_io/bytesio.c
@@ -33,6 +37,45 @@
return NULL; \
}
+/* Ensure we have a buffer suitable for writing, in the case that an initvalue
+ * object was provided, and we're currently borrowing its buffer. `size'
+ * indicates the new buffer size allocated as part of unsharing, to avoid a
+ * redundant reallocation caused by any subsequent mutation. `truncate'
+ * indicates whether truncation should occur if `size` < self->string_size.
+ *
+ * Do nothing if the buffer wasn't shared. Returns 0 on success, or sets an
+ * exception and returns -1 on failure. Existing state is preserved on failure.
+ */
+static int
+unshare(bytesio *self, size_t preferred_size, int truncate)
+{
+ if (self->initvalue) {
+ Py_ssize_t copy_size;
+ char *new_buf;
+
+ if((! truncate) && preferred_size < self->string_size) {

..\Modules\_io\bytesio.c(56): warning C4018: '<' : signed/unsigned mismatch

+ preferred_size = self->string_size;
+ }
+
+ new_buf = (char *)PyMem_Malloc(preferred_size);
+ if (new_buf == NULL) {
+ PyErr_NoMemory();
+ return -1;
+ }
+
+ copy_size = self->string_size;
+ if (copy_size > preferred_size) {

..\Modules\_io\bytesio.c(67): warning C4018: '>' : signed/unsigned mismatch

+ copy_size = preferred_size;
+ }
+
+ memcpy(new_buf, self->buf, copy_size);
+ Py_CLEAR(self->initvalue);
+ self->buf = new_buf;
+ self->buf_size = preferred_size;
+ self->string_size = (Py_ssize_t) copy_size;
+ }
+ return 0;
+}
/* Internal routine to get a line from the buffer of a BytesIO
object. Returns the length between the current position to the

--
Zach