PEP 467 – Minor API improvements for binary sequences¶
- PEP
467
- Title
Minor API improvements for binary sequences
- Version
$Revision$
- Last-Modified
- Author
Nick Coghlan <ncoghlan at gmail.com>, Ethan Furman <ethan at stoneleaf.us>
- Status
Deferred
- Type
Standards Track
- Content-Type
- Created
2014-03-30
- Python-Version
3.9
- Post-History
2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01
Contents
Abstract¶
During the initial development of the Python 3 language specification, the
core bytes type for arbitrary binary data started as the mutable type
that is now referred to as bytearray. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.
This PEP proposes five small adjustments to the APIs of the bytes and
bytearray types to make it easier to operate entirely in the binary domain:
Discourage passing single integer values to
bytesandbytearrayAdd
bytes.fromsizeandbytearray.fromsizealternative constructorsAdd
bytes.fromordandbytearray.fromordalternative constructorsAdd
bytes.getbyteandbytearray.getbytebyte retrieval methodsAdd
bytes.iterbytesandbytearray.iterbytesalternative iterators
And one built-in:
* ``bchr``
PEP Deferral¶
This PEP has been deferred until Python 3.9 at the earliest, as the open questions aren’t currently expected to be resolved in time for the Python 3.8 feature addition deadline in May 2019 (if you’re keen to see these changes implemented and are willing to drive that resolution process, contact the PEP authors).
Proposals¶
Discourage use of current “zero-initialised sequence” behaviour¶
Currently, the bytes and bytearray constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size:
>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')
This PEP proposes to update the documentation to discourage making use of that
input type dependent behaviour in Python 3.9, suggesting to use a new, more
explicit, bytes.fromsize(n) or bytearray.fromsize(n) spelling instead
(see next section).
However, the current handling of numeric inputs in the default constructors would remain in place indefinitely to avoid introducing a compatibility break.
No other changes are proposed to the existing constructors.
Addition of explicit “count and byte initialised sequence” constructors¶
To replace the now discouraged behaviour, this PEP proposes the addition of an
explicit fromsize alternative constructor as a class method on both
bytes and bytearray whose first argument is the count, and whose
second argument is the fill byte to use (defaults to \x00):
>>> bytes.fromsize(3)
b'\x00\x00\x00'
>>> bytearray.fromsize(3)
bytearray(b'\x00\x00\x00')
>>> bytes.fromsize(5, b'\x0a')
b'\x0a\x0a\x0a\x0a\x0a'
>>> bytearray.fromsize(5, b'\x0a')
bytearray(b'\x0a\x0a\x0a\x0a\x0a')
fromsize will behave just as the current constructors behave when passed a
single integer, while allowing for non-zero fill values when needed.
Similar to str.center, str.ljust, and str.rjust, both parameters
would be positional-only with no externally visible name.
Addition of “bchr” function and explicit “single byte” constructors¶
As binary counterparts to the text chr function, this PEP proposes
the addition of a bchr function and an explicit fromord alternative
constructor as a class method on both bytes and bytearray:
>>> bchr(ord("A"))
b'A'
>>> bchr(ord(b"A"))
b'A'
>>> bytes.fromord(65)
b'A'
>>> bytearray.fromord(65)
bytearray(b'A')
These methods will only accept integers in the range 0 to 255 (inclusive):
>>> bytes.fromord(512)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: integer must be in range(0, 256)
>>> bytes.fromord(1.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer
While this does create some duplication, there are valid reasons for it:
the
bchrbuiltin is to recreate theord/chr/unichrtrio from Python 2 under a different naming scheme (however, see the Open Questions section below)the class method is mainly for the
bytearray.fromordcase, withbytes.fromordadded for consistency
The documentation of the ord builtin will be updated to explicitly note
that bchr is the primary inverse operation for binary data, while chr
is the inverse operation for text data, and that bytes.fromord and
bytearray.fromord also exist.
Behaviourally, bytes.fromord(x) will be equivalent to the current
bytes([x]) (and similarly for bytearray). The new spelling is
expected to be easier to discover and easier to read (especially when used
in conjunction with indexing operations on binary sequence types).
As a separate method, the new spelling will also work better with higher
order functions like map.
Addition of “getbyte” method to retrieve a single byte¶
This PEP proposes that bytes and bytearray gain the method getbyte
which will always return bytes:
>>> b'abc'.getbyte(0)
b'a'
If an index is asked for that doesn’t exist, IndexError is raised:
>>> b'abc'.getbyte(9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index out of range
Addition of optimised iterator methods that produce bytes objects¶
This PEP proposes that bytes and bytearray``gain an optimised
``iterbytes method that produces length 1 bytes objects rather than
integers:
for x in data.iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer
For example:
>>> tuple(b"ABC".iterbytes())
(b'A', b'B', b'C')
Design discussion¶
Why not rely on sequence repetition to create zero-initialised sequences?¶
Zero-initialised sequences can be created via sequence repetition:
>>> b'\x00' * 3
b'\x00\x00\x00'
>>> bytearray(b'\x00') * 3
bytearray(b'\x00\x00\x00')
However, this was also the case when the bytearray type was originally
designed, and the decision was made to add explicit support for it in the
type constructor. The immutable bytes type then inherited that feature
when it was introduced in PEP 3137.
This PEP isn’t revisiting that original design decision, just changing the
spelling as users sometimes find the current behaviour of the binary sequence
constructors surprising. In particular, there’s a reasonable case to be made
that bytes(x) (where x is an integer) should behave like the
bytes.fromord(x) proposal in this PEP. Providing both behaviours as separate
class methods avoids that ambiguity.
Why use positional-only parameters?¶
This is for consistency with the other methods on the affected types, and to avoid having to devise sensible names for them.
Open Questions¶
Given the “multiple ways to do it” outcome, is the proposed
bchrbuiltin actually worth adding, or isord/bytes.fromord/chra sufficiently straightforward replacement for the Python 2ord/chr/unichroperations?Do we add
iterbytestomemoryview, or modifymemoryview.cast()to accept's'as a single-byte interpretation? Or do we ignorememoryviewfor now and add it later?
References¶
- 1
Initial March 2014 discussion thread on python-ideas (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
- 2
Guido’s initial feedback in that thread (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
- 3
Issue proposing moving zero-initialised sequences to a dedicated API (http://bugs.python.org/issue20895)
- 4
Issue proposing to use calloc() for zero-initialised binary sequences (http://bugs.python.org/issue21644)
- 5
August 2014 discussion thread on python-dev (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
- 6
June 2016 discussion thread on python-dev (https://mail.python.org/pipermail/python-dev/2016-June/144875.html)