Fluent Python - Chapter 2. An Array of Sequences
Container sequences: can hold different types. They contain references to the objects.
list, tuple, collections.deque
Flat sequences: can hold one type. They physically sotre the value.
str, bytes, bytearray, memoryview, array.array
Another way of grouping:
Mutable seuqences:
list, byte, array.array, collections.deque and memoryview
Immutable sequences:
tuple, str, bytes
List Comprehensions and Generator Expressions
Readability
List comprehension is meant to do one thing only: to build a new list. It is possible to abuse it to write very incomprehensible code, you should avoid it.
1 | symbols = '$¢£¥€¤' |
[36, 162, 163, 165, 8364, 164]
No leak in Python 3
1 | x = 'ABC' |
'ABC'
1 | dummy |
[65, 66, 67]
Versus map and filter
1 | beyond_ascii = [ord(s) for s in symbols if ord(s) > 127] |
[162, 163, 165, 8364, 164]
1 | beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols))) |
[162, 163, 165, 8364, 164]
Cartesian Products
Multiple Loops
1 | colors = ['black', 'white'] |
[('black', 'S'),
('black', 'M'),
('black', 'L'),
('white', 'S'),
('white', 'M'),
('white', 'L')]
We can also arrange them by size
1 | colors = ['black', 'white'] |
[('black', 'S'),
('white', 'S'),
('black', 'M'),
('white', 'M'),
('black', 'L'),
('white', 'L')]
Generator Expressions To initialize tuples, arrays, and other tyoes of sequences, you can use Generator Expressions (genexps).
Genexps saves memory because it yields items one by one.
Generator Expressions use the same syntax as list comprehension, but are enclosed in parentheses rather than brackets.
1 | symobls = '$¢£¥€¤' |
(36, 162, 163, 165, 8364, 164)
If you have millions of items, only to feed the loop, genexp can save the expense of building a list for that.
1 | colors = ['black', 'white'] |
black S
black M
black L
white S
white M
white L
Tuples Ate Not Just Immutable Lists
Tuples do double duty:
- immutable lists
- as records with no field names
Tuple as Records
1 | lax_coordinates = (33.9425, -118.408056) |
BRA/CE342567
ESP/XDA205856
USA/31195855
1 | for country, _ in traveler_ids: |
USA
BRA
ESP
Tuple Unpacking
parallel assignment
1 | lax_coordinates = (33.9425, -118.408056) |
33.9425
1 | longitude |
-118.408056
swap
1 | a = 1 |
2 1
Prefixing an argument with a start
1 | divmod(20, 8) |
(2, 4)
1 | t = (20, 8) |
(2, 4)
1 | quotient, remainder = divmod(*t) |
(2, 4)
Focus on certain part
1 | import os |
'idrsa.pub'
Using * to grab excess items
Without a *, it ignore the rest elements while unpacking.
1 | a, b, rest = range(5) |
1 | a, b, *rest = range(5) |
(0, 1, [2, 3, 4])
1 | a, b, *rest = range(3) |
(0, 1, [2])
Any place
1 | a, *body, c, d = range(5) |
(0, [1, 2], 3, 4)
1 | *head, b, c, d = range(5) |
([0, 1], 2, 3, 4)
Nested Tuple Unpacking
Unpacking nested tuple, like (a, b, (c, d))
1 | metro_areas = [ |
| lat. | long.
Mexico City | 19.4333 | -99.1333
New York-Newark | 40.8086 | -74.0204
Sao Paulo | -23.5478 | -46.6358
Named Tuples
1 | from collections import namedtuple |
City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
1 | tokyo.population |
36.933
1 | tokyo.coordinates |
(35.689722, 139.691667)
Named tuple attributes
1 | City._fields |
('name', 'country', 'population', 'coordinates')
1 | LatLong = namedtuple('LatLong', 'lat long') |
{'coordinates': LatLong(lat=28.613889, long=77.208889),
'country': 'IN',
'name': 'Delhi NCR',
'population': 21.935}
You can also use *.
1 | delhi = City(*delhi_data) |
City(name='Delhi NCR', country='IN', population=21.935, coordinates=LatLong(lat=28.613889, long=77.208889))
Tuple as Immutable Lists
Tuple supports all list methods that do not involve adding or removing items, like
s.__iadd__, s.__append__. s.clear(), s.copy(), s.__delitem__, s.__insert__, s.__imul__, s.__reversed__()
reversed(tuple) works without s.__reversed__. (list has it only for optimization)
Slicing
s[a:b:c]: a to b with a c stride.
1 | s = 'bicycle' |
'bye'
1 | s[::-1] |
'elcycib'
1 | s[::-2] |
'eccb'
1 | l = list(range(10)) |
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 | l[2:5] = [20, 30] |
[0, 1, 20, 30, 5, 6, 7, 8, 9]
1 | del l[5:7] |
[0, 1, 20, 30, 5, 8, 9]
1 | l[3::2] = [11, 22] |
[0, 1, 20, 11, 5, 22, 9]
The value must be a iterable
1 | l[2:5] = 100 |
1 | l[2:5] = [100] |
[0, 1, 100, 22, 9]
Using + and * with Sequences
How to create list of lists?
Wrong way
1 | weird_board = [['_'] * 3] * 3 # The same reference got added 3 times |
[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
1 | weird_board[1][2] = 'O' |
[['_', '_', 'O'], ['_', '_', 'O'], ['_', '_', 'O']]
Right way
1 | board = [['_'] * 3 for i in range(3)] |
[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
Augmented Assignment with Sequences
+= : __iadd__
, in-place add
*= : __imul__
, in-place multiply
Puzzle
What happend next?
- Error
- Or success
1 | t = (1, 2, [30, 40]) |
Answer: Both happened
1 | Traceback (most recent call last): |
1 | t |
(1, 2, [30, 40, 50, 60])
Look at bytecode
1 | import dis |
1 | 1 0 LOAD_NAME 0 (s) |
- Put the value of s[a] on TOS (Top Of Stack).
- Perform TOS += b. This succeeds if TOS refers to a mutable object.
- Assign s[a] = TOS. This fails if s is immutable.
list.sort and sorted
list.sort: in-place, return None
sorted: create a copy
Two optional arguments:
reverse
: key
:
Managing Ordered Sequences with bisect
bisect and insort.
bisect is alias for bisect_right.
The element equal to the found value, will be inserted on the right side.
bisect_left insert the new to the left.
1 | import bisect |
haystack -> 1 4 5 6 8 12 15 20 21 23 23 26 29 30
31 @ 14 | | | | | | | | | | | | | |31
30 @ 14 | | | | | | | | | | | | | |30
29 @ 13 | | | | | | | | | | | | |29
23 @ 11 | | | | | | | | | | |23
22 @ 9 | | | | | | | | |22
10 @ 5 | | | | |10
8 @ 5 | | | | |8
5 @ 3 | | |5
2 @ 1 |2
1 @ 1 |1
0 @ 0 0
1 | bisect_fn = bisect.bisect_left |
haystack -> 1 4 5 6 8 12 15 20 21 23 23 26 29 30
31 @ 14 | | | | | | | | | | | | | |31
30 @ 13 | | | | | | | | | | | | |30
29 @ 12 | | | | | | | | | | | |29
23 @ 9 | | | | | | | | |23
22 @ 9 | | | | | | | | |22
10 @ 5 | | | | |10
8 @ 4 | | | |8
5 @ 2 | |5
2 @ 1 |2
1 @ 0 1
0 @ 0 0
insort keep the order and insert new element.
1 | import bisect |
10 -> [10]
0 -> [0, 10]
6 -> [0, 6, 10]
8 -> [0, 6, 8, 10]
7 -> [0, 6, 7, 8, 10]
2 -> [0, 2, 6, 7, 8, 10]
10 -> [0, 2, 6, 7, 8, 10, 10]
When a List is Not the Answer
10 million floating-point values: array
is more efficient
removing and adding items fro mthe ends of a list: deque
works faster
a lot of containment checks: set
are optimized for fast membership checking
Arrays
If the list will only contain numbers, an array.array
is more efficient than a list.
1 | from array import array |
0.1288579230853678
Deques
1 | from collections import deque |
deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
1 | dq.rotate(3) |
deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)
1 | dq.rotate(-4) |
deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)
1 | dq.appendleft(-1) |
deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
1 | dq.extend([11, 12, 33]) |
deque([3, 4, 5, 6, 7, 8, 9, 11, 12, 33], maxlen=10)
1 | dq.extendleft([10, 20, 30, 40]) |
deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)
Other Queues:
queue
: thread-safe
multiprocessing
: its own bounded Queue
asyncio
: Queue, LifoQUeue, PriorityQueue
heapq
: heappop, heappush