Use Collections Effectively
Problem
Python provides multiple collection types (list, tuple, dict, set) and specialized variants in the collections module. Choosing the wrong collection leads to poor performance or unnecessarily complex code.
This guide shows how to select and use Python collections effectively.
Core Collection Types
List - Ordered, Mutable Sequence
Lists maintain order and allow modifications.
users = ["alice", "bob", "charlie"]
users.append("david")
users[0] = "ALICE"
users.sort()
numbers = [1, 2, 3, 4, 5]
numbers.append(6)
numbers.insert(0, 0) # Insert at beginning
numbers.remove(3)
last = numbers.pop()
first = numbers.pop(0)
third = numbers[2]
subset = numbers[1:4]
numbers.sort()
numbers.sort(reverse=True)
squares = [x ** 2 for x in range(10)]
evens = [x for x in numbers if x % 2 == 0]Tuple - Ordered, Immutable Sequence
Tuples are like lists but immutable.
point = (10, 20)
rgb = (255, 128, 0)
x, y = point
r, g, b = rgb
from typing import NamedTuple
class Point(NamedTuple):
x: float
y: float
p = Point(10.0, 20.0)
print(p.x, p.y) # Named access
print(p[0], p[1]) # Also supports indexing
def get_user():
return "alice", 30, "alice@example.com" # Tuple
name, age, email = get_user() # UnpackingDict - Key-Value Mapping
Dictionaries map keys to values with O(1) lookup.
user = {
"id": 1,
"name": "Alice",
"email": "alice@example.com"
}
email = user.get("email")
phone = user.get("phone", "N/A") # Default if missing
user["age"] = 30
if "email" in user:
send_email(user["email"])
del user["age"]
for key in user:
print(key)
for key, value in user.items():
print(f"{key}: {value}")
word_lengths = {word: len(word) for word in ["hello", "world"]}
squared = {x: x ** 2 for x in range(5)}
defaults = {"color": "blue", "size": 10}
custom = {"size": 20}
merged = defaults | custom # {"color": "blue", "size": 20}
defaults.update(custom)Set - Unordered, Unique Elements
Sets ensure uniqueness with fast membership testing.
numbers = {1, 2, 3, 4, 5}
unique_words = set(["hello", "world", "hello"]) # {"hello", "world"}
numbers.add(6)
numbers.remove(3) # Raises KeyError if not present
numbers.discard(3) # No error if not present
if 5 in numbers:
print("Found")
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print(a | b) # {1, 2, 3, 4, 5, 6}
print(a.union(b))
print(a & b) # {3, 4}
print(a.intersection(b))
print(a - b) # {1, 2}
print(a.difference(b))
print(a ^ b) # {1, 2, 5, 6}
print(a.symmetric_difference(b))
even_squares = {x ** 2 for x in range(10) if x % 2 == 0}Performance Comparison
import timeit
big_list = list(range(10000))
print(timeit.timeit(lambda: 9999 in big_list, number=1000)) # ~0.15s
big_set = set(range(10000))
print(timeit.timeit(lambda: 9999 in big_set, number=1000)) # ~0.00003sSpecialized Collections
defaultdict - Auto-Initialize Missing Keys
from collections import defaultdict
items = [
("fruit", "apple"),
("vegetable", "carrot"),
("fruit", "banana"),
("vegetable", "lettuce")
]
groups = {}
for category, item in items:
if category not in groups:
groups[category] = []
groups[category].append(item)
groups = defaultdict(list)
for category, item in items:
groups[category].append(item)
print(groups) # {"fruit": ["apple", "banana"], "vegetable": ["carrot", "lettuce"]}
word_counts = defaultdict(int)
for word in ["hello", "world", "hello"]:
word_counts[word] += 1 # Auto-initializes to 0
print(word_counts) # {"hello": 2, "world": 1}Counter - Count Hashable Objects
from collections import Counter
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
counts = Counter(words)
print(counts) # Counter({"apple": 3, "banana": 2, "cherry": 1})
print(counts.most_common(2)) # [("apple", 3), ("banana", 2)]
counter1 = Counter(["a", "b", "c"])
counter2 = Counter(["b", "c", "d"])
combined = counter1 + counter2 # Counter({"b": 2, "c": 2, "a": 1, "d": 1})
text = "hello world"
char_counts = Counter(text)
print(char_counts) # Counter({"l": 3, "o": 2, "h": 1, ...})deque - Double-Ended Queue
from collections import deque
queue = []
queue.append(1)
queue.append(2)
first = queue.pop(0) # O(n) - shifts all elements
queue = deque()
queue.append(1)
queue.append(2)
first = queue.popleft() # O(1)
recent = deque(maxlen=3)
recent.append(1)
recent.append(2)
recent.append(3)
recent.append(4) # Automatically removes 1
print(recent) # deque([2, 3, 4])
items = deque([1, 2, 3, 4, 5])
items.rotate(2) # Rotate right
print(items) # deque([4, 5, 1, 2, 3])namedtuple - Lightweight Data Classes
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])
point = Point(10, 20)
print(point.x, point.y) # Named access
print(point[0], point[1]) # Index access
x, y = point # Unpacking
Point = namedtuple("Point", ["x", "y"], defaults=[0, 0])
origin = Point() # Point(x=0, y=0)
data = {"name": "Alice", "age": 30}
User = namedtuple("User", data.keys())
user = User(**data)
print(user.name, user.age)OrderedDict - Preserve Insertion Order
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.cache = OrderedDict()
self.capacity = capacity
def get(self, key):
if key not in self.cache:
return None
self.cache.move_to_end(key) # Mark as recently used
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
self.cache.popitem(last=False) # Remove oldestCollection Patterns
Remove While Iterating
numbers = [1, 2, 3, 4, 5]
for num in numbers:
if num % 2 == 0:
numbers.remove(num) # RuntimeError or wrong result
numbers = [1, 2, 3, 4, 5]
odd_numbers = [num for num in numbers if num % 2 != 0]
odd_numbers = list(filter(lambda x: x % 2 != 0, numbers))
numbers[:] = [num for num in numbers if num % 2 != 0]
data = {"a": 1, "b": 2, "c": 3}
to_remove = [k for k, v in data.items() if v % 2 == 0]
for key in to_remove:
del data[key]Flatten Nested Collections
nested = [[1, 2], [3, 4], [5, 6]]
flat = [item for sublist in nested for item in sublist]
print(flat) # [1, 2, 3, 4, 5, 6]
from itertools import chain
flat = list(chain.from_iterable(nested))
def flatten(items):
for item in items:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
nested = [1, [2, 3], [[4, 5], 6]]
flat = list(flatten(nested)) # [1, 2, 3, 4, 5, 6]Group By Key
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
data = [
{"category": "fruit", "name": "apple"},
{"category": "vegetable", "name": "carrot"},
{"category": "fruit", "name": "banana"}
]
groups = defaultdict(list)
for item in data:
groups[item["category"]].append(item["name"])
data_sorted = sorted(data, key=itemgetter("category"))
groups = {k: list(v) for k, v in groupby(data_sorted, key=itemgetter("category"))}Memory Efficiency
Generator Expressions vs List Comprehensions
squares = [x ** 2 for x in range(1_000_000)] # Uses ~8MB
total = sum(squares)
squares = (x ** 2 for x in range(1_000_000)) # Uses ~100 bytes
total = sum(squares)
def process_large_file(filename):
# Generator - processes line by line
with open(filename) as f:
for line in f: # File object is iterator
yield process_line(line)
for result in process_large_file("huge.txt"):
handle(result)Summary
Python’s core collections serve distinct purposes based on order requirements, mutability, and uniqueness constraints. Lists maintain order and allow modifications, making them the default choice for sequences. Tuples provide immutable sequences ideal for constants and multiple return values. Dictionaries enable O(1) lookups by key for fast data retrieval. Sets guarantee uniqueness and provide O(1) membership testing along with mathematical set operations.
Performance characteristics guide collection choice. Lists excel at indexed access and appending to the end but perform poorly for membership tests and removals from the beginning. Sets provide constant-time membership testing, making them far superior to lists when you frequently check if items exist. Dictionaries offer constant-time key lookups but consume more memory than lists due to hashing overhead.
The collections module provides specialized variants for common patterns. defaultdict eliminates boilerplate for initializing missing keys, automatically creating default values. Counter simplifies counting occurrences with built-in aggregation methods. deque provides fast operations at both ends, making it superior to lists for queue implementations. namedtuple creates lightweight classes with named fields without the overhead of full class definitions.
List comprehensions and generator expressions transform collections concisely. Comprehensions create new lists, dicts, or sets in single expressions more clearly than equivalent loops. Generator expressions provide the same syntax but yield items one at a time, dramatically reducing memory usage for large datasets. Choose comprehensions when you need the whole result, generators when processing items individually.
Modifying collections during iteration requires care to avoid errors and unexpected behavior. List comprehensions create new collections from filtered or transformed elements without mutation issues. For in-place modification, assign the comprehension result to a slice. For sets and dicts, collect keys to remove first, then delete them in a separate loop.
The right collection type makes code both clearer and faster. Use sets for membership testing, dicts for key-based lookup, deques for queues, and defaultdict for grouping. These specialized collections express intent better than generic lists while providing better performance for their specific use cases.