Collections Module
A Container is an object that is used to store different objects and provide a way to access the contained objects and iterate over them. In Python we have dict, list, set and tuple as basic container data types.
The collections module is a built-in module that implements specialized container data types providing alternatives to Python’s general purpose built-in containers.
Counter
- Counter is a dictionary subclass which helps to count the elements in an iterable in the form of an unordered dictionary. Inside of it elements are stored as dictionary keys and the counts of the objects are stored as the value.
# Syntax :
from collections import Counter
# Counter() with lists
my_list=[1,1,1,1,'a','a','a','a','c','c','a','c','r','r',3,3,7,5]
Counter(my_list)
# Output :
Counter({1: 4, 'a': 5, 'c': 3, 'r': 2, 3: 2, 7: 1, 5: 1})
# Counter with strings
Counter('SAmmersummer'.lower())
# Output :
Counter({'s': 2, 'a': 1, 'm': 4, 'e': 2, 'r': 2, 'u': 1})
# Counter with words in a sentence
str1 = 'Count the occurances of each word in a sentence . The sentence is made of word[s]'
words = str1.split(" ")
Counter(words)
# Output :
Counter({'Count': 1,
'the': 1,
'occurances': 1,
'of': 2,
'each': 1,
'word': 1,
'in': 1,
'a': 1,
'sentence': 2,
'.': 1,
'The': 1,
'is': 1,
'made': 1,
'word[s]': 1})
# Methods with Counter()
items= Counter(my_list)
items.most_common()
# Output :
[('a', 5), (1, 4), ('c', 3), ('r', 2), (3, 2), (7, 1), (5, 1)]
# Common patterns when using the Counter() object
my_list=[1,1,1,1,'a','a','a','a','c','c','a','c','r','r',3,3,7,5]
items= Counter(my_list)
# total counts of values / Sum of all values
sum(items.values()) output : 18
# reset all counts
items.clear() output :
# list unique elements
list(items) output : [1, 'a', 'c', 'r', 3, 7, 5]
# convert to a set
set(items) output : {1, 3, 5, 7, 'a', 'c', 'r'}
# convert to a regular dictionary
dict(items) output : {1: 4, 'a': 5, 'c': 3, 'r': 2, 3: 2, 7: 1, 5: 1}
# convert to a list of (elem, count) pairs
items.items() output : dict_items([(1, 4), ('a', 5), ('c', 3), ('r', 2), (3, 2), (7, 1), (5, 1)])
# convert a list of (elem, count) pairs to Counter() object :
pairs = [('a',1), ('b',2), ('c',3), ('d',4)]
Counter(dict(pairs)) output : Counter({'a': 1, 'b': 2, 'c': 3, 'd': 4})
# n least common elements
items.most_common()[:-n-1:-1]
items.most_common()[:-4-1:-1] output : [(5, 1), (7, 1), (3, 2), ('r', 2)]
defaultdict
- defaultdict is also sub-class of dictionary. It provides all methods provided by a dictionary but takes a first argument (default_factory) as a default data type for the dictionary. Using defaultdict is faster than doing the same using dict.set_default method.
- If the parameter is absent then KeyError is raised.
- A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.
# Below code will generate a KeyError
my_dict = {}
print(my_dict['name'])
# Output :
Traceback (most recent call last):
File "< string >", line 2, in < module > KeyError: 'name'
# Syntax : defaultdict
from collections import defaultdict
my_dict = defaultdict(list) # Dictionary with values as list:
my_dict1 = defaultdict(object)
my_dict2 = defaultdict(lambda: '-NA-') # initialize with default value..
print(my_dict1['name'])
print(my_dict2['name'])
# Output :
< object object at 0x7f584fa96bc0 >
-NA-
OrderedDict
- OrderedDict is a a sub-class of dictionary but unlike dictionary, it remembers the order in which the keys were inserted.
# Syntax : OrderedDict
from collections import OrderedDict
my_dict = OrderedDict()
my_dict['a'] = 1
my_dict['b'] = 2
my_dict['c'] = 3
my_dict['d'] = 4
print('Before Deleting')
for key, value in my_dict.items():
print(key, value)
# deleting element
print('Deleting element\n')
my_dict.pop('a')
# Re-inserting the same
print('Re-inserting the same\n')
my_dict['a'] = 1
print('\nAfter re-inserting')
for key, value in my_dict.items():
print(key, value)
# Output :
Before Deleting
a 1
b 2
c 3
d 4
Deleting element
Re-inserting the same
After re-inserting
b 2
c 3
d 4
a 1
namedtuple
- The standard tuple uses numerical indexes to access its members.
# Example :
my_tuple = (3, 4, 5, 6)
print(my_tuple[2])
# Output :
5
# Syntax : namedtuple
from collections import namedtuple
Dog = namedtuple('Dog',['age','breed','name'])
sam = Dog(age=2,breed='Lab',name='Sammy')
frank = Dog(age=2,breed='Shepard',name="Frankie")
print("sam : {}".format(sam))
print("frank : {}".format(frank))
print("sam.breed : {}".format(sam.breed))
print("frank.name : {}".format(frank.name))
# Output :
sam : Dog(age=2, breed='Lab', name='Sammy')
frank : Dog(age=2, breed='Shepard', name='Frankie')
sam.breed : Lab
frank.name : Frankie
Deque (Doubly Ended Queue) :
- Deque is the optimized list for quicker append and pop operations from both sides of the container.
- This function takes the list as an argument.
# Syntax :
from collections import deque
# Declaring deque
my_deque = deque(['name','age','DOB'])
print(my_deque)
# Output :
deque(['name', 'age', 'DOB'])
- Inserting Elements in deque :
- Elements in deque can be inserted from both ends. To insert the elements from right append() method is used and to insert the elements from the left appendleft() method is used.
# append() and appendleft() :
from collections import deque
my_deque = deque([1,2,3])
print("Origional deque : {}".format(my_deque))
my_deque.appendleft(0)
print("appendleft(0) : {}".format(my_deque))
my_deque.append(4)
print("append(4) : {}".format(my_deque))
# Output :
Origional deque : deque([1, 2, 3])
appendleft(0) : deque([0, 1, 2, 3])
append(4) : deque([0, 1, 2, 3, 4])
- Removing Elements in deque :
- Elements can also be removed from deque from both ends. To remove the elements from right pop() method is used and to remove the elements from the left popleft() method is used.
# append() and appendleft() :
from collections import deque
my_deque = deque([1,2,3])
print("Origional deque : {}".format(my_deque))
my_deque.popleft()
print("popleft() : {}".format(my_deque))
my_deque.pop()
print("pop() : {}".format(my_deque))
# Output :
Origional deque : deque([1, 2, 3])
popleft() : deque([2, 3])
pop() : deque([2])