Generator comprehensions are
particularly useful for working with large datasets, as they generate values
on-the- fly rather than creating
a large data structure in memory. This can help to improve performance and reduce memory usage.
ENUMERATE
enumerate is a built-in function that allows for iterating over a sequence (such as a list or tuple) while keeping track of the index of each element.
This can be useful when working with datasets, as it allows for easily accessing and manipulating individual elements while keeping track of their index position.
Here we use enumerate to iterate
over a list of strings and print out the value if the index
is an even number.
ZIP
zip is a built-in function allowing iterating over multiple sequences (such as lists or tuples) in parallel.
Below we use zip to iterate
over two lists x and y simultaneously and perform
operations on their corresponding elements.
In this case, it prints out the values of each element in x and y, their sum, and their product.
GENERATORS
Generators in Python are a type of iterable that allows for generating a sequence of values on-the-fly, rather than generating all the values at once and storing them in memory.
This makes them useful for working with large datasets that won’t fit in memory, as the data is processed in small chunks or batches rather than all at once.
Below we use a generator function to generate the first n numbers in the Fibonacci sequence. The yield keyword is used to generate each value in the sequence one at a time, rather than generating the entire sequence at once.
LAMBDA FUNCTIONS
lambda is a keyword used to create anonymous functions, which are functions that do not have a name and can be defined in a single line of code.
They are useful for defining custom functions on-the-fly for feature engineering, data preprocessing, or model evaluation.
Below we use lambda to create a simple function for filtering even numbers from a list of numbers.
Here’s another
code snippet for
using lambda functions with Pandas
MAP, FILTER,
REDUCE
The functions map, filter, and reduce are three built-in functions used for manipulating and transforming data.
map is used to apply a function to each element of an iterable, filter is used to select elements from an iterable based on a condition, and reduce is used to apply a function to pairs of elements in an iterable to produce a single result.
Below we use all of them in a single
pipeline, calculating the sum of squares
of even numbers.
ANY AND ALL
any and all are built-in functions that allow for checking if any or all elements in an iterable meet a certain condition.
any and all can be useful for checking if certain conditions are met across a dataset or a subset of a dataset. For example, they can be used to check if any values in a column are missing or if all values in a column are within a certain range.
Below is a simple example of checking for the presence
of any even values and all odd values.
NEXT
next is used to retrieve the next item from an iterator. An iterator is an object that can be iterated (looped) upon, such as a list, tuple, set, or dictionary.
next is commonly used in data science for iterating through an iterator or generator object. It allows the user to retrieve the next item from the iterable and can be useful for handling large datasets or streaming data.
Below, we define a generator random_numbers()
that yields random numbers between 0 and 1. We then use
the next() function to find the first number in the generator greater than 0.9
DEFAULTDICT
defaultdict is a subclass of the built-in
dict class that allows
for
providing a default
value for missing
keys.
defaultdict can be useful for handling missing or incomplete data, such as when working with sparse matrices or feature vectors. It can also be used for counting the frequency of categorical variables.
An example is counting the frequency of items in a list.
int is used as the default
factory for the defaultdict, which initializes missing
keys to 0.
PARTIAL
partial is a function in the functools module that allows for creating a new function from an existing function with some of its arguments pre-filled.
partial can be useful for creating custom functions or data transformations with specific parameters or arguments pre- filled. This can help to reduce the amount of boilerplate code needed when defining and calling functions.
Here we use partial to create a new function increment from the existing add function with one of its arguments fixed to the value 1.
Calling increment(1) is essentially calling
add(1, 1)
LRU_CACHE
lru_cache is a decorator function in the functools module that allows for caching the results of functions with a limited- size cache.
lru_cache can be useful for optimizing computationally expensive functions or model training procedures that may be called with the same arguments multiple times.
Caching can help to speed up the execution of the function and reduce the overall computational cost.
Here’s an example
of efficiently computing
Fibonacci numbers with a cache (known
as memoization in computer science)
DATACLASSES
The @dataclass decorator automatically generates several, special methods for a class, such as __init__,__repr__,and __eq__, based on the defined attributes
This can help to reduce the amount of boilerplate code needed when defining classes. dataclass objects can represent data points, feature vectors, or model parameters, among other things.
In this example,
dataclass is used to define a simple class
Person with three attributes:
name, age, and city.