Data code model part II: Python Typing

Using python type constructs to structure your python code.

May 22, 2024

This is part 2 of a series on data code modelling for data engineering. In part 1, we went over abstract base classes in python and how these can be used to design software contracts between python classes:

Data Code Modelling: Abstract Base Classes

Tony Zeljkovic

April 28, 2024

Data Code Modelling: Abstract Base Classes

When working in data engineering, you will often have to work with python code bases to manage all sorts of processes in your data stack from ETL, to orchestration, to infrastructure management, to observability and much more. One view of data engineering is that of a blend of software engineering and data science. Oftentimes as data engineers, we will …

Read full story

In this article, we will go over python typing. A good type system design is the foundation of a robust code base. Python is extensively typed and has extensive support for generating custom types and using these types at various locations throughout the code through the Typing library.

💡 Python typing is an EXTENSIVE topic. Therefore, this article will only cover some essentials on this topic.

Why care about python types

There are a few main reasons why it pays dividends to lay out your python typing more carefully:

Types are an efficient expression of function. It is a lot easier to read and write code with proper typing annotation than dealing with tons of documentation.
Types can be used by static code analysers like mypy to provide insights on potential code issues BEFORE it is run.
Types can be used by python libraries to enforce specific class behavior. For example, dataclasses and pydantic can be used to create a data model class which uses types to enforce specific outputs (more on that in part 3).

Python type hints

A type hint is a annotation of sorts that can be added to python code to signify the types of various types of objects and attributes. Type hinting can be done at a few important locations in your code:

Class attributes
Function arguments
Function outputs
Variables

To illustrate this, view the example below:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

	class DataFrame:
	name: str
	rows: int

	def __init__(self, name: str, rows: int):
	self.name = name
	self.rows = rows

	def process_dataframe(df: DataFrame) -> str:
	return f"Processing {df.name} with {df.rows} rows."

	df_name: str = "user_data"
	df_rows: int = 1000000
	df: DataFrame = DataFrame(df_name, df_rows)
	result: str = process_dataframe(df)

	from typing import get_origin, get_args

	# Define some generic and non-generic types
	generic_type = list[int]
	non_generic_type = int
	generic_type_without_params = list

	# Function to check if a type is a generic type
	def is_generic_type(tp):
	return get_origin(tp) is not None or hasattr(tp, '__origin__') and tp.__origin__ is not None

	# Function to check if a type is a parameterized generic type
	def is_parameterized_generic_type(tp):
	return get_origin(tp) is not None and len(get_args(tp)) > 0

	if __name__ == "__main__":
	print(f'Is {generic_type} a generic type? {is_generic_type(generic_type)}') # True
	print(f'Is {non_generic_type} a generic type? {is_generic_type(non_generic_type)}') # False
	print(f'Is {generic_type_without_params} a generic type? {is_generic_type(generic_type_without_params)}') # True
	print(f'Is {generic_type} a parameterized generic type? {is_parameterized_generic_type(generic_type)}') # True
	print(f'Is {generic_type_without_params} a parameterized generic type? {is_parameterized_generic_type(generic_type_without_params)}') # False

	from typing import Annotated
	from typing_extensions import Annotated

	# Define an annotated type with metadata
	Age = Annotated[int, "Must be a non-negative integer"]

	def check_age(age: Age):
	if age < 0:
	raise ValueError("Age must be a non-negative integer")
	print(f"Age is valid: {age}")

	# Usage examples
	try:
	check_age(25) # Valid
	check_age(-5) # Invalid, will raise ValueError
	except ValueError as e:
	print(e)

	from collections.abc import Callable, Awaitable

	# Function with callable parameter
	def feeder(get_next_item: Callable[[], str]) -> None:
	... # Body

	# Function with multiple callable parameters
	def async_query(on_success: Callable[[int], None],
	on_error: Callable[[int, Exception], None]) -> None:
	... # Body

	# Async function example
	async def on_update(value: str) -> None:
	... # Body

	callback: Callable[[str], Awaitable[None]] = on_update

The Data Hustle

Data code model part II: Python Typing

Using python type constructs to structure your python code.

Data Code Modelling: Abstract Base Classes

Why care about python types

Python type hints

Generic types

Annotated types

Special types and forms

Self

Any

Union

Optional

Literal

Inspecting types in classes

Using vars

Using dir

Using type

Using the inspect Library

If you liked this and want to read more, please support the newsletter with a subscribe

Discussion about this post

Using `vars`

Using `dir`

Using `type`

Using the `inspect` Library