Python

TL;DR

  • Use python packages in editable mode with setup.py and pip install -e .
  • Reusable (general) code goes inside the main package
  • Use ./examples to show how to use your code
  • Use ./experiments for advanced experiment configuration/tracking etc
  • Use direnv to manage python virtual environments
    • automatically activates/deactivates envs depending on current directory

Python versions (pyenv)

I use pyenv for managing python versions.

I then set my global python version to be 3.9.13.

Virtual environments (direnv)

I use direnv to automatically activate/deactivate python virtual environments when I enter/leave a project’s directory.

  • On Mac OSX direnv can be installed using Homebrew
    brew install direnv
    
  • Create the .envrc file in the root of the project’s directory and enter “layout python” with
    cd /path/to/python/project
    echo "layout python" >> .envrc
    
  • Give direnv access to the directory with
    direnv allow
    
    This enables direnv to activate/deactivate the projects virtual environment when entering/leaving the directory.

I specify each project’s dependencies using setup.py or pyproject.toml. More details on this later.

Directory structure

I tend to put my machine learning projects into python packages with the following directory structure

package-name/
└── LICENCE
└── README  # detail how to install package and refer to examples/experiments
└── requirements.txt  # pin dependencies for reproducibility
└── setup.py  # package 
├── examples/
│   └── train_my_alg_on_example.ipynb  # detail how to use package via some example
├── experiments/
│   └── README  # detail how to run experiments / reproduce results
│   └── configs/  # directory containing yaml config files for hydra
│   └── train.py  # script for running experiments (with logging/monitoring/checkpointing etc)
├── package-name/  # reusable code in the main package
│   └── model.py
│   └── algorithm.py
├── tests/  # because I always write tests 🫣

The general idea is to keep the reusable code (models, algorithms) inside the main package (package-name/package-name) and to put problem specific training/plotting scripts in separate directories. I like to use an ./examples directory to show how the pacakge can be used; often in the form of a notebook. I then like to have an ./experiments directory which contains training/plotting scripts for generating results for a paper. I use hydra to configure all of my experiments and Weights & Biases to track them. hydra and Weights & Biases are powerful tools for configuring/tracking experiments but as not everybody uses them, it can make the ./experiments directory they can m

Dependencies

setup.py

import pathlib
import setuptools

_here = pathlib.Path(__file__).resolve().parent

name = "INSERT PACKAGENAME"
author = "Aidan Scannell"
author_email = "scannell.aidan@gmail.com"
description = "INSERT DESCRIPTION"

with open(_here / "README.md", "r") as f:
    readme = f.read()

url = "https://github.com/aidanscannell/" + name

license = "Apache-2.0"

classifiers = [
    "Development Status :: 3 - Alpha",
    "Intended Audience :: Science/Research",
    "Intended Audience :: Developers",
    "Intended Audience :: Information Technology",
    "License :: OSI Approved :: Apache Software License",
    "Natural Language :: English",
    "Programming Language :: Python :: 3",
    "Topic :: Scientific/Engineering :: Artificial Intelligence",
    "Topic :: Scientific/Engineering :: Mathematics",
]
keywords = ["keyword-one", "keyworkd-two"]

python_requires = "~=3.7"

install_requires = ["jax==0.3.14", "jaxlib==0.3.10", "numpy"]
extras_require = {
    "dev": ["black", "pyright", "isort", "pyflakes", "pytest"],
    "examples": ["hydra-core", "wandb", "matplotlib", "seaborn", "bsuite"],
}

setuptools.setup(
    name=name,
    version="0.1.0",
    author=author,
    author_email=author_email,
    maintainer=author,
    maintainer_email=author_email,
    description=description,
    keywords=keywords,
    long_description=readme,
    long_description_content_type="text/markdown",
    url=url,
    license=license,
    classifiers=classifiers,
    zip_safe=False,
    python_requires=python_requires,
    install_requires=install_requires,
    extras_require=extras_require,
    packages=setuptools.find_namespace_packages(),
)

The pacakge can be installed in editable mode using

pip install -e .

If we also wish to install the “dev” and “examples” dependencies we can use

pip install -e ".[examples,dev]"

Developer packages

These packages will make any python developer’s life easier:

pre-commit is another super handy tool for executing pre-commit hooks, e.g. formatting code, sorting imports, formatting notebooks, etc. My .pre-commit-config.yaml setup usually looks something like this

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
      - id: check-yaml
      - id: requirements-txt-fixer
      - id: trailing-whitespace
  - repo: https://github.com/psf/black
    rev: 22.10.0
    hooks:
      - id: black
  - repo: https://github.com/nbQA-dev/nbQA
    rev: 1.5.2
    hooks:
      - id: nbqa-black
      - id: nbqa-isort
      # - id: nbqa-flake8
  - repo: https://github.com/PyCQA/isort
    rev: 5.10.1
    hooks:
      - id: isort
  - repo: https://github.com/pycqa/flake8
    rev: 4.0.1
    hooks:
      - id: flake8

Pinning dependencies

For reproducibility I usually freeze and commit dependencies in a requirements.txt file before running experiments

pip freeze > requirements.txt