# Notebooks

We present here a curated list of notebooks recommended to start with scikit-decide, available in the notebooks/ folder of the repository.

How to write a new scikit-decide domain: Maze
How to write a new scikit-decide solver: depth-first search
Gymnasium environment with scikit-decide tutorial: Continuous Mountain Car
Introduction to scheduling
Benchmarking scikit-decide solvers
Flight Planning Domain
Using RDDL domains and solvers with scikit-decide
ICAPS 2024 tutorials

# How to write a new scikit-decide domain: Maze

(opens new window) (opens new window) (opens new window)

In this tutorial, we tackle the maze problem. We use this classical game to demonstrate how

a new scikit-decide domain can be easily created
to find solvers from scikit-decide hub matching its characteristics
to apply a scikit-decide solver to a domain
to create its own rollout function to play a trained solver on a domain

Notes:

In order to focus on scikit-decide use, we put some code not directly related to the library in a separate module (like maze generation and display).
A similar maze domain is already defined in scikit-decide hub (opens new window) but we do not use it for the sake of this tutorial.
Special notice for binder + sb3: it seems that stable-baselines3 (opens new window) algorithms are extremely slow on binder (opens new window). We could not find a proper explanation about it. We strongly advise you to either launch the notebook locally or on colab, or to skip the cells that are using sb3 algorithms (here PPO solver).

# How to write a new scikit-decide solver: depth-first search

(opens new window) (opens new window) (opens new window)

In this tutorial, we detail how to implement a new scikit-decide solver. To keep it simple, we choose to implement a simple depth-first search, that stops whenever a goal is found. We will apply it to the maze domain.

Defining a solver is a matter of:

defining the characteristics from the domain needed by the solver
selecting the necessary solver characteristics
auto-generating the code skeleton from the combination above (with all abstract methods needed)
filling the code as needed

The first steps can be accomplished via the code generator (opens new window) available in the scikit-decide online documentation. The last step is where you really need to type something (namely the solver logics).

Disclaimer: The chosen solver is a simple one, used only to showcase in a pedagogical way how to implement a new scikit-decide solver. It is not adapted to large domain. For a more realistic solver, one could use for instance a greedy best-first search (opens new window) that uses an heuristic to guide the search. Or use one of the solvers available in scikit-decide hub as A* (opens new window), as presented in the tutorial dedicated to the maze domain (opens new window).

# Gymnasium environment with scikit-decide tutorial: Continuous Mountain Car

(opens new window) (opens new window) (opens new window)

In this notebook we tackle the continuous mountain car problem taken from Gymnasium (opens new window) (previously OpenAI Gym), a toolkit for developing environments, usually to be solved by Reinforcement Learning (RL) algorithms.

Continuous Mountain Car, a standard testing domain in RL, is a problem in which an under-powered car must drive up a steep hill.

Note that we use here the continuous version of the mountain car because it has a shaped or dense reward (i.e. not sparse) which can be used successfully when solving, as opposed to the other "Mountain Car" environments. For reminder, a sparse reward is a reward which is null almost everywhere, whereas a dense or shaped reward has more meaningful values for most transitions.

This problem has been chosen for two reasons:

Show how scikit-decide can be used to solve gymnasium environments (the de-facto standard in the RL community),
Highlight that by doing so, you will be able to use not only solvers from the RL community (like the ones in stable_baselines3 (opens new window) for example), but also other solvers coming from other communities like genetic programming and planning/search (use of an underlying search graph) that can be very efficient.

Therefore in this notebook we will go through the following steps:

Wrap a gymnasium environment in a scikit-decide domain;
Use a classical RL algorithm like PPO to solve our problem;
Give CGP (Cartesian Genetic Programming) a try on the same problem;
Finally use IW (Iterated Width) coming from the planning community on the same problem.

Special notice for binder + sb3: it seems that stable-baselines3 (opens new window) algorithms are extremely slow on binder (opens new window). We could not find a proper explanation about it. We strongly advise you to either launch the notebook locally or on colab, or to skip the cells that are using sb3 algorithms (here PPO solver).

# Introduction to scheduling

(opens new window) (opens new window) (opens new window)

In this notebook, we explore how to solve a resource constrained project scheduling problem (RCPSP).

The problem is made of activities that have precedence constraints. That means that if activity is a successor of activity , then activity must be completed before activity can be started

On top of these constraints, each project is assigned a set of K renewable resources where each resource is available in units for the entire duration of the project. Each activity may require one or more of these resources to be completed. While scheduling the activities, the daily resource usage for resource can not exceed units.

Each activity takes time units to complete.

The overall goal of the problem is usually to minimize the makespan.

A classic variant of RCPSP is the multimode RCPSP where each task can be executed in several ways (one way=one mode). A typical example is :

Mode n°1 'Fast mode': high resource consumption and fast
Mode n°2 'Slow mode' : low resource consumption but slow

# Benchmarking scikit-decide solvers

(opens new window) (opens new window) (opens new window)

This notebook demonstrates how to run and compare scikit-decide solvers compatible with a given domain.

This benchmark is supported by Ray Tune (opens new window), a scalable Python library for experiment execution and hyperparameter tuning (incl. running experiments in parallel and logging results to Tensorboard).

Benchmarking is important since the most efficient solvers might greatly vary depending on the domain.

# Flight Planning Domain

(opens new window) (opens new window) (opens new window)

This notebook aims to make a short and interactive example of the Flight Planning Domain. See the online documentation (opens new window) for more information.

# Using RDDL domains and solvers with scikit-decide

(opens new window) (opens new window) (opens new window)

# ICAPS 2024 tutorials

This directory contains the notebooks shown during the ICAPS 2024 tutorial (opens new window) on scikit-decide, updated to work with the last version of the library.

# Solving PDDL problems with classical planning, and reinforcement learning solvers

(opens new window) (opens new window) (opens new window)

This notebook will show how to solve PDDL problems in scikit-decide via the great Unified Planning (opens new window) framework and its third-party engines from the AIPlan4EU (opens new window) project. We will also demonstrate how to call scikit-decide solvers from Unified Planning, allowing for solving PDDL problems with simulation-based solvers embedded in scikit-decide.

# Implementing a scikit-decide domain for RDDL problems

(opens new window) (opens new window) (opens new window)

In this notebook we demonstrate how to create a custom scikit-decide domain which can then be solved by scikit-decide solvers that are compatible with the custom created domain.

NB: Since this tutorial, the rddl domain has been introduced into the scikit-decide hub, see for instance this notebook (opens new window) about the pre-implemented version.

# Implementing a scikit-decide solver embedding the JaxPlan and GurobiPlan planners and solving RDDL-based scikit-decide domains

(opens new window) (opens new window) (opens new window)

This tutorial will demonstrate how to create a custom scikit-decide solver which can solve scikit-domains of whose characteristics are compatible with this solver.

NB: Since this tutorial, the pyrddl solvers jax and gurobi have been introduced into the scikit-decide hub, see for instance this notebook (opens new window) about the pre-implemented versions.

# Solving problems (possibly imported from Gym) with Reinforcement Learning and Cartesian Genetic Programming: Cart Pole

(opens new window) (opens new window) (opens new window)

This tutorial shows how to load a domain (Cart Pole (opens new window)) in scikit-decide and try to solve it with techniques from different communities:

# Solving problems (possibly imported from Gym) with Reinforcement Learning and Cartesian Genetic Programming: Mountain Car

(opens new window) (opens new window) (opens new window)

This tutorial shows how to load a domain (Mountain Car Continuous (opens new window)) in scikit-decide and try to solve it with techniques from different communities:

# Solving scheduling problems with constraint programming, operation research, and reinforcement learning solvers

(opens new window) (opens new window) (opens new window)

In this tutorial notebook, you will be introduced to scheduling domains in scikit-decide.