# Notebooks

We present here a curated list of notebooks recommended to start with scikit-decide, available in the notebooks/ folder of the repository.

# Maze tutorial

Github (opens new window) Colab (opens new window) Binder (opens new window)

In this tutorial, we tackle the maze problem. We use this classical game to demonstrate how

  • a new scikit-decide domain can be easily created
  • to find solvers from scikit-decide hub matching its characteristics
  • to apply a scikit-decide solver to a domain
  • to create its own rollout function to play a trained solver on a domain

Notes:

  • In order to focus on scikit-decide use, we put some code not directly related to the library in a separate module (like maze generation and display).
  • A similar maze domain is already defined in scikit-decide hub (opens new window) but we do not use it for the sake of this tutorial.
  • Special notice for binder + sb3: it seems that stable-baselines3 (opens new window) algorithms are extremely slow on binder (opens new window). We could not find a proper explanation about it. We strongly advise you to either launch the notebook locally or on colab, or to skip the cells that are using sb3 algorithms (here PPO solver).

# Gymnasium environment with scikit-decide tutorial: Continuous Mountain Car

Github (opens new window) Colab (opens new window) Binder (opens new window)

In this notebook we tackle the continuous mountain car problem taken from Gymnasium (opens new window) (previously OpenAI Gym), a toolkit for developing environments, usually to be solved by Reinforcement Learning (RL) algorithms.

Continuous Mountain Car, a standard testing domain in RL, is a problem in which an under-powered car must drive up a steep hill.

mountain_car_continuous.gif

Note that we use here the continuous version of the mountain car because it has a shaped or dense reward (i.e. not sparse) which can be used successfully when solving, as opposed to the other "Mountain Car" environments. For reminder, a sparse reward is a reward which is null almost everywhere, whereas a dense or shaped reward has more meaningful values for most transitions.

This problem has been chosen for two reasons:

  • Show how scikit-decide can be used to solve gymnasium environments (the de-facto standard in the RL community),
  • Highlight that by doing so, you will be able to use not only solvers from the RL community (like the ones in stable_baselines3 (opens new window) for example), but also other solvers coming from other communities like genetic programming and planning/search (use of an underlying search graph) that can be very efficient.

Therefore in this notebook we will go through the following steps:

  • Wrap a gymnasium environment in a scikit-decide domain;
  • Use a classical RL algorithm like PPO to solve our problem;
  • Give CGP (Cartesian Genetic Programming) a try on the same problem;
  • Finally use IW (Iterated Width) coming from the planning community on the same problem.

Special notice for binder + sb3: it seems that stable-baselines3 (opens new window) algorithms are extremely slow on binder (opens new window). We could not find a proper explanation about it. We strongly advise you to either launch the notebook locally or on colab, or to skip the cells that are using sb3 algorithms (here PPO solver).

# Introduction to scheduling

Github (opens new window) Colab (opens new window) Binder (opens new window)

In this notebook, we explore how to solve a resource constrained project scheduling problem (RCPSP).

The problem is made of activities that have precedence constraints. That means that if activity is a successor of activity , then activity must be completed before activity can be started

On top of these constraints, each project is assigned a set of K renewable resources where each resource is available in units for the entire duration of the project. Each activity may require one or more of these resources to be completed. While scheduling the activities, the daily resource usage for resource can not exceed units.

Each activity takes time units to complete.

The overall goal of the problem is usually to minimize the makespan.

A classic variant of RCPSP is the multimode RCPSP where each task can be executed in several ways (one way=one mode). A typical example is :

  • Mode n°1 'Fast mode': high resource consumption and fast
  • Mode n°2 'Slow mode' : low resource consumption but slow

# Benchmarking scikit-decide solvers

Github (opens new window) Colab (opens new window) Binder (opens new window)

This notebook demonstrates how to run and compare scikit-decide solvers compatible with a given domain.

This benchmark is supported by Ray Tune (opens new window), a scalable Python library for experiment execution and hyperparameter tuning (incl. running experiments in parallel and logging results to Tensorboard).

Benchmarking is important since the most efficient solvers might greatly vary depending on the domain.

# Flight Planning Domain

Github (opens new window) Colab (opens new window) Binder (opens new window)

This notebook aims to make a short and interactive example of the Flight Planning Domain. See the online documentation (opens new window) for more information.