Experiments with Neural ODEs in Julia

Neural Ordinary Differential Equations (abbreviated Neural ODEs) is a paper that introduces a new family of neural networks in which some hidden layers (or even the only layer in the simplest cases) are implemented with an ordinary differential equation solver.
This post shows an example written in Julia (there will be more in the future) that uses some ideas described in the paper Neural ODEs to solve a problem of approximating the mapping between an input and an output in a given scenario.

All of the various code snippets described in this post require Julia version 1.5.3 and the following packages: DifferentialEquations, Flux, DiffEqFlux, Plots.
To get the code see paragraph Download the complete code at the end of this post.

Conventions

In this post the conventions used are as follows:

  • $t$ is the independent variable
  • $x$ is the unknown function
  • $y$ is the second unknown function
  • $x$ and $y$ are intended to be functions of $t$, so $x=x(t)$ and $y=y(t)$, but the use of this compact notation, in addition to having a greater readability at a mathematical level makes it easier to "translate" the equation into code
  • $x'$ is the first derivative of x with respect to $t$ and of course $y'$ is the first derivative of y with respect to $t$

Experiment #1: to train a system of ODEs to meet an objective

A multilayer perceptron (abbreviated MLP) is an appropriate tool for learning a nonlinear relationship between inputs and outputs whose law is not known.
On the other hand, there are cases where one has a priori knowledge of the law that correlates inputs and outputs, for example in the form of a parametric system of differential equations: in this situation a neural network of type MLP does not allow to use such knowledge while a network of type Neural ODEs does.

The application scenario

The application scenario is as follows:

    A priori knowledge:
    • A dataset that contains the inputs and outputs
    • A law that associates input and output in the form of a parametric system of differential equations
    Objective:
    • Determine appropriate parameter values so that the system obtained by replacing the formal parameters with the determined values best approximates the mapping between input and output.
    The same scenario was applied in the post DiffEqFlux.jl – A Julia Library for Neural Differential Equations on Julia's official blog where the law is the Lotka-Volterra system of equations that describes the population dynamics of prey and predators.
    In this post, the law is intentionally not a famous law, but is the parametric system of two first-order equations and eight parameters shown in the following paragraph.

    The problem to be solved

    Let the following parametric system of two ordinary differential equations with initial values be given which represents the law describing the behavior of a hypothetical dynamical system: $$ \begin{equation} \begin{cases} x' = a_1x + b_1y + c_1e^{-d_1t} \\ y'= a_2x + b_2y + c_2e^{-d_2t} \\ x(0)=0 \\ y(0)=0 \end{cases} \end{equation} $$ Obviously this is a demo whose purpose is to test the goodness of the method, so to prepare the dataset we arbitrarily set eight parameter values, for example these: $$ \left[\begin{matrix}a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \end{matrix} \right] = \left[\begin{matrix}1.11 \\ 2.43 \\ -3.66 \\ 1.37 \\ 2.89 \\ -1.97 \\ 4.58 \\ 2.86 \end{matrix} \right] $$ and knowing that with such parameter values the analytical solution is as follows: $$ \begin{equation} \begin{array}{l} x(t) = \\ \;\; -1.38778 \cdot 10^{-17} \; e^{-8.99002 t} - \\ \;\; 2.77556 \cdot 10^{-17} \; e^{-7.50002 t} + \\ \;\; 3.28757 \; e^{-3.49501 t} - \\ \;\; 3.18949 \; e^{-2.86 t} + \\ \;\; 0.258028 \; e^{-1.37 t} - \\ \;\; 0.356108 \; e^{2.63501 t} + \\ \;\; 4.44089 \cdot 10^{-16} \; e^{3.27002 t} + \\ \;\; 1.11022 \cdot 10^{-16} \; e^{4.76002 t} \\ \\ y(t) = \\ \;\; -6.23016 \; e^{-3.49501 t} + \\ \;\; 5.21081 \; e^{-2.86 t} + \\ \;\; 1.24284 \; e^{-1.37 t} - \\ \;\; 0.223485 \; e^{2.63501 t} + \\ \;\; 2.77556 \cdot 10^{-17} \; e^{4.76002 t} \\ \end{array} \end{equation} $$ (verifiable online via Wolfram Alpha) you are able to prepare the dataset: the input is a discretized interval of time from $0$ to $1.5$ step $0.01$, while the output consists of the analytical solutions $x=x(t)$ and $y=y(t)$ for each $t$ belonging to the input.
    Once the dataset is prepared, we forget about the values of the parameters and the analytical solution and we pose the problem of how to train a neural network to determine an appropriate set of values for the eight parameters in order to best approximate the non-linear mapping between input and output of the dataset.

    Solution implementation

    The parametric system of differential equations is already written in explicit form, and in Julia with DifferentialEquations it is implemented like this:

    function parametric_ode_system!(du,u,p,t)
      x, y = u
      a1, b1, c1, d1, a2, b2, c2, d2 = p
      du[1] = dx = a1*x + b1*y + c1*exp(-d1*t)
      du[2] = dy = a2*x + b2*y + c2*exp(-d2*t)
    end
    The settings are:
    • Input: $t \in [0, 1.5]$ discretization step $0.01$
    • Boundary conditions: $x(0)=0; y(0)=0$
    • Initial values of any parameters; for convenience we set them all equal to $1$.
    Turned into code:

    tbegin=0.0
    tend=1.5
    tstep=0.01
    trange = tbegin:tstep:tend
    u0 = [0.0,0.0]
    tspan = (tbegin,tend)
    p = ones(8)
    The neural network consists of a single ODE solver type layer:

    prob = ODEProblem(parametric_ode_system!, u0, tspan, p)
    
    function net()
        solve(prob, Tsit5(), p=p, saveat=trange)
    end

    The training settings are:
    • Optimizator: ADAM
    • Learning rate: $0.05$
    • Number of epochs: $1000$
    • loss function: sum of squares of differences
    Turned into code:

    epochs = 1000
    learning_rate = 0.05
    data = Iterators.repeated((), epochs)
    opt = ADAM(learning_rate)
    callback_func = function ()
      #.......
    end
    fparams = Flux.params(p)
    Flux.train!(loss_func, fparams, data, opt, cb=callback_func)

    Here is the complete code:

    using Flux, DiffEqFlux, DifferentialEquations, Plots
    
    function parametric_ode_system!(du,u,p,t)
      x, y = u
      a1, b1, c1, d1, a2, b2, c2, d2 = p
      du[1] = dx = a1*x + b1*y + c1*exp(-d1*t)
      du[2] = dy = a2*x + b2*y + c2*exp(-d2*t)
    end
    
    true_params = [1.11, 2.43, -3.66, 1.37, 2.89, -1.97, 4.58, 2.86]
    
    an_sol_x(t) =
      -1.38778e-17 * exp(-8.99002 * t) -
      2.77556e-17 * exp(-7.50002 * t) +
      3.28757 * exp(-3.49501 * t) -
      3.18949 * exp(-2.86 * t) +
      0.258028 * exp(-1.37 * t) -
      0.356108 * exp(2.63501 * t) +
      4.44089e-16 * exp(3.27002 * t) +
      1.11022e-16 * exp(4.76002 * t)
    an_sol_y(t) =
      -6.23016 * exp(-3.49501 * t) +
      5.21081 * exp(-2.86 * t) +
      1.24284 * exp(-1.37 * t) -
      0.223485 * exp(2.63501 * t) +
      2.77556e-17 * exp(4.76002 * t)
    
    tbegin=0.0
    tend=1.5
    tstep=0.01
    trange = tbegin:tstep:tend
    u0 = [0.0,0.0]
    tspan = (tbegin,tend)
    p = ones(8)
    
    prob = ODEProblem(parametric_ode_system!, u0, tspan, p)
    
    function net()
        solve(prob, Tsit5(), p=p, saveat=trange)
    end
    
    dataset_outs = [an_sol_x.(trange), an_sol_y.(trange)]
    function loss_func()
      pred = net()
      sum(abs2, dataset_outs[1] .- pred[1,:]) +
      sum(abs2, dataset_outs[2] .- pred[2,:])
    end
    
    epochs = 1000
    learning_rate = 0.05
    data = Iterators.repeated((), epochs)
    opt = ADAM(learning_rate)
    callback_func = function ()
      println("loss: ", loss_func())
    end
    fparams = Flux.params(p)
    Flux.train!(loss_func, fparams, data, opt, cb=callback_func)
    
    predict_prob = ODEProblem(parametric_ode_system!, u0, tspan, p)
    predict_sol = solve(prob, Tsit5(), saveat=trange)
    x_predict_sol = [u[1] for u in predict_sol.u]
    y_predict_sol = [u[2] for u in predict_sol.u]
    
    println("Learned parameters:", p)
    
    plot(trange, dataset_outs[1],
        linewidth=2, ls=:dash,
        title="Neural ODEs to fit params",
        xaxis="t",
        label="dataset x(t)",
        legend=true)
    plot!(trange, dataset_outs[2],
        linewidth=2, ls=:dash,
        label="dataset y(t)")
    plot!(predict_sol.t, x_predict_sol,
        linewidth=1,
        label="predicted x(t)")
    plot!(predict_sol.t, y_predict_sol,
        linewidth=1,
        label="predicted y(t)")
    
    Here the link to the code on GitHub.

    The parameter values obtained at the end of the training are as follows: $$ \left[\begin{matrix}a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \end{matrix} \right] = \left[\begin{matrix}1.7302980833638142 \\ 1.2823312512074032 \\ -1.6866178290795755 \\ 0.41974163099782325 \\ 1.223075467559363 \\ 0.9410722500584323 \\ 0.18890958911958686 \\ 1.7462909509457183 \end{matrix} \right] $$ which are obviously different from the values of the parameters used to generate the dataset (passing through the analytical solution); but the system of equations obtained by replacing the formal parameters with these values obtained from the training of the neural network and solving numerically this last system we obtain a numerical solution that approximates quite well the dataset, as shown in the figure:

    Ccomparison of the numerical solution that approximates the dataset.

    Different training strategies allow to obtain different values of the parameters and therefore a different numerical solution of the system with a different accuracy.

    Citations

    @articleDBLP:journals/corr/abs-1806-07366,
      author    = {Tian Qi Chen and
                   Yulia Rubanova and
                   Jesse Bettencourt and
                   David Duvenaud},
      title     = {Neural Ordinary Differential Equations},
      journal   = {CoRR},
      volume    = {abs/1806.07366},
      year      = {2018},
      url       = {http://arxiv.org/abs/1806.07366},
      archivePrefix = {arXiv},
      eprint    = {1806.07366},
      timestamp = {Mon, 22 Jul 2019 14:09:23 +0200},
      biburl    = {https://dblp.org/rec/journals/corr/abs-1806-07366.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    
    @articleDBLP:journals/corr/abs-1902-02376,
      author    = {Christopher Rackauckas and
                   Mike Innes and
                   Yingbo Ma and
                   Jesse Bettencourt and
                   Lyndon White and
                   Vaibhav Dixit},
      title     = {DiffEqFlux.jl - {A} Julia Library for Neural Differential Equations},
      journal   = {CoRR},
      volume    = {abs/1902.02376},
      year      = {2019},
      url       = {https://arxiv.org/abs/1902.02376},
      archivePrefix = {arXiv},
      eprint    = {1902.02376},
      timestamp = {Tue, 21 May 2019 18:03:36 +0200},
      biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1902-02376},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    

    Download of the complete code

    The complete code is available at GitHub.
    These materials are distributed under MIT license; feel free to use, share, fork and adapt these materials as you see fit.
    Also please feel free to submit pull-requests and bug-reports to this GitHub repository or contact me on my social media channels available on the top right corner of this page.