**Computational Mindset**by Ettore Messina

# Experiments with Neural ODEs in Julia

*Neural Ordinary Differential Equations* (abbreviated *Neural ODEs*) is a paper that introduces a new family of neural networks
in which some hidden layers (or even the only layer in the simplest cases) are implemented with an ordinary differential equation solver.

This post shows an example written in Julia (there will be more in the future) that uses some ideas described in the paper *Neural ODEs* to solve a problem of approximating the mapping
between an input and an output in a given scenario.

All of the various code snippets described in this post require Julia version 1.5.3 and the following packages: DifferentialEquations, Flux, DiffEqFlux, Plots.

To get the code see paragraph Download the complete code at the end of this post.

## Conventions

In this post the conventions used are as follows:

- $t$ is the independent variable
- $x$ is the unknown function
- $y$ is the second unknown function
- $x$ and $y$ are intended to be functions of $t$, so $x=x(t)$ and $y=y(t)$, but the use of this compact notation, in addition to having a greater readability at a mathematical level makes it easier to "translate" the equation into code
- $x'$ is the first derivative of x with respect to $t$ and of course $y'$ is the first derivative of y with respect to $t$

## Experiment #1: to train a system of ODEs to meet an objective

A multilayer perceptron (abbreviated MLP) is an appropriate tool for learning a nonlinear relationship between inputs and outputs whose law is not known.

On the other hand, there are cases where one has a priori knowledge of the law that correlates inputs and outputs, for example in the form of a parametric system of differential equations:
in this situation a neural network of type MLP does not allow to use such knowledge while a network of type Neural ODEs does.

### The application scenario

The application scenario is as follows:

**A priori knowledge:**

- A dataset that contains the inputs and outputs
- A law that associates input and output in the form of a parametric system of differential equations

**Objective:**

- Determine appropriate parameter values so that the system obtained by replacing the formal parameters with the determined values best approximates the mapping between input and output.

In this post, the law is intentionally not a famous law, but is the parametric system of two first-order equations and eight parameters shown in the following paragraph.

### The problem to be solved

Let the following parametric system of two ordinary differential equations with initial values be given
which represents the law describing the behavior of a hypothetical dynamical system:
$$ \begin{equation}
\begin{cases}
x' = a_1x + b_1y + c_1e^{-d_1t}
\\
y'= a_2x + b_2y + c_2e^{-d_2t}
\\
x(0)=0
\\
y(0)=0
\end{cases}
\end{equation} $$
Obviously this is a demo whose purpose is to test the goodness of the method, so to prepare the dataset
we arbitrarily set eight parameter values, for example these:
$$ \left[\begin{matrix}a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \end{matrix} \right] =
\left[\begin{matrix}1.11 \\ 2.43 \\ -3.66 \\ 1.37 \\ 2.89 \\ -1.97 \\ 4.58 \\ 2.86 \end{matrix} \right] $$
and knowing that with such parameter values the analytical solution is as follows:
$$ \begin{equation}
\begin{array}{l}
x(t) = \\
\;\; -1.38778 \cdot 10^{-17} \; e^{-8.99002 t} - \\
\;\; 2.77556 \cdot 10^{-17} \; e^{-7.50002 t} + \\
\;\; 3.28757 \; e^{-3.49501 t} - \\
\;\; 3.18949 \; e^{-2.86 t} + \\
\;\; 0.258028 \; e^{-1.37 t} - \\
\;\; 0.356108 \; e^{2.63501 t} + \\
\;\; 4.44089 \cdot 10^{-16} \; e^{3.27002 t} + \\
\;\; 1.11022 \cdot 10^{-16} \; e^{4.76002 t} \\
\\
y(t) = \\
\;\; -6.23016 \; e^{-3.49501 t} + \\
\;\; 5.21081 \; e^{-2.86 t} + \\
\;\; 1.24284 \; e^{-1.37 t} - \\
\;\; 0.223485 \; e^{2.63501 t} + \\
\;\; 2.77556 \cdot 10^{-17} \; e^{4.76002 t} \\
\end{array}
\end{equation} $$
(verifiable online via Wolfram Alpha)
you are able to prepare the dataset: the input is a discretized interval of time from $0$ to $1.5$ step $0.01$,
while the output consists of the analytical solutions $x=x(t)$ and $y=y(t)$ for each $t$ belonging to the input.

Once the dataset is prepared, we forget about the values of the parameters and the analytical solution and we pose the problem
of how to train a neural network to determine an appropriate set of values for the eight parameters in order to best approximate
the non-linear mapping between input and output of the dataset.

### Solution implementation

The parametric system of differential equations is already written in explicit form, and in Julia with DifferentialEquations it is implemented like this:

```
function parametric_ode_system!(du,u,p,t)
x, y = u
a1, b1, c1, d1, a2, b2, c2, d2 = p
du[1] = dx = a1*x + b1*y + c1*exp(-d1*t)
du[2] = dy = a2*x + b2*y + c2*exp(-d2*t)
end
```

The settings are:
- Input: $t \in [0, 1.5]$ discretization step $0.01$
- Boundary conditions: $x(0)=0; y(0)=0$
- Initial values of any parameters; for convenience we set them all equal to $1$.

```
tbegin=0.0
tend=1.5
tstep=0.01
trange = tbegin:tstep:tend
u0 = [0.0,0.0]
tspan = (tbegin,tend)
p = ones(8)
```

The neural network consists of a single *ODE solver*type layer:

```
prob = ODEProblem(parametric_ode_system!, u0, tspan, p)
function net()
solve(prob, Tsit5(), p=p, saveat=trange)
end
```

The training settings are:

- Optimizator: ADAM
- Learning rate: $0.05$
- Number of epochs: $1000$
- loss function: sum of squares of differences

```
epochs = 1000
learning_rate = 0.05
data = Iterators.repeated((), epochs)
opt = ADAM(learning_rate)
callback_func = function ()
#.......
end
fparams = Flux.params(p)
Flux.train!(loss_func, fparams, data, opt, cb=callback_func)
```

Here is the complete code:

```
using Flux, DiffEqFlux, DifferentialEquations, Plots
function parametric_ode_system!(du,u,p,t)
x, y = u
a1, b1, c1, d1, a2, b2, c2, d2 = p
du[1] = dx = a1*x + b1*y + c1*exp(-d1*t)
du[2] = dy = a2*x + b2*y + c2*exp(-d2*t)
end
true_params = [1.11, 2.43, -3.66, 1.37, 2.89, -1.97, 4.58, 2.86]
an_sol_x(t) =
-1.38778e-17 * exp(-8.99002 * t) -
2.77556e-17 * exp(-7.50002 * t) +
3.28757 * exp(-3.49501 * t) -
3.18949 * exp(-2.86 * t) +
0.258028 * exp(-1.37 * t) -
0.356108 * exp(2.63501 * t) +
4.44089e-16 * exp(3.27002 * t) +
1.11022e-16 * exp(4.76002 * t)
an_sol_y(t) =
-6.23016 * exp(-3.49501 * t) +
5.21081 * exp(-2.86 * t) +
1.24284 * exp(-1.37 * t) -
0.223485 * exp(2.63501 * t) +
2.77556e-17 * exp(4.76002 * t)
tbegin=0.0
tend=1.5
tstep=0.01
trange = tbegin:tstep:tend
u0 = [0.0,0.0]
tspan = (tbegin,tend)
p = ones(8)
prob = ODEProblem(parametric_ode_system!, u0, tspan, p)
function net()
solve(prob, Tsit5(), p=p, saveat=trange)
end
dataset_outs = [an_sol_x.(trange), an_sol_y.(trange)]
function loss_func()
pred = net()
sum(abs2, dataset_outs[1] .- pred[1,:]) +
sum(abs2, dataset_outs[2] .- pred[2,:])
end
epochs = 1000
learning_rate = 0.05
data = Iterators.repeated((), epochs)
opt = ADAM(learning_rate)
callback_func = function ()
println("loss: ", loss_func())
end
fparams = Flux.params(p)
Flux.train!(loss_func, fparams, data, opt, cb=callback_func)
predict_prob = ODEProblem(parametric_ode_system!, u0, tspan, p)
predict_sol = solve(prob, Tsit5(), saveat=trange)
x_predict_sol = [u[1] for u in predict_sol.u]
y_predict_sol = [u[2] for u in predict_sol.u]
println("Learned parameters:", p)
plot(trange, dataset_outs[1],
linewidth=2, ls=:dash,
title="Neural ODEs to fit params",
xaxis="t",
label="dataset x(t)",
legend=true)
plot!(trange, dataset_outs[2],
linewidth=2, ls=:dash,
label="dataset y(t)")
plot!(predict_sol.t, x_predict_sol,
linewidth=1,
label="predicted x(t)")
plot!(predict_sol.t, y_predict_sol,
linewidth=1,
label="predicted y(t)")
```

Here the link to the code on GitHub.The parameter values obtained at the end of the training are as follows: $$ \left[\begin{matrix}a_1 \\ b_1 \\ c_1 \\ d_1 \\ a_2 \\ b_2 \\ c_2 \\ d_2 \end{matrix} \right] = \left[\begin{matrix}1.7302980833638142 \\ 1.2823312512074032 \\ -1.6866178290795755 \\ 0.41974163099782325 \\ 1.223075467559363 \\ 0.9410722500584323 \\ 0.18890958911958686 \\ 1.7462909509457183 \end{matrix} \right] $$ which are obviously different from the values of the parameters used to generate the dataset (passing through the analytical solution); but the system of equations obtained by replacing the formal parameters with these values obtained from the training of the neural network and solving numerically this last system we obtain a numerical solution that approximates quite well the dataset, as shown in the figure:

Different training strategies allow to obtain different values of the parameters and therefore a different numerical solution of the system with a different accuracy.

## Citations

```
@articleDBLP:journals/corr/abs-1806-07366,
author = {Tian Qi Chen and
Yulia Rubanova and
Jesse Bettencourt and
David Duvenaud},
title = {Neural Ordinary Differential Equations},
journal = {CoRR},
volume = {abs/1806.07366},
year = {2018},
url = {http://arxiv.org/abs/1806.07366},
archivePrefix = {arXiv},
eprint = {1806.07366},
timestamp = {Mon, 22 Jul 2019 14:09:23 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1806-07366.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
```

```
@articleDBLP:journals/corr/abs-1902-02376,
author = {Christopher Rackauckas and
Mike Innes and
Yingbo Ma and
Jesse Bettencourt and
Lyndon White and
Vaibhav Dixit},
title = {DiffEqFlux.jl - {A} Julia Library for Neural Differential Equations},
journal = {CoRR},
volume = {abs/1902.02376},
year = {2019},
url = {https://arxiv.org/abs/1902.02376},
archivePrefix = {arXiv},
eprint = {1902.02376},
timestamp = {Tue, 21 May 2019 18:03:36 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1902-02376},
bibsource = {dblp computer science bibliography, https://dblp.org}
```

## Download of the complete code

The complete code is available at GitHub.

These materials are distributed under MIT license; feel free to use, share, fork and adapt these materials as you see fit.

Also please feel free to submit pull-requests and bug-reports to this GitHub repository or contact me on my social media channels available on the top right corner of this page.