Simultaneous Fitting of Multiple Neural Networks

In many cases users are interested in fitting multiple neural networks or parameters simultaneously. This tutorial addresses how to perform this kind of study.

The following is a fully working demo on the Fitzhugh-Nagumo ODE:

using DiffEqFlux, OrdinaryDiffEq, DiffEqSensitivity, Flux, Optim

function fitz(du,u,p,t)
  v,w = u
  a,b,τinv,l = p
  du[1] = v - v^3/3 -w + l
  du[2] = τinv*(v +  a - b*w)

p_ = Float32[0.7,0.8,1/12.5,0.5]
u0 = [1f0;1f0]
tspan = (0f0,10f0)
prob = ODEProblem(fitz,u0,tspan,p_)
sol = solve(prob, Tsit5(), saveat = 0.5 )

# Ideal data
X = Array(sol)
Xₙ = X + Float32(1e-3)*randn(eltype(X), size(X))  #noisy data

# For xz term
NN_1 = FastChain(FastDense(2, 16, tanh), FastDense(16, 1))
p1 = initial_params(NN_1)

# for xy term
NN_2 = FastChain(FastDense(3, 16, tanh), FastDense(16, 1))
p2 = initial_params(NN_2)
scaling_factor = 1f0

p = [p1;p2;scaling_factor]
function dudt_(u,p,t)
    v,w = u
    z1 = NN_1([v,w], p[1:length(p1)])
    z2 = NN_2([v,w,t], p[(length(p1)+1):end-1])
prob_nn = ODEProblem(dudt_,u0, tspan, p)
sol_nn = solve(prob_nn, Tsit5(),saveat = sol.t)

function predict(θ)
    Array(solve(prob_nn, Vern7(), p=θ, saveat = sol.t,
                         abstol=1e-6, reltol=1e-6,
                         sensealg = InterpolatingAdjoint(autojacvec=ReverseDiffVJP(true))))

# No regularisation right now
function loss(θ)
    pred = predict(θ)
    sum(abs2, Xₙ .- pred), pred
const losses = []
callback(θ,l,pred) = begin
    push!(losses, l)
    if length(losses)%50==0

res1_uode = DiffEqFlux.sciml_train(loss, p, ADAM(0.01), cb=callback, maxiters = 500)
res2_uode = DiffEqFlux.sciml_train(loss, res1_uode.minimizer, BFGS(initial_stepnorm=0.01), cb=callback, maxiters = 10000)

The key is that sciml_train acts on a single parameter vector p. Thus what we do here is concatenate all of the parameters into a single vector p = [p1;p2;scaling_factor] and then train on this parameter vector. Whenever we need to evaluate the neural networks, we cut the vector and grab the portion that corresponds to the neural network. For example, the p1 portion is p[1:length(p1)], which is why the first neural network's evolution is written like NN_1([v,w], p[1:length(p1)]).

This method is flexible to use with many optimizers and in fairly optimized ways. The allocations can be reduced by using @view p[1:length(p1)]. We can also see with the scaling_factor that we can grab parameters directly out of the vector and use them as needed.