# Simultaneous Fitting of Multiple Neural Networks

In many cases users are interested in fitting multiple neural networks or parameters simultaneously. This tutorial addresses how to perform this kind of study.

The following is a fully working demo on the Fitzhugh-Nagumo ODE:

```
using DiffEqFlux, OrdinaryDiffEq, DiffEqSensitivity, Flux, Optim
function fitz(du,u,p,t)
v,w = u
a,b,τinv,l = p
du[1] = v - v^3/3 -w + l
du[2] = τinv*(v + a - b*w)
end
p_ = Float32[0.7,0.8,1/12.5,0.5]
u0 = [1f0;1f0]
tspan = (0f0,10f0)
prob = ODEProblem(fitz,u0,tspan,p_)
sol = solve(prob, Tsit5(), saveat = 0.5 )
# Ideal data
X = Array(sol)
Xₙ = X + Float32(1e-3)*randn(eltype(X), size(X)) #noisy data
# For xz term
NN_1 = FastChain(FastDense(2, 16, tanh), FastDense(16, 1))
p1 = initial_params(NN_1)
# for xy term
NN_2 = FastChain(FastDense(3, 16, tanh), FastDense(16, 1))
p2 = initial_params(NN_2)
scaling_factor = 1f0
p = [p1;p2;scaling_factor]
function dudt_(u,p,t)
v,w = u
z1 = NN_1([v,w], p[1:length(p1)])
z2 = NN_2([v,w,t], p[(length(p1)+1):end-1])
[z1[1],p[end]*z2[1]]
end
prob_nn = ODEProblem(dudt_,u0, tspan, p)
sol_nn = solve(prob_nn, Tsit5(),saveat = sol.t)
function predict(θ)
Array(solve(prob_nn, Vern7(), p=θ, saveat = sol.t,
abstol=1e-6, reltol=1e-6,
sensealg = InterpolatingAdjoint(autojacvec=ReverseDiffVJP(true))))
end
# No regularisation right now
function loss(θ)
pred = predict(θ)
sum(abs2, Xₙ .- pred), pred
end
loss(p)
const losses = []
callback(θ,l,pred) = begin
push!(losses, l)
if length(losses)%50==0
println(losses[end])
end
false
end
res1_uode = DiffEqFlux.sciml_train(loss, p, ADAM(0.01), cb=callback, maxiters = 500)
res2_uode = DiffEqFlux.sciml_train(loss, res1_uode.minimizer, BFGS(initial_stepnorm=0.01), cb=callback, maxiters = 10000)
```

The key is that `sciml_train`

acts on a single parameter vector `p`

. Thus what we do here is concatenate all of the parameters into a single vector `p = [p1;p2;scaling_factor]`

and then train on this parameter vector. Whenever we need to evaluate the neural networks, we cut the vector and grab the portion that corresponds to the neural network. For example, the `p1`

portion is `p[1:length(p1)]`

, which is why the first neural network's evolution is written like `NN_1([v,w], p[1:length(p1)])`

.

This method is flexible to use with many optimizers and in fairly optimized ways. The allocations can be reduced by using `@view p[1:length(p1)]`

. We can also see with the `scaling_factor`

that we can grab parameters directly out of the vector and use them as needed.