# FastChain

The `FastChain`

system is a Flux-like explicit parameter neural network architecture system for less overhead in smaller neural networks. It acts explicitly, avoiding internal references to reduce overhead, while using explicitly defined adjoints to fuse operations. For neural networks with layers of lengths >~200, these optimizations are overshadowed by the cost of matrix multiplication. However, for smaller layer operations, this architecture can reduce a lot of the overhead traditionally seen in neural network architectures and thus is recommended in a lot of scientific machine learning use cases.

## Basics

The basic principle is that `FastChain`

is a collection of functions of two values, `(x,p)`

, and chains these functions to call one after the next. Each layer in this chain gets a pre-defined amount of parameters sent to it. For example,

```
f = FastChain((x,p) -> x.^3,
FastDense(2,50,tanh),
FastDense(50,2))
```

`FastChain`

here has a `2*50 + 50`

length parameter `FastDense(2,50,tanh)`

function and a `50*2 + 2`

parameter function `FastDense(50,2)`

. The first function gets the default number of parameters which is 0. Thus, `f(x,p)`

is equivalent to the following code:

```
function f(x,p)
tmp1 = x.^3
len1 = paramlength(FastDense(2,50,tanh))
tmp2 = FastDense(2,50,tanh)(tmp1,@view p[1:len1])
tmp3 = FastDense(50,2)(tmp2,@view p[(len1+1):end])
end
```

`FastChain`

functions thus require that the vector of neural network parameters is passed to it on each call, making the setup explicit in the passed parameters.

To get initial parameters for the optimization of a function defined by a `FastChain`

, one simply calls `initial_params(f)`

, which returns the concatenation of the initial parameters for each layer. Notice that since all parameters are explicit, constructing and reconstructing chains/layers can be a memory-free operation, since the only memory is the parameter vector itself, which is handled by the user.

### FastChain Interface

The only requirement to be a layer in `FastChain`

is to be a 2-argument function `l(x,p)`

and define the following traits:

`paramlength(::typeof(l))`

: The number of parameters from the parameter vector to allocate to this layer. Defaults to zero.`initial_params(::typeof(l))`

: The function for defining the initial parameters of the layer. Should output a vector of length matching`paramlength`

. Defaults to`Float32[]`

.

## FastChain-Compatible Layers

The following pre-defined layers can be used with `FastChain`

:

`DiffEqFlux.FastDense`

— TypeFastDense(in,out,activation=identity; bias = true, precache = false ,initW = Flux.glorot_uniform, initb = Flux.zeros32)

A Dense layer `activation.(W*x + b)`

with input size `in`

and output size `out`

. The `activation`

function defaults to `identity`

, meaning the layer is an affine function. Initial parameters are taken to match `Flux.Dense`

. 'bias' represents b in the layer and it defaults to true.'precache' is used to preallocate memory for the intermediate variables calculated during each pass. This avoids heap allocations in each pass which would otherwise slow down the computation, it defaults to false.

Note that this function has specializations on `tanh`

for a slightly faster adjoint with Zygote.

`DiffEqFlux.StaticDense`

— TypeStaticDense(in,out,activation=identity; initW = Flux.glorot_uniform, initb = Flux.zeros32)

A Dense layer `activation.(W*x + b)`

with input size `in`

and output size `out`

. The `activation`

function defaults to `identity`

, meaning the layer is an affine function. Initial parameters are taken to match `Flux.Dense`

. The internal calculations are done with `StaticArrays`

for extra speed for small linear algebra operations. Should only be used for input/output sizes of approximately 16 or less. 'bias' represents bias(b) in the dense layer and it defaults to true.

Note that this function has specializations on `tanh`

for a slightly faster adjoint with Zygote.