Neural networks – what is the point of having sigmoid activation function $\sigma(.)$ AND sigmoid g(t)?
$\begingroup$
Just so we’re all on the same page, this is the classic neural network set up as I understand it:
$$z_j=\sigma(\alpha_0j+\alpha_j^T x)$$
$$t=\beta_0+\Sigma^k_{j=1}\beta_j z_j)=\beta_0+\beta^Tx $$
$$ y=f(x)=g(t) $$
(I may have missed a few bits off like the hat on the y and some bold to show the vectors, but you get the idea)
$\sigma(.)$ is the activation function which can be a sigmoid function (can also be tanh, ReLu or threshold), but then in the final line there is another sigmoid function in g(t).
What is the logic in having two sigmoid functions? I get that this is what happens, but haven’t yet seen any justification for the setup like this.
Please dumb it down as much as possible for me. I do not understand the meaning of the word “patronise”.
Sometimes it feels like neural networks are just dumping input into a massive bag of arbitrary linear algebra, stirring it around, and hoping the output is something that you expect it to be…