The magic of Artificial Neural Network — Part 2

8 min readApr 10, 2024

This is my second article on unlocking the magic of Artificial Neural Network. In part 1, the space of real-valued continuous function C(Im) is introduced. We have proved that C(Im) is both a Topological Space and a Normed Space. Let’s continue our journey…

Detail proof of Universal Approximation Theorem — Part 1

This is an article for those who are interested in Artificial Neural Network and want to know why it works. In 1989…

simonkwan-35335.medium.com

Proof of the Universal Approximation Theorem

Let S be the set of all artificial neural network functions. From the lens of Topology, we find that the Universal Approximation Theorem will be proven once we can show that the Closure of S is dense in C(Im). That is, Closure of S equals the space of real-valued continuous function, Cl(S) = C(Im).

Proof
Cl(S) is dense in C(Im)
Suppose Cl(S) is not dense in C(Im)
⇒ Cl(S) ⊂ C(Im) and Cl(S) is a closed linear subspace of C(Im)
⇒ For h∈C(Im)\Cl(S), there exists Continuous Linear functional L on C(Im) exists s.t.
L(w) = 0 ∀w∈Cl(S)
L(h) ≠ 0 [ By Hahn-Banach Theorem ]
⇒ L(h) = ∫ h(x)dμ(x) [ By RMK Representation Theorem ]
⇒ L(σ(wTx+b))=∫ σ(wTx+b)dμ(x)=0 ∀w∈Rn and b∈R [ Since σ(yTx+θ) ∈ S ]
⇒ μ=0 [ Since σ is discriminatory ]
⇒ L(h) = ∫ h(x)dμ(x) = 0
⇒ contradict L(h) ≠ 0

I will explain the proof steps one by one in the following sections. First, we prove that Cl(S) is not just a subset of C(Im), it is indeed a closed linear subspace too.

Cl(S) is a closed linear subspace in C(lm)

UA1. S is a linear subspace in C(lm).
Proof
∀f, g ∈S
⇒ f(x)=∑ υ[i]σ(w[i]Tx+b[i]) where 1≤i≤M
g(x)=∑ υ’[i]σ(w’[i]Tx+b’[i]) where 1≤i≤N
⇒ (f+g)(x)=∑ υ’’[i]σ(w’’[i]Tx+b’’[i]) where 1≤i≤M+N
⇒ f+g ∈ S
∀f ∈S, c∈ℝ and c≠0
⇒ (c·f)(x) = c·f(x) = ∑ c·υ[i]σ(w[i]Tx+b[i]) where 1≤i≤M
⇒ c·f ∈ S
⇒ S is a linear subspace of C(Im) [ Since f+g, c·f ∈ S ∀f, g ∈S ]

The next step is to prove that Cl(S) is a also a linear subspace in C(lm) and the following theorem will be used for the proof.

NS1. The closure of linear subspace S of a normed space (V, ∥·∥) is a linear subspace of (V, ∥·∥)
Proof
∀f, g ∈ Cl(S)
=> f, g ∈ S⋃S’ [ By CL4 where S’ is the collection of all limit points of S ]
Case 1: f, g ∈ S’
∀r∈ℝand r>0, {p∈V|∥f-p∥<0.5r} and {q∈V|∥g-q∥<0.5r} are open set
=> ∃p, q ∈S s.t. ∥f-p∥ < 0.5r and ∥g-q∥ < 0.5r and p≠f, q≠g [ By LP1 ]
=> ∃p+q ∈S s.t. ∥f+g-(p+q)∥ ≤ ∥f-p∥+∥g-q∥ < r and f+g≠p+q [ By N4 and S is a subspace ]
=> f+g ∈ S’
Case 2: f∈ S, g ∈ S’
∀r∈ℝand r>0, {q∈V|∥g-q∥<r} is open set
=> ∃q ∈S s.t. ∥g-q∥ < r and q≠g [ By LP1 ]
=> ∃f+q ∈S s.t. ∥f+g-(f+q)∥ = ∥g-q∥ < r and f+g≠f+q [ Since S is a subspace ]
=> f+g ∈ S’ [ Same for f∈ S’, g∈ S case ]
Case 3: f, g ∈ S
=> f+g ∈ S [ By S is a subspace ]
=> f+g ∈ Cl(S) in all 3 cases
∀f ∈ Cl(S), c∈ℝand c≠0
=> f ∈ S⋃S’ [ By CL4 where S’ is the collection of all limit points of S ]
Case 1: f ∈ S’
∀r∈ℝand r>0, {p∈V|∥f-p∥<r/|c|} is open set
=> ∃p ∈S s.t. ∥f-p∥ < r/|c| and p≠f [ By LP1 ]
=> ∃c·p ∈S s.t. ∥c·f-c·p∥ =|c|∥f-p∥ < r and c·f≠c·p [ Since S is a subspace ]
=> c·f ∈ S’
Case 2: f∈ S
=> c·f∈ S [ Since S is a subspace ]
=> c·f ∈ Cl(S) in both cases
=> Cl(S) is a closed linear subspace in X [ Since f+g, c·f ∈ Cl(S) ∀f, g ∈Cl(S) ]

UA2. Cl(S) is a closed linear subspace in C(lm)
Proof
Cl(S) is a closed set in C(lm) [ By CL2 ]

S is a linear subspace in C(lm) [ By UA1 ]
⇒ Cl(S) is a linear subspace in C(lm) [ Since C(lm) is a normed space and by NS1 ]

Quotient Space of a Normed Space

Quotient Space is a powerful tool in linear algebra. It provides a way to simplify the structure of a vector space by “collapsing” certain elements into equivalence classes. This simplification can make the proof more easy.

Quotient Space is a vector space that can be constructed from the linear subspace N of a vector space V by using the equivalence relation ~ on V as follows:

∀x, y∈V, x~y iff x-y ∈ N with linear subspace N
~ is an equivalence relation
Proof
x-x=0 ∈ N ⇒ x~x
x~y ⇒ x-y ∈ N ⇒ -x+y ∈ N ⇒ y~x
x~y and y~z ⇒ x-y ∈ N and y-z ∈ N ⇒ x-z = (x-y)+(y-z) ∈ N ⇒ x~z

Since ~ is an equivalence relation, the partition V/~ over V exists:
V/~ = { [x] | x ∈ V } where [x] = { y ∈ V | y ~ x }
[x] is called the equivalence class which is the element of the partition V/~. When we define the following 2 operations on [x], V/~ will become a vector space which is called Quotient Space and will be denoted by V/N.

Define operations +, * on V/N as follows:
∀[x], [y]∈V/N, ∀c∈R with linear subspace N
[x]+[y] = [x+y], c*[x] = [c·x]
Then, + and * are well defined and V/N with operation + and * is a vector space.
Proof
Suppose [x]=[x’], [y]=[y’] where x, x’, y, y’∈V
For operation +
⇒ x-x’, y-y’∈N ⇒ x-x’+y-y’∈N ⇒ x+y-(x’+y’) ∈N ⇒ x+y ~ x’+y’
⇒ [x+y]=[x’+y’] ⇒ [x]+[y]=[x’]+[y’]
For operation *
⇒ c·(x-x’)∈N ⇒ c·x-c·x’∈N ⇒c·x~c·x’⇒ [c·x]=[c·x’]⇒c*[x]=c*[x’]

∀[x],[y]∈V/N, ∀a,b∈R
associativity:
[x]+([y]+[z])=[x]+[y+z]=[x+(y+z)]=[(x+y)+z]=[x+y]+[z]=([x]+[y])+[z]
commutativity:
[x]+[y]=[x+y]=[y+x]=[y]+[x]
identity element:
[0]+[x]=[0+x]=[x]
inverse elements:
[-x]+[x]=[-x+x]=[0]
compatibility:
a*(b*[x])=a*[b·x]=[a·(b·x)]=[(a·b)·x)]=(a·b)*[x]
1*[x]=[1·x]=[x]
distributivity:
a*([x]+[y])=a*[x+y]=[a·(x+y)]=[a·x+a·y]=[a·x]+[a·y]=a*[x]+a*[y]
(a+b)*[x]=[(a+b)·x]=[a·x+b·x]=[a·x]+[b·x]=a*[x]+b*[x]

Quotient Norm of Quotient Space

When V is the normed space (V, ∥·∥) and N is its linear subspace, we can construct the quotient space V/N by operation + and *. Can we equip V/N with a norm? The answer is Yes. One more condition on N is that it must be a closed set. The norm defined on the quotient space V/N is called Quotient Norm:

Define ∥·∥’: V/N → ℝ, for any [x]∈V/N with closed linear subspace N
∥[x]∥’ = inf { ∥x-z∥ | z∈N }
∥·∥’ is a norm satisfying the following conditions:
N1: ∥[x]∥’ ≥ 0
N2: ∥[x]∥’ = 0 iff [x] = [0]
N3: ∥c*[x]∥’ = |c|∥[x]∥’ ∀c∈ℝ
N4: ∥[x]+[y]∥’ ≤ ∥[x]∥’ + ∥[y]∥’

Proof
Since ∥·∥ ≥ 0, inf ∥x-z∥ exists by Greatest Lower Bound Axiom.
Suppose [x]=[x’] where x, x’ ∈V
=> x-x’ = n∈N
∥[x]∥’ = inf { ∥x-z∥ | z∈N }
=> ∥[x]∥’ = inf { ∥x’+n-z∥ } = inf { ∥x’-(z-n)∥ } = ∥[x’]∥’ [ Since z-n∈N ]
=> ∥·∥’ is well defined

N1: ∀z∈N ∥x-z∥’ ≥ 0 => ∥[x]∥’ = inf { ∥x-z∥’ | z∈N } ≥ 0

N2:
∥[x]∥’ = 0 iff [x]=[0]
(if)
[x]=[0] => ∥[x]∥’ = ∥[0]∥’ = inf { ∥z∥ | z ∈ N }
=> ∥[x]∥’= 0 [ Since 0 ∈ N and N2 for norm ∥·∥]
(only if)
∥[x]∥’ = inf { ∥x-z∥ | z ∈ N } = 0 ⇒ 0 is the greatest lower bound
=> ∀k>0, ∃z∈N s.t. ∥x-z∥ < 1/k [ Since 1/k>0, 1/k is not a lower bound ]
=> {z∈N| ∥x-z∥<1/k} ≠ ø
Consider indexed family {B[k] = {z∈N| ∥x-z∥<1/k} } where k=1,2,..
Consider S[k]= B[1]∩B[2]∩..B[k]
=> S[k]≠ø [ Since B[k]≠ø and x∈B[k] ∀k>0 ]
=> (z[k]) exists [ By Axiom of Choice, pick z[k]∈ S[k] for k=1,2.. ]
=> ∀B[k], ∀j≥k z[j]∈B[k] [ Since z[j] ∈ S[j]=B[1]∩B[2]∩..B[j] ]
=> z[k] → x and (z[k]) ⊂ N [ By definition of limit and z[k]∈B[k]⊂N ]
=> x ∈ N [ Since N is closed, then by CS5 ]
=> x-0 ∈ N ⇒ x~0 ⇒ [x]=[0]

N3:
∥c*[x]∥’=∥[c·x]∥’=inf { ∥c·x-z∥ | z∈N }
= inf { |c| ∥x-(1/c)·z∥ | z∈N } =|c| inf { ∥x-(1/c)·z∥ | z∈N } [ Since (1/c)·z∈N ]
=|c| ∥[x]∥’

N4:
∥[x]∥’+∥[y]∥’
= inf { ∥x-z1∥ | z1∈N } + inf { ∥y-z2∥ | z2∈N }
= inf { ∥x-z1∥+∥y-z2∥ | z1, z2∈N }
≥ inf { ∥x-z1+y-z2∥ | z1,z2 ∈N } = inf { ∥x+y-(z1+z2)∥ | z1,z2 ∈N } = ∥[x+y]∥’ [Since z1+z2∈N ]
=> ∥[x]∥’+∥[y]∥’ ≥ ∥[x]+[y]∥’ [ Since [x]+[y] = [x+y] ]

Therefore, (V/N, ∥·∥’) is a normed space.

NS2. Let V be a normed space and N⊆V be the closed linear subspace in V.
Then (V/N, ∥·∥’) is a normed space.

Relating to the proof of Universal Approximation Theorem, (Cl(S)/C(lm),∥·∥’) is a normed space.

Linear functional in C(Im)

Cybenko uses the Hahn-Banach Theorem from Functional Analysis to prove the existence of the linear functional in C(Im) when Cl(S) is a closed subspace in C(Im). So, what is a linear functional ? A linear functional is a linear map from a vector space to its field of scalars. In our case, we will only consider the field ℝ, the real numbers.

A Linear Functional L on a real vector space V is a function L:V→ℝ, which satisfies the following properties:
∀v, w∈V, c∈ℝ
LF1. L(v+w)=L(v)+L(w)
LF2. c·L(v)=L(c·v)

Hahn-Banach theorem for Normed Space

This theorem, named after Hans Hahn and Stefan Banach, is a central tool in functional analysis. It allows the extension of linear functionals defined on a subspace of the normed space to the whole space:

NS3. Let (V, ∥·∥) be the normed space and N⊆V be the subspace in V. Let l:N→ℝ be a continuous linear functional defined on N. Then, there is a continuous linear functional L:V->ℝsuch that l(v)=L(v) ∀ v∈N and ∥ l ∥ = ∥L∥

At first glance it seems that this theorem can not be applied since it request a linear functional defined on subspace N. We do not have any linear functional yet! Interestingly, by considering the Quotient Space V/N, the existence of linear functional on V can be proved! I will show you the trick in part 3.

Conclusion

In this post, I have sketched the proof of the Universal Approximation Theorem. I first proved that the set of all artificial neural network functions, S , is a linear subspace of C(lm). Then, Cl(S) is also a linear subspace of C(lm). The Quotient Space and Quotient Norm is introduced and finally (Cl(S)/C(lm),∥·∥’) is proved to be a normed space.

In part 3, the Hahn-Banach theorem will be proved and its application will be discussed.

If you want to know more about topology, please visit my series of articles on this topic: