The magic of Artificial Neural Network — Part 2

Simon Kwan
8 min readApr 10, 2024

--

Photo by FlyD on Unsplash

This is my second article on unlocking the magic of Artificial Neural Network. In part 1, the space of real-valued continuous function C(Im) is introduced. We have proved that C(Im) is both a Topological Space and a Normed Space. Let’s continue our journey…

Proof of the Universal Approximation Theorem

Let S be the set of all artificial neural network functions:

S = {f ∈ C(Im)|f(x)=∑υ[i]σ(w[i]Tx+b[i]) where v[i],b[i]∈ℝ,w[i]∈ℝm,1≤i≤M }

From the lens of Topology, we find that the Universal Approximation Theorem will be proven once we can show that the Closure of S is dense in C(Im). That is, Closure of S equals the space of real-valued continuous function, Cl(S) = C(Im).

Proof
Suppose Cl(S) is not dense in C(Im)
⇒ Cl(S) ≠ C(Im)
⇒ Cl(S) ⊂ C(Im) ⇒ Cl(S) is a closed linear subspace of C(Im)
⇒ For h∈C(Im)\Cl(S), there exists Continuous Linear functional L on C(Im) exists s.t.
L(w) = 0 ∀w∈Cl(S)
L(h) 0 [ By Hahn-Banach Theorem ]
⇒ L(h) = ∫ h(x)dμ(x) [ By RMK Representation Theorem ]
⇒ L(σ(wTx+b))=∫ σ(wTx+b)dμ(x)=0 ∀w∈Rn and b∈R [ Since σ(wTx+b) ∈ S ]
⇒ μ=0 [ Since σ is discriminatory ]
⇒ L(h) = ∫ h(x)dμ(x) = 0
contradict L(h) 0

I will explain the proof steps one by one in the following sections. First, we will prove that

Cl(S) is a closed linear subspace of C(Im)

But before that, we prove the following theorem for a normed space.

NS1. The closure of linear subspace A of a normed space (V, ∥·∥) is a linear subspace of (V, ∥·∥)
Proof
∀f, g ∈ Cl(A)
=> f, g ∈ A⋃A’ [ By CL4 where A’ is the collection of all limit points of A]
Case 1: f, g ∈ A’
∀r∈ℝand r>0, {p∈V|∥f-p∥<0.5r} and {q∈V|∥g-q∥<0.5r} are open set
=> ∃p, q ∈A s.t. ∥f-p∥ < 0.5r and ∥g-q∥ < 0.5r and p≠f, q≠g [ By LP1 ]
=> ∃p+q ∈A s.t. ∥f+g-(p+q)∥ ≤ ∥f-p∥+∥g-q∥ < r and f+g≠p+q [ By N4 and A is a subspace ]
=> f+g ∈ A’
Case 2: f∈ A, g ∈ A’
∀r∈ℝand r>0, {q∈V|∥g-q∥<r} is open set
=> ∃q ∈S s.t. ∥g-q∥ < r and q≠g [ By LP1 ]
=> ∃f+q ∈A s.t. ∥f+g-(f+q)∥ = ∥g-q∥ < r and f+g≠f+q [ Since A is a subspace ]
=> f+g ∈ A’ [ Same for f∈ A’, g∈ A case ]
Case 3: f, g ∈ A
=> f+g ∈ A [ By A is a subspace ]
=> f+g ∈ Cl(A) in all 3 cases
∀f ∈ Cl(A), c∈ℝand c≠0
=> f ∈ A⋃A’ [ By CL4 where A’ is the collection of all limit points of A ]
Case 1: f ∈ A’
∀r∈ℝand r>0, {p∈V|∥f-p∥<r/|c|} is open set
=> ∃p ∈S s.t. ∥f-p∥ < r/|c| and p≠f [ By LP1 ]
=> ∃c·p ∈A s.t. ∥c·f-c·p∥ =|c|∥f-p∥ < r and c·f≠c·p [ Since A is a subspace ]
=> c·f ∈ A’
Case 2: f∈ A
=> c·f∈ A [ Since A is a subspace ]
=> c·f ∈ Cl(A) in both cases
=> Cl(A) is a closed linear subspace in V [ Since f+g, c·f ∈ Cl(A) ∀f, g ∈Cl(A) ]

Cl(S) is a closed linear subspace in C(lm)

UA1. S is a linear subspace in C(lm).
Proof
∀f, g ∈S
⇒ f(x)=∑ υ[i]σ(w[i]Tx+b[i]) where 1≤i≤M
g(x)=∑ υ’[i]σ(w’[i]Tx+b’[i]) where 1≤i≤N
⇒ (f+g)(x)=∑ υ’’[i]σ(w’’[i]Tx+b’’[i]) where 1≤i≤M+N
⇒ f+g ∈ S
∀f ∈S, c∈ℝ and c≠0
⇒ (c·f)(x) = c·f(x) = ∑ c·υ[i]σ(w[i]Tx+b[i]) where 1≤i≤M
⇒ c·f ∈ S
⇒ S is a linear subspace of C(Im) [ Since f+g, c·f ∈ S ∀f, g ∈S ]

The next step is to prove that Cl(S) is a also a linear subspace in C(lm).

UA2. Cl(S) is a closed linear subspace in C(lm)
Proof
Cl(S) is a closed set in C(lm) [ By CL2 ]

S is a linear subspace in C(lm) [ By UA1 ]
⇒ Cl(S) is a linear subspace in C(lm) [ Since C(lm) is a normed space and by NS1 ]

Quotient Space of a Normed Space

Quotient Space is a powerful tool in linear algebra. It provides a way to simplify the structure of a vector space by “collapsing” certain elements into equivalence classes. This simplification can make the proof more easy.

Quotient Space is a vector space that can be constructed from the linear subspace N of a vector space V by using the equivalence relation ~ on V as follows:

∀x, y∈V, x~y iff x-y ∈ N with linear subspace N
~ is an equivalence relation
Proof
x-x=0 ∈ N ⇒ x~x
x~y ⇒ x-y ∈ N ⇒ -x+y ∈ N ⇒ y~x
x~y and y~z ⇒ x-y ∈ N and y-z ∈ N ⇒ x-z = (x-y)+(y-z) ∈ N ⇒ x~z

Since ~ is an equivalence relation, the partition V/~ over V exists:
V/~ = { [x] | x ∈ V } where [x] = { y ∈ V | y ~ x }
[x] is called the equivalence class which is the element of the partition V/~. When we define the following 2 operations on [x], V/~ will become a vector space which is called Quotient Space and will be denoted by V/N.

Define operations +, * on V/N as follows:
∀[x], [y]∈V/N, ∀c∈R with linear subspace N
[x]+[y] = [x+y], c*[x] = [c·x]
Then, + and * are well defined and V/N with operation + and * is a vector space.
Proof
Suppose [x]=[x’], [y]=[y’] where x, x’, y, y’∈V
For operation +
⇒ x-x’, y-y’∈N ⇒ x-x’+y-y’∈N ⇒ x+y-(x’+y’) ∈N ⇒ x+y ~ x’+y’
⇒ [x+y]=[x’+y’] ⇒ [x]+[y]=[x’]+[y’]
For operation *
⇒ c·(x-x’)∈N ⇒ c·x-c·x’∈N ⇒c·x~c·x’⇒ [c·x]=[c·x’]⇒c*[x]=c*[x’]

∀[x],[y]∈V/N, ∀a,b∈R
associativity:
[x]+([y]+[z])=[x]+[y+z]=[x+(y+z)]=[(x+y)+z]=[x+y]+[z]=([x]+[y])+[z]
commutativity:
[x]+[y]=[x+y]=[y+x]=[y]+[x]
identity element:
[0]+[x]=[0+x]=[x]
inverse elements:
[-x]+[x]=[-x+x]=[0]
compatibility:
a*(b*[x])=a*[b·x]=[a·(b·x)]=[(a·b)·x)]=(a·b)*[x]
1*[x]=[1·x]=[x]
distributivity:
a*([x]+[y])=a*[x+y]=[a·(x+y)]=[a·x+a·y]=[a·x]+[a·y]=a*[x]+a*[y]
(a+b)*[x]=[(a+b)·x]=[a·x+b·x]=[a·x]+[b·x]=a*[x]+b*[x]

Quotient Norm of Quotient Space

When V is the normed space (V, ∥·∥) and N is its linear subspace, we can construct the quotient space V/N by operation + and *. Can we equip V/N with a norm? The answer is Yes. One more condition on N is that it must be a closed set. The norm defined on the quotient space V/N is called Quotient Norm:

Define ∥·∥’: V/N → ℝ, for any [x]∈V/N with closed linear subspace N
∥[x]∥’ = inf { ∥x-z∥ | z∈N }
∥·∥’ is a norm satisfying the following conditions:
N1: ∥[x]∥’ ≥ 0
N2: ∥[x]∥’ = 0 iff [x] = [0]
N3: ∥c*[x]∥’ = |c|∥[x]∥’ ∀c∈ℝ
N4: ∥[x]+[y]∥’ ≤ ∥[x]∥’ + ∥[y]∥’

NS2. Let V be a normed space and N⊆V be the closed linear subspace in V.
Then (V/N, ∥·∥’) is a normed space.

Proof
Since ∥·∥ ≥ 0, inf ∥x-z∥ exists by Greatest Lower Bound Axiom.
Suppose [x]=[x’] where x, x’ ∈V
=> x-x’ = n∈N
∥[x]∥’ = inf { ∥x-z∥ | z∈N }
=> ∥[x]∥’ = inf { ∥x’+n-z∥ } = inf { ∥x’-(z-n)∥ } = ∥[x’]∥’ [ Since z-n∈N ]
=> ∥·∥’ is well defined

N1: ∀z∈N ∥x-z∥’ ≥ 0 => ∥[x]∥’ = inf { ∥x-z∥’ | z∈N } ≥ 0

N2:
∥[x]∥’ = 0 iff [x]=[0]
(if)
[x]=[0] => ∥[x]∥’ = ∥[0]∥’ = inf { ∥z∥ | z ∈ N }
=> ∥[x]∥’= 0 [ Since 0 ∈ N and N2 for norm ∥·∥]
(only if)
∥[x]∥’ = inf { ∥x-z∥ | z ∈ N } = 0 ⇒ 0 is the greatest lower bound
=> ∀k>0, ∃z∈N s.t. ∥x-z∥ < 1/k [ Since 1/k>0, 1/k is not a lower bound ]
=> {z∈N| ∥x-z∥<1/k} ≠ ø
Consider indexed family {B[k] = {z∈N| ∥x-z∥<1/k} } where k=1,2,..
Consider S[k]= B[1]∩B[2]∩..B[k]
=> S[k]≠ø [ Since B[k]≠ø and x∈B[k] ∀k>0 ]
=> (z[k]) exists [ By Axiom of Choice, pick z[k]∈ S[k] for k=1,2.. ]
=> ∀B[k], ∀j≥k z[j]∈B[k] [ Since z[j] ∈ S[j]=B[1]∩B[2]∩..B[j] ]
=> z[k] → x and (z[k]) ⊂ N [ By definition of limit and z[k]∈B[k]⊂N ]
=> x ∈ N [ Since N is closed, then by CS5 ]
=> x-0 ∈ N ⇒ x~0 ⇒ [x]=[0]

N3:
∥c*[x]∥’=∥[c·x]∥’=inf { ∥c·x-z∥ | z∈N }
= inf { |c| ∥x-(1/c)·z∥ | z∈N } =|c| inf { ∥x-(1/c)·z∥ | z∈N } [ Since (1/c)·z∈N ]
=|c| ∥[x]∥’

N4:
∥[x]∥’+∥[y]∥’
= inf { ∥x-z1∥ | z1∈N } + inf { ∥y-z2∥ | z2∈N }
= inf { ∥x-z1∥+∥y-z2∥ | z1, z2∈N }
≥ inf { ∥x-z1+y-z2∥ | z1,z2 ∈N } = inf { ∥x+y-(z1+z2)∥ | z1,z2 ∈N } = ∥[x+y]∥’ [Since z1+z2∈N ]
=> ∥[x]∥’+∥[y]∥’ ≥ ∥[x]+[y]∥’ [ Since [x]+[y] = [x+y] ]

Therefore, (V/N, ∥·∥’) is a normed space.

Linear functional in C(Im)

Cybenko uses the Hahn-Banach Theorem from Functional Analysis to prove the existence of the linear functional in C(Im) when Cl(S) is a closed subspace in C(Im). So, what is a linear functional ? A linear functional is a linear map from a vector space to its field of scalars. In our case, we will only consider the field ℝ, the real numbers.

A Linear Functional L on a real vector space V is a function L:V→ℝ, which satisfies the following properties:
∀v, w∈V, c∈ℝ
LF1. L(v+w)=L(v)+L(w)
LF2. c·L(v)=L(c·v)

Hahn-Banach theorem for Normed Space

This theorem, named after Hans Hahn and Stefan Banach, is a central tool in functional analysis. It allows the extension of linear functionals defined on a subspace of the normed space to the whole space:

NS3. Let (V, ∥·∥) be the normed space and N⊆V be the subspace in V. Let l:N→ℝ be a continuous linear functional defined on N. Then, there is a continuous linear functional L:V->ℝsuch that l(v)=L(v) ∀ v∈N and ∥ l ∥ = ∥L∥

At first glance it seems that this theorem can not be applied since it request a linear functional defined on subspace N. We do not have any linear functional yet! Interestingly, by considering the Quotient Space V/N, the existence of linear functional on V can be proved! I will show you the trick in part 3.

Conclusion

In this post, I have sketched the proof of the Universal Approximation Theorem. I first proved that the set of all artificial neural network functions, S , is a linear subspace of C(lm). Then, Cl(S) is also a linear subspace of C(lm). The Quotient Space and Quotient Norm is introduced. In part 3, the Hahn-Banach theorem will be proved and its application will be discussed.

If you want to know more about topology, please visit my series of articles on this topic:

Your feedback is highly appreciated and will help me to continue my articles.

Please give this post a clap if you like this post. Thanks!!

--

--

Simon Kwan
Simon Kwan

Written by Simon Kwan

"The fear of the LORD is the beginning of wisdom" ✝️Seasoned Java Programmer. MSc Comp Sc, BSc Math (1st Hon), BSc Mech Eng

No responses yet