The magic of Artificial Neural Network — Part 2
This is my second article on unlocking the magic of Artificial Neural Network. In part 1, the space of real-valued continuous function C(Im) is introduced. We have proved that C(Im) is both a Topological Space and a Normed Space. Let’s continue our journey…
Proof of the Universal Approximation Theorem
Let S be the set of all artificial neural network functions:
S = {f ∈ C(Im)|f(x)=∑υ[i]σ(w[i]Tx+b[i]) where v[i],b[i]∈ℝ,w[i]∈ℝm,1≤i≤M }
From the lens of Topology, we find that the Universal Approximation Theorem will be proven once we can show that the Closure of S is dense in C(Im). That is, Closure of S equals the space of real-valued continuous function, Cl(S) = C(Im).
Proof
Suppose Cl(S) is not dense in C(Im)
⇒ Cl(S) ≠ C(Im)
⇒ Cl(S) ⊂ C(Im) ⇒ Cl(S) is a closed linear subspace of C(Im)
⇒ For h∈C(Im)\Cl(S), there exists Continuous Linear functional L on C(Im) exists s.t.
L(w) = 0 ∀w∈Cl(S)
L(h) ≠ 0 [ By Hahn-Banach Theorem ]
⇒ L(h) = ∫ h(x)dμ(x) [ By RMK Representation Theorem ]
⇒ L(σ(wTx+b))=∫ σ(wTx+b)dμ(x)=0 ∀w∈Rn and b∈R [ Since σ(wTx+b) ∈ S ]
⇒ μ=0 [ Since σ is discriminatory ]
⇒ L(h) = ∫ h(x)dμ(x) = 0
⇒ contradict L(h) ≠ 0
I will explain the proof steps one by one in the following sections. First, we will prove that
Cl(S) is a closed linear subspace of C(Im)
But before that, we prove the following theorem for a normed space.
NS1. The closure of linear subspace A of a normed space (V, ∥·∥) is a linear subspace of (V, ∥·∥)
Proof
∀f, g ∈ Cl(A)
=> f, g ∈ A⋃A’ [ By CL4 where A’ is the collection of all limit points of A]
Case 1: f, g ∈ A’
∀r∈ℝand r>0, {p∈V|∥f-p∥<0.5r} and {q∈V|∥g-q∥<0.5r} are open set
=> ∃p, q ∈A s.t. ∥f-p∥ < 0.5r and ∥g-q∥ < 0.5r and p≠f, q≠g [ By LP1 ]
=> ∃p+q ∈A s.t. ∥f+g-(p+q)∥ ≤ ∥f-p∥+∥g-q∥ < r and f+g≠p+q [ By N4 and A is a subspace ]
=> f+g ∈ A’
Case 2: f∈ A, g ∈ A’
∀r∈ℝand r>0, {q∈V|∥g-q∥<r} is open set
=> ∃q ∈S s.t. ∥g-q∥ < r and q≠g [ By LP1 ]
=> ∃f+q ∈A s.t. ∥f+g-(f+q)∥ = ∥g-q∥ < r and f+g≠f+q [ Since A is a subspace ]
=> f+g ∈ A’ [ Same for f∈ A’, g∈ A case ]
Case 3: f, g ∈ A
=> f+g ∈ A [ By A is a subspace ]
=> f+g ∈ Cl(A) in all 3 cases
∀f ∈ Cl(A), c∈ℝand c≠0
=> f ∈ A⋃A’ [ By CL4 where A’ is the collection of all limit points of A ]
Case 1: f ∈ A’
∀r∈ℝand r>0, {p∈V|∥f-p∥<r/|c|} is open set
=> ∃p ∈S s.t. ∥f-p∥ < r/|c| and p≠f [ By LP1 ]
=> ∃c·p ∈A s.t. ∥c·f-c·p∥ =|c|∥f-p∥ < r and c·f≠c·p [ Since A is a subspace ]
=> c·f ∈ A’
Case 2: f∈ A
=> c·f∈ A [ Since A is a subspace ]
=> c·f ∈ Cl(A) in both cases
=> Cl(A) is a closed linear subspace in V [ Since f+g, c·f ∈ Cl(A) ∀f, g ∈Cl(A) ]
Cl(S) is a closed linear subspace in C(lm)
UA1. S is a linear subspace in C(lm).
Proof
∀f, g ∈S
⇒ f(x)=∑ υ[i]σ(w[i]Tx+b[i]) where 1≤i≤M
g(x)=∑ υ’[i]σ(w’[i]Tx+b’[i]) where 1≤i≤N
⇒ (f+g)(x)=∑ υ’’[i]σ(w’’[i]Tx+b’’[i]) where 1≤i≤M+N
⇒ f+g ∈ S
∀f ∈S, c∈ℝ and c≠0
⇒ (c·f)(x) = c·f(x) = ∑ c·υ[i]σ(w[i]Tx+b[i]) where 1≤i≤M
⇒ c·f ∈ S
⇒ S is a linear subspace of C(Im) [ Since f+g, c·f ∈ S ∀f, g ∈S ]
The next step is to prove that Cl(S) is a also a linear subspace in C(lm).
UA2. Cl(S) is a closed linear subspace in C(lm)
Proof
Cl(S) is a closed set in C(lm) [ By CL2 ]
S is a linear subspace in C(lm) [ By UA1 ]
⇒ Cl(S) is a linear subspace in C(lm) [ Since C(lm) is a normed space and by NS1 ]
Quotient Space of a Normed Space
Quotient Space is a powerful tool in linear algebra. It provides a way to simplify the structure of a vector space by “collapsing” certain elements into equivalence classes. This simplification can make the proof more easy.
Quotient Space is a vector space that can be constructed from the linear subspace N of a vector space V by using the equivalence relation ~ on V as follows:
∀x, y∈V, x~y iff x-y ∈ N with linear subspace N
~ is an equivalence relation
Proof
x-x=0 ∈ N ⇒ x~x
x~y ⇒ x-y ∈ N ⇒ -x+y ∈ N ⇒ y~x
x~y and y~z ⇒ x-y ∈ N and y-z ∈ N ⇒ x-z = (x-y)+(y-z) ∈ N ⇒ x~z
Since ~ is an equivalence relation, the partition V/~ over V exists:
V/~ = { [x] | x ∈ V } where [x] = { y ∈ V | y ~ x }
[x] is called the equivalence class which is the element of the partition V/~. When we define the following 2 operations on [x], V/~ will become a vector space which is called Quotient Space and will be denoted by V/N.
Define operations +, * on V/N as follows:
∀[x], [y]∈V/N, ∀c∈R with linear subspace N
[x]+[y] = [x+y], c*[x] = [c·x]
Then, + and * are well defined and V/N with operation + and * is a vector space.
Proof
Suppose [x]=[x’], [y]=[y’] where x, x’, y, y’∈V
For operation +
⇒ x-x’, y-y’∈N ⇒ x-x’+y-y’∈N ⇒ x+y-(x’+y’) ∈N ⇒ x+y ~ x’+y’
⇒ [x+y]=[x’+y’] ⇒ [x]+[y]=[x’]+[y’]
For operation *
⇒ c·(x-x’)∈N ⇒ c·x-c·x’∈N ⇒c·x~c·x’⇒ [c·x]=[c·x’]⇒c*[x]=c*[x’]
∀[x],[y]∈V/N, ∀a,b∈R
associativity:
[x]+([y]+[z])=[x]+[y+z]=[x+(y+z)]=[(x+y)+z]=[x+y]+[z]=([x]+[y])+[z]
commutativity:
[x]+[y]=[x+y]=[y+x]=[y]+[x]
identity element:
[0]+[x]=[0+x]=[x]
inverse elements:
[-x]+[x]=[-x+x]=[0]
compatibility:
a*(b*[x])=a*[b·x]=[a·(b·x)]=[(a·b)·x)]=(a·b)*[x]
1*[x]=[1·x]=[x]
distributivity:
a*([x]+[y])=a*[x+y]=[a·(x+y)]=[a·x+a·y]=[a·x]+[a·y]=a*[x]+a*[y]
(a+b)*[x]=[(a+b)·x]=[a·x+b·x]=[a·x]+[b·x]=a*[x]+b*[x]
Quotient Norm of Quotient Space
When V is the normed space (V, ∥·∥) and N is its linear subspace, we can construct the quotient space V/N by operation + and *. Can we equip V/N with a norm? The answer is Yes. One more condition on N is that it must be a closed set. The norm defined on the quotient space V/N is called Quotient Norm:
Define ∥·∥’: V/N → ℝ, for any [x]∈V/N with closed linear subspace N
∥[x]∥’ = inf { ∥x-z∥ | z∈N }
∥·∥’ is a norm satisfying the following conditions:
N1: ∥[x]∥’ ≥ 0
N2: ∥[x]∥’ = 0 iff [x] = [0]
N3: ∥c*[x]∥’ = |c|∥[x]∥’ ∀c∈ℝ
N4: ∥[x]+[y]∥’ ≤ ∥[x]∥’ + ∥[y]∥’
NS2. Let V be a normed space and N⊆V be the closed linear subspace in V.
Then (V/N, ∥·∥’) is a normed space.
Proof
Since ∥·∥ ≥ 0, inf ∥x-z∥ exists by Greatest Lower Bound Axiom.
Suppose [x]=[x’] where x, x’ ∈V
=> x-x’ = n∈N
∥[x]∥’ = inf { ∥x-z∥ | z∈N }
=> ∥[x]∥’ = inf { ∥x’+n-z∥ } = inf { ∥x’-(z-n)∥ } = ∥[x’]∥’ [ Since z-n∈N ]
=> ∥·∥’ is well defined
N1: ∀z∈N ∥x-z∥’ ≥ 0 => ∥[x]∥’ = inf { ∥x-z∥’ | z∈N } ≥ 0
N2:
∥[x]∥’ = 0 iff [x]=[0]
(if)
[x]=[0] => ∥[x]∥’ = ∥[0]∥’ = inf { ∥z∥ | z ∈ N }
=> ∥[x]∥’= 0 [ Since 0 ∈ N and N2 for norm ∥·∥]
(only if)
∥[x]∥’ = inf { ∥x-z∥ | z ∈ N } = 0 ⇒ 0 is the greatest lower bound
=> ∀k>0, ∃z∈N s.t. ∥x-z∥ < 1/k [ Since 1/k>0, 1/k is not a lower bound ]
=> {z∈N| ∥x-z∥<1/k} ≠ ø
Consider indexed family {B[k] = {z∈N| ∥x-z∥<1/k} } where k=1,2,..
Consider S[k]= B[1]∩B[2]∩..B[k]
=> S[k]≠ø [ Since B[k]≠ø and x∈B[k] ∀k>0 ]
=> (z[k]) exists [ By Axiom of Choice, pick z[k]∈ S[k] for k=1,2.. ]
=> ∀B[k], ∀j≥k z[j]∈B[k] [ Since z[j] ∈ S[j]=B[1]∩B[2]∩..B[j] ]
=> z[k] → x and (z[k]) ⊂ N [ By definition of limit and z[k]∈B[k]⊂N ]
=> x ∈ N [ Since N is closed, then by CS5 ]
=> x-0 ∈ N ⇒ x~0 ⇒ [x]=[0]
N3:
∥c*[x]∥’=∥[c·x]∥’=inf { ∥c·x-z∥ | z∈N }
= inf { |c| ∥x-(1/c)·z∥ | z∈N } =|c| inf { ∥x-(1/c)·z∥ | z∈N } [ Since (1/c)·z∈N ]
=|c| ∥[x]∥’
N4:
∥[x]∥’+∥[y]∥’
= inf { ∥x-z1∥ | z1∈N } + inf { ∥y-z2∥ | z2∈N }
= inf { ∥x-z1∥+∥y-z2∥ | z1, z2∈N }
≥ inf { ∥x-z1+y-z2∥ | z1,z2 ∈N } = inf { ∥x+y-(z1+z2)∥ | z1,z2 ∈N } = ∥[x+y]∥’ [Since z1+z2∈N ]
=> ∥[x]∥’+∥[y]∥’ ≥ ∥[x]+[y]∥’ [ Since [x]+[y] = [x+y] ]
Therefore, (V/N, ∥·∥’) is a normed space.
Linear functional in C(Im)
Cybenko uses the Hahn-Banach Theorem from Functional Analysis to prove the existence of the linear functional in C(Im) when Cl(S) is a closed subspace in C(Im). So, what is a linear functional ? A linear functional is a linear map from a vector space to its field of scalars. In our case, we will only consider the field ℝ, the real numbers.
A Linear Functional L on a real vector space V is a function L:V→ℝ, which satisfies the following properties:
∀v, w∈V, c∈ℝ
LF1. L(v+w)=L(v)+L(w)
LF2. c·L(v)=L(c·v)
Hahn-Banach theorem for Normed Space
This theorem, named after Hans Hahn and Stefan Banach, is a central tool in functional analysis. It allows the extension of linear functionals defined on a subspace of the normed space to the whole space:
NS3. Let (V, ∥·∥) be the normed space and N⊆V be the subspace in V. Let l:N→ℝ be a continuous linear functional defined on N. Then, there is a continuous linear functional L:V->ℝsuch that l(v)=L(v) ∀ v∈N and ∥ l ∥ = ∥L∥
At first glance it seems that this theorem can not be applied since it request a linear functional defined on subspace N. We do not have any linear functional yet! Interestingly, by considering the Quotient Space V/N, the existence of linear functional on V can be proved! I will show you the trick in part 3.
Conclusion
In this post, I have sketched the proof of the Universal Approximation Theorem. I first proved that the set of all artificial neural network functions, S , is a linear subspace of C(lm). Then, Cl(S) is also a linear subspace of C(lm). The Quotient Space and Quotient Norm is introduced. In part 3, the Hahn-Banach theorem will be proved and its application will be discussed.
If you want to know more about topology, please visit my series of articles on this topic:
General Topology — Part 6 (Hausdorff Space)
Hausdorff Space and Compact Set
simonkwan-35335.medium.com
Your feedback is highly appreciated and will help me to continue my articles.
Please give this post a clap if you like this post. Thanks!!
Reference:
“https://en.wikipedia.org/wiki/Vector_space”
“https://en.wikipedia.org/wiki/Linear_subspace”
“https://en.wikipedia.org/wiki/Normed_vector_space”
“https://en.wikipedia.org/wiki/Dense_set”
“https://en.wikipedia.org/wiki/Functional_analysis”
“https://en.wikipedia.org/wiki/Quotient_space_(linear_algebra)”
“https://proofwiki.org/wiki/Definition:Quotient_Norm”
“https://en.wikipedia.org/wiki/Hahn_Banach_theorem”
“https://en.wikipedia.org/wiki/Riesz_Markov_Kakutani_representation_theorem”
“https://www.proofwiki.org/wiki/Definition:Linear_Functional”
“https://en.wikipedia.org/wiki/Axiom_of_choice”
“https://en.wikipedia.org/wiki/Infimum_and_supremum”