The magic of Artificial Neural Network — Part 3
This is my third article on the magic of the Artificial Neural Network. I have sketched the proof of the Universal Approximation Theorem in part 2:
Let S be the set of all artificial neural network functions:
S = {f ∈ C(Im)|f(x)=∑υ[i]σ(w[i]Tx+b[i]) where v[i],b[i]∈ℝ,w[i]∈ℝm,1≤i≤M }
Suppose Cl(S) is not dense in C(Im)
⇒ Cl(S) ≠ C(Im)
⇒ Cl(S) ⊂ C(Im) ⇒ Cl(S) is a closed linear subspace of C(Im)
⇒ For h∈C(Im)\Cl(S), there exists Continuous Linear functional L on C(Im) such that
L(w) = 0 ∀w∈Cl(S)
L(h) ≠ 0 [ By Hahn-Banach Theorem ]
⇒ L(h) = ∫ h(x)dμ(x) [ By RMK Representation Theorem ]
⇒ L(σ(wTx+b))=∫ σ(wTx+b)dμ(x)=0 ∀w∈Rn and b∈R [ Since σ(wTx+b) ∈ S ]
⇒ μ=0 [ Since σ is discriminatory ]
⇒ L(h) = ∫ h(x)dμ(x) = 0
⇒ contradict L(h) ≠ 0
In this article, we are going to prove that:
For h∈C(Im)\Cl(S), there exists Continuous Linear functional L on C(Im) such that
L(w) = 0 ∀w∈Cl(S)
L(h) ≠ 0
Hahn-Banach theorem for the Vector Space
By using the Hahn-Banach Theorem, the linear functional defined on a vector subspace of some vector space can be extended to the whole space when the linear functional is dominated by a Sublinear Function. So, what is a Sublinear Function ?
A Sublinear Function p on a real vector space V is a function p:V→ℝ, which satisfies the following properties:
∀v, w∈V, c∈ℝ
SF1. p(v+w) ≤ p(v)+p(w)
SF2. c·p(v) = p(c·v) where c > 0
Below is the description of Hahn-Banach Theorem for vector space:
HA. Let V be the vector space and S⊆V be the linear subspace in V. Let l:S→ℝ be a linear functional defined on S. Let p:V→ℝbe a sublinear function defined on V such that l(v) ≤ p(v) ∀v∈S. Then, 1) there is a linear functional L:V→ℝ s.t. 2) L(v)=l(v) ∀ v∈S and 3) L(v) ≤ p(v) ∀v∈V.
Proof
Please read this article if you are interested in the proof.
Continuous Linear functional
For the linear functional on the normed space V, we can define the condition for which it is continuous or bounded. For the detail definition of continuous function, you can read this article. Let’s review the definition of a linear functional again.
A Linear Functional L on a real vector space V is a function L:V→ℝ, which satisfies the following properties:
∀v, w∈V, c∈ℝ
LF1. L(v+w)=L(v)+L(w)
LF2. c·L(v)=L(c·v)
LF3. A linear functional L:V→ℝ is said to be continuous at v0 ∈ V if
∀ε>0, ∃δ>0 such that |L(v)−L(v0)| < ε whenever ∥v−v0∥ < δ where v ∈ V
LF4. A linear functional L:V→ℝ is said to be continuous if it is continuous at v0 ∀v0 ∈ V
LF5. A linear functional L:V→ℝ is said to be bounded if
∃K > 0 such that ∀v ∈ V, |L(v)| ≤ K ∥v∥
LF6. A linear functional L is bounded iff L is continuous
Proof
(only if)
∀v0, v ∈ V, ε>0
=> |L(v-v0)| ≤ K ∥v-v0∥ [ since v-v0 ∈ V ]
=> |L(v)-L(v0)| ≤ K ∥v-v0∥ < ε whenever ∥v-v0∥ < ε/K where ε/K > 0
=> L is continuous at v0 => L is continuous [ Since v0 can be any point ]
(if)
L is continuous => L is continuous at 0
=> ∀ε>0, v ∈ V, |L(v)-L(0)| < ε whenever ∥v-0∥ < δ for some δ > 0
=> |L(v)| < 1 whenever ∥v∥ < δ for some δ > 0 [ since L(0) = 0 and put ε = 1 ]
Let w = (δ/2) (v/∥v∥) with v ≠ 0
∥w∥ = ∥ (δ/2) (v/∥v∥) ∥ = (δ/2)(1/∥v∥)∥v∥ = δ/2 < δ
=> |L(w)| < 1
=> |L((δ/2) (v/∥v∥))| < 1
=> (δ/2)(1/∥v∥) |L(v)| < 1 => |L(v)| < 2/δ ∥v∥
=> |L(v)| ≤ 2/δ ∥v∥ where 2/δ > 0 [ for v = 0, |L(v)| = 2/δ ∥v∥ ]
=> L is bounded
Norm of the Continuous Linear Functional
Similar to a vector, we can measure the size of a continuous linear functional by using the “norm”. To define a “norm”, we first need to check that the set of all continuous linear functional can form a vector space with two operations + and · .
Let (V, ∥·∥) be the normed space and V* be the set of all continuous linear functional defined on V. For any continuous linear functionals L,G: V → R and any c in R, define 2 operations + and · as follows:
(L+G)(v) = L(v) + G(v), (c·L)(v) = c·L(v) ∀ v∈V
The above operation definition is valid, L+G and c·L are still continuous linear functional
Proof
∀L, G ∈ V*
LF1:
∀v, w∈V
(L+G)(v+w)=L(v+w)+G(v+w)
=L(v)+L(w)+G(v)+G(w)=L(v)+G(v)+L(w)+G(w)
=(L+G)(v)+(L+G)(w)
LF2:
∀v∈V, c∈ℝ
c·(L+G)(v)=c·(L(v)+G(v))
=c·L(v)+c·G(v)=L(c·v)+G(c·v)
=(L+G)(c·v)
=> L+G ∈ V* [ Since LF1, LF2 are satisified and L(v)+G(v) is also a continuous function ]
∀L∈V*, a∈ℝ
LF1:
∀v, w∈V
(a·L)(v+w)=a·L(v+w)
=a·(L(v)+L(w))=a·L(v)+a·L(w)
=(a·L)(v)+(a·L)(w)
LF2:
∀v∈V, c∈ℝ
c·(a·L)(v)=c·(a·L(v))=(c·a)·L(v)=a·L(c·v)
=(a·L)(c·v)
=> a·L ∈ V* [ Since LF1, LF2 are satisfied and c·L(v) is also a continuous function ]
Clearly, the conditions of vector space are all satisfied since continuous linear functional is just a continuous function. Therefore, the set V* is a Vector Space.
Now, we can define the norm for this vector space V* as follows:
Define ∥·∥: V* → ℝ, for any L ∈ V*
∥L∥ = sup { |L(v)|/∥v∥ | v ∈ V and v ≠ 0 }
∥·∥ is a norm.
Note that ∥·∥ is a norm for V* and ∥·∥ is a norm for V
Proof
∀L ∈ V*
=> L is continuous => L is bounded [ By LF6 ]
=> ∃ K>0 s.t. |L(v)| ≤ K∥v∥ ∀v∈V
=> ∀v∈V and v ≠ 0, ∃ K>0 s.t. |L(v)|/∥v∥ ≤ K
=> ∥L∥ = sup{ |L(v)|/∥v∥ | v∈V and v ≠ 0 } exists and thus well defined [ By Greatest Lower Bound Property ]
N1: ∥L∥ ≥ 0 [ Since |L(v)|/∥v∥ > 0 ∀v∈V and v ≠ 0 ]
N2: ∥L∥ = 0 iff L(v) = 0 ∀v∈V
N3: ∥c·L∥ = sup { |c·L(v)| | v ∈ V and v ≠ 0 }
= sup { |c|·|L(v)|/∥v∥ | v ∈ V and v ≠ 0 }
= |c| sup { |L(v)|/∥v∥ | v ∈ V and v ≠ 0 }
= |c| ∥L∥
N4: ∥L+G∥ = sup { |(L+G)(v)|/∥v∥ | v ∈ V and v ≠ 0 }
= sup { |(L+G)(v)|/∥v∥ | v ∈ V and v ≠ 0 }
= sup { |L(v)+G(v)|/∥v∥ | v ∈ V and v ≠ 0 }
≤ sup { |L(v)|/∥v∥ + |G(v)|/∥v∥ | v ∈ V and v ≠ 0 }
≤ sup { |L(v)|/∥v∥ | v ∈ V and v ≠ 0 } + sup { |G(v)|/∥v∥ | v ∈ V and v ≠ 0 }=∥L∥+∥G∥
=> ∥·∥ is a norm, thus (V*, ∥·∥) is a normed space
Hahn-Banach theorem for the Normed Space
For the linear functional defined on a linear subspace of the normed space, it can be extended to the whole space when it is continuous. No dominating Sublinear Function is required and the size of the linear functional can be maintained in this case. Below is the description of Hahn-Banach Theorem for the normed space:
NS3. Let (V, ∥·∥) be the normed space and N⊆V be the subspace in V. Let l:N→ℝ be a continuous linear functional defined on N. Then, there is a continuous linear functional L:V->ℝ such that l(v) = L(v) ∀ v∈N and ∥l∥ = ∥L∥
Proof
1)Let’s find the sublinear function p and then apply the Hahn-Banach theorem:
l:N→ℝ is a continuous linear functional on N
=> l is bounded [ By LF6 ]
=> l(v) ≤ |l(v)| ≤ ∥l∥∥v∥ ∀v∈N [ Since ∥l∥ = sup { |l(v)|/∥v∥ | v ∈ V and v ≠ 0 } ]
Let p(v) = ∥l∥∥v∥ ∀ v∈V
∀v, w∈V, c∈ℝ and c > 0
SF1:
p(v+w) = ∥l∥∥v+w∥ ≤ ∥l∥(∥v∥+∥w∥) = ∥l∥∥u∥ + ∥l∥∥v∥ = p(u)+p(v)
SF2:
p(cv) = ∥l∥∥cv∥ = |c|∥l∥∥v∥ = cp(v)
=> p is sublinear function defined on V and l(v) ≤ p(v) ∀v∈N
=> There is a linear functional L:V→ℝ such that L(v)=l(v) ∀ v∈N and L(v) ≤ p(v) ∀v∈V [ By HA ]
2)L is bounded:
L(-v)≤ p(-v) => -L(v) ≤ p(v) => L(v) ≥ -p(v)
=> |L(v)| ≤ p(v) = ∥l∥∥v∥ ∀ v∈V
3)∥L∥ = ∥l∥
{ |L(v)|/∥v∥ | v∈N } ⊆ { |L(v)|/∥v∥ | v∈V }
=> { |l(v)|/∥v∥ | v∈N } ⊆ { |L(v)|/∥v∥ | v∈V } [ By L(v)=l(v) ∀ v∈N ]
=> sup{ |l(v)|/∥v∥ | v∈N } ≤ sup{ |L(v)|/∥v∥ | v∈V }
=> ∥l∥ ≤ ∥L∥
∀v∈V, |L(v)|/∥v∥ ≤ ∥l∥
=> sup{ |L(v)|/∥v∥ | v∈V } ≤ ∥l∥ [ By Greatest Lower Bound Property ]
=> ∥L∥ ≤ ∥l∥
=> ∥l∥ = ∥L∥
A Vector in the Normed Space
For any vector in the normed space, we can always find a continuous linear functional which has nice property on this vector.
NS4. Let (V, ∥·∥) be the normed space. For v∈V and v≠0, there exists a continuous linear functional L:V→ℝ such that L(v) = ∥v∥ and ∥L∥ = 1 [Note that v is fixed]
Proof
Let’s find the linear subspace and the corresponding linear functional.
Let N = { λv | λ ∈ ℝ}
∀ λ, μ ∈ ℝ, λv, μv ∈ N
λv+μv = (λ+μ)v ∈ N, λ(μv) = (λμ)v ∈ N
=> N is a linear subspace spanned by v ∈V
Let l: N→ℝ with
l(λv) = λ∥v∥ where λv ∈ N for some λ ∈ ℝ
∀ λ, μ ∈ ℝ, λv, μv ∈ N
LF1:
l(λv+μv)=l((λ+μ)v) = (λ+μ)∥v∥ = λ∥v∥+μ∥v∥ = l(λv)+l(μv)
LF2:
l(λ(μv))=(λμ)∥v∥ = λ(μ∥v∥) = λl(μv)
=> l is a continuous linear functional [ Since λ∥v∥ is a continuous function of λ, please refer to this article for details ]
=> There is a continuous linear functional L:V→ℝ such that
l(w)=L(w) ∀ w∈N and ∥l∥ = ∥L∥ [ By NS3 ]
=> L(v)=l(v)= ∥v∥ [ Consider when λ=1 ]
∥L∥ = ∥l∥ = sup { |l(λv)|/∥λv∥: λv∈N } = sup { |λ∥v∥|/|λ|∥v∥: λv∈N } = sup {1} = 1
NS5. Let (V, ∥·∥) be the normed space. For v1,v2∈V, v1≠v2, there exists a continuous linear functional L:V→ℝ such that L(v1)≠L(v2) [ Note that v1 and v2 are fixed ]
Proof
v1≠v2 => v1-v2 ≠ 0
=> There exists a continuous linear functional L:V→ℝ s.t. L(v1-v2) = ∥v1-v2∥ [ By NS4 ]
=> L(v1)-L(v2)=L(v1-v2)=∥v1-v2∥ ≠ 0
=> L(v1)≠L(v2)
A Closed Linear Subspace in the Normed Space
Suppose there is a linear subspace in the normed space. What will happen when it is closed ? Remember that in part 2, we have proved the following theorem:
NS2. Let V be a normed space and N⊆V be the closed linear subspace in V.
Then (V/N, ∥·∥’) is a normed space.
We will use this feature to prove this theorem.
NS6. Let (V, ∥·∥) be the normed space, N⊆V be the closed linear subspace in V and v∈V\N. Then, 1) there exists a continuous linear functional L:V→ℝ such that 2) L(w) = 0 ∀ w∈N and 3) L(v) ≠ 0 [ Note that v is fixed ]
Proof
1)
N⊆V be the closed linear subspace in V
=> (V/N, ∥·∥’) is a normed space [ By NS2 ]
=> For [v] ∈ V/N, there exists a continuous linear functional L’:V/N→ℝ such that L’([v])=∥[v]∥’ [ By NS4, note that v is fixed ]
Let L:V→ℝ with
L(v) = L’([v]) where v∈V
L is clearly well defined. Next is to prove that it is continuous and linear.
L’ is continuous
=> L’ is bounded [ By LF6 ]
=> ∃K > 0 s.t. ∀[v] ∈ V/N, |L(v)| = |L’([v])| ≤ K ∥[v]∥’ = K inf { ∥v-z∥ | z∈N }
=> |L(v)|/K ≤ inf { ∥v-z∥ | z∈N } ≤ ∥v-0∥ [ By Greatest Lower Bound Property and z=0 ∈ N, noted that 0 is a vector ]
=> |L(v)| ≤ K ∥v∥
=> L is bounded => L is continuous [ By LF6 ]
∀u, w∈V, c∈ℝ
LF1. L(u+w)=L’([u+w])=L’([u]+[w])=L’([u])+L’([w])=L(u)+L(w)
LF2. c·L(u)=c·L’([u])=L’(c*[u])=L’([c·u])=L(c·u)
=> L is a continuous linear functional
2)
∀w ∈ N, L(w)=L’([w])=L’([0])=L(0)=0 [ L(0) = L(0+0) = L(0) + L(0), L(0)=0 ]
3)
v ∈ V\N, 0∈N => [v] ≠ [0] => L’([v]) ≠ L’([0]) [ By NS5, note that v is fixed ]
=> L(v) ≠ L(0)
=> L(v) ≠ 0
Go back to the proof of the Universal Approximation Theorem:
C(Im) is a Normed Space and Cl(S) ⊂ C(Im) is a closed linear subspace
[ By CL2 Cl(S) is closed ]
⇒ For h∈C(Im)\Cl(S), there exists Continuous Linear functional L on C(Im) such that
L(w) = 0 ∀w∈Cl(S)
L(h) ≠ 0 [ By NS6 ]
A Dense set in the Normed Space
Suppose there is a linear subspace in the normed space. What will happen when it is dense ?
NS7. Let (V, ∥·∥) be the normed space, N⊆V be the linear subspace in V.
N is dense in V iff for every continuous linear functional L:V→ℝ,
L(w) = 0 ∀w∈N => L=0
Proof
(only if)
N is dense in V
=> ∀ v∈V, e > 0, o(v, e)∩N ≠ ø [ o(v, e) is a neighbourhood of v and by DS3 ]
=> ∃v0 ∈ o(v,e)∩N s.t. v0 ∈ N and ∥v0-v∥ < e [ By the definition of o(v, e) ]
=> L(v0) = 0 [ Since v0 ∈ N ]
L is continuous
=> ∃ K > 0 s.t. |L(v)| ≤ K ∥v∥ ∀ v∈V [ By LF6 and then LF5 ]
=> |L(v0-v)| = |L(v0)-L(v)| ≤ K ∥v0-v∥ < Ke [ By LF1 and LF2 and v0-v ∈ N ]
=> |0-L(v)| < Ke
=> |L(v)| < Ke for any e > 0
Suppose |L(v)| > 0
=> ∃ n ∈ N s.t. n|L(v)| > Ke [ By Archimedean Property ]
=> |L(v)| > K(e/n) where e/n > 0 => contradict |L(v)| < Ke for any e > 0
=> |L(v)| = 0 => L=0
(if)
Suppose Cl(N) ⊂ V
Cl(N) is closed [ By CL2 ]
=> For any v ∈ V\N, there exists a continuous linear functional L:V→ℝ s.t. L(w) = 0 ∀ w∈N and L(v) ≠ 0 [ By NS6 ]
=> L(v) ≠ 0 contradict L(v)= 0
Conclusion
In this post, the Hahn-Banach theorem for normed space is proved and its application on the C(lm) having the closed linear subspace Cl(S) has been discussed. We find that under this condition, for h∈C(Im)\Cl(S), there is a continuous linear functional L on C(Im) such that
L(w) = 0 ∀w∈Cl(S)
L(h) ≠ 0
In part 4, we will continue our journey on the proof of the Universal Approximation Theorem and the application of the RMK Representation Theorem to the proof will be discussed.
Your feedback is highly appreciated and will help me to continue my articles.
Please give this post a clap if you like this post. Thanks!!
Reference:
“https://en.wikipedia.org/wiki/Vector_space”
“https://en.wikipedia.org/wiki/Linear_subspace”
“https://en.wikipedia.org/wiki/Functional_analysis”
“https://en.wikipedia.org/wiki/Hahn_Banach_theorem”
“https://www.proofwiki.org/wiki/Definition:Linear_Functional”
“https://en.wikipedia.org/wiki/Infimum_and_supremum”