The magic of Artificial Neural Network — Part 4
In part 2, I have sketched the proof of the Universal Approximation Theorem:
Let S be the set of all artificial neural network functions.
S = {f ∈ C(Im)|f(x)=∑υ[i]σ(w[i]Tx+b[i]) where v[i],b[i]∈ℝ,w[i]∈ℝm,1≤i≤M }
Suppose Cl(S) is not dense in C(Im)
⇒ Cl(S) ≠ C(Im)
⇒ Cl(S) ⊂ C(Im) ⇒ Cl(S) is a closed linear subspace of C(Im)
⇒ For h∈C(Im)\Cl(S), there exists Continuous Linear functional L on C(Im) such that
L(w) = 0 ∀w∈Cl(S)
L(h) ≠ 0 [ By Hahn-Banach Theorem ]
⇒ L(h) = ∫ h(x)dμ(x) [ By RMK Representation Theorem ]
⇒ L(σ(wTx+b))=∫ σ(wTx+b)dμ(x)=0 ∀w∈Rn and b∈R [ Since σ(wTx+b) ∈ S ]
⇒ μ=0 [ Since σ is discriminatory ]
⇒ L(h) = ∫ h(x)dμ(x) = 0
⇒ contradict L(h) ≠ 0
In this article, we are going to prove that:
L(h) = ∫ h(x)dμ(x)
L(σ(wTx+b))=∫ σ(wTx+b)dμ(x)=0 ∀w∈Rn and b∈R
Lebesgue Integral and Riemann Integral
By using the Riesz Markov Kakutani (RMK) representation theorem, this Continuous Linear Functional L on C(Im) can be represented as the Lebesgue Integral:
L(f) = ∫_Im f(x)dμ(x) ∀f ∈ C(Im)
Note that I use ∫_Im to denote the Lebesgue Integration over Im which is the domain of f(x). So, what is Lebesgue Integral? The Lebesgue Integral is a more generalized version of the Riemann Integral:
R(f) = ∫ f(x)dx where x∈[0, 1]
For the Riemann integral, the domain is partitioned into intervals. By constructing the bars with heights that meet the graph, the integral can be approximated by the sum of the area of these bars.
For the Lebesgue integral, the range is partitioned into intervals by choosing a finite number of target values. By constructing the bars with heights that equal to these values but below the function, the integral can be approximated by the sum of the area of these bars. Note that there may be several bars with heights that equal to these values but with different range of the function.
The Lebesgue Integral is more general than the Riemann Integral since the domain of the function can be more general space, not only Euclidean space. The Lebesgue Integral is also more powerful and let’s consider the function q: [0,1] → {0, 1} below which has no Riemann Integral:
q(x) = 1 if x is a rational number
q(x) = 0 otherwise
upper Riemann sum = inf(Σ sup(q(xi*)) ∆x) = 1*(1–0) = 1
lower Riemann sum = sup(Σ inf(q(xi*)) ∆x) = 0*(1–0) = 0
⇒ upper Riemann sum ≠ lower Riemann sum ⇒ R(q) is not Riemann Integrable
Note that since rational number is dense in R, there is always a rational number between any real numbers, so sup(q(xi*)) = 1. Irrational number is also dense in R, so inf(q(xi*)) = 0.
In the Riemann Integral, “dx” is an infinitesimal change in x whereas in the Lebesgue Integral, μ is a “measure” of a “measurable set” and f(x) is a “measurable function”. Then, what is the meaning of “measure”, “measurable set” and “measurable function” ? For details, please refer to this article.
In part 3, I have shown that when Cl(S) is a closed linear subspace in C(Im) and for every continuous function h∈C(Im)\Cl(S), there exists a corresponding Continuous Linear Functional L on C(Im) with L(h) ≠ 0 and L(w) = 0 ∀w∈Cl(S). With the help of RMK representation theorem, the Lebesgue Integral of any neural network function in S is zero.
RMK representation theorem (Signed version)
Let X be a locally compact Hausdorff space
Let C(X) be the space of continuous function on X
Let Cc(X) be the space of continuous function on X with compact support
Let L be a continuous linear functional on Cc(X)
Then there exists a unique signed regular Borel measure μ on X such that
L(f) = ∫_X f(x) dμ(x) ∀f ∈ Cc(X)
It can be proved that
1) Im is a locally compact Hausdorff space and
2) Cc(Im) = C(Im)
So, we can now apply the RMK representation theorem and get the following result:
L(f) = ∫_Im f(x)dμ(x) ∀f ∈ C(Im)
∀w∈Rn and b∈R,
σ(wTx+b) ∈ S
⇒ σ(wTx+b) ∈ Cl(S) [ By CL4 ]
⇒ L(σ(wTx+b)) = 0
⇒ L(σ(wTx+b))=∫_Im σ(wTx+b)dμ(x)=0 ∀w∈Rn and b∈R [ Since S ⊂ Im ]
Sigmoidal function σ is discriminatory
Our next step is to show that any continuous sigmoidal function σ is discriminatory.
Let M(Im) denote the space of finite, signed regular Borel measure on Im and μ is a member of M(Im). The regular Borel measure is one kind of the measure but the detail is not important here, so I will not describe here.
σ is sigmoidal if:
σ(t) → 1 as t → +∞
σ(t) → 0 as t → -∞
Sigmoidal function example: f(t)=1/(1+e^-t)
σ is discriminatory if for a measure μ ∈ M(Im), ∀y∈ℝ^n and θ∈ℝ,
∫_Im σ(y^T x + θ)dμ(x) = 0 ⇒ μ = 0
Any bounded, measurable sigmoidal function σ is discriminatory.
Proof
Define σλ(x) = σ(λ(y^T x+θ)+φ) where y∈ℝ^n and φ,θ∈ℝ, x∈Im:
σλ(x) → 1 for y^T x + θ > 0 as λ → +∞
σλ(x) → 0 for y^T x + θ < 0 as λ → +∞
σλ(x) = σ(φ) for y^T x + θ = 0 ∀ λ
Define γ(x):
γ(x) = 1 for y^T x + θ > 0
γ(x) = 0 for y^T x + θ < 0
γ(x) = σ(φ) for y^T x + θ = 0
Note that (σλ(x)) is a sequence of measurable function and γ(x) is the limit of this sequence: lim_λ→∞ σλ(x)=γ(x) ∀x∈Im
Also, |σλ(x)| = |σ(x)| ≤ 1 [ Since σ is bounded ]
⇒ lim_λ→∞ ∫_Im σλ(x)dμ(x) = ∫_Im γ(x)dμ(x) [ By Lebesgue Bounded Convergence Theorem ]
Let Hy,θ={x|y^T x + θ > 0}, Πy,θ={x|y^T x + θ = 0} and Gy,θ={x|y^T x + θ < 0}. That is, Hy,θ is the half space above the plane Πy,θ and Gy,θ is the half space below the plane Πy,θ. Thus, Im = Hy,θ ∪ Πy,θ ∪ Gy,θ and Hy,θ∩Πy,θ=∅, Gy,θ∩Πy,θ=∅ and Hy,θ∩Gy,θ=∅. Thus,
γ(x) = 1 ∀x ∈ Hy,θ
γ(x) = 0 ∀x ∈ Gy,θ
γ(x) = σ(φ) ∀x ∈ Πy,θ
If for a measure μ ∈ M(Im), ∫_Im σ(y’^T x + θ’)dμ(x) = 0 ∀y’∈ℝ^n and θ’∈ℝ
∫_Im σλ(x)dμ(x)
= ∫_Im σ(λ(y^T x+θ)+φ) dμ(x)
= ∫_Im σ((λy)^T x+(λθ+φ)) dμ(x) [ Put y’=λy, θ’=λθ+φ ]
= 0
⇒ lim_λ→∞ ∫_Im σλ(x)dμ(x) = 0
⇒ ∫_Im γ(x)dμ(x) = 0 [ Since lim_λ→∞ ∫_Im σλ(x)dμ(x) = ∫_Im γ(x)dμ(x) ]
⇒ ∫_Im γ(x)dμ(x) = μ(Hy,θ) + σ(φ)μ(Πy,θ) = 0 [ By LB1, LB2 and definition of γ ]
⇒ μ(Hy,θ) + σ(φ)μ(Πy,θ) = 0 ∀y∈ℝ^n and φ,θ∈ℝ
⇒ μ(Hy,θ) = 0 [ Consider σ(φ) → 0 as φ → -∞ ]
⇒ μ(Πy,θ) = 0 [ Consider σ(φ) → 1 as φ → +∞ ]
Conclusion
At this point, we have proved that μ(Hy,θ) = μ(Πy,θ) = 0. That is, μ = 0 for any plane and its upper half space. However, “μ = 0 for any set in Im” is not proven yet and we continue our proof in the next article.
Your feedback is highly appreciated and will help me to continue my articles.