The magic of Artificial Neural Network— Part 1

8 min readAug 11, 2020

This is an article for those who are interested in Artificial Neural Network and want to know why it works. In 1989, George Cybenko has published a paper to explain why it works and proved the following theorem, Universal Approximation Theorem:

Universal approximation theorem from wikipedia

This theorem states that for any given continuous function over an interval of [0, 1], it is guaranteed that there exists a neural network that can approximate it within the given accuracy. This theorem does not tell you how to find the neural network, but it tells you that you can find it anyway. It is very cool, isn’t it?

I will describe the proof from George Cybenko using abstract mathematics (Topology and Functional Analysis) instead of a visual proof. After reading this article, you will get a feeling of how the abstract mathematics is used in solving real world problem.

C(Im) is a Vector Space

First, I want to talk about the properties of C(Im) that will be used in the proof. C(Im) denotes the space of real-valued continuous function on Im. Actually, it is a Vector Space (also called Function Space) over field R in which every function f is represented as a point. The operations are defined point-wise, that is, for any f,g : Im → R, and any c in R :

(f+g)(x) = f(x) + g(x), (c·f)(x) = c·f(x)

The above operation definition is valid, f+g and c·f are still continuous function.

Proof- When f and g are continuous at a in Im, we have:

∀ ε > 0, ∃ δ1 > 0 s.t. ∀ x [|x-a|<δ1 ⇒ |f(x)-f(a)| < ε/2] and
∃ δ2 > 0 s.t. ∀ x [|x-a|<δ2 ⇒ |g(x)-g(a)| < ε/2]
⇒
∃ δ = min(δ1, δ2) > 0 s.t. ∀ x[|x-a|<δ ⇒
|(f+g)(x)-(f+g)(a)|= |f(x)-f(a)+g(x)-g(a)|
< |f(x)-f(a)| + |g(x)-g(a)|
< ε/2 + ε/2 = ε]
⇒
f+g is continuous at a

∀ ε > 0, ∃ δ > 0 s.t. ∀ x [|x-a|<δ ⇒ |f(x)-f(a)| < ε/|c|]
⇒
∃ δ > 0 s.t. ∀ x[|x-a|<δ ⇒
|(c·f)(x)-(c·f)(a)|= |c·f(x)-c·f(a)|= |c|·|f(x)-f(a)|
< |c|·ε/|c| = ε]
⇒
c·f is continuous at a

Now, to prove that C(Im) is a Vector Space, we have to prove that the following properties are satisfied.

Proof- Let f, g and h be arbitrary functions in C(Im), a and b scalars in R associativity: f + (g + h) = (f + g) + h
commutativity: f + g = g + f
identity element: 0(x) = 0, (f + 0)(x) = f(x) + 0(x) = f(x) + 0 = f(x)
inverse elements: (−f)(x) = −f(x) such that f+ (−f) = f−f = 0
compatibility: a(b·f) = (ab)f , 1·f = f
distributivity: a·(f + g) = a·f + a·g, (a + b)f = a·f + b·f

C(Im) is a Normed Space

C(Im) is a Vector Space. Furthermore, it is a Normed Space. A Normed Space is an ordered pair (V, ∥·∥) where V is a vector space over the real or complex numbers, on which a norm ∥·∥ is defined. A norm is the intuitive notion of “length” in the real world. A norm is a real-valued function defined on the vector space ∥·∥: V → ℝ, for any x∈V and has the following properties:

N1: ∥x∥ ≥ 0
N2: ∥x∥ = 0 ⇔ x = 0
N3: ∥αx∥ = |α|∥x∥
N4: ∥x+y∥ ≤ ∥x∥ + ∥y∥

Define ∥·∥: C(Im) → ℝ, for any f ∈ C(Im)
∥f∥ = sup{ |f(x)| : x ∈ Im}
Let’s prove that ∥·∥ is a norm.

Proof- Let f, g ∈ C(Im)
N1: ∥f∥ = sup{ |f(x)| : x ∈ Im} ≥ 0
N2: ∥f∥ = 0 ⇔ sup{ |f(x)| : x ∈ Im}= 0 ⇔ f(x) = 0 ∀x ∈ Im ⇔ f = 0
N3: ∥αf∥ = sup{ |αf(x)| : x ∈ Im} = |α| sup{ |f(x)| : x ∈ Im} = |α|∥f∥
N4:∥f∥ + ∥g∥ = sup{|f(x)| : x ∈ Im}+ sup{|g(x)| : x ∈ Im}≥ sup{|f(x)|+|g(x)| : x ∈ Im} ≥ sup{|f(x)+g(x)| : x ∈ Im} = ∥f+g∥
⇒
∥·∥ is a norm

C(Im) is a Metric Space and Topological Space

When a vector space is a Normed Space, it will be a Metric Space and Topological Space.

Normed Vector Space, Metric Space and Topological Space (from Wikipedia)

A Metric Space is an ordered pair (M, d) where M is a set on which a metric d is defined. A metric is the intuitive notion of “distance” in the real world. A metric is a non-negative function defined on the set d: M x M → [0, ∞), for any f, g ∈ M and has the following properties:

M1: d(f, g) = 0 ⇔ f = g
M2: d(g, f) = d(f, g)
M3: d(f, g)+d(g, h)≥d(f, h)

Define d(f, g): M x M → [0, ∞), for any f, g ∈ M and M is a Normed Space
d(f, g) = ∥f - g∥
Let’s prove that d(·,·) is a metric.

Proof- Let f, g, h ∈ M
M1: d(f, g) = 0 ⇔ ||f-g|| = 0 ⇔ f-g = 0 ⇔ f = g
M2: d(g, f) = ∥g-f∥= ∥(-1)(f-g)∥ = |-1|∥f-g∥ = d(f, g)
M3: d(f, g)+d(g, h) = ∥f-g∥ + ∥g-h∥ ≥ ∥f-h∥ = d(f, h)
⇒
d(·,·) is a metric

Since C(Im) is a Normed Space, it is also a Metric Space.

A Topological Space is an ordered pair (X, τ), where X is a set and τ is a collection of subsets ui of X, called Open Set, satisfying the following axioms:
T1: ø, X ∈ τ
T2: any (finite or infinite) union of sets in τ is itself in τ, i.e. ∪ui ∈ τ ∀ ui ∈ τ
T3: any finite intersection of sets in τ is itself in τ, i.e. ∩ui ∈ τ ∀ ui ∈ τ

To understand why a Metric Space (M, d) is also a Topological Space, we need a tool called Open Ball:

Define Open Ball B(x, r), for any x ∈ M, r > 0
B(x, r) = {p|p ∈ M, d(x, p) < r}
Define Open Set o and collection τ
τ ={o|o ⊆ M, ∀x∈o ∃r>0 (B(x,r) ⊆ o)}

Let’s prove that (M, d) is a Topological Space

Proof-
T1:
ø contains no point ⇒ ø ∈ τ
∀x∈M ∃r>0 B(x,r)⊆M⇒ M ∈ τ

T2:
o[i] ∈ τ
⇒ ∪o[i] ⊆ M and ∪o[i] = {x|∃o∈τ ∃r>0 (x∈o and B(x,r) ⊆o ⊆ ∪o[i])}
⇒ ∪o[i] ⊆ M and ∪o[i] = {x|∃r>0 (B(x,r) ⊆ ∪o[i])}
⇒ ∪o[i] ∈ τ

T3:
o1, o2 ∈ τ
⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃r1>0 ∃r2>0 (x∈o1 and B(x,r1) ⊆o1 and x∈o2 and B(x,r2) ⊆o2)}
⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃min(r1,r2)>0 (x∈o1 ∩ 02 and B(x,min(r1,r2)) ⊆ B(x,r1) ⊆ o1 and B(x,min(r1,r2)) ⊆ B(x,r2) ⊆ o2)}
⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃min(ru)>0 (B(x,min(r1,r2)) ⊆ o1 ∩ o2)}
⇒ o1 ∩ o2 ∈ τ

∴(M, d) is a Topological Space

Functions of the form F(x) is dense in C(Im) ?

Return to the Universal Approximation Theorem, it says that “functions of the form F(x) is Dense in C(Im)”. Let S be the set of functions of the form F(x). Clearly, set S is a subset of C(Im). Dense is a topological feature. It means that for any point x in C(Im), then every Neighbourhood N of x contains a point from S. A Neighbourhood N of x is a subset of C(Im) that includes an Open Set u containing x.

x ∈ u ⊆ N

Dense set S, Neighbourhood N and Open Set u containing point x

In Metric Space, it can be proved that any Open Ball can be an Open Set and neighbourhood at the same time [See Appendix]:

x∈B(x, r)=u=N

We have proved that C(Im) is a Topological Space, so we can now use the Topology Theory to help us to prove the theorem. Now, we can express the Universal Approximation Theorem in the language of Topology Theory:

S is Dense in C(Im)
⇔ For any f ∈ C(Im), N ∩ S ≠ ∅ ∀ N where N is a neighbourhood of f
⇒ For any f ∈ C(Im), B(f, ε) ∩ S ≠ ∅ ∀ε > 0
⇒ For any f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. d(f, F) < ε
⇒ For any f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. ∥f-F∥ < ε
⇒ For any f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. sup{|F(x)-f(x)||∀ x ∈ Im} < ε
⇒ For any function f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. |F(x)-f(x)| < ε ∀ x ∈ Im

So, once we have proved that S is dense in C(Im), the Universal Approximation Theorem is finally proved. To prove this, we use the following Topology theorem and then prove by contradiction:

S is Dense in C(Im) ⇔ Closure(S) = C(Im)
Suppose Closure(S) ≠ C(Im) ⇒ Closure(S) ⊂ C(Im) ⇒ contradiction

I will explain this in details in Part 2.

Conclusion

In this article, the Universal Approximation Theorem is introduced. It is a theorem that can explain why the Artificial Neural Network works. The set of all real-valued continuous function on Im, C(Im), is a normed vector space which is also a topological space. We find that the Universal Approximation Theorem will be proven when we can show that the Closure of S is dense in C(Im). The detail of the proof will be explained in Part 2.

If you want to know more about topology, please visit my series of articles on this topic:

General Topology — Part 1 (Topological Space)

What is General Topology ?

simonkwan-35335.medium.com

General Topology — Part 2

ε-δ Definition of Continuous Function

simonkwan-35335.medium.com

General Topology — Part 3

In this article, I will discuss some essential topological concepts that are closely related: Neighbourhood, Limit…

simonkwan-35335.medium.com

General Topology — Part 4

What is Urysohn’s Lemma ?

simonkwan-35335.medium.com

General Topology — Part 5 (Compact Set)

Extreme Value Theorem

simonkwan-35335.medium.com

General Topology — Part 6 (Hausdorff Space)

Hausdorff Space and Compact Set

simonkwan-35335.medium.com

General Topology — Part 7 (Locally Compact Hausdroff Space)

In the Locally Compact Hausdroff Space, when a compact set is contained by an open set, a “bump” function over this…

simonkwan-35335.medium.com

General Topology — Part 8 (Cheat Sheet)

Definition

simonkwan-35335.medium.com

Your feedback is highly appreciated and will help me to continue my articles.

Please give this post a clap if you like this post. Thanks!!

Appendix

Open Ball B(x, r) is an Open Set and neighbourhood for any point x in Metric Space.
Proof-
∀ p ∈ B(x, r)
⇒ d(p, x)= l < r
Consider Open Ball B(p, t) with t = r-l > 0
∀ y ∈ B(p, t)
⇒ d(y, p) < t
⇒ d(y, x) < d(y, p) + d(p, x) < t + l = r
⇒ y ∈ B(x, r)
⇒ ∃t >0 s.t. B(p, t) ⊂ B(x, r)
⇒ B(x, r) ∈ τ