# Detail proof of Universal Approximation Theorem — Part 1

--

This is an article for those who are interested in Artificial Neural Network and want to know why it works. In 1989, George Cybenko has published a paper to explain why it works and proved the following theorem, Universal Approximation Theorem:

This theorem states that for any given continuous function over an interval of [0, 1], it is guaranteed that there exists a neural network that can approximate it within the given accuracy. This theorem does not tell you how to find the neural network, but it tells you that you can find it anyway. It is very cool, isn’t it?

I will describe the proof from George Cybenko using abstract mathematics (Topology and Functional Analysis) instead of a visual proof. After reading this article, you will get a feeling of how the abstract mathematics is used in solving real world problem.

*C*(*I**m*) is a Vector Space

*C*

*I*

First, I want to talk about the properties of ** C**(

*I**m*) that will be used in the proof.

**(**

*C*

*I**m*) denotes the space of real-valued continuous function on

*I**m*. Actually, it is a Vector Space (also called Function Space) over field R in which every function f is represented as a point. The operations are defined point-wise, that is, for any f,g :

*I**m*→ R, and any

*c*in R :

(f+g)(x) = f(x) + g(x), (c·f)(x) = c·f(x)

The above operation definition is **valid**, f+g and c·f are still continuous function.

**Proof- **When f and g are continuous at a in *I**m*, we have:

∀ ε > 0, ∃ δ1 > 0 s.t. ∀ x [|x-a|<δ1 ⇒ |f(x)-f(a)| < ε/2] and

∃ δ2 > 0 s.t. ∀ x [|x-a|<δ2 ⇒ |g(x)-g(a)| < ε/2]

⇒

∃ δ = min(δ1, δ2) > 0 s.t. ∀ x[|x-a|<δ ⇒

|(f+g)(x)-(f+g)(a)|= |f(x)-f(a)+g(x)-g(a)|

< |f(x)-f(a)| + |g(x)-g(a)|

< ε/2 + ε/2 = ε]

⇒

f+g is continuous at a

∀ ε > 0, ∃ δ > 0 s.t. ∀ x [|x-a|<δ ⇒ |f(x)-f(a)| < ε/|c|]

⇒

∃ δ > 0 s.t. ∀ x[|x-a|<δ ⇒

|(c·f)(x)-(c·f)(a)|= |c·f(x)-c·f(a)|= |c|·|f(x)-f(a)|

< |c|·ε/|c| = ε]

⇒

c·f is continuous at a

Now, to prove that ** C**(

*I**m*) is a Vector Space, we have to prove that the following properties are satisfied.

**Proof- **Let f, g and h be arbitrary functions in ** C**(

*I**m*), a and b scalars in R

*associativity:*f + (g + h) = (f + g) + h

*commutativity*: f + g = g + f

*identity element*: 0(x) = 0, (f + 0)(x) = f(x) + 0(x) = f(x) + 0 = f(x)

*inverse elements*: (−f)(x) = −f(x) such that f+ (−f) = f−f = 0

*compatibility*: a(b·f) = (ab)f , 1·f = f

*distributivity*: a·(f + g) = a·f + a·g, (

*a*+

*b*)f =

*a*·f +

*b*·f

# C(Im) is a Normed Space

** C**(

*I**m*) is a Vector Space. Furthermore, it is a Normed Space. A Normed Space is an ordered pair (

*V,*∥·∥) where

*V*is a vector space over the real or complex numbers, on which a norm ∥·∥ is defined. A norm is the intuitive notion of “length” in the real world. A norm is a real-valued function defined on the vector space ∥·∥:

*V*→ ℝ, for any x∈

*V*and has the following properties:

N1: ∥x∥ ≥ 0

N2: ∥x∥ = 0 ⇔ x = 0

N3: ∥αx∥ = |α|∥x∥

N4: ∥x+y∥ ≤ ∥x∥ + ∥y∥

Define ∥·∥: C(Im) → ℝ, for any f ∈ C(Im)

∥f∥ = sup{ |f(x)| : x ∈ Im}

Let’s prove that ∥·∥ is a norm.

**Proof**- Let f, g ∈ C(Im)

N1: ∥f∥ = sup{ |f(x)| : x ∈ Im} ≥ 0

N2: ∥f∥ = 0 ⇔ sup{ |f(x)| : x ∈ Im}= 0 ⇔ f(x) = 0 ∀x ∈ Im ⇔ f = 0

N3: ∥αf∥ = sup{ |αf(x)| : x ∈ Im} = |α| sup{ |f(x)| : x ∈ Im} = |α|∥f∥

N4:∥f∥ + ∥g∥ = sup{|f(x)| : x ∈ Im}+ sup{|g(x)| : x ∈ Im}≥ sup{|f(x)|+|g(x)| : x ∈ Im} ≥ sup{|f(x)+g(x)| : x ∈ Im} = ∥f+g∥

⇒

∥·∥ is a norm

# C(Im) is a Metric Space and Topological Space

When a vector space is a Normed Space, it will be a Metric Space and Topological Space.

A Metric Space is an ordered pair (*M, d*) where *M *is a set on which a metric *d* is defined. A metric is the intuitive notion of “distance” in the real world. A metric is a non-negative function defined on the set *d*: *M* x *M *→ [0, ∞), for any f, g ∈ *M *and has the following properties:

M1: *d*(f, g) = 0 ⇔ f = g

M2: *d*(g, f) = *d*(f, g)

M3: *d*(f, g)+*d*(g, h)≥*d*(f, h)

Define *d*(f, g): *M* x *M *→ [0, ∞), for any f, g ∈ *M *and *M* is a Normed Space*d*(f, g) = ∥f - g∥

Let’s prove that *d*(·,·) is a metric.

**Proof**- Let f, g, h ∈ *M*

M1: *d*(f, g) = 0 ⇔ ∥f-g∥ = 0 ⇔ f-g = 0 ⇔ f = g

M2: *d*(g, f) = ∥g-f∥= ∥(-1)(f-g)∥ = |-1|∥f-g∥ = *d*(f, g)

M3: *d*(f, g)+*d*(g, h) = ∥f-g∥ + ∥g-h∥ ≥ ∥f-h∥ = *d*(f, h)

⇒*d*(·,·) is a metric

Since ** C**(

*I**m*) is a Normed Space, it is also a Metric Space.

A Topological Space is an ordered pair (X, τ), where X is a set and τ is a collection of subsets ui of X, called **Open Set**, satisfying the following axioms:

T1: ø, X ∈ τ

T2: any (finite or infinite) union of sets in τ is itself in τ, i.e. ∪ui ∈ τ ∀ ui ∈ τ

T3: any *finite* intersection of sets in τ is itself in τ, i.e. ∩ui ∈ τ ∀ ui ∈ τ

To understand why a Metric Space (*M, d*)* *is also a Topological Space, we need a tool called **Open Ball**:

Define **Open Ball** B(x, r), for any x ∈ *M*, r > 0

B(x, r) = {p|p ∈ M, d(x, p) < r}

Define **Open Set** o and collection τ

τ ={o|o ⊆ M, ∀x∈o ∃r>0 (B(x,r) ⊆ o)}

Let’s prove that (*M, d*) is a Topological Space

**Proof**-

T1:

ø contains no point ⇒ ø ∈ τ

∀x∈M ∃r>0 B(x,r)⊆M⇒ M ∈ τ

T2:

o[i] ∈ τ

⇒ ∪o[i] ⊆ M and ∪o[i] = {x|∃o∈τ ∃r>0 (x∈o and B(x,r) ⊆o ⊆ ∪o[i])}

⇒ ∪o[i] ⊆ M and ∪o[i] = {x|∃r>0 (B(x,r) ⊆ ∪o[i])}

⇒ ∪o[i] ∈ τ

T3:

o1, o2 ∈ τ

⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃r1>0 ∃r2>0 (x∈o1 and B(x,r1) ⊆o1 and x∈o2 and B(x,r2) ⊆o2)}

⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃min(r1,r2)>0 (x∈o1 ∩ 02 and B(x,min(r1,r2)) ⊆ B(x,r1) ⊆ o1 and B(x,min(r1,r2)) ⊆ B(x,r2) ⊆ o2)}

⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃min(ru)>0 (B(x,min(r1,r2)) ⊆ o1 ∩ o2)}

⇒ o1 ∩ o2 ∈ τ

∴(M, d) is a Topological Space

# Functions of the form *F*(*x*) is dense in *C*(*I**m*) ?

*C*

*I*

Return to the Universal Approximation Theorem, it says that “functions of the form F(x) is **Dense** in ** C**(

*I**m*)”. Let S be the set of functions of the form

*F*(

*x*). Clearly, set S is a subset of

**(**

*C*

*I**m*).

**Dense**is a topological feature. It means that for any point x in

**(**

*C*

*I**m*)

*,*then every

**Neighbourhood**N of

*x*contains a point from S

*.*A

**Neighbourhood**N of x is a subset of

**(**

*C*

*I**m*) that includes an

**Open Set**u containing x.

x ∈ u ⊆ N

In Metric Space, it can be proved that any **Open Ball** can be an **Open Set** and **neighbourhood** at the same time [See **Appendix**]:

x∈B(x, r)=u=N

We have proved that ** C**(

*I**m*) is a Topological Space, so we can now use the Topology Theory to help us to prove the theorem. Now, we can express the Universal Approximation Theorem in the language of Topology Theory:

S is **Dense** in ** C**(

*I**m*)

⇔ For any f ∈

**(**

*C*

*I**m*), N ∩ S ≠ ∅ ∀ N where N is a

**neighbourhood**of f

⇒ For any f ∈

**(**

*C*

*I**m*), B(f, ε) ∩ S ≠ ∅ ∀ε > 0

⇒ For any f ∈

**(**

*C*

*I**m*), ∀ε > 0, ∃ F ∈ S s.t. d(f, F) < ε

⇒ For any f ∈

**(**

*C*

*I**m*), ∀ε > 0, ∃ F ∈ S s.t. ∥f-F∥ < ε

⇒ For any f ∈

**(**

*C*

*I**m*), ∀ε > 0, ∃ F ∈ S s.t. sup{|F(x)-f(x)||∀ x ∈

*I**m*} < ε

⇒ For any function f ∈

**(**

*C*

*I**m*), ∀ε > 0, ∃ F ∈ S s.t. |F(x)-f(x)| < ε ∀ x ∈

*I**m*

So, once we have proved that S is **dense** in ** C**(

*I**m*), the Universal Approximation Theorem is finally proved. To prove this, we use the following Topology theorem and then prove by contradiction:

S is **Dense** in C(Im) ⇔ **Closure**(S) = C(Im)

Suppose **Closure**(S) ≠ ** C**(

*I**m*) ⇒

**Closure**(S) ⊂

**(**

*C*

*I**m*) ⇒ contradiction

I will explain this in details in Part 2.

# Your feedback is highly appreciated and will help me to continue my articles.

# Please give this post a clap if you like this post. Thanks!!

**Appendix**

**Open Ball** B(x, r) is an **Open Set** and **neighbourhood** for any point x in Metric Space.**Proof-**

∀ p ∈ B(x, r)

⇒ *d*(p, x)= *l* < r

Consider **Open Ball** B(p, t) with t = r-*l* > 0

∀ y ∈ B(p, t)

⇒ *d*(y, p) < t

⇒ *d*(y, x) < *d*(y, p) + *d*(p, x) < t + *l* = r

⇒ y ∈ B(x, r)

⇒ ∃t >0 s.t. B(p, t) ⊂ B(x, r)

⇒ B(x, r) ∈ τ

Clearly,** **B(x, r)⊆B(x, r), so **Open Ball** B(x, r) contains an **Open Set **and** **is thus a **neighbourhood** of point x.

Reference:

Cybenko, G. (1989) “Approximations by superpositions of sigmoidal functions”, Mathematics of Control, Signals, and Systems, 2(4), 303–314

“https://en.wikipedia.org/wiki/Universal_approximation_theorem”

“https://en.wikipedia.org/wiki/Vector_space”

“https://en.wikipedia.org/wiki/Normed_vector_space”

“https://en.wikipedia.org/wiki/Metric_space”

“https://en.wikipedia.org/wiki/Topological_space”

“https://en.wikipedia.org/wiki/Neighbourhood”

“https://en.wikipedia.org/wiki/Ball_(mathematics)”

“https://en.wikipedia.org/wiki/Open_set”