Detail proof of Universal Approximation Theorem — Part 1

Simon Kwan
8 min readAug 11, 2020


Photo by Mila Buenavida on Unsplash

This is an article for those who are interested in Artificial Neural Network and want to know why it works. In 1989, George Cybenko has published a paper to explain why it works and proved the following theorem, Universal Approximation Theorem:

Universal approximation theorem from wikipedia

This theorem states that for any given continuous function over an interval of [0, 1], it is guaranteed that there exists a neural network that can approximate it within the given accuracy. This theorem does not tell you how to find the neural network, but it tells you that you can find it anyway. It is very cool, isn’t it?

I will describe the proof from George Cybenko using abstract mathematics (Topology and Functional Analysis) instead of a visual proof. After reading this article, you will get a feeling of how the abstract mathematics is used in solving real world problem.

C(Im) is a Vector Space

First, I want to talk about the properties of C(Im) that will be used in the proof. C(Im) denotes the space of real-valued continuous function on Im. Actually, it is a Vector Space (also called Function Space) over field R in which every function f is represented as a point. The operations are defined point-wise, that is, for any f,g : Im → R, and any c in R :

(f+g)(x) = f(x) + g(x), (c·f)(x) = c·f(x)

The above operation definition is valid, f+g and c·f are still continuous function.

Proof- When f and g are continuous at a in Im, we have:

∀ ε > 0, ∃ δ1 > 0 s.t. ∀ x [|x-a|<δ1 ⇒ |f(x)-f(a)| < ε/2] and
∃ δ2 > 0 s.t. ∀ x [|x-a|<δ2 ⇒ |g(x)-g(a)| < ε/2]

∃ δ = min(δ1, δ2) > 0 s.t. ∀ x[|x-a|<δ ⇒
|(f+g)(x)-(f+g)(a)|= |f(x)-f(a)+g(x)-g(a)|
< |f(x)-f(a)| + |g(x)-g(a)|
< ε/2 + ε/2 = ε]

f+g is continuous at a

∀ ε > 0, ∃ δ > 0 s.t. ∀ x [|x-a|<δ ⇒ |f(x)-f(a)| < ε/|c|]

∃ δ > 0 s.t. ∀ x[|x-a|<δ ⇒
|(c·f)(x)-(c·f)(a)|= |c·f(x)-c·f(a)|= |c|·|f(x)-f(a)|
< |c|·ε/|c| = ε]

c·f is continuous at a

Now, to prove that C(Im) is a Vector Space, we have to prove that the following properties are satisfied.

Proof- Let f, g and h be arbitrary functions in C(Im), a and b scalars in R associativity: f + (g + h) = (f + g) + h
commutativity: f + g = g + f
identity element: 0(x) = 0, (f + 0)(x) = f(x) + 0(x) = f(x) + 0 = f(x)
inverse elements: (−f)(x) = −f(x) such that f+ (−f) = f−f = 0
compatibility: a(b·f) = (ab)f , 1·f = f
distributivity: a·(f + g) = a·f + a·g, (a + b)f = a·f + b·f

C(Im) is a Normed Space

C(Im) is a Vector Space. Furthermore, it is a Normed Space. A Normed Space is an ordered pair (V, ∥·∥) where V is a vector space over the real or complex numbers, on which a norm ∥·∥ is defined. A norm is the intuitive notion of “length” in the real world. A norm is a real-valued function defined on the vector space ∥·∥: V → ℝ, for any x∈V and has the following properties:

N1: ∥x∥ ≥ 0
N2: ∥x∥ = 0 ⇔ x = 0
N3: ∥αx∥ = |α|∥x∥
N4: ∥x+y∥ ≤ ∥x∥ + ∥y∥

Define ∥·∥: C(Im) → ℝ, for any f ∈ C(Im)
∥f∥ = sup{ |f(x)| : x ∈ Im}
Let’s prove that ∥·∥ is a norm.

Proof- Let f, g ∈ C(Im)
N1: ∥f∥ = sup{ |f(x)| : x ∈ Im} ≥ 0
N2: ∥f∥ = 0 ⇔ sup{ |f(x)| : x ∈ Im}= 0 ⇔ f(x) = 0 ∀x ∈ Im ⇔ f = 0
N3: ∥αf∥ = sup{ |αf(x)| : x ∈ Im} = |α| sup{ |f(x)| : x ∈ Im} = |α|∥f∥
N4:∥f∥ + ∥g∥ = sup{|f(x)| : x ∈ Im}+ sup{|g(x)| : x ∈ Im}≥ sup{|f(x)|+|g(x)| : x ∈ Im} ≥ sup{|f(x)+g(x)| : x ∈ Im} = ∥f+g∥

∥·∥ is a norm

C(Im) is a Metric Space and Topological Space

When a vector space is a Normed Space, it will be a Metric Space and Topological Space.

Normed Vector Space, Metric Space and Topological Space (from Wikipedia)

A Metric Space is an ordered pair (M, d) where M is a set on which a metric d is defined. A metric is the intuitive notion of “distance” in the real world. A metric is a non-negative function defined on the set d: M x M → [0, ∞), for any f, g ∈ M and has the following properties:

M1: d(f, g) = 0 ⇔ f = g
M2: d(g, f) = d(f, g)
M3: d(f, g)+d(g, h)≥d(f, h)

Define d(f, g): M x M → [0, ∞), for any f, g ∈ M and M is a Normed Space
d(f, g) = ∥f - g∥
Let’s prove that d(·,·) is a metric.

Proof- Let f, g, h ∈ M
M1: d(f, g) = 0 ⇔ ∥f-g∥ = 0 ⇔ f-g = 0 ⇔ f = g
M2: d(g, f) = ∥g-f∥= ∥(-1)(f-g)∥ = |-1|∥f-g∥ = d(f, g)
M3: d(f, g)+d(g, h) = ∥f-g∥ + ∥g-h∥ ≥ ∥f-h∥ = d(f, h)

d(·,·) is a metric

Since C(Im) is a Normed Space, it is also a Metric Space.

A Topological Space is an ordered pair (X, τ), where X is a set and τ is a collection of subsets ui of X, called Open Set, satisfying the following axioms:
T1: ø, X ∈ τ
T2: any (finite or infinite) union of sets in τ is itself in τ, i.e. ∪ui ∈ τ ∀ ui ∈ τ
T3: any finite intersection of sets in τ is itself in τ, i.e. ∩ui ∈ τ ∀ ui ∈ τ

To understand why a Metric Space (M, d) is also a Topological Space, we need a tool called Open Ball:

Define Open Ball B(x, r), for any x ∈ M, r > 0
B(x, r) = {p|p ∈ M, d(x, p) < r}
Define Open Set o and collection τ
τ ={o|o ⊆ M, ∀x∈o ∃r>0 (B(x,r) ⊆ o)}

Let’s prove that (M, d) is a Topological Space

ø contains no point ⇒ ø ∈ τ
∀x∈M ∃r>0 B(x,r)⊆M⇒ M ∈ τ

o[i] ∈ τ
⇒ ∪o[i] ⊆ M and ∪o[i] = {x|∃o∈τ ∃r>0 (x∈o and B(x,r) ⊆o ⊆ ∪o[i])}
⇒ ∪o[i] ⊆ M and ∪o[i] = {x|∃r>0 (B(x,r) ⊆ ∪o[i])}
⇒ ∪o[i] ∈ τ

o1, o2 ∈ τ
⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃r1>0 ∃r2>0 (x∈o1 and B(x,r1) ⊆o1 and x∈o2 and B(x,r2) ⊆o2)}
⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃min(r1,r2)>0 (x∈o1 ∩ 02 and B(x,min(r1,r2)) ⊆ B(x,r1) ⊆ o1 and B(x,min(r1,r2)) ⊆ B(x,r2) ⊆ o2)}
⇒ o1 ∩ o2 ⊆ M and o1 ∩ o2 = {x|∃min(ru)>0 (B(x,min(r1,r2)) ⊆ o1 ∩ o2)}
⇒ o1 ∩ o2 ∈ τ

∴(M, d) is a Topological Space

Functions of the form F(x) is dense in C(Im) ?

Return to the Universal Approximation Theorem, it says that “functions of the form F(x) is Dense in C(Im)”. Let S be the set of functions of the form F(x). Clearly, set S is a subset of C(Im). Dense is a topological feature. It means that for any point x in C(Im), then every Neighbourhood N of x contains a point from S. A Neighbourhood N of x is a subset of C(Im) that includes an Open Set u containing x.

x ∈ u ⊆ N

Dense set S, Neighbourhood N and Open Set u containing point x

In Metric Space, it can be proved that any Open Ball can be an Open Set and neighbourhood at the same time [See Appendix]:

x∈B(x, r)=u=N

We have proved that C(Im) is a Topological Space, so we can now use the Topology Theory to help us to prove the theorem. Now, we can express the Universal Approximation Theorem in the language of Topology Theory:

S is Dense in C(Im)
⇔ For any f ∈ C(Im), N ∩ S ≠ ∅ ∀ N where N is a neighbourhood of f
⇒ For any f ∈ C(Im), B(f, ε) ∩ S ≠ ∅ ∀ε > 0
⇒ For any f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. d(f, F) < ε
⇒ For any f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. ∥f-F∥ < ε
⇒ For any f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. sup{|F(x)-f(x)||∀ x ∈ Im} < ε
⇒ For any function f ∈ C(Im), ∀ε > 0, ∃ F ∈ S s.t. |F(x)-f(x)| < ε ∀ x ∈ Im

So, once we have proved that S is dense in C(Im), the Universal Approximation Theorem is finally proved. To prove this, we use the following Topology theorem and then prove by contradiction:

S is Dense in C(Im) ⇔ Closure(S) = C(Im)
Suppose Closure(S) ≠ C(Im) ⇒ Closure(S) ⊂ C(Im) ⇒ contradiction

I will explain this in details in Part 2.

Your feedback is highly appreciated and will help me to continue my articles.

Please give this post a clap if you like this post. Thanks!!


Open Ball B(x, r) is an Open Set and neighbourhood for any point x in Metric Space.
∀ p ∈ B(x, r)
d(p, x)= l < r
Consider Open Ball B(p, t) with t = r-l > 0
∀ y ∈ B(p, t)
d(y, p) < t
d(y, x) < d(y, p) + d(p, x) < t + l = r
⇒ y ∈ B(x, r)
⇒ ∃t >0 s.t. B(p, t) ⊂ B(x, r)
⇒ B(x, r) ∈ τ

Clearly, B(x, r)⊆B(x, r), so Open Ball B(x, r) contains an Open Set and is thus a neighbourhood of point x.



Simon Kwan

MSc Computer Science, BSc Math, BSc Mechanical Engineering