Hahn-Banach theorem
This is article on the Hahn-Banach theorem. I write this article down to serve both as a note for me as well as a tutorial for the interested readers. This article is rather technical and if you know the basic set theory and linear algebra, I believe you can read this article without difficulty.
This theorem, named after Hans Hahn and Stefan Banach, is a central tool in functional analysis. By using this theorem, any linear functional defined on a vector subspace of some vector space can be extended to the whole space.
Linear Functional and Sublinear Function
So, what is a linear functional then?
A Linear Functional L on a real vector space V is a function L:V→ℝ, which satisfies the following properties:
∀v, w∈V, c∈ℝ
LF1. L(v+w)=L(v)+L(w)
LF2. c·L(v)=L(c·v)
A Sublinear Function p on a real vector space V is a function p:V→ℝ, which satisfies the following properties:
∀v, w∈V, c∈ℝ
SF1. p(v+w) ≤ p(v)+p(w)
SF2. c·p(v) = p(c·v) where c > 0
Zorn’s Lemma
Two axioms are used in the proof of the Hahn-Banach theorem: Greatest Lower Bound Property and Zorn’s Lemma.
Greatest-lower-bound property — Any non-empty set of real numbers that has an lower bound must have a greatest lower bound in real numbers.
Least-upper-bound property — Any non-empty set of real numbers that has an upper bound must have a least upper bound in real numbers.
Zorn’s lemma — Suppose a non-empty partially ordered set P has the property that every non-empty chain has an upper bound in P. Then the set P contains at least one maximal element.
First, I have to introduce the definition of Partial Order, Total Order and Chain.
A Partial Order ≤ on a set X is a binary relation that satisfies the following properties:
PO1. ∀x ∈ X, x ≤ x
PO2. ∀x, y ∈ X, x ≤ y and y ≤ x ⇒ x = y
PO3. ∀x, y, z ∈ X, x ≤ y and y ≤ z ⇒ x ≤ z
A Total Order on a set X is a Partial Order in which any two elements are comparable:
∀x, y ∈ X, x ≤ y or y ≤ x
A Chain on a set X is a Total Ordered subset of X.
Hahn-Banach Theorem
Below is the description of Hahn-Banach Theorem:
HA. Let V be the vector space and S⊆V be the linear subspace in V. Let l:S→ℝ be a linear functional defined on S. Let p:V→ℝbe a sublinear function defined on V such that l(v) ≤ p(v) ∀v∈S. Then, 1) there is a linear functional L:V→ℝsuch that 2) L(v)=l(v) ∀ v∈S and 3) L(v) ≤ p(v) ∀v∈V.
Proof
If S=V, the statement is true.
So, consider S⊂V.
⇒ ∃z∈V\S
1)We can use z to extend the linear subspace S and the linear functional l.
Consider the set S’ extended from S by:
S’ = { y+c·z | y∈S, c∈ℝ }
∀f, g ∈S’
⇒ f = y1+c1·z, g = y2+c2·z for some y1, y2 ∈ S and c1, c2 ∈ ℝ
⇒ f+g = (y1+y2) + (c1+c2)·z ⇒ f+g ∈ S’
∀f∈S’, d∈ℝ
⇒ f = y+c·z for some y∈S, c∈ℝ
⇒ d·f = d·(y+c·z) = d·y+(d·c)·z ⇒ d·f ∈ S’
⇒ S’ is a linear subspace of V
Assume L’:S’→ℝ be the extended linear functional defined on S’ s.t.
L’(v) = l(v) ∀ v∈S
L’(v) ≤ p(v) ∀ v∈S’
Then, ∀ v∈S’
⇒ v = y+c·z for some y ∈ S and c ∈ ℝ
⇒ L’(v) = L’(y+c·z) = L’(y)+c·L’(z) = l(y)+c·L’(z) [ By LF1, LF2 and definition ]
We need to prove that L’ is well defined. If we can prove that v is uniquely represented by y+c·z, then L’ is well defined as l is already well defined.
Suppose on the contrary, v = y1+c1·z = y2+c2·z where y1,y2∈S and c1,c2∈ℝ
⇒ y1-y2 = (c2-c1)·z ∈S
⇒ y1-y2 = 0, c2-c1 = 0 [ Since z ∉ S ]
⇒ y1=y2 and c1=c2 ⇒ v is uniquely defined in S’
⇒ L’ is well defined
Next, we need to check if our assumption is correct. Denote q = L’(z), then
∀v∈S’, L’(v)=l(y)+c·q for some y∈S and c∈ℝ
=> l(y)+c·q ≤ p(y+c·z) [ By our assumption ]
=> c·q ≤ p(y+c·z) — l(y) — — — (*)
If c = 0, L’(v) = l(v) ∀ v∈S. If c ≠ 0, let’s check if there exists q∈ℝsuch that (*) is satisfied.
Case 1: c>0
q ≤ p(y/c+z)-l(y/c)
⇒ q ≤ p(y+z)-l(y) [ Since y/c ∈ S and replace y/c by y ]
Case 2: c<0
q ≥ 1/c·p(y+c·z) +(-1/c)·l(y) = -(-1/c)·p(y+c·z) +(-1/c)·l(y) = -p(-y/c-z)+l(-y/c) [ By SF2 ]
⇒ q ≥ -p(y-z)+l(y) [ Since -y/c ∈ S and replace -y/c by y ]
Let M = { p(y+z)-l(y) | y∈S }, N = { -p(y-z)+l(y) | y∈S }
∀ m∈M, n∈N
⇒ m = p(y1+z)-l(y1), n = -p(y2-z)+l(y2) for some y1, y2 ∈ S
⇒ m-n = p(y1+z)-l(y1)+p(y2-z)-l(y2) = p(y1+z)+p(y2-z)-(l(y1)+l(y2))
≥ p(y1+y2)-l(y1+y2) ≥ 0 [ Since y1+y2 ∈ S and by SF1 ]
⇒ n ≤ m
⇒ sup N ≤ q ≤ inf M [ Since m ≥ q ≥ n, inf M and sup N exist by Greatest Lower Bound Property and Least Upper Bound Property ]
⇒ ∃q∈ℝ s.t. (*) holds ⇒ every S’ has an extended linear functional L’ s.t. L’(v)=l(v) ∀ v∈S’ and L’(v) ≤ p(v) ∀v∈V.
Now, we find that there is a Finally, we consider the family of all possible extended linear functional L’ and use the Zorn’s Lemma to “merge” all these function.
2)Let’s find out what the set P is.
Consider Ω = { (S’[i], L’[i]) | i=1,2,.. } be the set of ordered pair (S’[i], L’[i]) where S’[i] is the linear subspace containing S and L’[i]:S’[i]→ℝ is the corresponding extended linear function with L’[i](v)=l(v) ∀ v∈S and L’[i](v) ≤ p(v) ∀ v∈S’[i].
Consider P = (Ω, ≤) with the order ≤ on Ω s.t. :
(S’[i], L’[i]) ≤ (S’[j], L’[j]) iff S’[i] ⊆ S’[j] and L’[j](S’[i])=L’[i](S’[i])
We need to prove that ≤ is a Partial Order.
PO1:
∀ (S’[i],L’[i]) ∈ Ω, S’[i] ⊆ S’[i], L’[i](S’[i])=L’[i](S’[i]) => (S’[i], L’[i]) ≤ (S’[i], L’[i])
PO2:
∀ (S’[i],L’[i]),(S’[j], L’[j]) ∈ Ω,
(S’[i], L’[i]) ≤ (S’[j], L’[j]) and (S’[j], L’[j]) ≤ (S’[i], L’[i])
⇒ S’[i] ⊆ S’[j] and S’[j] ⊆ S’[i]
L’[j](S’[i])=L’[i](S’[i]) and L’[i](S’[j])=L’[j](S’[j])
⇒ S’[i]=S’[j] and L’[j](S’[j]) = L’[j](S’[i]) = L’[i](S’[i]) = L’[i](S’[j])
⇒ (S’[i], L’[i]) = (S’[j], L’[j])
PO3:
∀ (S’[i],L’[i]), (S’[j], L’[j]), (S’[k], L’[k]) ∈ Ω,
(S’[i], L’[i]) ≤ (S’[j], L’[j]) and (S’[j], L’[j]) ≤ (S’[k], L’[k])
⇒ S’[i] ⊆ S’[j] and S’[j] ⊆ S’[k] and L’[j](S’[i])=L’[i](S’[i]) and L’[k](S’[j]) = L’[j](S’[j])
⇒ S’[i] ⊆ S’[k] and L’[k](S’[i])=L’[j](S’[i])=L’[i](S’[i])
⇒ (S’[i], L’[i]) ≤ (S’[k], L’[k])
Ω ≠ Ø [ Since ∃z∈V\S, extended linear subspace exists. ]
Therefore, P=(Ω,≤) is a non-empty Partial Ordered Set.
3) For every non-empty chain C = {(S’[i],L’[i])} on Ω, we need to find its upper bound. Consider the ordered pair (S’c, L’c) with
S’c = ⋃ S’[i]
L’c: S’c→ℝ where L’c(v) = L’[i](v) for some (S’[i], L’[i]) ∈ C where v ∈ S’[i]
We need to prove that the (S’c, L’c) ∈ Ω and also is a upper bound of the chain C, that is, we need to prove the following conditions are satisfied:
3.1) S’c is a linear subspace of V
3.2) S ⊆ S’c
3.3) L’c is well defined
3.4) L’c(v) ≤ p(v) ∀ v∈S’c
3.5) L’c(v) = l(v) ∀ v∈S
3.6) L’c is a linear functional defined on S’c
3.7) (S’[i], L’[i]) ≤ (S’c, L’c) ∀(S’[i], L’[i]) ∈ P
3.1) S’c is a linear subspace of V:
∀ f, g ∈ S’c, d∈ℝ
⇒ f∈S’[i] for some (S’[i], L’[i]), g∈S’[j] for some (S’[j], L’[j])
⇒ f, g∈S’[j] [ Suppose (S’[i], L’[i]) ≤ (S’[j], L’[j]) as C is totally ordered ]
⇒ f+g∈S’[j] ⊆ ⋃ S’[i] [ Since S’[j] is a linear subspace ]
⇒ f+g∈S’c [ Same result when (S’[j], L’[j]) ≤ (S’[i], L’[i]) ]
∀ f∈S’c, d∈ℝ
⇒ f∈S’[i] for some (S’[i], L’[i])
⇒ d·f∈S’[i] [ Since S’[i] is a linear subspace ]
⇒ d·f∈S’c
3.2) ∀ S’[k], S ⊆ S’[k] ⊆⋃ S’[i]
⇒ S ⊆ S’c
3.3) L’c is well defined:
Suppose v∈S’[i] and v∈S’[j]
⇒ (S’[i], L’[i]) ≤ (S’[j], L’[j]) ⇒ L’[j](v)=L’[i](v) [ Since C is totally ordered ]
⇒ L’c(v)=L’[j](v)=L’[i](v) [ Same result when (S’[j], L’[j]) ≤ (S’[i], L’[i]) ]
3.4) ∀ v∈S’c ⇒ v∈S’[i] for some (S’[i], L’[i]) ⇒ L’c(v)=L’[i](v) ≤ p(v)
3.5) ∀ v∈S ⇒ v∈S’[i] for some (S’[i], L’[i]) ⇒ L’c(v) = L’[i](v) = l(v)
3.6) L’c is linear functional:
∀ f, g ∈ S’c
⇒ f∈S’[i] for some (S’[i], L’[i]), g∈S’[j] for some (S’[j], L’[j])
⇒ f, g ∈S’[j] [ Suppose (S’[i], L’[i]) ≤ (S’[j], L’[j]) as C is totally ordered ]
⇒ L’c(f) + L’c(g) = L’[i](f)+L’[j](g) = L’[j](f)+L’[j](g) = L’[j](f+g) = L’c(f+g) [ Since f+g ∈ S’[j] ]
∀f ∈ S’c, d∈ℝ
⇒ f ∈ S’[i] for some (S’[i], L’[i])
⇒ L’c(d·f) = L’[i](d·f) = d·L’[i](f) = d·L’c(f)
3.7) (S’[i], L’[i]) ≤ (S’c, L’c) ∀(S’[i], L’[i]) ∈ P:
∀S’[k] ⇒ S’[k] ⊆⋃ S’[i]
∀v∈S’[i] ⇒ L’c(v) = L’[i](v) [ By the definition of L’c ]
Therefore, every non-empty chain has an upper bound in P. By the Zorn’s lemma, set P = (Ω, ≤) contains at least one maximal element (Smax, Lmax). If Smax = V, then Hahn-Banach Theorem will be proven.
4) Now, suppose Smax ≠ V
⇒ Smax ⊂ V ⇒ ∃z∈V\Smax
⇒ By 1), there exists set S’ which is an extended from Smax by z and extended linear functional L’ defined on S’
⇒ ∃(S’, L’) ∈ Ω s.t. (Smax, Lmax) ≤ (S’, L’)
⇒ contradicting (Smax, Lmax) is the maximal element
Reference:
“https://en.wikipedia.org/wiki/Vector_space”
“https://en.wikipedia.org/wiki/Linear_subspace”
“https://en.wikipedia.org/wiki/Functional_analysis”
“https://en.wikipedia.org/wiki/Hahn_Banach_theorem”
“https://www.proofwiki.org/wiki/Definition:Linear_Functional”
“https://en.wikipedia.org/wiki/Infimum_and_supremum”