- Recap
- Perceptron: Affine combination of input.
- Activation functions:
- Binary Step:
- Sigmod:
- Tanh:
.
- Binary Step:
- The multi-layer perceptron
- Input layer, Hidden layer, Output layer
- Deep network: Has at least 3 layers
Depth of DAG is the length of the longest path from a source to a sink - MLPs approximate functions
- Perceptron: Affine combination of input.
- MLPs as a Boolean Functions
- A boolean function is a truth table.
- Disjunctive Normal Form (DNF)
- One hidden layer MLP is a Universal Boolean Function.
- However, the hidden layer can be exponentially large.
- Karnaugh Map
- Adjacent boxes can be grouped to reduce the complexity of the DNF formula.
- The need for Depth
- Worst Case: Largest irreducible DNF. Chessboard.
- Single Layer: N Inputs, Requires
perceptrons in the hidden layer.
- Multi-Layer: N Input, Requires
perceptrons.
If use a tree structure. Requireslayers.
If the depth is limited, the size of the perceptrons is exponential again. - If the boolean function can be expressed as some preliminary computation followed by XORs, the same problem surges.
- Any Boolean Circuit of depth d using AND, OR and NOT gates with unbounded fan-in must have size
. (actually a threshold circuit, not a boolean circuit.)
- MLPs as a Universal Classifiers
- Complex decision boundaries.
- Example: Two hidden layers.
Use a combination of convex polygons to form the decision boundary. (Using OR logic). - Universal: One hidden layer.
- Use polygons that have N edges.
- The area inside polygon has a sum of N and the majority area outside polygon has an area of N/2. Set the threshold as N/2.
- As N -> infinity, the polygons will become circles. (If consider the sum as the 3rd dimension, they will become cylinders.)
- The circles will fill the target polygons. (Using OR logic)
- Example: Two hidden layers.
- Optimal depth
- View this problem as arithmetic circuits.
- Valiant: A polynomial of degree n requires a network of depth
. (Cannot be shallower)
- Otherwise, it will be exponentially large
- Valiant: A polynomial of degree n requires a network of depth
- Example: Cheeseboard example
- One Layer: Exponential number of neurons to approximate.
- Two Layers: For an 8 x 8 chessboard. 16 in the hidden layer 1 (to represent 16 lines). 40 in the hidden layer 2 (to represent 40 yellow boxes)
- Multiple Layers of XOR network: Depends on the dimension of the input, the depth can be higher, but the number of neurons can be less.
- Overall: The number of neurons required in a shallow network is potentially exponential in the dimension of input. (Statistically independent)
- View this problem as arithmetic circuits.
- Complex decision boundaries.
- MLPs as a Universal Approximators for Continuous Functions
- How to approximate
- Similar to 3.1.2. Use N dimension cylinders to approximate function.
- The network is actually a universal map from the entire domain of input values to the entire range of the output activation.
- Sufficiency of architecture
- Each hidden layer needs sufficiency wide to preserve the information. For example, using 8 neurons on the first hidden layer will lose the information of a 16 dimension input.
- However, with continuous-valued activation function (not a threshold function), the information may be preserved.
- In practice, something like a Parametric rectified linear unit is more helpful as it preserves information.
- How to approximate
- Summary
- MLP is a Univeral Boolean Function.
- But it can represent a given function only if 1) sufficiently wide and 2) sufficiently deep.
- Deeper networks require far fewer neurons than shallower networks to express the same function. (Single layer may require an exponentially large number of perceptrons)
Leave a comment