Concise Implementation of Multilayer Perceptrons

:label:sec_mlp_concise

As you might expect, by relying on the high-level APIs, we can implement MLPs even more concisely.

```{.python .input} from d2l import mxnet as d2l from mxnet import gluon, init, npx from mxnet.gluon import nn npx.set_np()


```{.python .input}
#@tab pytorch
from d2l import torch as d2l
import torch
from torch import nn

```{.python .input}

@tab tensorflow

from d2l import tensorflow as d2l import tensorflow as tf


## Model
As compared with our concise implementation
of softmax regression implementation
(:numref:`sec_softmax_concise`),
the only difference is that we add
*two* fully-connected layers
(previously, we added *one*).
The first is our hidden layer,
which contains 256 hidden units
and applies the ReLU activation function.
The second is our output layer.
```{.python .input}
net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'),
        nn.Dense(10))
net.initialize(init.Normal(sigma=0.01))

```{.python .input}

@tab pytorch

net = nn.Sequential(nn.Flatten(), nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10))

def initweights(m): if type(m) == nn.Linear: torch.nn.init.normal(m.weight, std=0.01)

net.apply(init_weights)


```{.python .input}
#@tab tensorflow
net = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(10)])

The training loop is exactly the same as when we implemented softmax regression. This modularity enables us to separate matters concerning the model architecture from orthogonal considerations.

```{.python .input} batch_size, lr, num_epochs = 256, 0.1, 10 loss = gluon.loss.SoftmaxCrossEntropyLoss() trainer = gluon.Trainer(net.collect_params(), ‘sgd’, {‘learning_rate’: lr})


```{.python .input}
#@tab pytorch
batch_size, lr, num_epochs = 256, 0.1, 10
loss = nn.CrossEntropyLoss()
trainer = torch.optim.SGD(net.parameters(), lr=lr)

```{.python .input}

@tab tensorflow

batch_size, lr, num_epochs = 256, 0.1, 10 loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) trainer = tf.keras.optimizers.SGD(learning_rate=lr)


```{.python .input}
#@tab all
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

Summary

Using high-level APIs, we can implement MLPs much more concisely.
For the same classification problem, the implementation of an MLP is the same as that of softmax regression except for additional hidden layers with activation functions.

Exercises

Try adding different numbers of hidden layers (you may also modify the learning rate). What setting works best?
Try out different activation functions. Which one works best?
Try different schemes for initializing the weights. What method works best?

:begin_tab:mxnet Discussions :end_tab:

:begin_tab:pytorch Discussions :end_tab:

:begin_tab:tensorflow Discussions :end_tab: