来源:Reddit 等 编辑:金磊、鹏飞
【新智元导读】训练神经网络可以用听的!Reddit 网友做了一个非常有趣的实验:将每个神经网络层梯度范式转换成了一个音调,这样人类就可以凭借听觉,来很好的分辨出非常小的干扰,比如节奏和音调。
训练神经网络还可以用“听”的!
网友做了一个非常有趣的实验:将每个神经网络层梯度范式转换成了一个音调,这样人类就可以凭借听觉,来很好的分辨出非常小的干扰,比如节奏和音调。
以往,我们在训练神经网络的时候,通常会测量许多不同的指标,例如精度、损失以及梯度等等。这些工作大部分是在 TensorBoard 上聚合上述度量指标并且绘制可视化。
但除了视觉之外,有 Reddit 网友提出:用听觉也可以监控神经网络的训练!
博客地址:
声音是目前神经网络训练中研究较少的一个方向。人类的听觉可以很好的分辨出非常小的干扰(即使这些干扰时间很短或很细微),比如节奏和音高。
在这个实验中,研究者做了一个非常简单的例子,显示了使用每层的梯度范数进行的合成声音,以及使用不同设置(如不同学习率、优化器,动量等)对 MNIST 进行卷积神经网络训练的步骤等。
看到这个结果,Reddit 网友嗨了,纷纷开发脑洞。
MLApprentice:
这真太了不起了。我一直在寻找直观体验渐变的方法,我觉得只看直方图时很难注意到训练模式。你有没有想过用图层深度来控制音高并使用音量来表示规范呢?这样我们光靠听音高就能知道是第几层了。
klaysDoodle:
10 层网络以后,我聋了
MLApprentice:
楼上你太搞笑了。你可以将深度标准化,使其保持在人类听觉范围内就可以。
gohu_cd:
很有意思!我想知道这是否有助于调试神经网络训练。因为其中存在不同的加权损失,甚至是对抗的(例如 GAN)。因为视觉和听觉都是感官,查看图表或听觉声音应该具有相同数量的信息。可以用对应于加权梯度的所有声音创建一个“交响乐”,也许这对于确定每个损失的正确权重是有用的。
在下文给出的实验中,你需要安装 PyAudio 和 PyTorch 来运行代码。
一、“听见”神经网络的声音
如下训练神经网络的声音可跳转至下方链接听:
用 LR 0.01 和 SGD 训练声音
下面这个音频片段表示在第一个 epoch 的前 200 步中使用 4 个层的梯度,并使用 10 个 batche 大小的训练会话。音高越高,一个神经网络层的标准值就越高,不同的 batche 之间会有短暂的静音。
用 LR 0.1 的 SGD 训练声音
同上,但是学习率更高了。
用 LR 1.0 的 SGD 训练声音
同上,但是随着学习率的提高,神经网络产生发散(diverge)。
用 LR 1.0、BS 256 的 SGD 训练声音
设置是相同的,但是学习率高达 1.0,batche 大小为 256。
用 LR 0.01 的 Adam 训练声音
与 SGD 的设置相同,但使用的是 Adam。
二、源代码展示
以下是实验的全部源代码,有兴趣的读者可以上手试一下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | <span style="color: #000000;"> 1 import pyaudio 2 import numpy </span><span style="color: #0000ff;">as</span><span style="color: #000000;"> np 3 import wave </span><span style="color: #800080;">4</span><span style="color: #000000;"> 5 import torch 6 import torch.nn </span><span style="color: #0000ff;">as</span><span style="color: #000000;"> nn 7 import torch.nn.functional </span><span style="color: #0000ff;">as</span><span style="color: #000000;"> F 8 import torch.optim </span><span style="color: #0000ff;">as</span><span style="color: #000000;"> optim 9 </span><span style="color: #0000ff;">from</span><span style="color: #000000;"> torchvision import datasets, transforms </span><span style="color: #800080;">10</span> <span style="color: #800080;">11</span><span style="color: #000000;"> 12 </span><span style="color: #0000ff;">class</span><span style="color: #000000;"> Net (nn.Module): </span><span style="color: #800080;">13</span><span style="color: #000000;"> def __init__(self): </span><span style="color: #800080;">14</span><span style="color: #000000;"> super (Net, self).__init__() </span><span style="color: #800080;">15</span> self.conv1 = nn.Conv2d (<span style="color: #800080;">1</span>, <span style="color: #800080;">20</span>, <span style="color: #800080;">5</span>, <span style="color: #800080;">1</span><span style="color: #000000;">) </span><span style="color: #800080;">16</span> self.conv2 = nn.Conv2d (<span style="color: #800080;">20</span>, <span style="color: #800080;">50</span>, <span style="color: #800080;">5</span>, <span style="color: #800080;">1</span><span style="color: #000000;">) </span><span style="color: #800080;">17</span> self.fc1 = nn.Linear (<span style="color: #800080;">4</span>*<span style="color: #800080;">4</span>*<span style="color: #800080;">50</span>, <span style="color: #800080;">500</span><span style="color: #000000;">) </span><span style="color: #800080;">18</span> self.fc2 = nn.Linear (<span style="color: #800080;">500</span>, <span style="color: #800080;">10</span><span style="color: #000000;">) </span><span style="color: #800080;">19</span> <span style="color: #800080;">20</span> self.ordered_layers =<span style="color: #000000;"> [self.conv1, </span><span style="color: #800080;">21</span><span style="color: #000000;"> self.conv2, </span><span style="color: #800080;">22</span><span style="color: #000000;"> self.fc1, </span><span style="color: #800080;">23</span><span style="color: #000000;"> self.fc2] </span><span style="color: #800080;">24</span> <span style="color: #800080;">25</span><span style="color: #000000;"> def forward (self, x): </span><span style="color: #800080;">26</span> x =<span style="color: #000000;"> F.relu (self.conv1(x)) </span><span style="color: #800080;">27</span> x = F.max_pool2d (x, <span style="color: #800080;">2</span>, <span style="color: #800080;">2</span><span style="color: #000000;">) </span><span style="color: #800080;">28</span> x =<span style="color: #000000;"> F.relu (self.conv2(x)) </span><span style="color: #800080;">29</span> x = F.max_pool2d (x, <span style="color: #800080;">2</span>, <span style="color: #800080;">2</span><span style="color: #000000;">) </span><span style="color: #800080;">30</span> x = x.view (-<span style="color: #800080;">1</span>, <span style="color: #800080;">4</span>*<span style="color: #800080;">4</span>*<span style="color: #800080;">50</span><span style="color: #000000;">) </span><span style="color: #800080;">31</span> x =<span style="color: #000000;"> F.relu (self.fc1(x)) </span><span style="color: #800080;">32</span> x =<span style="color: #000000;"> self.fc2(x) </span><span style="color: #800080;">33</span> <span style="color: #0000ff;">return</span> F.log_softmax (x, dim=<span style="color: #800080;">1</span><span style="color: #000000;">) </span><span style="color: #800080;">34</span> <span style="color: #800080;">35</span><span style="color: #000000;"> 36 def open_stream (fs): </span><span style="color: #800080;">37</span> p =<span style="color: #000000;"> pyaudio.PyAudio () </span><span style="color: #800080;">38</span> stream = p.open (format=<span style="color: #000000;">pyaudio.paFloat32, </span><span style="color: #800080;">39</span> channels=<span style="color: #800080;">1</span><span style="color: #000000;">, </span><span style="color: #800080;">40</span> rate=<span style="color: #000000;">fs, </span><span style="color: #800080;">41</span> output=<span style="color: #000000;">True) </span><span style="color: #800080;">42</span> <span style="color: #0000ff;">return</span><span style="color: #000000;"> p, stream </span><span style="color: #800080;">43</span> <span style="color: #800080;">44</span><span style="color: #000000;"> 45 def generate_tone (fs, freq, duration): </span><span style="color: #800080;">46</span> npsin = np.sin (<span style="color: #800080;">2</span> * np.pi * np.arange (fs*duration) * freq /<span style="color: #000000;"> fs) </span><span style="color: #800080;">47</span> samples =<span style="color: #000000;"> npsin.astype (np.float32) </span><span style="color: #800080;">48</span> <span style="color: #0000ff;">return</span> <span style="color: #800080;">0.1</span> *<span style="color: #000000;"> samples </span><span style="color: #800080;">49</span> <span style="color: #800080;">50</span><span style="color: #000000;"> 51 def train (model, device, train_loader, optimizer, epoch): </span><span style="color: #800080;">52</span><span style="color: #000000;"> model.train () </span><span style="color: #800080;">53</span> <span style="color: #800080;">54</span> fs = <span style="color: #800080;">44100</span> <span style="color: #800080;">55</span> duration = <span style="color: #800080;">0.01</span> <span style="color: #800080;">56</span> f = <span style="color: #800080;">200.0</span> <span style="color: #800080;">57</span> p, stream =<span style="color: #000000;"> open_stream (fs) </span><span style="color: #800080;">58</span> <span style="color: #800080;">59</span> frames =<span style="color: #000000;"> [] </span><span style="color: #800080;">60</span> <span style="color: #800080;">61</span> <span style="color: #0000ff;">for</span> batch_idx, (data, target) <span style="color: #0000ff;">in</span><span style="color: #000000;"> enumerate (train_loader): </span><span style="color: #800080;">62</span> data, target =<span style="color: #000000;"> data.to (device), target.to (device) </span><span style="color: #800080;">63</span><span style="color: #000000;"> optimizer.zero_grad () </span><span style="color: #800080;">64</span> output =<span style="color: #000000;"> model (data) </span><span style="color: #800080;">65</span> loss =<span style="color: #000000;"> F.nll_loss (output, target) </span><span style="color: #800080;">66</span><span style="color: #000000;"> loss.backward () </span><span style="color: #800080;">67</span> <span style="color: #800080;">68</span> norms =<span style="color: #000000;"> [] </span><span style="color: #800080;">69</span> <span style="color: #0000ff;">for</span> layer <span style="color: #0000ff;">in</span><span style="color: #000000;"> model.ordered_layers: </span><span style="color: #800080;">70</span> norm_grad =<span style="color: #000000;"> layer.weight.grad.norm () </span><span style="color: #800080;">71</span><span style="color: #000000;"> norms.append (norm_grad) </span><span style="color: #800080;">72</span> <span style="color: #800080;">73</span> tone = f + ((norm_grad.numpy ()) * <span style="color: #800080;">100.0</span><span style="color: #000000;">) </span><span style="color: #800080;">74</span> tone =<span style="color: #000000;"> tone.astype (np.float32) </span><span style="color: #800080;">75</span> samples =<span style="color: #000000;"> generate_tone (fs, tone, duration) </span><span style="color: #800080;">76</span> <span style="color: #800080;">77</span><span style="color: #000000;"> frames.append (samples) </span><span style="color: #800080;">78</span> <span style="color: #800080;">79</span> silence = np.zeros (samples.shape[<span style="color: #800080;">0</span>] * <span style="color: #800080;">2</span><span style="color: #000000;">, </span><span style="color: #800080;">80</span> dtype=<span style="color: #000000;">np.float32) </span><span style="color: #800080;">81</span><span style="color: #000000;"> frames.append (silence) </span><span style="color: #800080;">82</span> <span style="color: #800080;">83</span><span style="color: #000000;"> optimizer.step () </span><span style="color: #800080;">84</span> <span style="color: #800080;">85</span> # Just <span style="color: #800080;">200</span><span style="color: #000000;"> steps per epoach </span><span style="color: #800080;">86</span> <span style="color: #0000ff;">if</span> batch_idx == <span style="color: #800080;">200</span><span style="color: #000000;">: </span><span style="color: #800080;">87</span> <span style="color: #0000ff;">break</span> <span style="color: #800080;">88</span> <span style="color: #800080;">89</span> wf = wave.open (<span style="color: #800000;">"</span><span style="color: #800000;">sgd_lr_1_0_bs256.wav</span><span style="color: #800000;">"</span>, <span style="color: #800000;">'</span><span style="color: #800000;">wb</span><span style="color: #800000;">'</span><span style="color: #000000;">) </span><span style="color: #800080;">90</span> wf.setnchannels (<span style="color: #800080;">1</span><span style="color: #000000;">) </span><span style="color: #800080;">91</span><span style="color: #000000;"> wf.setsampwidth (p.get_sample_size (pyaudio.paFloat32)) </span><span style="color: #800080;">92</span><span style="color: #000000;"> wf.setframerate (fs) </span><span style="color: #800080;">93</span> wf.writeframes (b<span style="color: #800000;">''</span><span style="color: #000000;">.join (frames)) </span><span style="color: #800080;">94</span><span style="color: #000000;"> wf.close () </span><span style="color: #800080;">95</span> <span style="color: #800080;">96</span><span style="color: #000000;"> stream.stop_stream () </span><span style="color: #800080;">97</span><span style="color: #000000;"> stream.close () </span><span style="color: #800080;">98</span><span style="color: #000000;"> p.terminate () </span><span style="color: #800080;">99</span> <span style="color: #800080;">100</span><span style="color: #000000;"> 101 def run_main (): </span><span style="color: #800080;">102</span> device = torch.device (<span style="color: #800000;">"</span><span style="color: #800000;">cpu</span><span style="color: #800000;">"</span><span style="color: #000000;">) </span><span style="color: #800080;">103</span> <span style="color: #800080;">104</span> train_loader =<span style="color: #000000;"> torch.utils.data.DataLoader ( </span><span style="color: #800080;">105</span> datasets.MNIST (<span style="color: #800000;">'</span><span style="color: #800000;">../data</span><span style="color: #800000;">'</span>, train=True, download=<span style="color: #000000;">True, </span><span style="color: #800080;">106</span> transform=<span style="color: #000000;">transforms.Compose ([ </span><span style="color: #800080;">107</span><span style="color: #000000;"> transforms.ToTensor (), </span><span style="color: #800080;">108</span> transforms.Normalize ((<span style="color: #800080;">0.1307</span>,), (<span style="color: #800080;">0.3081</span><span style="color: #000000;">,)) </span><span style="color: #800080;">109</span><span style="color: #000000;"> ])), </span><span style="color: #800080;">110</span> batch_size=<span style="color: #800080;">256</span>, shuffle=<span style="color: #000000;">True) </span><span style="color: #800080;">111</span> <span style="color: #800080;">112</span> model =<span style="color: #000000;"> Net () .to (device) </span><span style="color: #800080;">113</span> optimizer = optim.SGD (model.parameters (), lr=<span style="color: #800080;">0.01</span>, momentum=<span style="color: #800080;">0.5</span><span style="color: #000000;">) </span><span style="color: #800080;">114</span> <span style="color: #800080;">115</span> <span style="color: #0000ff;">for</span> epoch <span style="color: #0000ff;">in</span> range (<span style="color: #800080;">1</span>, <span style="color: #800080;">2</span><span style="color: #000000;">): </span><span style="color: #800080;">116</span><span style="color: #000000;"> train (model, device, train_loader, optimizer, epoch) </span><span style="color: #800080;">117</span> <span style="color: #800080;">118</span><span style="color: #000000;"> 119 </span><span style="color: #0000ff;">if</span> __name__ == <span style="color: #800000;">"</span><span style="color: #800000;">__main__</span><span style="color: #800000;">"</span><span style="color: #000000;">: </span><span style="color: #800080;">120</span> run_main () |
Reddit 地址:
博客:
来自:
新智元(ID:AI_era)