问题

今天想试一下之前在Windows 10上配置的Pytorch环境。准备模型训练的数据集,当我尝试遍历DataLoader的时候出现了以下报错信息。

  1. Traceback (most recent call last):
  2. File "<string>", line 1, in <module>
  3. File "D:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
  4. exitcode = _main(fd)
  5. File "D:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
  6. self = reduction.pickle.load(from_parent)
  7. AttributeError: Can't get attribute 'MyDataset' on <module '__main__' (built-in)>
  8. Traceback (most recent call last):
  9. File "D:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
  10. exec(code_obj, self.user_global_ns, self.user_ns)
  11. File "<ipython-input-5-e37105fe54f7>", line 1, in <module>
  12. for i, j in train_iter:
  13. File "D:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
  14. return _MultiProcessingDataLoaderIter(self)
  15. File "D:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
  16. w.start()
  17. File "D:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start
  18. self._popen = self._Popen(self)
  19. File "D:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
  20. return _default_context.get_context().Process._Popen(process_obj)
  21. File "D:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
  22. return Popen(process_obj)
  23. File "D:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
  24. reduction.dump(process_obj, to_child)
  25. File "D:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
  26. ForkingPickler(file, protocol).dump(obj)
  27. BrokenPipeError: [Errno 32] Broken pipe

好像涉及到多进程的一些问题,之前学python的时候记得Windows上没有fork()系统调用,多进程好像需求特殊的处理。涉及问题的部分代码如下。

  1. train_iter = DataLoader(dataset=train_set, batch_size=batch_size, shuffle=True, num_workers=10)
  2. test_iter = DataLoader(dataset=test_set, batch_size=batch_size, shuffle=True, num_workers=10)
  3. # %%
  4. for i, j in train_iter:
  5. print(i)

DataLoader的num_workers涉及多线程读取数据,而Python由于设计时有GIL全局锁,导致了多线程无法利用多核,这边实际上应该是用多进程实现多核利用的。问题大致就出于此。

解决方法

方案一

将num_workers设置为0

方案二

用以下代码包括你的其他代码

  1. if __name__ == '__main__':