1. http链接下载

一般能够在浏览器点击,然后自动触发浏览器弹出下载的就是http链接,右键下载位置,能够获取到下载链接。
这样下载的局限在于:

  1. 如果要下载大量碎小文件,需要一个一个点击链接,有可能还需要重命名。
  2. 能够点击即下载的位置不好找,往往需要登录等操作。
  3. 对于浏览器来说,一般有最大数据规模限制,比如最大只能下载4GB的单个文件。

下载方式—1 : 通过迅雷
这种http url的格式,对于多个数据来说,往往是有规律可循的,比如点击每个下载位置,右键复制下载链接

  1. http://apdrc.soest.hawaii.edu/erddap/griddap/hawaii_soest_c4f8_7ed4_2d75.nc?iicethic[(2006-01-01):1:(2006-12-31)][(-89.5):1:(89.5)][(0.0):1:(359.0)]
  2. http://apdrc.soest.hawaii.edu/erddap/griddap/hawaii_soest_c4f8_7ed4_2d75.nc?iicethic[(2007-01-01):1:(2008-12-31)][(-89.5):1:(89.5)][(0.0):1:(359.0)]

可以发现其中变化的就是时间和经纬度。根据这个规律完全可以自己生成这样的链接,生成你想要的时间和经纬度,然后统一复制,到迅雷中创建批量下载,粘贴即可。
下载方式—1 : 通过linux的命令wget -i
将生成的链接复制到一个文件中,然后在命令行中输入:

  1. wget -i 链接.txt

wget有非常强大的下载功能,有许多参数,如果需要更进一步使用,比如说递归下载,断点续传,ip代理等,如果需要就去网上查。

2. FTP下载

常用下载数据方式 - 图1
有的数据网站会提供ftp地址,比如像图中的BGC-Argo,有了这样的ftp服务器之后,下载数据就比较容易。直接用FileZilla等ftp软件连接服务器地址即可,如果网站没有提示用户名和密码,一般来说选择匿名连接即可。
FileZilla界面

3. opendap下载

网络数据访问协议的开源项目 ( OPeNDAP ) 是同名客户端/服务器软件的开发商,它使科学家能够更轻松地通过互联网共享数据。
如果你查找的数据网站有opendap地址,那么我建议首选这一条,因为opendap能与xarray 无缝衔接,非常好用!
argo的apdrc开放opendap链接
复制该链接,直接用xarray读取即可,甚至不用下载就能查看数据信息,并且画图!!只需要0.7s!
常用下载数据方式 - 图4
常用下载数据方式 - 图5
利用xarray下载hycom-opendap数据
Hycom 数据的时间起始点是2000-01-01 00:00:00,里面存的是小时数,所以需要对时间进行一个解码校正。opendap的优势在于可以先读进来数据,然后对其切割你想下载的局部海域,然后再进行下载。

  1. import xarray as xr
  2. import os
  3. from datetime import datetime,timedelta
  4. import pandas as pd
  5. import numpy as np
  6. def return_latest_time():
  7. date_start = '2000-01-01 00:00:00'
  8. date_list = []
  9. for i in data_latest.time.data:
  10. date_list.append(pd.to_datetime(date_start)+timedelta(hours = i))
  11. return date_list
  12. data_global = xr.open_dataset('http://tds.hycom.org/thredds/dodsC/GLBy0.08/expt_93.0',decode_times=False,chunks={"time":100})
  13. data_latest = data_global.sel(lat = slice(2,42),lon = slice(104,132),depth=slice(0,1001))
  14. date_list = return_latest_time()
  15. data_latest['time'] = pd.to_datetime(date_list) # 重新更新文件时间
  16. date_time = pd.to_datetime(date_list)
  17. #每天到点下载就行,就下载最近的,0点和12点,如果本地有就覆盖就行了
  18. for date in date_time[-58:]:
  19. if date.hour == 0 or date.hour == 12 :
  20. data_latest_now = data_latest.sel(time = date)
  21. if not os.path.exists("/data/hycom_2018_latest/{}.nc".format(str(date))) or os.path.getsize("/data/hycom_2018_latest/{}.nc".format(str(date)))<96320000:
  22. data_latest_now.to_netcdf("/data/hycom_2018_latest/{}.nc".format(str(date)))
  23. print(date,"下载完成!!!")

dask并行下载:

  1. import xarray as xr
  2. # 利用chunks参数,将文件用dask打开
  3. data_global = xr.open_dataset('http://tds.hycom.org/thredds/dodsC/GLBy0.08/expt_93.0',decode_times=False,chunks={"time":100})
  4. #索引需要的海域范围和深度范围
  5. data_latest = data_global.sel(lat = slice(2,42),lon = slice(104,132),depth=slice(0,1001))
  6. # 更新源数据文件的时间
  7. from datetime import datetime,timedelta
  8. import pandas as pd
  9. import numpy as np
  10. def return_latest_time():
  11. date_start = '2000-01-01 00:00:00'
  12. date_list = []
  13. for i in data_latest.time.data:
  14. date_list.append(pd.to_datetime(date_start)+timedelta(hours = i))
  15. return date_list
  16. date_list = return_latest_time()
  17. data_latest['time'] = pd.to_datetime(date_list) # 重新更新文件时间
  18. date_time = pd.to_datetime(date_list)
  19. # 构建用于mfdataset保存的数据列表和文件列表
  20. data_latest_now =[]
  21. data_latest_path= []
  22. for date in date_time:
  23. if (date.hour == 0 or date.hour == 12) and date.year>2019:
  24. data_latest_now.append(data_latest.sel(time = date))
  25. data_latest_path.append("/data/hycom_2020_latest/{}.nc".format(str(date)))
  26. xr.save_mfdataset(data_latest_now,data_latest_path)

4. Linux 命令行中的ftp

在第二种方式中,我推荐了ftp软件下载,这样做的方式是界面可操作,但是也有弊端,通常我们不想将数据下载到本地,而是想要直接下载到linux服务器怎么办呢?
这样就需要通过linux远程终端去操作ftp下载,也很简单,具体参考这篇文章https://linux.cn/article-6746-1.html。
比如下载Argo FTP数据

  1. (base) msdc@msdc-virtual-machine:~/hycom_predict_temp_3D$ ftp data.argo.org.cn
  2. Connected to data.argo.org.cn.
  3. 220 (vsFTPd 3.0.2)
  4. Name (data.argo.org.cn:msdc): anonymous # anonymous 表示匿名登陆
  5. 331 Please specify the password.
  6. Password:
  7. 230-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  8. 230- Welcome to the FTP site of the China Argo Real-time Data Centre (CARDC).
  9. 230- The site is maintained by the Second Institute of Oceanography, Ministry
  10. 230- of Natural Resources.
  11. 230- CARDC website: http://www.argo.org.cn/
  12. 230-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  13. 230 Login successful.
  14. Remote system type is UNIX.
  15. Using binary mode to transfer files.
  16. ftp> ls
  17. 200 PORT command successful. Consider using PASV.
  18. 150 Here comes the directory listing.
  19. drwxr-xr-x 3 0 0 26 Nov 10 2019 pub
  20. 226 Directory send OK.
  21. ftp> cd pub
  22. 250 Directory successfully changed.
  23. ftp> ls
  24. 200 PORT command successful. Consider using PASV.
  25. 150 Here comes the directory listing.
  26. drwxr-xr-x 12 1000 1000 246 Apr 23 2020 ARGO
  27. 226 Directory send OK.
  28. ftp> cd ARGO
  29. 250-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  30. 250- Welcome to the FTP site of the China Argo Real-time Data Centre (CARDC).
  31. 250- The site is maintained by the Second Institute of Oceanography, Ministry
  32. 250- of Natural Resources. All data contained on this site is produced by CARDC.
  33. 250- Users are permitted to download and make use of all the data.
  34. 250-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  35. 250 Directory successfully changed.
  36. ftp> ls
  37. 200 PORT command successful. Consider using PASV.
  38. 150 Here comes the directory listing.
  39. drwxr-xr-x 2 1000 1000 131 Nov 10 2019 ArgoQuerySystem
  40. drwxr-xr-x 6 1000 1000 158 Nov 10 2019 Argo_derived
  41. drwxr-xr-x 5 1000 1000 58 Nov 10 2019 BOA_Argo
  42. drwxr-xr-x 4 1000 1000 53 Nov 10 2019 G-argo
  43. drwxr-xr-x 2 1000 1000 12288 Nov 10 2019 GDCSM
  44. drwxr-xr-x 2 1000 1000 8192 Nov 10 2019 ROSWPOA
  45. drwxr-xr-x 2 1000 1000 4096 Sep 08 08:14 argo-index
  46. drwxrwxr-x 2 1000 1000 144 Apr 23 2020 etopo
  47. drwxr-xr-x 13 1000 1000 4096 Oct 11 02:22 raw_argo_data
  48. drwxr-xr-x 2 1000 1000 142 Nov 10 2019 surface_current
  49. 226 Directory send OK.
  50. ftp> cd BOA_Argo
  51. 250 Directory successfully changed.
  52. ftp> ls
  53. 200 PORT command successful. Consider using PASV.
  54. 150 Here comes the directory listing.
  55. drwxr-xr-x 2 1000 1000 8192 May 22 04:33 MAT
  56. drwxr-xr-x 2 1000 1000 8192 May 22 04:33 NetCDF
  57. drwxr-xr-x 2 1000 1000 171 Apr 30 08:27 doc
  58. 226 Directory send OK.
  59. ftp> cd NetCDF
  60. 250 Directory successfully changed.
  61. ftp> ls
  62. 200 PORT command successful. Consider using PASV.
  63. 150 Here comes the directory listing.
  64. -rw-r--r-- 1 1000 1000 54151116 Apr 06 2021 BOA_Argo_2004_01.nc
  65. -rw-r--r-- 1 1000 1000 54151116 Apr 01 2021 BOA_Argo_2004_02.nc
  66. -rw-r--r-- 1 1000 1000 54151116 Apr 01 2021 BOA_Argo_2004_03.nc
  67. *********************
  68. *********************
  69. 226 Directory send OK.
  70. ftp> lcd ARGO
  71. Local directory now /home/msdc/Downloads/ARGO
  72. ftp> prompt off
  73. Interactive mode off.
  74. ftp> mget BOA_Argo_2*.nc
  75. local: BOA_Argo_2004_01.nc remote: BOA_Argo_2004_01.nc
  76. 200 PORT command successful. Consider using PASV.
  77. 150 Opening BINARY mode data connection for BOA_Argo_2004_01.nc (54151116 bytes).
  78. 226 Transfer complete.
  79. .......................