Compare commits

...

59 Commits
v1.0 ... master

Author SHA1 Message Date
ea31ad93cc chore(raw_to_rgba.py): 更新文件元信息、调整导入顺序与示例注释
移除未使用的sys导入语句,重新调整导入包的排版顺序,同时更新示例代码中的本地测试路径注释
2026-05-18 19:19:51 +08:00
5e96d0e026 refactor: 重构项目目录,添加全套遥感数据处理模块
- 重构项目目录结构,将零散代码整理到src目录下,创建各子包初始化文件
- 包括HLS、GPM、SMAP、MODIS等多源遥感数据的搜索、下载与预处理脚本
- 包括通用工具函数库,包含坐标参数定义、格式转换、影像处理等工具
- 同步更新README.md示例命令路径,修正脚本调用的目录错误
- 新增AGENTS.md文档说明开发环境配置
- 更新.gitignore规则,添加环境变量和构建缓存的忽略配置
2026-05-18 19:09:07 +08:00
231d6f1d72 fix(hls processing): 完善栅格错误处理并跳过异常数据块
为 open_hls 函数添加异常捕获逻辑,打开栅格失败时返回 None;在处理数据块时检查质量图层可用性,无法访问时跳过并记录警告;移除 common_utils.py 中冗余的 netrc 相关 GDAL 配置项;更新 HLS_PER.py 的最后更新日期
2026-05-18 17:23:58 +08:00
339755a42c refactor(HLS_SuPER): 修复HLS数据下载与处理代码存在的问题,并优化代码结构
- 统一调整所有文件的导入顺序,清理冗余导入项
- 修复format_roi函数中误用convex_hull获取顶点的问题,改用exterior.coords
- 修正dask并行任务中的参数传递错误
- 新增非交互运行模式支持,优化confirm_action函数
- 添加无搜索结果时提前退出的检查逻辑
- 优化日志配置,新增netrc认证支持
- 修复裸露的except语句,改用明确的Exception捕获
- 更新所有文件的最后更新日期
2026-05-18 14:06:06 +08:00
cc0791f1ba fix(DataV_SuPER): 允许省略city参数以获取省级行政区划数据
修改resolve_adcode_by_name和fetch_and_save_geojson_by_name的city参数为可选参数,添加默认值None。更新函数文档字符串与示例代码,同时更新文件最后更新日期。
2026-05-15 09:19:23 +08:00
7a63d373a3 build(setup): 新增项目所需的依赖包
向environment.yml添加Planetary Computer API运行所需的核心依赖dask-gateway、adlfs、stackstac、xarray-spatial,并同步更新详细配置文件至openearth.yml
2026-05-15 09:16:59 +08:00
79778dc577 fix(raw_to_cog): 清理导入顺序并移除未使用的参数
- 将导入语句按标准顺序重新排列(标准库、第三方库、本地模块)
- 移除 tif_to_cog 函数中未使用的 scaleParams 参数
- 更新示例代码中的测试路径和输出数据类型
- 更新文件最后修改日期
2026-04-24 18:09:16 +08:00
f608e3afab feat(DataV_SuPER): 重构为支持省-市-县三级行政区划解析
- 将单一城市名称参数改为省、市、县三级参数,提升解析准确性
- 实现层级查找策略:全国->省->市->县,并添加参数校验
- 更新输出文件命名规则,使用"省_市"或"省_市_县"格式
- 完善函数文档字符串和错误提示信息
2026-04-13 13:30:34 +08:00
2988ca0a53 chore: 更新开发环境配置与依赖项
- 更新 .gitignore 以忽略 powershell/ 目录和 .env* 文件
- 在 environment.yml 中调整依赖项顺序,添加 planetary-computer 和 rich,并固定 python 版本为 3.12
- 新增 .vscode 配置文件,包含推荐的扩展和编辑器设置,以统一开发环境
2026-04-13 12:07:05 +08:00
fdae8be777 chore: 扩展.gitignore以忽略更多常见文件类型
添加对办公文档、压缩文件、地理空间数据及临时文件的忽略规则,减少仓库中的无关文件。
2026-04-13 10:10:09 +08:00
c3ad04cd71 docs: 更新 README 文档结构并添加数据平台链接
- 将列表项标题格式化为三级标题以提升结构清晰度
- 为 NASA EARTHDATA、Microsoft Planetary Computer 和 Google Earth Engine 添加官方链接
- 新增 OpenTopography 和阿里云 DataV 两个公开数据平台说明
- 修正使用说明中的编号连续性
2026-04-13 10:03:54 +08:00
834c9d3563 chore: 重命名环境配置并更新依赖和文档
- 将环境配置从 `lpdaac.yml` 重命名为 `openearth.yml` 以反映更通用的工具集定位
- 在 `environment.yml` 中添加 `pystac-client` 依赖以支持更多数据源
- 更新 README.md,补充 Microsoft Planetary Computer 说明并修正环境名称引用
2026-04-13 09:56:22 +08:00
cf886a05b4 fix(raw_to_cog): 移除日志记录中的f-string冗余写法. 2026-02-27 20:01:26 +08:00
0aef8e010c feat(raw_to_cog): 改进 NoData 值处理逻辑并添加 Warp 转换支持. 2026-02-03 10:11:07 +08:00
d7cbf1e5f5 refactor(raw_to_cog): 简化主函数参数并更新示例调用. 2026-01-30 09:46:03 +08:00
c82b58e91d fix(raw_to_cog): 完善GDT_Byte类型无效值设置逻辑. 2026-01-28 19:49:42 +08:00
4dd7077d56 fix(raw_to_cog.py): 添加输入文件备用路径的二次检查. 2026-01-23 17:37:50 +08:00
566a75f6fc refactor(LST_download): 优化LST数据处理流程, 封装复用函数. 2026-01-16 15:27:17 +08:00
39a290c3d6 fix(Weather_download): 修正气象数据下载脚本的投影和分辨率设置. 2026-01-16 12:15:57 +08:00
ccfe4ef7bc refactor(Weather_download): 重构天气数据下载脚本并添加导出函数. 2026-01-16 08:59:20 +08:00
7464a47eac feat(GEE_Scripts): 添加气象数据下载脚本, 并更新LST导出文件名. 2026-01-15 21:51:32 +08:00
f1a88995db feat(LST_download): 添加MODIS LST数据支持并调整可视化范围. 2026-01-15 17:52:10 +08:00
b8a617ed8d feat(GEE_Scripts): 新增土地覆盖分类数据下载脚本. 2026-01-15 16:47:08 +08:00
8f22938c50 feat(GEE_Scripts): 添加Landsat地表温度数据下载脚本. 2026-01-15 16:46:08 +08:00
a846232e04 fix(raw_to_cog): 处理输入文件不存在时尝试替换文件名并支持自定义输出数据类型. 2026-01-08 12:02:04 +08:00
ae1c441e9a fix(raw_to_cog): 修正output_type参数类型并更新注释. 2026-01-08 10:47:38 +08:00
539b6a387b feat(utils): 重构传统GeoTIFF转COG格式的工具函数. 2026-01-08 10:33:59 +08:00
32d629aede feat(utils): 重构原始影像数据转换RGBA图像的工具函数. 2026-01-08 10:13:03 +08:00
7347cd60bb feat(S1andS2_download): 完善Sentinel-2 L2A数据缺失情况下的平替处理逻辑. 2026-01-07 23:01:29 +08:00
a4c81d0813 fix(sr2rgb): 添加转换前后对NoData值的处理. 2026-01-07 11:51:31 +08:00
eeed789d5b feat(sr2rgb): 重构地表反射率转RGB图像功能并添加多波段支持. 2026-01-06 20:59:32 +08:00
c8cdbff7b9 fix(GEE_Scripts): 完善导出图像命名规则. 2026-01-06 16:00:24 +08:00
70a7d1433b fix(GEE_Scripts): 添加HLS数据作为S2备用数据源并优化导出配置. 2026-01-06 13:42:52 +08:00
e750c067e9 fix(GEE_Scripts): 完善Sentinel-1和Sentinel-2数据融合下载脚本中的去云参数与可视化参数配置. 2026-01-05 21:32:21 +08:00
13a930fec5 feat(GEE_Scripts): 添加Sentinel-1和Sentinel-2数据融合下载脚本. 2026-01-05 20:41:38 +08:00
d52df03c2d feat(utils): 添加传统GeoTIFF转COG格式的功能. 2025-12-22 20:18:47 +08:00
7dee3d71e0 feat(GEE_Scripts): 添加Sentinel-1影像下载脚本. 2025-12-19 15:15:54 +08:00
11118915b3 feat(GEE_Scripts): 添加Sentinel-2影像下载脚本. 2025-12-09 12:44:26 +08:00
ff1dea2324 feat(sr2rgb): 完善影像转换处理逻辑. 2025-12-09 12:18:18 +08:00
fc5a29696a feat(sr2rgb): 添加SR地表反射率影像批量镶嵌并转换8bit RGB影像代码. 2025-12-07 17:42:30 +08:00
7b3573e57d fix: 完善注释说明与部分路径变量类型声明. 2025-10-20 20:29:26 +08:00
63c85931e3 feat(DEM_SuPER): 优化ALOSDEM文件处理逻辑, 直接读取并转储压缩包内部的DEM文件. 2025-10-20 10:37:22 +08:00
62b2370fd7 feat: 调整数据检索方式, 统一使用外包Polygon检索以减少所需处理的数据量. 2025-10-20 08:48:38 +08:00
c90432a815 feat(DataV_SuPER): 增强行政区划代码解析功能,支持县级市搜索. 2025-10-14 11:04:58 +08:00
2b007df006 feat(steup): 更新环境配置文件和依赖项版本. 2025-10-13 11:12:23 +08:00
7a28957087 feat(steup): 更新earthaccess包至0.15.1版本. 2025-10-13 10:46:10 +08:00
a3e4c7a172 feat(HLS_SuPER): 使用最小外包简化检索时的ROI范围. 2025-10-13 10:00:56 +08:00
116b964904 feat(DataV_SuPER): 添加从阿里云DataV获取行政区边界数据的功能. 2025-10-12 16:38:28 +08:00
fcbfa1bf08 fix(DEM_SuPER): 优化ALOSDEM文件检索匹配模式并优化裁剪逻辑. 2025-10-12 16:28:28 +08:00
418923f7df feat(DEM_SuPER): 添加对ALOS 12.5m DEM的本地解压支持. 2025-10-11 13:20:23 +08:00
b872e9e3f1 feat(DEM_SuPER): 添加对ALOS 12.5m DEM的下载支持. 2025-10-10 23:37:38 +08:00
64f50ffc0a feat: 优化项目路径处理和代码结构以及MODIS下载处理逻辑. 2025-09-12 10:41:56 +08:00
bda6f0a1ef refactor(sr2rgb): 使用xarray重构地表反射率合成RGB图像功能. 2025-09-05 21:07:51 +08:00
3d5d159d7f fix(HLS_SuPER): 确保多边形外环为逆时针方向并简化网络查询时提交的坐标. 2025-09-05 12:15:34 +08:00
d01ba65d64 feat(utils): 添加地表反射率转RGB图像功能. 2025-09-05 11:24:13 +08:00
c19f334035 refactor: 使用动态路径替换硬编码路径. 2025-09-04 14:47:42 +08:00
8883fd8d58 fix(setup): 补全Python3.12新环境下的selenium依赖. 2025-08-11 19:53:00 +08:00
1f86d2da2b feat(setup): 升级环境至Python3.12, 并更新环境配置文件及README文档. 2025-08-10 18:09:25 +08:00
e1997c102e fix(DEM_SuPER): 修复DEM数据下载拼接逻辑bug. 2025-08-05 17:21:53 +08:00
35 changed files with 3116 additions and 964 deletions

36
.gitignore vendored
View File

@ -1,11 +1,35 @@
.dodsrc
__pycache__/
powershell/
data/
*.pdf
*.tif
*.doc
*.docx
*.xls
*.xlsx
*.ppt
*.pptx
*.csv
*.tif*
*.tiff
*.ipynb
*.log
*.geojson
*.shp
*.shx
*.dbf
*.prj
*.cpg
*.qmd
*.sld
*.zip
*.7z
*.tar*
*.rar
.sisyphus/
# 环境变量
.env
.env.local
.env.*.local
!.env.example

12
.vscode/extensions.json vendored Normal file
View File

@ -0,0 +1,12 @@
{
// VS Code ()
"recommendations": [
"vscode-icons-team.vscode-icons", //
"streetsidesoftware.code-spell-checker", //
"esbenp.prettier-vscode", // Prettier
"dbaeumer.vscode-eslint", // ESLint
"ms-python.python", // Python
"charliermarsh.ruff", // Ruff
"detachhead.BasedPyright", // Python
]
}

25
.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,25 @@
{
// Python Ruff
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
}
},
// -Ruff
"editor.formatOnSave": false,
// Ruff
"ruff.fixAll": true,
"ruff.organizeImports": true,
//
"ruff.lineLength": 88,
//
"ruff.exclude": [
"**/__pycache__/**",
"**/migrations/**",
"**/node_modules/**",
"**/.git/**"
]
}

1
AGENTS.md Normal file
View File

@ -0,0 +1 @@
1、本项目算法部分在开发阶段可以使用 conda 管理 Python 环境,本机环境名为 `openearth`Python 版本为 `3.12`

View File

@ -1,270 +0,0 @@
# -*- coding: utf-8 -*-
"""
===============================================================================
This module contains functions related to preprocessing DEM data.
For example, elevation, slope, aspect
Step1: Use earthaccess search and download NASADEM Data
- NASADEM_HGT
- includes 30m DEM, based on SRTM data
- https://lpdaac.usgs.gov/products/nasadem_hgtv001/
- NASADEM_SC
- includes 30m slope, aspect, based on NASADEM_HGT
- https://lpdaac.usgs.gov/products/nasadem_scv001/
Step2: Process DEM data
- 下载的 NASADEM 均为 *.zip 文件, 需先进行解压
- NASADEM 文件名称结构为: NASADEM_类型_网格编号/网格编号.数据类型
- 高程示例: NASADEM_HGT_n30e113/n30e113.hgt
- 坡度示例: NASADEM_SC_n30e113/n30e113.slope
- 坡向示例: NASADEM_SC_n30e113/n30e113.aspect
- 读取文件按网格进行裁剪并镶嵌, 坡度和坡向数据需要进行缩放处理, 将网格范围的结果保存为 *.tif 文件
-------------------------------------------------------------------------------
Authors: Hong Xie
Last Updated: 2025-07-03
===============================================================================
"""
import os
import sys
import glob
import json
import zipfile
import time
import dask.distributed
import logging
import earthaccess
import geopandas as gpd
import numpy as np
from rioxarray import open_rasterio
sys.path.append("D:/NASA_EarthData_Script")
from utils.common_utils import setup_dask_environment, clip_image, mosaic_images
from HLS_SuPER.HLS_Su import earthdata_search
def reorganize_nasadem_urls(dem_results_urls: list):
"""
重组 NASADEM 下载链接
将同一格网内的高程, 坡度坡向数据链接进行组合
Parameters
----------
dem_results_urls: list
查询返回的 NASADEM 数据 URL 列表
Returns
-------
grouped_results_urls: list
重组后的 NASADEM 数据 URL 列表
"""
tile_ids = []
for granule in dem_results_urls:
tile_id = granule[0].split("/")[-2].split("_")[-1]
tile_ids.append(tile_id)
tile_ids = np.array(tile_ids)
# 根据瓦片ID找到对应的索引
tile_id_indices = np.where(tile_ids == tile_id)
# 根据索引过滤结果
return [dem_results_urls[i] for i in tile_id_indices[0]]
def download_granule(granule_urls: list[str], output_dir: str) -> bool:
"""
下载单批数据
Parameters
----------
granule_urls: list
查询返回的规范化待下载数据 URL 列表
output_dir: str
下载目录
Returns
-------
download_state: bool
下载状态 True or False
"""
# 检查是否已下载
if not all(
os.path.isfile(os.path.join(output_dir, os.path.basename(url)))
for url in granule_urls
):
try:
earthaccess.download(granule_urls, output_dir)
except Exception as e:
logging.error(f"Error downloading data: {e}. Skipping.")
return False
logging.info("All Data already downloaded.")
return True
def unzip_nasadem_files(zip_file_list: list[str], unzip_dir: str):
"""
解压下载的 NASADEM ZIP 文件, 并将解压后的文件统一为可读写的 .hgt 格式
"""
try:
for zip_path in zip_file_list:
if not zipfile.is_zipfile(zip_path):
continue
with zipfile.ZipFile(zip_path, "r") as zip_ref:
# 仅解压包含 .hgt, .slope, .aspect 的文件
for hgt_file in [f for f in zip_ref.namelist() if f.endswith((".hgt", ".slope", ".aspect"))]:
# 解压时重命名文件
new_name = (
hgt_file.replace(".hgt", ".elevation.hgt")
if hgt_file.endswith(".hgt")
else f"{hgt_file}.hgt"
)
unzip_file_path = os.path.join(unzip_dir, new_name)
if os.path.exists(unzip_file_path):
continue
with zip_ref.open(hgt_file) as source_file:
with open(unzip_file_path, 'wb') as unzip_file:
unzip_file.write(source_file.read())
except Exception as e:
logging.error(f"Error unzipping NASADEM to {unzip_dir}: {e}")
return
def process_granule(
unzip_dir: str,
output_dir: str,
name: str,
roi: list,
clip=True,
tile_id: str = None,
) -> bool:
"""
读取解压并重命名处理后的指定类型 NASADEM 数据并进行预处理, 包括读取, 裁剪, 镶嵌, 并对坡度坡向进行缩放
Parameters
----------
unzip_dir: str
解压后的 NASADEM 文件根目录
output_dir: str
输出根目录
name: str
数据类型, 包括 elevation, slope, aspect
roi: list
网格范围
clip: bool
是否裁剪
tile_id: str
网格编号
Returns
-------
process_state: bool
处理状态 True or False
"""
dem_file_list = glob.glob(os.path.join(unzip_dir, f"*{name}.hgt"))
out_tif_name = f"DEM.NASADEM.{tile_id}.2000.{name}.tif"
output_file = os.path.join(output_dir, out_tif_name)
if not os.path.isfile(output_file):
try:
dem_raster_list = []
for dem_path in dem_file_list:
dem = (
open_rasterio(dem_path)
.squeeze(dim="band", drop=True)
.rename(name)
)
if name == "slope" or name == "aspect":
org_attrs = dem.attrs
dem = dem * 0.01
# 恢复源数据属性信息
dem.attrs = org_attrs.copy()
dem.rio.write_crs("EPSG:4326", inplace=True)
dem.attrs["scale_factor"] = 1
dem_raster_list.append(dem)
if len(dem_raster_list) > 1:
if name == "slope" or name == "aspect":
dem_mosaiced = mosaic_images(dem_raster_list, nodata=-9999)
else:
dem_mosaiced = mosaic_images(dem_raster_list, nodata=-32768)
if roi is not None and clip:
dem_mosaiced = clip_image(dem_mosaiced, roi)
dem_mosaiced.rio.to_raster(output_file, driver="COG", compress="DEFLATE")
except Exception as e:
logging.error(f"Error processing files in {name}: {e}")
return False
logging.info(f"Processed {output_file} successfully.")
else:
logging.warning(f"{output_file} already exists. Skipping.")
return True
def main(region: list, asset_name: list, tile_id: str):
bbox = tuple(list(region.total_bounds))
# 示例文件名称: NASADEM_HGT_n30e113.zip
results_urls = []
output_root_dir = ".\\data\\DEM\\NASADEM"
# 放置下载的 ZIP 文件
download_dir = os.path.join(output_root_dir, "ZIP")
# 放置解压并预处理后的文件
unzip_dir = os.path.join(download_dir, "UNZIP")
output_dir = os.path.join(output_root_dir, "TIF", tile_id)
os.makedirs(unzip_dir, exist_ok=True)
results_urls_file = f"{output_root_dir}\\NASADEM_{tile_id}_results_urls.json"
if not os.path.isfile(results_urls_file):
results_urls = earthdata_search(asset_name, roi=bbox)
with open(results_urls_file, "w") as f:
json.dump(results_urls, f)
else:
results_urls = json.load(open(results_urls_file))
# 构造待解压的文件列表
zip_file_list = [os.path.join(download_dir, os.path.basename(result[0])) for result in results_urls]
# 配置日志
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s:%(asctime)s ||| %(message)s",
handlers=[
logging.StreamHandler(sys.stdout),
logging.FileHandler(f"{output_root_dir}\\NASADEM_SuPER.log"),
],
)
logging.info(f"Found {len(results_urls)} NASADEM granules.")
client = dask.distributed.Client(timeout=60, memory_limit="8GB")
client.run(setup_dask_environment)
all_start_time = time.time()
client.scatter(results_urls)
logging.info(f"Start processing NASADEM ...")
download_tasks = [
dask.delayed(download_granule)(granule_url, download_dir)
for granule_url in results_urls
]
unzip_tasks = dask.delayed(unzip_nasadem_files)(zip_file_list, unzip_dir)
process_tasks = [
dask.delayed(process_granule)(
unzip_dir, output_dir, name, region, True, tile_id
)
for name in ["elevation", "slope", "aspect"]
]
dask.compute(*download_tasks)
dask.compute(unzip_tasks)
dask.compute(*process_tasks)
client.close()
all_total_time = time.time() - all_start_time
logging.info(
f"All NASADEM Downloading complete and proccessed. Total time: {all_total_time} seconds"
)
if __name__ == "__main__":
earthaccess.login(persist=True)
# region = gpd.read_file("./data/vectors/wuling_guanqu_polygon.geojson")
tile_id = "49REL"
region = gpd.read_file(f"./data/vectors/{tile_id}.geojson")
asset_name = ["NASADEM_HGT", "NASADEM_SC"]
main(region, asset_name, tile_id)

251
GEE_Scripts/LST_download.js Normal file
View File

@ -0,0 +1,251 @@
/**
* 地表温度 (LST) 数据下载 以年平均温度处理下载为例
*
* @author CVEO Team
* @date 2026-01-16
*
* 1. 加载并合并 Landsat-8, 9 30m LST 数据
* 2. 加载 MODIS MOD11A2 1km LST 数据
* 3. 合成年度平均 LST 数据
* 4. 导出 COG 云优化并填补缺失值的 GeoTIFF 影像 (大区域 GEE 自动分块下载)
*/
// 加载研究区域和影像
// var region_name = "应城市";
// var region_name_en = "Yingcheng";
// var region = Yingcheng;
// var crs = "EPSG:4526";
var region_name = "保康县";
var region_name_en = "Baokang";
var region = Baokang;
var crs = "EPSG:4525";
var target_res = 30;
var orign_res = 1000;
var start_year = 2021;
var end_year = 2025;
var start_date = start_year + "-01-01";
var end_date = end_year + "-12-31";
var cloud_threshold = 90; // 最大云量阈值
var LansatBands = ["ST_B10"];
var MODISBands = ["LST_Day_1km"];
var commonBands = ["LST"];
// 设置研究区域的缓冲区, 确保粗分辨率数据能够完整覆盖研究区域
var region_geo = region.geometry();
var bounds = ee.Feature(region_geo.bounds()).buffer(orign_res * 3).geometry();
var common_filter = ee.Filter.and(
ee.Filter.bounds(bounds),
ee.Filter.date(start_date, end_date)
);
/**
* Applies scaling factors.
* @param {ee.Image} image Landsat SR image
* @returns {ee.Image} Landsat SR image with scaled bands
*/
function applyScaleFactors(image) {
var opticalBands = image.select("SR_B.").multiply(0.0000275).add(-0.2);
var thermalBands = image.select("ST_B.*").multiply(0.00341802).add(149.0);
return image
.addBands(opticalBands, null, true)
.addBands(thermalBands, null, true);
}
/**
* Applies scaling factors for MODIS.
* @param {ee.Image} image MODIS LST image
* @returns {ee.Image} MODIS LST image with scaled bands
*/
function applyModisScaleFactors(image) {
return image.multiply(0.02)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
/**
* 开尔文转摄氏度
* @param {ee.Image} image Landsat LST image
* @returns {ee.Image} Landsat LST image in Celsius
*/
function kelvinToCelsius(image) {
return ee.Image(image.expression("B1 - 273.15", { B1: image.select(0) }))
.rename("LST")
.copyProperties(image)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
/**
* Function to mask clouds using the Landsat QA_PIXEL and QA_RADSAT bands
* @param {ee.Image} image Landsat SR image
* @return {ee.Image} cloud masked and saturated Landsat SR image
*/
function maskLandsatclouds(image) {
var qa = image.select("QA_PIXEL");
var qa_radsat = image.select("QA_RADSAT");
// Bits 3 and 4 are cloud and cloud shadow, respectively.
var cloudBitMask = 1 << 3;
var shadowBitMask = 1 << 4;
var snowBitMask = 1 << 5;
// Both flags should be set to zero, indicating clear conditions.
var mask = qa.bitwiseAnd(cloudBitMask).eq(0)
.and(qa.bitwiseAnd(shadowBitMask).eq(0))
.and(qa.bitwiseAnd(snowBitMask).eq(0))
.and(qa_radsat.eq(0));
image = image.updateMask(mask);
return image
.copyProperties(image)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
/**
* 合成年度平均数据
* @param {ee.ImageCollection} imageCollection 数据集合
* @param {number} start_year 开始年份
* @param {number} end_year 结束年份
* @returns {ee.ImageCollection} 合成年度平均数据集合
*/
function compositeYearly(imageCollection, start_year, end_year) {
var years = ee.List.sequence(start_year, end_year);
var months = ee.List.sequence(1, 12);
var yearlyImgCol = ee.ImageCollection.fromImages(
years.map(function (y) {
return ee.ImageCollection.fromImages(
months.map(function (m) {
return imageCollection
.filter(ee.Filter.calendarRange(y, y, "year"))
.filter(ee.Filter.calendarRange(m, m, "month"))
.mean()
.clip(bounds)
.set("month", m)
.set("year", y);
})
).mean().set("year", y);
})
);
return yearlyImgCol;
}
// 加载 Landsat-8, Landsat-9 SR 数据
var L8dataset = ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUD_COVER", cloud_threshold));
print(start_date + " - " + end_date + " Landsat-8 SR dataset", L8dataset);
var L9dataset = ee.ImageCollection("LANDSAT/LC09/C02/T1_L2")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUD_COVER", cloud_threshold));
print(start_date + " - " + end_date + " Landsat-9 SR dataset", L9dataset);
// 加载 Landsat-8, Landsat-9 T2_L2 数据 (Tier 2)
var L8T2dataset = ee.ImageCollection("LANDSAT/LC08/C02/T2_L2")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUD_COVER", cloud_threshold));
print(start_date + " - " + end_date + " Landsat-8 T2_L2 SR dataset", L8T2dataset);
var L9T2dataset = ee.ImageCollection("LANDSAT/LC09/C02/T2_L2")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUD_COVER", cloud_threshold));
print(start_date + " - " + end_date + " Landsat-9 T2_L2 SR dataset", L9T2dataset);
// 合并 Landsat-8, Landsat-9 LST 数据 (包括 T1_L2 和 T2_L2)
var LSTdataset = L8dataset.merge(L9dataset)
.merge(L8T2dataset)
.merge(L9T2dataset)
.filter(ee.Filter.eq("PROCESSING_LEVEL", "L2SP"))
.map(applyScaleFactors)
.map(maskLandsatclouds)
.select(LansatBands, commonBands)
.map(kelvinToCelsius);
print(start_date + " - " + end_date + " Landsat-8,9 T1_L2 & T2_L2 LST dataset", LSTdataset);
// 加载 MODIS LST 数据
var MODISLSTdataset = ee.ImageCollection('MODIS/061/MOD11A2')
.filter(common_filter)
.select(MODISBands, commonBands)
.map(applyModisScaleFactors)
.map(kelvinToCelsius);
print(start_date + " - " + end_date + " MODIS LST dataset", MODISLSTdataset);
// 合成年度平均 Landsat LST 数据
var yearlyLST = compositeYearly(LSTdataset, start_year, end_year);
if (start_year == end_year) {
var year_str = start_year;
} else {
var year_str = start_year + "-" + end_year;
}
print(year_str + " Annual Mean Landsat LST dataset", yearlyLST);
// 合成年度平均 MODIS LST 数据
var yearlyMODISLST = compositeYearly(MODISLSTdataset, start_year, end_year);
print(year_str + " Annual Mean MODIS LST dataset", yearlyMODISLST);
// 合成处理后会丢失投影信息, 需要重新设置投影
var lst_proj = LSTdataset.first().select(0).projection();
var modis_proj = MODISLSTdataset.first().select(0).projection();
var total_mean_LST = LSTdataset.select("LST").mean().setDefaultProjection(lst_proj);
print(year_str + " Total Year Mean Landsat LST Histogram", ui.Chart.image.histogram(total_mean_LST, region, 100, 258));
var annual_mean_LST = yearlyLST.select("LST").mean().setDefaultProjection(lst_proj);
print(year_str + " Annual Mean Landsat LST Histogram", ui.Chart.image.histogram(annual_mean_LST, region, 100, 258));
var total_mean_MODIS_LST = MODISLSTdataset.select("LST").mean().setDefaultProjection(modis_proj);
print(year_str + " Total Year Mean MODIS LST Histogram", ui.Chart.image.histogram(total_mean_MODIS_LST, region, 1000, 258));
var annual_mean_MODIS_LST = yearlyMODISLST.select("LST").mean().setDefaultProjection(modis_proj);
print(year_str + " Annual Mean MODIS LST Histogram", ui.Chart.image.histogram(annual_mean_MODIS_LST, region, 1000, 258));
var annual_mean_MODIS_LST_30m = annual_mean_MODIS_LST.resample("bicubic").reproject({
crs: "EPSG:4326",
scale: target_res,
});
var lst_vis = {
min: 5,
max: 35,
palette: [
'040274', '040281', '0502a3', '0502b8', '0502ce', '0502e6',
'0602ff', '235cb1', '307ef3', '269db1', '30c8e2', '32d3ef',
'3be285', '3ff38f', '86e26f', '3ae237', 'b5e22e', 'd6e21f',
'fff705', 'ffd611', 'ffb613', 'ff8b13', 'ff6e08', 'ff500d',
'ff0000', 'de0101', 'c21301', 'a71001', '911003'
],
};
var styling = {
color: "blue",
fillColor: "00000000",
};
Map.centerObject(region, 10);
Map.addLayer(total_mean_MODIS_LST, lst_vis, year_str + " MODIS Total Year Mean LST");
Map.addLayer(annual_mean_MODIS_LST, lst_vis, year_str + " MODIS Annual Mean LST");
Map.addLayer(annual_mean_MODIS_LST_30m, lst_vis, year_str + " MODIS Annual Mean LST 30m");
Map.addLayer(total_mean_LST, lst_vis, year_str + " Landsat Total Year Mean LST");
Map.addLayer(annual_mean_LST, lst_vis, year_str + " Landsat Annual Mean LST");
Map.addLayer(region.style(styling), {}, region_name);
// 导出合并后的影像
// 明确设置数据类型为Float32, 否则默认类型为Float64, 会占用更多内存
// 并对缺失值进行填充, 否则默认为nan不便于后续本地处理
function exportCOG(image, description, folder, region, scale, crs) {
image = image.toFloat().unmask(-9999.0);
// 默认为 EPSG:4326 投影
crs = crs || "EPSG:4326";
Export.image.toDrive({
image: image,
description: description,
folder: folder,
region: region, // 添加后会自动裁剪
scale: scale,
crs: crs,
maxPixels: 1e13, // GEE 最多支持 1e8 像素, 当超过时会自动分块
fileFormat: "GeoTIFF",
// 导出COG云优化的GeoTIFF影像, 并明确设置缺失值为-9999.0
formatOptions: {
cloudOptimized: true,
noData: -9999.0,
},
});
}
print("Start exporting " + year_str + "Yearly Mean LST image (" + crs + ")", annual_mean_LST);
exportCOG(annual_mean_LST, region_name_en + "_Landsat_LST_" + year_str + "_30m", "LST", region, 30, crs);
print("Start exporting " + year_str + " MODIS Yearly Mean LST image (" + crs + ")", annual_mean_MODIS_LST);
exportCOG(annual_mean_MODIS_LST, region_name_en + "_MODIS_LST_" + year_str + "_1km", "LST", region, 1000, crs);

View File

@ -0,0 +1,267 @@
/**
* 土地覆盖分类数据下载
*
* @author CVEO Team
* @date 2025-10-27
*
* 1. 加载 ESA WorldCover, Dynamic World 等公开土地覆盖数据
* - ESA WorldCover 2021 V200 版本: 11 类地物
* - Dynamic World V1 版本: 9 类地物
* 2. 加载 Sentinel-2 数据
* 3. 导出 COG 云优化并填补缺失值的 GeoTIFF 影像 (大区域 GEE 自动分块下载)
*/
// 加载研究区域和影像
// var region_name = "武汉市";
// var region_name_en = "Wuhan";
// var region = allCites.filter(ee.Filter.eq("市", region_name));
var region_name = "49REL";
var region_name_en = region_name;
var region = grid;
var year = 2025;
var start_date = year + "-01-01";
var end_date = year + "-12-31";
var target_month = 5; // 指定月份
var cloud_threshold = 60;
// Use 'cs' or 'cs_cdf', depending on your use case; see docs for guidance.
var QA_BAND = "cs_cdf";
// The threshold for masking; values between 0.50 and 0.65 generally work well.
// Higher values will remove thin clouds, haze & cirrus shadows.
var CLEAR_THRESHOLD = 0.7;
// Sentinel-2的B8A波段(20m)可能更适合代表近红外波段
// var s2Bands = ["B2", "B3", "B4", "B8", "B11", "B12"];
var s2Bands = ["B2", "B3", "B4", "B8A", "B11", "B12"];
var commonBands = ["Blue", "Green", "Red", "NIR", "SWIR1", "SWIR2"];
var bounds = region.geometry().bounds();
var common_filter = ee.Filter.and(
ee.Filter.bounds(bounds),
ee.Filter.date(start_date, end_date)
);
/**
* 基于 Cloud Score+ 云掩膜
* @param {ee.Image} image Sentinel-2 image
* @return {ee.Image} cloud masked Sentinel-2 image
*/
function maskS2cloudsByCS(image) {
var cloudMask = image.select(QA_BAND).gte(CLEAR_THRESHOLD);
image = image.updateMask(cloudMask);
return image
.copyProperties(image)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
/**
* 按照属性值对影像集合进行排序
* @param {ee.ImageCollection} collection 影像集合
* @param {ee.Geometry} region 目标区域
* @return {ee.ImageCollection} 排序后的影像集合
*/
function sortedByPriority(collection, region) {
// 1. 计算每张影像的有效像素 (非空像素) 覆盖面积
var withArea = collection.map(function (image) {
var area = image
.mask()
.reduceRegion({
reducer: ee.Reducer.sum(),
geometry: region,
scale: 10,
maxPixels: 1e9,
})
.values()
.get(0);
return image.set("coverage_area", area);
});
// 2. 剔除覆盖面积为 0 的影像
withArea = withArea.filter(ee.Filter.gt("coverage_area", 0));
// 3. 根据云量 (降序) 和覆盖率 (升序) 对影像进行综合排序
// 考虑影像集合是以栈的形式 (先入后出) 组织的, 在进行 mosaic 合成时, 从最后一张影像开始合并, 逆转排序
// sort 默认升序 True
var sorted = withArea
.sort("CLOUDY_PIXEL_PERCENTAGE", false)
.sort("coverage_area");
return ee.ImageCollection(sorted);
}
// 加载Sentinel-2 L2A数据
// Cloud Score+ image collection.
// Note Cloud Score+ is produced from Sentinel-2 level 1C data and can be applied to either L1C or L2A collections.
var S2csPlus = ee.ImageCollection("GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED");
var S2dataset = ee
.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", cloud_threshold))
.select(s2Bands, commonBands)
.linkCollection(S2csPlus, [QA_BAND])
.map(maskS2cloudsByCS);
var target_s2 = sortedByPriority(
S2dataset.filter(
ee.Filter.calendarRange(target_month, target_month, "month")
),
region
);
var s2_img = ee.Image(target_s2.median());
// 加载ESA WorldCover数据
var ESA_worldcover = ee.ImageCollection("ESA/WorldCover/v200").first();
var wc_img = ee.Image(ESA_worldcover.clip(region));
var wc_class_names = {
"Tree cover": 10,
Shrubland: 20,
Grassland: 30,
Cropland: 40,
"Built-up": 50,
"Bare / sparse vegetation": 60,
"Snow and ice": 70,
"Permanent water bodies": 80,
"Herbaceous wetland": 90,
Mangroves: 95,
"Moss and lichen": 100,
};
// 统计各类别覆盖面积
var wc_class_area = wc_img
.reduceRegion({
reducer: ee.Reducer.frequencyHistogram(),
geometry: region,
scale: 10,
maxPixels: 1e13,
})
.get("Map");
print("ESA WorldCover 2021 v200");
print("各类别名称:", wc_class_names);
print("各类别覆盖面积:", wc_class_area);
// 加载Dynamic World数据
var DWdataset = ee
.ImageCollection("GOOGLE/DYNAMICWORLD/V1")
.filter(common_filter)
.select("label");
// mode() 取出现次数最多的类别作为合成后的类别
var dw_img = ee.Image(DWdataset.mode().clip(region));
var dw_class_names = {
water: 0,
trees: 1,
grass: 2,
flooded_vegetation: 3,
crops: 4,
shrub_and_scrub: 5,
built: 6,
bare: 7,
snow_and_ice: 8,
};
var built_img = dw_img.updateMask(dw_img.eq(dw_class_names.built));
var crop_img = dw_img.updateMask(dw_img.eq(dw_class_names.crops));
var flood_img = dw_img.updateMask(dw_img.eq(dw_class_names.flooded_vegetation));
// 统计各类别覆盖面积
var dw_class_area = dw_img
.reduceRegion({
reducer: ee.Reducer.frequencyHistogram(),
geometry: region,
scale: 10,
maxPixels: 1e13,
})
.get("label");
print("Dynamic World v1");
print("各类别名称:", dw_class_names);
print("各类别覆盖面积:", dw_class_area);
var VIS_PALETTE = [
"419bdf",
"397d49",
"88b053",
"7a87c6",
"e49635",
"dfc35a",
"c4281b",
"a59b8f",
"b39fe1",
];
var DW_vis = {
min: 0,
max: 8,
palette: VIS_PALETTE,
};
var true_rgb_vis = {
bands: ["Red", "Green", "Blue"],
min: 0.0,
max: 3000,
};
var false_rgb_vis = {
min: 0.0,
max: 4000,
gamma: 1.4,
bands: ["NIR", "Red", "Green"],
};
var styling = {
color: "red",
fillColor: "00000000",
};
var WC_vis = {
bands: ["Map"],
};
Map.centerObject(region, 9);
Map.addLayer(region.style(styling), {}, region_name);
Map.addLayer(wc_img, WC_vis, "ESA 2021 Landcover");
Map.addLayer(dw_img, DW_vis, "Dynamic World LULC");
Map.addLayer(
s2_img,
true_rgb_vis,
year + "-" + target_month + " Sentinel-2 True RGB",
false
);
Map.addLayer(
s2_img,
false_rgb_vis,
year + "-" + target_month + " Sentinel-2 False RGB",
false
);
// Map.addLayer(built_img, DW_vis, 'Built DW LULC');
// Map.addLayer(crop_img, DW_vis, "Crop DW LULC");
// Map.addLayer(flood_img, DW_vis, "Flood DW LULC");
// 导出裁剪处理后的土地覆盖数据
var fileName = region_name_en + "_ESA_WorldCover_2021_v200";
// 区域过大, 需要先将 wc_img 划分为四块网格分开下载
var bounds = region.geometry().bounds();
var coords = ee.List(bounds.coordinates().get(0));
var minX = ee.Number(ee.List(coords.get(0)).get(0));
var minY = ee.Number(ee.List(coords.get(0)).get(1));
var maxX = ee.Number(ee.List(coords.get(2)).get(0));
var maxY = ee.Number(ee.List(coords.get(2)).get(1));
var midX = minX.add(maxX).divide(2);
var midY = minY.add(maxY).divide(2);
var grid1 = ee.Geometry.Rectangle([minX, minY, midX, midY]);
var grid2 = ee.Geometry.Rectangle([midX, minY, maxX, midY]);
var grid3 = ee.Geometry.Rectangle([minX, midY, midX, maxY]);
var grid4 = ee.Geometry.Rectangle([midX, midY, maxX, maxY]);
var grids = [grid1, grid2, grid3, grid4];
for (var i = 0; i < grids.length; i++) {
var grid_region = grids[i];
var clipped_img = wc_img.clip(grid_region);
var grid_fileName = fileName + "_grid_" + (i + 1);
Export.image.toDrive({
image: clipped_img,
description: grid_fileName,
folder: "LULC",
region: grid_region,
scale: 10,
maxPixels: 1e13, // GEE 最多支持 1e8 像素, 当超过时会自动分块
fileFormat: "GeoTIFF",
// 导出COG云优化的GeoTIFF影像
formatOptions: {
cloudOptimized: true,
},
});
}

View File

@ -0,0 +1,289 @@
/**
* Sentinel-1 & Sentinel-2 哨兵一号与二号长时序数据下载 适用于小范围区域年度数据获取
*
* @author CVEO Team
* @date 2026-01-07
*
* 1. 加载 Sentinel-1 GRD, Sentinel-2 L2A SR数据与 Cloud Score+ 云掩膜, 以及 HLS 数据
* 2. 合成年度Sentinel-1, Sentinel-2, HLS无云影像
* 3. 使用 Sentinel-2 L1C TOA 影像与 HLS 影像作为 Sentinel-2 L2A SR 影像的补充
* 4. 合并 Sentinel-1 Sentinel-2 影像
* 5. 导出 COG 云优化并填补缺失值的 GeoTIFF 影像 (小区域不分块)
*
* : 截止至 2026-01-07, GEE 仍未集成 2015-2017 年的 Sentinel-2 L2A SR数据, 2018 年的 L2A 数据仍不完整.
*/
// 加载研究区域和影像
var name_list = [
"台北中山区大直要塞区",
"花莲县新城乡佳山基地",
"高雄左营区左营军港",
"高雄旗山区陆军第八军团指挥部",
];
var target_index = 1; // 从 0 开始计数, 0-3
var region_name = name_list[target_index];
var region = ee.FeatureCollection(
demoPoints
.filter(ee.Filter.eq("Name", region_name))
.geometry()
.buffer(3000)
.bounds()
);
var start_year = 2015;
var end_year = 2025;
var start_date = start_year + "-" + "01-01";
var end_date = start_year + "-" + "12-31";
var cloud_threshold = 100; // 最大云量阈值
// Use 'cs' or 'cs_cdf', depending on your use case; see docs for guidance.
var QA_BAND = "cs_cdf";
// The threshold for masking; values between 0.50 and 0.65 generally work well.
// Higher values will remove thin clouds, haze & cirrus shadows.
// 内陆建议设置为 0.8, 沿海或大江大湖旁建议设置为 0.6
var CLEAR_THRESHOLD = 0.8;
var L30Bands = ["B2", "B3", "B4", "B5", "B6", "B7"];
// Sentinel-2的B8A波段(20m)可能更适合代表近红外波段
// var s2Bands = ["B2", "B3", "B4", "B8", "B11", "B12"];
var s2Bands = ["B2", "B3", "B4", "B8A", "B11", "B12"];
var commonBands = ["Blue", "Green", "Red", "NIR", "SWIR1", "SWIR2"];
var region_geo = region.geometry();
var bounds = region_geo.bounds();
var common_filter = ee.Filter.and(
ee.Filter.bounds(bounds),
ee.Filter.date(start_date, end_date)
);
/**
* 基于 Cloud Score+ 云掩膜
* @param {ee.Image} image Sentinel-2 image
* @return {ee.Image} cloud masked Sentinel-2 image
*/
function maskS2cloudsByCS(image) {
var cloudMask = image.select(QA_BAND).gte(CLEAR_THRESHOLD);
image = image.updateMask(cloudMask);
return image
.divide(10000)
.copyProperties(image)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
/**
* Function to mask clouds using the HLS Fmask band
* @param {ee.Image} image HLS image
* @return {ee.Image} cloud masked HLS image
*/
function maskHLSclouds(image) {
var qa = image.select("Fmask");
// Bits 1 and 2 are cloud and shadow, respectively.
var cloudBitMask = 1 << 1;
var shadowBitMask = 1 << 2;
// Both flags should be set to zero, indicating clear conditions.
var mask = qa
.bitwiseAnd(cloudBitMask)
.eq(0)
.and(qa.bitwiseAnd(shadowBitMask).eq(0));
image = image.updateMask(mask);
return image
.copyProperties(image)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
/**
* 按照属性值对影像集合进行排序
* @param {ee.ImageCollection} collection 影像集合
* @param {ee.Geometry} region 目标区域
* @param {string} cloud_field 云量属性字段名, 默认为 "CLOUDY_PIXEL_PERCENTAGE"
* @return {ee.ImageCollection} 排序后的影像集合
*/
function sortedByPriority(collection, region, cloud_field) {
cloud_field = cloud_field || "CLOUDY_PIXEL_PERCENTAGE";
// 1. 计算每张影像的有效像素 (非空像素) 覆盖面积
var withArea = collection.map(function (image) {
var area = image
.mask()
.reduceRegion({
reducer: ee.Reducer.sum(),
geometry: region,
scale: 10,
maxPixels: 1e9,
})
.values()
.get(0);
return image.set("coverage_area", area);
});
// 2. 剔除覆盖面积为 0 的影像
withArea = withArea.filter(ee.Filter.gt("coverage_area", 0));
// 3. 根据云量 (降序) 和覆盖率 (升序) 对影像进行综合排序
// 考虑影像集合是以栈的形式 (先入后出) 组织的, 在进行 mosaic 合成时, 从最后一张影像开始合并, 逆转排序
// sort 默认升序 True
var sorted = withArea.sort(cloud_field, false).sort("coverage_area");
return ee.ImageCollection(sorted);
}
// 加载HLS L30和L30数据
var HLSL30 = ee
.ImageCollection("NASA/HLS/HLSL30/v002")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUD_COVERAGE", cloud_threshold))
.map(maskHLSclouds)
.select(L30Bands, commonBands);
var HLSS30 = ee
.ImageCollection("NASA/HLS/HLSS30/v002")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUD_COVERAGE", cloud_threshold))
.map(maskHLSclouds)
.select(s2Bands, commonBands);
var HLSdataset = ee.ImageCollection(HLSL30.merge(HLSS30));
print(start_date + " - " + end_date + " HLS dataset", HLSdataset);
var targetHLSCol = sortedByPriority(HLSdataset, region, "CLOUD_COVERAGE");
// Mosaic 处理后会丢失投影信息, 需重新设置投影
var proj = targetHLSCol.first().select(1).projection();
var hls_img = ee.Image(targetHLSCol.mosaic()).setDefaultProjection(proj);
var hls_10m_img = hls_img.resample("bicubic").reproject({
crs: proj,
scale: 10,
});
// 加载Sentinel-1 GRD数据
var S1dataset = ee
.ImageCollection("COPERNICUS/S1_GRD")
.filter(common_filter)
.filter(ee.Filter.eq("instrumentMode", "IW"));
var S1VVdataset = S1dataset.filter(
ee.Filter.listContains("transmitterReceiverPolarisation", "VV")
).select("VV");
var S1VHdataset = S1dataset.filter(
ee.Filter.listContains("transmitterReceiverPolarisation", "VH")
).select("VH");
print(start_date + " - " + end_date + " Sentinel-1 dataset", S1dataset);
var s1_vv_img = ee.Image(S1VVdataset.median());
var s1_vh_img = ee.Image(S1VHdataset.median());
// 加载Sentinel-2 L2A数据与Cloud Score+云掩膜
// Cloud Score+ image collection.
// Note Cloud Score+ is produced from Sentinel-2 level 1C data and can be applied to either L1C or L2A collections.
// 加载Sentinel-2 L2A数据
// Cloud Score+ image collection.
// Note Cloud Score+ is produced from Sentinel-2 level 1C data and can be applied to either L1C or L2A collections.
var S2csPlus = ee.ImageCollection("GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED");
var S2TOAdataset = ee
.ImageCollection("COPERNICUS/S2_HARMONIZED")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", cloud_threshold))
.linkCollection(S2csPlus, [QA_BAND])
.map(maskS2cloudsByCS)
.select(s2Bands, commonBands);
var targetS2TOACol = sortedByPriority(S2TOAdataset, region);
var s2_toa_img = ee.Image(targetS2TOACol.mosaic());
// 加载SIAC工具集, 容易爆内存
// var SIAC = require("users/marcyinfeng/utils:SIAC");
// var s2_boa_img = SIAC.get_sur(s2_toa_img.clip(region));
print(start_date + " - " + end_date + " Sentinel-2 TOA dataset", S2TOAdataset);
var S2SRdataset = ee
.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", cloud_threshold))
.linkCollection(S2csPlus, [QA_BAND])
.map(maskS2cloudsByCS)
.select(s2Bands, commonBands);
var targetS2SRCol = sortedByPriority(S2SRdataset, region);
var s2_sr_img = ee.Image(targetS2SRCol.mosaic());
print(start_date + " - " + end_date + " Sentinel-2 SR dataset", S2SRdataset);
var s2_img = ee.Image();
// 2015-2018 年间若 Sentinel-2 SR 数据为空, 则使用 Sentinel-2 TOA 数据
// 若仍然没有, 则使用重采样到 10m 的 HLS 数据
if (S2SRdataset.size().getInfo() === 0 && S2TOAdataset.size().getInfo() !== 0) {
s2_img = s2_toa_img;
} else if (S2TOAdataset.size().getInfo() === 0 || start_year === 2015) {
s2_img = hls_10m_img;
} else {
s2_img = s2_sr_img;
}
// 针对试点区域的特殊设置
if (start_year === 2018 && (target_index === 0 || target_index === 3)) {
s2_img = s2_toa_img;
}
if (start_year === 2016 && target_index === 1) {
s2_img = hls_10m_img;
}
var s2_img_rgb = s2_img.select(["Red", "Green", "Blue"]);
var s2_img_frgb = s2_img.select(["NIR", "Red", "Green"]);
var s1_vis = {
min: -25,
max: 5,
};
var s2_vis = {
bands: ["Red", "Green", "Blue"],
min: 0.0,
max: 0.3,
};
var true_rgb_vis = {
bands: ["Red", "Green", "Blue"],
min: 0.0,
max: 0.2,
};
var false_rgb_vis = {
min: 0.0,
max: 0.4,
gamma: 1.4,
bands: ["NIR", "Red", "Green"],
};
var styling = {
color: "blue",
fillColor: "00000000",
};
Map.centerObject(region, 14);
Map.addLayer(s1_vv_img, s1_vis, start_year + " Sentinel-1 GRD VV", false);
Map.addLayer(s1_vh_img, s1_vis, start_year + " Sentinel-1 GRD VH", false);
Map.addLayer(hls_img, true_rgb_vis, start_year + " HLS True RGB 30m");
Map.addLayer(hls_10m_img, true_rgb_vis, start_year + " HLS True RGB 10m");
Map.addLayer(
s2_img_frgb,
false_rgb_vis,
start_year + " Sentinel-2 False RGB",
false
);
Map.addLayer(s2_img_rgb, s2_vis, start_year + " Sentinel-2 True RGB");
Map.addLayer(region.style(styling), {}, region_name);
// 导出合并后的影像
// 明确设置数据类型为Float32, 否则默认类型为Float64, 会占用更多内存
// 并对缺失值进行填充, 否则默认为nan不便于后续本地处理
var processed_img = s2_img
.addBands(s1_vv_img)
.addBands(s1_vh_img)
.toFloat()
.unmask(-9999.0);
print(
"Start exporting " + start_year + " Sentinel-2 and S1 GRD VV/VH merged image",
processed_img
);
target_index = target_index + 1;
var target_name = "Target" + target_index;
Export.image.toDrive({
image: processed_img,
description: "S1_S2_" + target_name + "_" + start_year,
folder: "Sentinel",
region: region, // 添加后会自动裁剪
scale: 10,
crs: "EPSG:4326",
maxPixels: 1e13, // GEE 最多支持 1e8 像素, 当超过时会自动分块
fileFormat: "GeoTIFF",
// 导出COG云优化的GeoTIFF影像, 并明确设置缺失值为-9999.0
formatOptions: {
cloudOptimized: true,
noData: -9999.0,
},
});

View File

@ -0,0 +1,74 @@
/**
* Sentinel-1 哨兵一号数据下载 以下载 2023-2024 年云南省边境影像为例
*
* @author CVEO Team
* @date 2025-12-18
*
* 1. 加载 Sentinel-1 数据
* 2. 导出COG云优化并填补缺失值的GeoTIFF影像分块
*/
// 加载研究区域和影像
// var region_name = "德宏傣族景颇族自治州"; // 瑞丽县所在
var region_name = "保山市"; // 德宏州东侧
var region_name_en = "baoshan";
var region = allCites.filter(ee.Filter.eq("市", region_name));
var yunnan = allCites.filter(ee.Filter.eq("省", "云南省"));
var year = 2023;
var start_date = year + "-04-09";
var end_date = year + "-04-11";
var region_geo = region.geometry();
var bounds = region_geo.bounds();
var common_filter = ee.Filter.and(
ee.Filter.bounds(bounds),
ee.Filter.date(start_date, end_date)
);
// 加载Sentinel-1 GRD数据
var S1dataset = ee
.ImageCollection("COPERNICUS/S1_GRD")
.filter(common_filter)
.filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
.filter(ee.Filter.eq('instrumentMode', 'IW'))
.select('VV');
print(start_date + " - " + end_date + " S1dataset", S1dataset);
var s1_img = ee.Image(S1dataset.median());
var s1_vis = {
min: -25,
max: 5
};
var styling1 = {
color: "blue",
fillColor: "00000000",
};
var styling2 = {
color: "red",
fillColor: "00000000",
};
Map.centerObject(region, 7);
Map.addLayer(s1_img, s1_vis, start_date + " Sentinel-1 GRD");
Map.addLayer(yunnan.style(styling2), {}, "云南省");
Map.addLayer(region.style(styling1), {}, region_name);
// 导出 Sentinel-1 影像
// 明确设置数据类型为Float32, 否则默认类型为Float64, 会占用更多内存
// 并对缺失值进行填充, 否则默认为nan不便于后续本地处理
var processed_img = S1dataset.first().toFloat().unmask(-9999.0);
Export.image.toDrive({
image: processed_img,
description: "S1A_IW_GRDH_1SDV_" + start_date,
folder: "Sentinel-1",
// region: region,
scale: 10,
maxPixels: 1e13, // GEE 最多支持 1e8 像素, 当超过时会自动分块
fileFormat: "GeoTIFF",
// 导出COG云优化的GeoTIFF影像, 并明确设置缺失值为-9999.0
formatOptions: {
cloudOptimized: true,
noData: -9999.0,
},
});

View File

@ -0,0 +1,129 @@
/**
* Sentinel-2 哨兵二号数据下载 以下载 2025 年湖北省无云RGB三波段影像为例
*
* GEE 最新代码: https://code.earthengine.google.com/15b61c853c281deb3ef8fc416224c405
*
* @author CVEO Team
* @date 2025-12-08
*
* 1. 加载 Sentinel-2 数据与 Cloud Score+ 云掩膜
* 2. 合成年度无云影像
* 3. 导出COG云优化并填补缺失值的GeoTIFF影像分块
*/
// 加载研究区域和影像
var region_name = "湖北省";
var region_name_en = "Hubei";
var region = allCites.filter(ee.Filter.eq("省", region_name));
var year = 2025;
var start_date = year + "-01-01";
var end_date = year + "-12-31";
var cloud_threshold = 5; // 5% 云量
// Use 'cs' or 'cs_cdf', depending on your use case; see docs for guidance.
var QA_BAND = "cs_cdf";
// The threshold for masking; values between 0.50 and 0.65 generally work well.
// Higher values will remove thin clouds, haze & cirrus shadows.
var CLEAR_THRESHOLD = 0.8;
// Sentinel-2的B8A波段(20m)可能更适合代表近红外波段
// var s2Bands = ["B2", "B3", "B4", "B8", "B11", "B12"];
var s2Bands = ["B2", "B3", "B4", "B8A", "B11", "B12"];
var commonBands = ["Blue", "Green", "Red", "NIR", "SWIR1", "SWIR2"];
var region_geo = region.geometry();
var bounds = region_geo.bounds();
var common_filter = ee.Filter.and(
ee.Filter.bounds(bounds),
ee.Filter.date(start_date, end_date)
);
/**
* 基于 Cloud Score+ 云掩膜
* @param {ee.Image} image Sentinel-2 image
* @return {ee.Image} cloud masked Sentinel-2 image
*/
function maskS2cloudsByCS(image) {
var cloudMask = image.select(QA_BAND).gte(CLEAR_THRESHOLD);
image = image.updateMask(cloudMask);
return image
.copyProperties(image)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
// 加载Sentinel-2 L2A数据与Cloud Score+云掩膜
// Cloud Score+ image collection.
// Note Cloud Score+ is produced from Sentinel-2 level 1C data and can be applied to either L1C or L2A collections.
var S2csPlus = ee.ImageCollection("GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED");
var S2dataset = ee
.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
.filter(common_filter)
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", cloud_threshold))
.select(s2Bands, commonBands)
.linkCollection(S2csPlus, [QA_BAND])
.map(maskS2cloudsByCS);
var s2_img = ee.Image(S2dataset.median());
var s2_img_rgb = s2_img.select(["Red", "Green", "Blue"]);
var s2_img_frgb = s2_img.select(["NIR", "Red", "Green"]);
var true_rgb_vis = {
bands: ["Red", "Green", "Blue"],
min: 0.0,
max: 3000,
};
var false_rgb_vis = {
min: 0.0,
max: 4000,
gamma: 1.4,
bands: ["NIR", "Red", "Green"],
};
var styling = {
color: "red",
fillColor: "00000000",
};
Map.centerObject(region, 7);
Map.addLayer(s2_img_frgb, false_rgb_vis, year + " Sentinel-2 False RGB", false);
Map.addLayer(s2_img_rgb, true_rgb_vis, year + " Sentinel-2 True RGB");
Map.addLayer(region.style(styling), {}, region_name);
// 导出裁剪处理后的Sentinel-2 RGB影像
var fileName = region_name_en + "_Sentinel-2_" + year;
// 区域过大, 需要先将 s2_img 划分为四块网格分开下载
var coords = ee.List(bounds.coordinates().get(0));
var minX = ee.Number(ee.List(coords.get(0)).get(0));
var minY = ee.Number(ee.List(coords.get(0)).get(1));
var maxX = ee.Number(ee.List(coords.get(2)).get(0));
var maxY = ee.Number(ee.List(coords.get(2)).get(1));
var midX = minX.add(maxX).divide(2);
var midY = minY.add(maxY).divide(2);
var grid1 = ee.Geometry.Rectangle([minX, minY, midX, midY]);
var grid2 = ee.Geometry.Rectangle([midX, minY, maxX, midY]);
var grid3 = ee.Geometry.Rectangle([minX, midY, midX, maxY]);
var grid4 = ee.Geometry.Rectangle([midX, midY, maxX, maxY]);
var grids = [grid1, grid2, grid3, grid4];
for (var i = 0; i < grids.length; i++) {
var grid_region = grids[i];
// 明确设置数据类型为Float32, 否则默认类型为Float64, 会占用更多内存
// 并对缺失值进行填充, 否则默认为nan不便于后续本地处理
var clipped_img = s2_img_rgb.clip(grid_region).toFloat().unmask(-9999.0);
var grid_fileName = fileName + "_grid_" + (i + 1);
Export.image.toDrive({
image: clipped_img,
description: grid_fileName,
folder: "Sentinel-2",
region: grid_region,
scale: 10,
maxPixels: 1e13, // GEE 最多支持 1e8 像素, 当超过时会自动分块
fileFormat: "GeoTIFF",
// 导出COG云优化的GeoTIFF影像, 并明确设置缺失值为-9999.0
formatOptions: {
cloudOptimized: true,
noData: -9999.0,
},
});
}

View File

@ -0,0 +1,171 @@
/**
* 气象数据下载 以年平均气温与年总降雨量处理下载为例
*
* @author CVEO Team
* @date 2026-01-16
*
* 1. 加载 ERA5-Land 数据
* 2. 合成年度平均气温数据, 年度总降雨量数据
* 3. 导出 COG 云优化并填补缺失值的 GeoTIFF 影像 (大区域 GEE 自动分块下载)
*/
// 加载研究区域和影像
// var region_name = "应城市";
// var region_name_en = "Yingcheng";
// var region = Yingcheng;
// var target_crs = "EPSG:4526";
var region_name = "保康县";
var region_name_en = "Baokang";
var region = Baokang;
var target_crs = "EPSG:4525";
var target_res = 30;
var orign_res = 10000;
var start_year = 2021;
var end_year = 2025;
var start_date = start_year + "-01-01";
var end_date = end_year + "-12-31";
var ERA5Bands = ["temperature_2m", "total_precipitation_sum"];
var commonBands = ["temperature", "total_precipitation_sum"];
// 设置研究区域的缓冲区, 确保粗分辨率数据能够完整覆盖研究区域
var region_geo = region.geometry();
var bounds = ee.Feature(region_geo.bounds()).buffer(orign_res * 3).geometry();
var common_filter = ee.Filter.and(
ee.Filter.bounds(bounds),
ee.Filter.date(start_date, end_date)
);
/**
* 开尔文转摄氏度
* @param {ee.Image} image Landsat LST image
* @returns {ee.Image} Landsat LST image in Celsius
*/
function kelvinToCelsius(image) {
return ee.Image(image.expression("B1 - 273.15", { B1: image.select("temperature") }))
.copyProperties(image)
.copyProperties(image, ["system:time_start", "system:index", "system:id"]);
}
// 加载 ERA5-Land 数据
var ERA5dataset = ee.ImageCollection("ECMWF/ERA5_LAND/MONTHLY_AGGR")
.filter(common_filter)
.select(ERA5Bands, commonBands);
print(start_date + " - " + end_date + " ERA5-Land dataset", ERA5dataset);
var ERA5Tempdataset = ERA5dataset.select("temperature").map(kelvinToCelsius);
var ERA5Raindataset = ERA5dataset.select("total_precipitation_sum");
// 合成年度平均 ERA5-Land 气温数据
var years = ee.List.sequence(start_year, end_year);
var yearlyTemp = ee.ImageCollection.fromImages(
years.map(function (y) {
return ERA5Tempdataset.filter(ee.Filter.calendarRange(y, y, "year"))
.mean()
.set("year", y);
})
);
var yearlyRain = ee.ImageCollection.fromImages(
years.map(function (y) {
return ERA5Raindataset.filter(ee.Filter.calendarRange(y, y, "year"))
.sum()
.set("year", y);
})
);
if (start_year == end_year) {
var year_str = start_year;
} else {
var year_str = start_year + "-" + end_year;
}
print(year_str + " Annual Mean ERA5-Land Temperature dataset", yearlyTemp);
print(year_str + " Annual Mean ERA5-Land Precipitation dataset", yearlyRain);
// 合成处理后会丢失投影信息, 需要重新设置投影
var proj = ERA5dataset.first().select(1).projection();
var annual_mean_temperature = ee.Image(yearlyTemp.mean()).setDefaultProjection(proj);
print(year_str + " Annual Mean ERA5-Land Temperature Histogram", ui.Chart.image.histogram(annual_mean_temperature, region, 10000, 258));
var annual_mean_precipitation = ee.Image(yearlyRain.mean()).setDefaultProjection(proj);
print(year_str + " Annual Mean ERA5-Land Precipitation Histogram", ui.Chart.image.histogram(annual_mean_precipitation, region, 10000, 258));
var annual_mean_temp_30m = annual_mean_temperature.clip(bounds).resample("bicubic").reproject({
crs: "EPSG:4326",
scale: target_res,
});
var annual_mean_precip_30m = annual_mean_precipitation.clip(bounds).resample("bicubic").reproject({
crs: "EPSG:4326",
scale: target_res,
});
var temperature_vis = {
min: -20,
max: 30,
palette: [
'040274', '040281', '0502a3', '0502b8', '0502ce', '0502e6',
'0602ff', '235cb1', '307ef3', '269db1', '30c8e2', '32d3ef',
'3be285', '3ff38f', '86e26f', '3ae237', 'b5e22e', 'd6e21f',
'fff705', 'ffd611', 'ffb613', 'ff8b13', 'ff6e08', 'ff500d',
'ff0000', 'de0101', 'c21301', 'a71001', '911003'
],
};
var precipitation_vis = {
min: 0,
max: 2,
palette: [
'000096', '0064ff', '00b4ff', '33db80', '9beb4a',
'ffeb00', 'ffb300', 'ff6400', 'eb1e00', 'af0000'
],
};
var styling = {
color: "blue",
fillColor: "00000000",
};
Map.centerObject(region, 9);
Map.addLayer(annual_mean_temperature, temperature_vis, year_str + " Annual Mean Temperature 0.1°");
Map.addLayer(annual_mean_precipitation, precipitation_vis, year_str + " Annual Mean Precipitation 0.1°");
Map.addLayer(annual_mean_temp_30m, temperature_vis, year_str + " Annual Mean Temperature " + target_res + "m");
Map.addLayer(annual_mean_precip_30m, precipitation_vis, year_str + " Annual Mean Precipitation " + target_res + "m");
Map.addLayer(region.style(styling), {}, region_name);
// 导出合并后的影像
// 明确设置数据类型为Float32, 否则默认类型为Float64, 会占用更多内存
// 并对缺失值进行填充, 否则默认为nan不便于后续本地处理
function exportCOG(image, description, folder, region, scale, crs) {
image = image.toFloat().unmask(-9999.0);
// 默认为 EPSG:4326 投影
crs = crs || "EPSG:4326";
Export.image.toDrive({
image: image,
description: description,
folder: folder,
region: region, // 添加后会自动裁剪
scale: scale,
crs: crs,
maxPixels: 1e13, // GEE 最多支持 1e8 像素, 当超过时会自动分块
fileFormat: "GeoTIFF",
// 导出COG云优化的GeoTIFF影像, 并明确设置缺失值为-9999.0
formatOptions: {
cloudOptimized: true,
noData: -9999.0,
},
});
}
print("Start exporting " + year_str + " Yearly Mean ERA5-Land Temperature (" + target_crs + " " + target_res + "m)", annual_mean_temp_30m);
exportCOG(
annual_mean_temp_30m,
region_name_en + "_ERA5_Temperature_" + year_str + "_" + target_res + "m",
"Temperature",
region,
target_res,
target_crs
);
print("Start exporting " + year_str + " Yearly Mean ERA5-Land Precipitation (" + target_crs + " " + target_res + "m)", annual_mean_precip_30m);
exportCOG(
annual_mean_precip_30m,
region_name_en + "_ERA5_Precipitation_" + year_str + "_" + target_res + "m",
"Precipitation",
region,
target_res,
target_crs
);

107
README.md
View File

@ -1,33 +1,47 @@
# NASA EARTHDATA 数据自动化爬取与预处理
# 公开RS/GIS数据自动化下载与预处理工具
## 仓库说明
1) EARTHDATA 说明
- NASA 计划于 2026 年底将公开数据全部集成进 EARTHDATA 中
- HLS 数据集NASA 多部门将Landsat8/9与Sentinel-2A/B统一协调至30m划分为
- L30Landsat8/9包含其独有的2个热红外波段并已处理为表面亮温℃
- S30Sentinel-2A/B包含其独有的4个红边波段
- 对于核心六波段Blue/Green/Red/NIR/SWIR1/SWIR2可直接合并使用合并后湖北地区单格网观测频率约为3-4d
2) 使用说明
- 本仓库基于官方原始脚本修复了实际使用中存在的bug并添加了更多可控制参数
- 爬取时不要使用魔法工具
- 实测单格网全年影像爬取+预处理约1h
- 单用户每秒限制100次请求
### 1. [NASA EARTHDATA](https://search.earthdata.nasa.gov/search/) 说明
- NASA 计划于2026年底将公开数据全部集成进 EARTHDATA 中
- HLS 数据集NASA多部门将 Landsat8/9 与 Sentinel-2A/B 统一协调至 30m划分为
- L30Landsat8/9包含其独有的 2 个热红外波段,并已处理为表面亮温 ℃)
- S30Sentinel-2A/B包含其独有的 4 个红边波段)
- 对于核心六波段Blue/Green/Red/NIR/SWIR1/SWIR2可直接合并使用合并后湖北地区单格网观测频率约为3-4d
### 2. [Microsoft Planetary Computer](https://planetarycomputer.microsoft.com/) 说明
- 微软行星计算机是一个平台,它使用户能够利用云计算的力量来加速环境可持续性和地球科学的发展。
- 行星计算机将多 PB 的全球环境数据目录与直观的 API、灵活的科学环境相结合使用户能够回答有关这些数据的全球性问题并将这些答案提供给保护利益相关者的应用程序。
### 3. [Google Earth Engine](https://earthengine.google.com/) 说明
### 4. [OpenTopography](https://portal.opentopography.org/) 说明
### 5. 公开矢量数据平台
1. [阿里云DataV](https://datav.aliyun.com/portal/school/atlas/area_selector) 说明
### 6. 使用说明
- 针对HLS数据获取基于NASA EARTHDATA官方原始脚本修复了实际使用中的bug并添加了更多控制参数
- 使用时不要使用魔法工具
- 实测单格网全年HLS影像爬取+预处理约1h
## 1 安装基础环境
### 1.1 miniforge
### 1.1 miniforge 介绍
- miniforge是结合conda与mamba的最小化包比Anaconda和Miniconda更快更轻量并且配置命令与原conda基本一致支持直接使用mamba命令。
- 简而言之环境配置效率上miniforge > Mambaforge (202407已废弃) > Miniconda + Mamba > Miniconda > Anaconda
- miniforge 是结合 conda mamba 的最小化包,比 Anaconda Miniconda 更快更轻量,并且配置命令与原 conda 基本一致,支持直接使用 mamba 命令。
- 简而言之环境配置效率上miniforge > Mambaforge (202407 已废弃) > Miniconda + Mamba > Miniconda > Anaconda
- 官方仓库地址https://github.com/conda-forge/miniforge
- 官方下载地址https://conda-forge.org/download/
### 1.2 配置环境变量
- 为了在控制台中直接使用conda命令需要将安装的相关目录配置到Path环境变量中。
- 为了在控制台中直接使用 conda 命令,需要将安装的相关目录配置到 Path 环境变量中。
```
D:\program\miniforge3
@ -37,27 +51,27 @@ D:\program\miniforge3\Library\bin
### 1.3 配置权限
- 详细配置与miniconda相同图文教程地址https://gis-xh.github.io/my-note/python/01conda/Win11-Miniconda-install/
- Windows环境下需要设置虚拟环境文件夹的访问权限为所有用户可访问否则会出现无法读取虚拟环境文件的问题
- 详细配置与 miniconda 相同图文教程地址https://gis-xh.github.io/my-note/python/01conda/Win11-Miniconda-install/
- Windows 环境下,需要设置虚拟环境文件夹的访问权限为所有用户可访问,否则会出现无法读取虚拟环境文件的问题
- 具体地:
- 设置`D:\program\miniforge3\env`目录为所有用户可访问,具体操作为:右键点击文件夹 -> 属性 -> 安全 -> 编辑 -> 添加 -> 添加所有用户 -> 全选 -> 应用 -> 确定
### 1.4 配置镜像源
- 生成下载源文件的配置文件 (若已经安装过Anaconda/Miniconda则无需执行此步骤)
- 生成下载源文件的配置文件 (若已经安装过 Anaconda/Miniconda则无需执行此步骤)
```sh
conda config --set show_channel_urls yes
```
- 在`C:\Users\实际用户名\`目录找到`.condarc`文件,使用记事本打开,输入如下内容并保存
- 在 `C:\Users\实际用户名\` 目录找到 `.condarc` 文件,使用记事本打开,输入如下内容并保存
```
envs_dirs:
- D:\program\miniforge3\envs
- 其他路径地址(可选,创建虚拟环境时将会按照顺序查找)
channels:
- defaults
- conda-forge
show_channel_urls: true
channel_alias: https://mirrors.tuna.tsinghua.edu.cn/anaconda
default_channels:
@ -75,6 +89,12 @@ custom_channels:
simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
```
- 清理镜像缓存
```sh
conda clean -i
```
### 1.5 初始化 conda
- 打开控制台,初始化 PowerShell 与 CMD
@ -89,18 +109,30 @@ conda init cmd.exe
## 2 运行环境配置
### 2.1 使用mamba创建并激活虚拟环境
### 2.1 使用 mamba 创建并激活虚拟环境
- 克隆虚拟环境 (Windows环境下推荐)
- 克隆虚拟环境 (完整复刻运行环境所有依赖)
```sh
mamba env create -f setup/lpdaac_windows.yml
mamba env create -f setup/openearth.yml
```
- 克隆虚拟环境 (复刻主要依赖环境-部分依赖可能会更新)
```sh
mamba env create -f setup/environment.yml
```
- 激活虚拟环境
```sh
mamba activate lpdaac_windows
mamba activate openearth
```
- 导出当前虚拟环境
```sh
mamba env export > setup/openearth.yml
```
## 3 设计思路
@ -146,11 +178,10 @@ data/
### 3.2 NASA Earthdata 账户准备
- 参考自NASA官网示例Demohttps://github.com/nasa/LPDAAC-Data-Resources/blob/main/setup/setup_instructions_python.md
- 参考自 NASA 官网示例 Demohttps://github.com/nasa/LPDAAC-Data-Resources/blob/main/setup/setup_instructions_python.md
- 首次运行爬取命令时,需要输入用户名和密码,用户名和密码可以在 [Earthdata](https://urs.earthdata.nasa.gov/) 注册获取。
- 需要注意的是,密码中最好不要出 `@/#/$/%` 等符号,爬取时可能会出错。
- 单个用户每秒限制最多100次请求参考自https://forum.earthdata.nasa.gov/viewtopic.php?t=3734
- 单个用户每秒限制最多 100 次请求参考自https://forum.earthdata.nasa.gov/viewtopic.php?t=3734
## 4 使用示例
@ -160,7 +191,7 @@ data/
- `-roi`:感兴趣区,需要按照 **左下右上** 的逆时针顺序设置点坐标,同时还支持 `shp``geojson/json` 格式文件
- `-clip`:是否对影像进行裁剪,默认 `False`
- `-tile`HLS影像瓦片ID例如 `T49RGQ`
- `-tile`HLS 影像瓦片 ID例如 `T49RGQ`
- `-dir`:输出目录,必须是已存在的目录
- `-start`:开始时间,格式为 `YYYY-MM-DD`
- `-end`:结束时间,格式为 `YYYY-MM-DD`
@ -172,30 +203,30 @@ data/
#### 4.1.2 爬取云端数据并在内存中进行预处理示例
- 爬取 L30 与 S30 的核心光谱波段仅按照感兴趣区瓦片ID起止时间以及产品名称筛选影像不进行云量筛选影像对影像进行去云掩膜处理
- 爬取 L30 与 S30 的核心光谱波段:仅按照感兴趣区,瓦片 ID起止时间以及产品名称筛选影像不进行云量筛选影像对影像进行去云掩膜处理
```sh
python .\\HLS_SuPER\\HLS_SuPER.py -roi '112.9834,30.5286,114.32373,31.64448' -tile T49RGQ -dir .\\data\\HLS\\ALL -start 2024-01-01 -end 2024-01-31 -prod both -bands BLUE,GREEN,RED,NIR1,SWIR1,SWIR2,Fmask -scale True
python .\\src\\HLS_SuPER\\HLS_SuPER.py -roi '112.9834,30.5286,114.32373,31.64448' -tile T49RGQ -dir .\\data\\HLS\\ALL -start 2024-01-01 -end 2024-01-10 -prod both -bands BLUE,GREEN,RED,NIR1,SWIR1,SWIR2,Fmask -scale True
```
- 爬取 L30 的所有波段按照感兴趣区瓦片ID起止时间以及产品名称筛选影像过滤云量小于70% 的影像,对影像进行去云掩膜处理
- 爬取 L30 的所有波段:按照感兴趣区,瓦片 ID起止时间以及产品名称筛选影像过滤云量小于 70% 的影像,对影像进行去云掩膜处理
```sh
python .\\HLS_SuPER\\HLS_SuPER.py -roi '112.9834,30.5286,114.32373,31.64448' -tile T49RGQ -dir .\\data\\HLS\\L30\\subset -start 2024-01-01 -end 2024-01-31 -prod HLSL30 -bands COASTAL-AEROSOL,BLUE,GREEN,RED,NIR1,SWIR1,SWIR2,CIRRUS,TIR1,TIR2,Fmask -cc 70 -scale True
python .\\src\\HLS_SuPER\\HLS_SuPER.py -roi '112.9834,30.5286,114.32373,31.64448' -tile T49RGQ -dir .\\data\\HLS\\L30\\subset -start 2024-01-01 -end 2024-01-31 -prod HLSL30 -bands COASTAL-AEROSOL,BLUE,GREEN,RED,NIR1,SWIR1,SWIR2,CIRRUS,TIR1,TIR2,Fmask -cc 70 -scale True
```
- 仅爬取 L30 的热红外波段仅按照感兴趣区瓦片ID起止时间以及产品名称筛选影像不进行云量筛选影像对影像进行去云掩膜处理
- 仅爬取 L30 的热红外波段:仅按照感兴趣区,瓦片 ID起止时间以及产品名称筛选影像不进行云量筛选影像对影像进行去云掩膜处理
```sh
python .\\HLS_SuPER\\HLS_SuPER.py -roi .\\data\\vectors\\49REL.geojson -tile T49REL -dir .\\data\\HLS\\2024 -start 2024-06-01 -end 2024-09-01 -prod HLSL30 -bands TIR1,TIR2
python .\\src\\HLS_SuPER\\HLS_SuPER.py -roi .\\data\\vectors\\49REL.geojson -tile T49REL -dir .\\data\\HLS\\2024 -start 2024-06-01 -end 2024-09-01 -prod HLSL30 -bands TIR1,TIR2
```
- 【测试用】根据给定的范围文件 `*.geojson`,不进行云量筛选,直接爬取数据
```sh
python .\\HLS_SuPER\\HLS_SuPER.py -roi .\\data\\vectors\\49REL.geojson -tile T49REL -dir .\\data\\HLS\\2024 -start 2024-03-01 -end 2024-11-01 -prod both -bands BLUE,GREEN,RED,NIR1,SWIR1,SWIR2,Fmask
python .\\src\\HLS_SuPER\\HLS_SuPER.py -roi .\\data\\vectors\\49REL.geojson -tile T49REL -dir .\\data\\HLS\\2024 -start 2024-03-01 -end 2024-11-01 -prod both -bands BLUE,GREEN,RED,NIR1,SWIR1,SWIR2,Fmask
```
### 4.2 其他数据
v1.0: 直接运行`DATA_SuPER/`目录下所需数据对应的*.py文件即可.
v1.0: 直接运行 `src/DATA_SuPER/` 目录下所需数据对应的 `*.py` 文件即可.

37
setup/environment.yml Normal file
View File

@ -0,0 +1,37 @@
name: openearth
channels:
- conda-forge
dependencies:
- python=3.12
- dask
- dask-gateway
- earthaccess
- fiona
- gdal
- geopandas
- geoviews
- h5netcdf
- h5py
- harmony-py
- hvplot
- jupyter
- jupyter_bokeh
- jupyterlab
- libgdal-hdf4
- odc-stac
- pyresample
- pystac-client
- planetary-computer
- adlfs
- stackstac
- rasterio
- ray-default
- xarray-spatial
- rioxarray
- scikit-image
- seaborn
- spectral
- rich
- selenium
- firefox
- geckodriver

View File

@ -1,426 +0,0 @@
name: lpdaac_windows
channels:
- defaults
- conda-forge
dependencies:
- affine=2.4.0=pyhd8ed1ab_0
- aiobotocore=2.7.0=pyhd8ed1ab_1
- aiofiles=22.1.0=pyhd8ed1ab_0
- aiohttp=3.8.6=py310h8d17308_1
- aiohttp-cors=0.7.0=py_0
- aioitertools=0.11.0=pyhd8ed1ab_0
- aiosignal=1.3.1=pyhd8ed1ab_0
- aiosqlite=0.19.0=pyhd8ed1ab_0
- ansicon=1.89.0=py310h5588dad_7
- anyio=4.0.0=pyhd8ed1ab_0
- argon2-cffi=23.1.0=pyhd8ed1ab_0
- argon2-cffi-bindings=21.2.0=py310h8d17308_4
- arrow=1.3.0=pyhd8ed1ab_0
- asciitree=0.3.3=py_2
- asttokens=2.4.1=pyhd8ed1ab_0
- async-timeout=4.0.3=pyhd8ed1ab_0
- aws-c-auth=0.7.3=h0127223_1
- aws-c-cal=0.6.1=hfb91821_1
- aws-c-common=0.9.0=hcfcfb64_0
- aws-c-compression=0.2.17=h04c9df6_2
- aws-c-event-stream=0.3.1=h495bb32_4
- aws-c-http=0.7.11=hf013885_4
- aws-c-io=0.13.32=he824701_1
- aws-c-mqtt=0.9.3=h64f41f2_1
- aws-c-s3=0.3.14=hb8b96c7_1
- aws-c-sdkutils=0.1.12=h04c9df6_1
- aws-checksums=0.1.17=h04c9df6_1
- aws-crt-cpp=0.21.0=hf1ed06d_5
- aws-sdk-cpp=1.10.57=heb7cc7f_19
- babel=2.13.1=pyhd8ed1ab_0
- backports=1.0=pyhd8ed1ab_3
- backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0
- beautifulsoup4=4.12.2=pyha770c72_0
- bleach=6.1.0=pyhd8ed1ab_0
- blessed=1.19.1=pyh95a074a_2
- blosc=1.21.5=hdccc3a2_0
- bokeh=3.3.0=pyhd8ed1ab_0
- boto3=1.28.64=pyhd8ed1ab_0
- botocore=1.31.64=pyhd8ed1ab_0
- bounded-pool-executor=0.0.3=pyhd8ed1ab_0
- branca=0.7.0=pyhd8ed1ab_1
- brotli=1.0.9=hcfcfb64_9
- brotli-bin=1.0.9=hcfcfb64_9
- brotli-python=1.0.9=py310h00ffb61_9
- bzip2=1.0.8=hcfcfb64_5
- c-ares=1.21.0=hcfcfb64_0
- ca-certificates=2025.1.31=h56e8100_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- cachetools=5.3.2=pyhd8ed1ab_0
- cairo=1.18.0=h1fef639_0
- cartopy=0.22.0=py310hecd3228_1
- certifi=2025.1.31=pyhd8ed1ab_0
- cffi=1.16.0=py310h8d17308_0
- cfitsio=4.3.0=h9b0cee5_0
- cftime=1.6.3=py310h3e78b6c_0
- charset-normalizer=3.3.2=pyhd8ed1ab_0
- click=8.1.7=win_pyh7428d3b_0
- click-plugins=1.1.1=py_0
- cligj=0.7.2=pyhd8ed1ab_1
- cloudpickle=3.0.0=pyhd8ed1ab_0
- colorama=0.4.6=pyhd8ed1ab_0
- colorcet=3.0.1=pyhd8ed1ab_0
- colorful=0.5.4=pyhd8ed1ab_0
- comm=0.1.4=pyhd8ed1ab_0
- configobj=5.0.8=pyhd8ed1ab_0
- contourpy=1.2.0=py310h232114e_0
- cryptography=41.0.5=py310h6e82f81_0
- cycler=0.12.1=pyhd8ed1ab_0
- cytoolz=0.12.2=py310h8d17308_1
- dask=2023.10.1=pyhd8ed1ab_0
- dask-core=2023.10.1=pyhd8ed1ab_0
- datashader=0.16.0=pyhd8ed1ab_0
- debugpy=1.8.0=py310h00ffb61_1
- decorator=5.1.1=pyhd8ed1ab_0
- defusedxml=0.7.1=pyhd8ed1ab_0
- distlib=0.3.7=pyhd8ed1ab_0
- distributed=2023.10.1=pyhd8ed1ab_0
- entrypoints=0.4=pyhd8ed1ab_0
- exceptiongroup=1.1.3=pyhd8ed1ab_0
- executing=2.0.1=pyhd8ed1ab_0
- expat=2.5.0=h63175ca_1
- fasteners=0.17.3=pyhd8ed1ab_0
- filelock=3.13.1=pyhd8ed1ab_0
- fiona=1.9.5=py310h65cc672_0
- folium=0.15.0=pyhd8ed1ab_0
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
- font-ttf-inconsolata=3.000=h77eed37_0
- font-ttf-source-code-pro=2.038=h77eed37_0
- font-ttf-ubuntu=0.83=hab24e00_0
- fontconfig=2.14.2=hbde0cde_0
- fonts-conda-ecosystem=1=0
- fonts-conda-forge=1=0
- fonttools=4.44.0=py310h8d17308_0
- fqdn=1.5.1=pyhd8ed1ab_0
- freetype=2.12.1=hdaf720e_2
- freexl=2.0.0=h8276f4a_0
- frozenlist=1.4.0=py310h8d17308_1
- fsspec=2023.10.0=pyhca7485f_0
- gdal=3.7.3=py310haa9213b_2
- geopandas=0.14.0=pyhd8ed1ab_1
- geopandas-base=0.14.0=pyha770c72_1
- geos=3.12.0=h1537add_0
- geotiff=1.7.1=hcf4a93f_14
- geoviews=1.11.0=pyhd8ed1ab_0
- geoviews-core=1.11.0=pyha770c72_0
- gettext=0.21.1=h5728263_0
- gitdb=4.0.11=pyhd8ed1ab_0
- gitpython=3.1.40=pyhd8ed1ab_0
- google-api-core=2.13.0=pyhd8ed1ab_0
- google-auth=2.23.4=pyhca7485f_0
- googleapis-common-protos=1.61.0=pyhd8ed1ab_0
- gpustat=1.1.1=pyhd8ed1ab_0
- grpcio=1.54.3=py310h8020be6_0
- h5netcdf=1.3.0=pyhd8ed1ab_0
- h5py=3.10.0=nompi_py310h20f5850_100
- hdf4=4.2.15=h5557f11_7
- hdf5=1.14.2=nompi_h73e8ff5_100
- holoviews=1.18.1=pyhd8ed1ab_0
- hvplot=0.9.0=pyhd8ed1ab_0
- icu=73.2=h63175ca_0
- idna=3.4=pyhd8ed1ab_0
- imagecodecs-lite=2019.12.3=py310h3e78b6c_7
- imageio=2.31.5=pyh8c1a49c_0
- importlib-metadata=6.8.0=pyha770c72_0
- importlib_metadata=6.8.0=hd8ed1ab_0
- intel-openmp=2023.2.0=h57928b3_50497
- ipykernel=6.26.0=pyha63f2e9_0
- ipython=8.17.2=pyh5737063_0
- ipython_genutils=0.2.0=py_1
- ipywidgets=8.1.1=pyhd8ed1ab_0
- isoduration=20.11.0=pyhd8ed1ab_0
- jedi=0.19.1=pyhd8ed1ab_0
- jinja2=3.1.2=pyhd8ed1ab_1
- jinxed=1.2.0=pyh95a074a_0
- jmespath=1.0.1=pyhd8ed1ab_0
- joblib=1.3.2=pyhd8ed1ab_0
- json5=0.9.14=pyhd8ed1ab_0
- jsonpointer=2.4=py310h5588dad_3
- jsonschema=4.19.2=pyhd8ed1ab_0
- jsonschema-specifications=2023.7.1=pyhd8ed1ab_0
- jsonschema-with-format-nongpl=4.19.2=pyhd8ed1ab_0
- jupyter=1.0.0=pyhd8ed1ab_10
- jupyter-resource-usage=1.0.1=pyhd8ed1ab_0
- jupyter-server-mathjax=0.2.6=pyh5bfe37b_1
- jupyter_bokeh=3.0.7=pyhd8ed1ab_0
- jupyter_client=7.4.9=pyhd8ed1ab_0
- jupyter_console=6.6.3=pyhd8ed1ab_0
- jupyter_core=5.5.0=py310h5588dad_0
- jupyter_events=0.9.0=pyhd8ed1ab_0
- jupyter_server=2.10.0=pyhd8ed1ab_0
- jupyter_server_fileid=0.9.0=pyhd8ed1ab_0
- jupyter_server_terminals=0.4.4=pyhd8ed1ab_1
- jupyter_server_ydoc=0.8.0=pyhd8ed1ab_0
- jupyter_ydoc=0.2.4=pyhd8ed1ab_0
- jupyterlab=3.6.6=pyhd8ed1ab_0
- jupyterlab-geojson=3.4.0=pyhd8ed1ab_0
- jupyterlab-git=0.44.0=pyhd8ed1ab_0
- jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
- jupyterlab_server=2.25.1=pyhd8ed1ab_0
- jupyterlab_widgets=3.0.9=pyhd8ed1ab_0
- kealib=1.5.2=ha10e780_1
- kiwisolver=1.4.5=py310h232114e_1
- krb5=1.21.2=heb0366b_0
- lazy_loader=0.3=pyhd8ed1ab_0
- lcms2=2.15=h67d730c_3
- lerc=4.0.0=h63175ca_0
- libabseil=20230125.3=cxx17_h63175ca_0
- libaec=1.1.2=h63175ca_1
- libarchive=3.7.2=h6f8411a_0
- libarrow=12.0.1=he3e0f11_8_cpu
- libblas=3.9.0=19_win64_mkl
- libboost-headers=1.82.0=h57928b3_6
- libbrotlicommon=1.0.9=hcfcfb64_9
- libbrotlidec=1.0.9=hcfcfb64_9
- libbrotlienc=1.0.9=hcfcfb64_9
- libcblas=3.9.0=19_win64_mkl
- libcrc32c=1.1.2=h0e60522_0
- libcurl=8.4.0=hd5e4a3a_0
- libdeflate=1.19=hcfcfb64_0
- libevent=2.1.12=h3671451_1
- libexpat=2.5.0=h63175ca_1
- libffi=3.4.2=h8ffe710_5
- libgdal=3.7.3=h3217549_2
- libglib=2.78.1=he8f3873_0
- libgoogle-cloud=2.12.0=h00b2bdc_1
- libgrpc=1.54.3=ha177ca7_0
- libhwloc=2.9.3=default_haede6df_1009
- libiconv=1.17=h8ffe710_0
- libjpeg-turbo=3.0.0=hcfcfb64_1
- libkml=1.3.0=haf3e7a6_1018
- liblapack=3.9.0=19_win64_mkl
- libnetcdf=4.9.2=nompi_h8284064_112
- libpng=1.6.39=h19919ed_0
- libpq=16.3=hab9416b_0
- libprotobuf=3.21.12=h12be248_2
- librttopo=1.1.0=h92c5fdb_14
- libsodium=1.0.18=h8d14728_1
- libspatialindex=1.9.3=h39d44d4_4
- libspatialite=5.1.0=hbf340bc_1
- libsqlite=3.44.0=hcfcfb64_0
- libssh2=1.11.0=h7dfc565_0
- libthrift=0.18.1=h06f6336_2
- libtiff=4.6.0=h6e2ebb7_2
- libutf8proc=2.8.0=h82a8f57_0
- libwebp-base=1.3.2=hcfcfb64_0
- libxcb=1.15=hcd874cb_0
- libxml2=2.12.7=h283a6d9_1
- libzip=1.10.1=h1d365fa_3
- libzlib=1.2.13=hcfcfb64_5
- linkify-it-py=2.0.0=pyhd8ed1ab_0
- llvmlite=0.41.1=py310hb84602e_0
- locket=1.0.0=pyhd8ed1ab_0
- lz4=4.3.2=py310hbbb2075_1
- lz4-c=1.9.4=hcfcfb64_0
- lzo=2.10=he774522_1000
- m2w64-gcc-libgfortran=5.3.0=6
- m2w64-gcc-libs=5.3.0=7
- m2w64-gcc-libs-core=5.3.0=7
- m2w64-gmp=6.1.0=2
- m2w64-libwinpthread-git=5.0.0.4634.697f757=2
- mapclassify=2.6.1=pyhd8ed1ab_0
- markdown=3.5.1=pyhd8ed1ab_0
- markdown-it-py=3.0.0=pyhd8ed1ab_0
- markupsafe=2.1.3=py310h8d17308_1
- matplotlib-base=3.8.1=py310hc9baf74_0
- matplotlib-inline=0.1.6=pyhd8ed1ab_0
- mdit-py-plugins=0.4.0=pyhd8ed1ab_0
- mdurl=0.1.0=pyhd8ed1ab_0
- minizip=4.0.2=h5bed578_0
- mistune=3.0.2=pyhd8ed1ab_0
- mkl=2023.2.0=h6a75c08_50496
- msgpack-python=1.0.6=py310h232114e_0
- msys2-conda-epoch=20160418=1
- multidict=6.0.4=py310h8d17308_1
- multimethod=1.9.1=pyhd8ed1ab_0
- multipledispatch=0.6.0=py_0
- munch=4.0.0=pyhd8ed1ab_0
- munkres=1.1.4=pyh9f0ad1d_0
- nbclassic=1.0.0=pyhb4ecaf3_1
- nbclient=0.8.0=pyhd8ed1ab_0
- nbconvert=7.11.0=pyhd8ed1ab_0
- nbconvert-core=7.11.0=pyhd8ed1ab_0
- nbconvert-pandoc=7.11.0=pyhd8ed1ab_0
- nbdime=3.2.1=pyhd8ed1ab_0
- nbformat=5.9.2=pyhd8ed1ab_0
- nest-asyncio=1.5.8=pyhd8ed1ab_0
- netcdf4=1.6.5=nompi_py310h6477780_100
- networkx=3.2.1=pyhd8ed1ab_0
- notebook=6.5.6=pyha770c72_0
- notebook-shim=0.2.3=pyhd8ed1ab_0
- numba=0.58.1=py310h9ccaf4f_0
- numcodecs=0.12.1=py310h00ffb61_0
- numpy=1.26.0=py310hf667824_0
- nvidia-ml-py=12.535.133=pyhd8ed1ab_0
- opencensus=0.11.3=pyhd8ed1ab_0
- opencensus-context=0.1.3=py310h5588dad_2
- openjpeg=2.5.0=h3d672ee_3
- openssl=3.4.1=ha4e3fda_0
- orc=1.9.0=hada7b9e_1
- overrides=7.4.0=pyhd8ed1ab_0
- packaging=23.2=pyhd8ed1ab_0
- pandas=2.1.2=py310hecd3228_0
- pandoc=3.1.3=h57928b3_0
- pandocfilters=1.5.0=pyhd8ed1ab_0
- panel=1.3.1=pyhd8ed1ab_0
- param=2.0.1=pyhca7485f_0
- parso=0.8.3=pyhd8ed1ab_0
- partd=1.4.1=pyhd8ed1ab_0
- pcre2=10.40=h17e33f8_0
- pexpect=4.8.0=pyh1a96a4e_2
- pickleshare=0.7.5=py_1003
- pillow=10.1.0=py310h1e6a543_0
- pip=23.3.1=pyhd8ed1ab_0
- pixman=0.42.2=h63175ca_0
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
- platformdirs=3.11.0=pyhd8ed1ab_0
- poppler=23.10.0=hc2f3c52_0
- poppler-data=0.4.12=hd8ed1ab_0
- postgresql=16.3=h7f155c9_0
- pqdm=0.2.0=pyhd8ed1ab_0
- proj=9.3.0=he13c7e8_2
- prometheus_client=0.18.0=pyhd8ed1ab_0
- prompt-toolkit=3.0.39=pyha770c72_0
- prompt_toolkit=3.0.39=hd8ed1ab_0
- protobuf=4.21.12=py310h00ffb61_0
- psutil=5.9.5=py310h8d17308_1
- pthread-stubs=0.4=hcd874cb_1001
- pthreads-win32=2.9.1=hfa6e2cd_3
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- py-spy=0.3.14=h975169c_0
- pyarrow=12.0.1=py310hd1a9178_8_cpu
- pyasn1=0.5.0=pyhd8ed1ab_0
- pyasn1-modules=0.3.0=pyhd8ed1ab_0
- pycparser=2.21=pyhd8ed1ab_0
- pyct=0.4.6=py_0
- pyct-core=0.4.6=py_0
- pydantic=1.10.13=py310h8d17308_1
- pygments=2.16.1=pyhd8ed1ab_0
- pykdtree=1.3.9=py310h3e78b6c_1
- pyopenssl=23.3.0=pyhd8ed1ab_0
- pyparsing=3.1.1=pyhd8ed1ab_0
- pyproj=3.6.1=py310hebb2149_4
- pyresample=1.27.1=py310hecd3228_2
- pyshp=2.3.1=pyhd8ed1ab_0
- pysocks=1.7.1=pyh0701188_6
- pystac=1.9.0=pyhd8ed1ab_0
- pystac-client=0.7.5=pyhd8ed1ab_0
- python=3.10.13=h4de0772_0_cpython
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-fastjsonschema=2.18.1=pyhd8ed1ab_0
- python-json-logger=2.0.7=pyhd8ed1ab_0
- python-tzdata=2023.3=pyhd8ed1ab_0
- python_abi=3.10=4_cp310
- pytz=2023.3.post1=pyhd8ed1ab_0
- pyu2f=0.1.5=pyhd8ed1ab_0
- pyviz_comms=2.3.2=pyhd8ed1ab_0
- pywavelets=1.4.1=py310h3e78b6c_1
- pywin32=306=py310h00ffb61_2
- pywinpty=2.0.12=py310h00ffb61_0
- pyyaml=6.0.1=py310h8d17308_1
- pyzmq=24.0.1=py310hcd737a0_1
- qtconsole-base=5.5.0=pyha770c72_0
- qtpy=2.4.1=pyhd8ed1ab_0
- rasterio=1.3.9=py310h4d3659c_0
- ray-core=2.7.1=py310h139b6d1_0
- ray-default=2.7.1=py310h5588dad_0
- re2=2023.03.02=hd4eee63_0
- referencing=0.30.2=pyhd8ed1ab_0
- requests=2.31.0=pyhd8ed1ab_0
- rfc3339-validator=0.1.4=pyhd8ed1ab_0
- rfc3986-validator=0.1.1=pyh9f0ad1d_0
- rioxarray=0.15.0=pyhd8ed1ab_0
- rpds-py=0.12.0=py310h87d50f1_0
- rsa=4.9=pyhd8ed1ab_0
- rtree=1.1.0=py310h1cbd46b_0
- s3fs=2023.10.0=pyhd8ed1ab_0
- s3transfer=0.7.0=pyhd8ed1ab_0
- scikit-image=0.20.0=py310h1c4a608_1
- scikit-learn=1.3.2=py310hfd2573f_1
- scipy=1.11.3=py310hf667824_1
- send2trash=1.8.2=pyh08f2357_0
- setproctitle=1.3.3=py310h8d17308_0
- setuptools=68.2.2=pyhd8ed1ab_0
- shapely=2.0.2=py310h839b4a8_0
- six=1.16.0=pyh6c4a22f_0
- smart_open=6.4.0=pyhd8ed1ab_0
- smmap=5.0.0=pyhd8ed1ab_0
- snappy=1.1.10=hfb803bf_0
- sniffio=1.3.0=pyhd8ed1ab_0
- snuggs=1.4.7=py_0
- sortedcontainers=2.4.0=pyhd8ed1ab_0
- soupsieve=2.5=pyhd8ed1ab_1
- spectral=0.23.1=pyh1a96a4e_0
- sqlite=3.44.0=hcfcfb64_0
- stack_data=0.6.2=pyhd8ed1ab_0
- tbb=2021.10.0=h91493d7_2
- tblib=2.0.0=pyhd8ed1ab_0
- terminado=0.17.0=pyh08f2357_0
- threadpoolctl=3.2.0=pyha21a80b_0
- tifffile=2020.6.3=py_0
- tiledb=2.16.3=h1ffc264_3
- tinycss2=1.2.1=pyhd8ed1ab_0
- tinynetrc=1.3.1=pyhd8ed1ab_0
- tk=8.6.13=h5226925_1
- tomli=2.0.1=pyhd8ed1ab_0
- toolz=0.12.0=pyhd8ed1ab_0
- tornado=6.3.3=py310h8d17308_1
- tqdm=4.66.1=pyhd8ed1ab_0
- traitlets=5.13.0=pyhd8ed1ab_0
- types-python-dateutil=2.8.19.14=pyhd8ed1ab_0
- typing_utils=0.1.0=pyhd8ed1ab_0
- tzdata=2023c=h71feb2d_0
- uc-micro-py=1.0.1=pyhd8ed1ab_0
- ucrt=10.0.22621.0=h57928b3_0
- unicodedata2=15.1.0=py310h8d17308_0
- uri-template=1.3.0=pyhd8ed1ab_0
- uriparser=0.9.7=h1537add_1
- urllib3=1.26.18=pyhd8ed1ab_0
- vc=14.3=h64f974e_17
- vc14_runtime=14.36.32532=hdcecf7f_17
- virtualenv=20.21.0=pyhd8ed1ab_0
- vs2015_runtime=14.36.32532=h05e6639_17
- wcwidth=0.2.9=pyhd8ed1ab_0
- webcolors=1.13=pyhd8ed1ab_0
- webencodings=0.5.1=pyhd8ed1ab_2
- wheel=0.41.3=pyhd8ed1ab_0
- widgetsnbextension=4.0.9=pyhd8ed1ab_0
- win_inet_pton=1.1.0=pyhd8ed1ab_6
- winpty=0.4.3=4
- wrapt=1.16.0=py310h8d17308_0
- xarray=2023.10.1=pyhd8ed1ab_0
- xerces-c=3.2.4=h63175ca_3
- xorg-libxau=1.0.11=hcd874cb_0
- xorg-libxdmcp=1.1.3=hcd874cb_0
- xyzservices=2023.10.1=pyhd8ed1ab_0
- xz=5.2.6=h8d14728_0
- y-py=0.5.5=py310h87d50f1_2
- yaml=0.2.5=h8ffe710_2
- yarl=1.9.2=py310h8d17308_1
- ypy-websocket=0.8.2=pyhd8ed1ab_0
- zarr=2.16.1=pyhd8ed1ab_0
- zeromq=4.3.4=h0e60522_1
- zict=3.0.0=pyhd8ed1ab_0
- zipp=3.17.0=pyhd8ed1ab_0
- zlib=1.2.13=hcfcfb64_5
- zstd=1.5.5=h12be248_0
- pip:
- attrs==25.3.0
- earthaccess==0.14.0
- h11==0.14.0
- importlib-resources==6.5.2
- outcome==1.3.0.post0
- python-cmr==0.13.0
- selenium==4.30.0
- trio==0.29.0
- trio-websocket==0.12.2
- typing-extensions==4.12.2
- websocket-client==1.8.0
- wsproto==1.2.0

BIN
setup/openearth.yml Normal file

Binary file not shown.

View File

@ -0,0 +1,298 @@
"""
访问阿里云 DataV 下载的行政区边界数据保存为原始 JSON, 有部分字段与GDAL不兼容, 导致无法
直接读取, 需要先清洗再保存为 GeoJSON.
- 官方地址: https://datav.aliyun.com/portal/school/atlas/area_selector
- Step1: "省-市-县名称"解析为行政区划代码 (adcode);
- Step2: DataV 原始数据保存为 `--县名称.json` 文件;
- Step3: 移除不兼容 GDAL/GeoPandas 的属性字段 (parent, center, centroid, acroutes);
- Step4: 将清洗后的结果写出为 `--县名称.geojson` 文件.
-------------------------------------------------------------------------------
Authors: CVEO Team
Last Updated: 2026-04-19
===============================================================================
"""
import json
import re
from pathlib import Path
from typing import Optional
import requests
def _validate_region_params(province: str, city: str, county: Optional[str]) -> None:
"""
Validate that province and city are non-empty.
Raises
------
ValueError
If province or city is None or empty string.
"""
if not province or not province.strip():
raise ValueError("province (省) cannot be empty")
if not city or not city.strip():
raise ValueError("city (市) cannot be empty")
def get_datav_json(accode: str) -> dict:
"""
DataV 接口获取行政区边界的原始 JSON 数据并返回字典.
"""
# 使用路径式接口, 支持如 "420100" 或 "420100_full"
url = f"https://geo.datav.aliyun.com/areas_v3/bound/{accode}.json"
response = requests.get(url, timeout=15)
response.raise_for_status()
return response.json()
def fetch_and_save_geojson(accode: str, region_name: str, out_dir: Path) -> Path:
"""
获取 DataV 原始数据, 先保存为 .json; 随后清洗属性并另存为 .geojson.
Parameters
----------
accode : str
行政区划代码, "420100" "420100_full".
region_name : str
区域名称, 用于输出文件名. 当使用省--县三级参数时, 文件名为
"{province}_{city}.geojson" "{province}_{city}_{county}.geojson".
out_dir : Path
输出目录.
"""
raw_data = get_datav_json(accode)
# 处理 features: 移除不兼容 GeoPandas 的属性
def sanitize_properties(props: dict) -> dict:
out = {}
for k, v in props.items():
# 移除嵌套对象
if k in ("parent", "center", "centroid", "acroutes"):
continue
# 仅保留标量类型; 丢弃其他嵌套结构
if isinstance(v, (str, int, float, bool)) or v is None:
out[k] = v
return out
# 输出路径(确保目录存在)
out_dir_path = Path(out_dir)
out_dir_path.mkdir(parents=True, exist_ok=True)
# 先保存原始 JSON(未清洗)
raw_json_path = out_dir_path / f"{region_name}.json"
with raw_json_path.open("w", encoding="utf-8") as f:
json.dump(raw_data, f, ensure_ascii=False)
# 再保存清洗后的 GeoJSON
out_path = out_dir_path / f"{region_name}.geojson"
# 深拷贝后进行清洗, 避免影响原始数据
data = json.loads(json.dumps(raw_data))
features = data.get("features", [])
for feature in features:
props = feature.get("properties", {})
feature["properties"] = sanitize_properties(props)
# 写出为 .geojson, 确保 UTF-8 且保留中文字符
with out_path.open("w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False)
return out_path
def _normalize_name(name: str) -> str:
name = name.strip()
# 简单去除常见后缀提高匹配鲁棒性
for suffix in ("", "", "地区", "", "自治州", "自治县", "特别行政区"):
if name.endswith(suffix):
name = name[: -len(suffix)]
return name
def _name_matches_exact(target: str, candidate: str) -> bool:
return _normalize_name(target) == _normalize_name(candidate)
def resolve_adcode_by_name(
province: str,
city: Optional[str] = None,
county: Optional[str] = None,
prefer_full: bool = False,
) -> Optional[str]:
"""
通过省--县名称解析 DataV 行政区划代码.
采用层级查找策略: 全国数据 -> 省级数据 -> 市级数据 -> 区县数据.
前两级 (/) 必须提供且不能为空; 第三级 () 可为空.
Parameters
----------
province : str
省份名称, "湖北省". 不能为空.
city : str, optional
城市名称, "武汉市". 可为空 (表示只解析到省).
county : str, optional
区县名称, "江岸区". 可为空 (表示只解析到市级).
prefer_full : bool, optional
是否返回下辖完整边界代码 ( "420100_full"), 默认 False.
Returns
-------
str or None
行政区划代码, "420100" "420100_full", 找不到则返回 None.
Raises
------
ValueError
如果 province city 为空.
"""
_validate_region_params(province, city, county)
# Step 1: 从全国数据中查找省份
try:
cn = requests.get(
"https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json",
timeout=20,
).json()
except Exception:
return None
pcode = None
for feat in cn.get("features", []):
props = feat.get("properties", {})
if props.get("level") == "province":
name = props.get("name", "")
code = str(props.get("adcode", ""))
if re.fullmatch(r"\d{6}", code):
if _name_matches_exact(province, name):
pcode = code
break
if not pcode:
return None
# Step 2: 从省份数据中查找城市
# 特殊处理直辖市: 当 province == city 时 (如 "北京市" == "北京市")
if _name_matches_exact(province, city):
ccode = pcode
else:
try:
prov_data = requests.get(
f"https://geo.datav.aliyun.com/areas_v3/bound/{pcode}_full.json",
timeout=20,
).json()
except Exception:
return None
ccode = None
for feat in prov_data.get("features", []):
props = feat.get("properties", {})
level = props.get("level")
name = props.get("name", "")
code = str(props.get("adcode", ""))
# 匹配城市或区县级别
if level in ("city", "district") and re.fullmatch(r"\d{6}", code):
if _name_matches_exact(city, name):
ccode = code
break
if not ccode:
return None
# Step 3: 如果提供了区县名称, 从城市数据中查找区县
if county:
try:
city_data = requests.get(
f"https://geo.datav.aliyun.com/areas_v3/bound/{ccode}_full.json",
timeout=20,
).json()
except Exception:
return None
dcode = None
for feat in city_data.get("features", []):
props = feat.get("properties", {})
level = props.get("level")
name = props.get("name", "")
code = str(props.get("adcode", ""))
# 匹配区县 (包括县级市, 其 level 可能为 "city")
if level in ("district", "city") and re.fullmatch(r"\d{6}", code):
if _name_matches_exact(county, name):
dcode = code
break
if not dcode:
return None
# 区县级别已经是最终边界,不需要 _full 后缀
return dcode
# 只解析到市级
return f"{ccode}_full" if prefer_full else ccode
def fetch_and_save_geojson_by_name(
province: str,
city: Optional[str],
county: Optional[str],
out_dir: Path,
prefer_full: bool = False,
) -> Path:
"""
通过省--县名称解析 adcode, 并直接拉取与保存 GeoJSON.
Parameters
----------
province : str
省份名称, "湖北省". 不能为空.
city : str, optional
城市名称, "武汉市". 可为空 (表示只解析到省).
county : str, optional
区县名称, "江岸区". 可为空 (表示只解析到市级).
out_dir : Path
输出目录, 用于保存 GeoJSON 文件.
prefer_full : bool, optional
是否下载下辖区划的 GeoJSON, 默认 False.
Returns
-------
Path
保存的 GeoJSON 文件路径.
Raises
------
ValueError
如果省或市为空, 或无法解析到行政区划代码.
"""
code = resolve_adcode_by_name(province, city, county, prefer_full=prefer_full)
if not code:
raise ValueError(
f"无法通过名称解析到行政区划代码: province={province}, city={city}, county={county}"
)
# 构建输出文件名
if county:
region_name = f"{province}_{city}_{county}"
else:
region_name = f"{province}_{city}"
return fetch_and_save_geojson(code, region_name, out_dir)
if __name__ == "__main__":
# 示例 1: 只获取市级边界 (province + city)
out = fetch_and_save_geojson_by_name(
"湖北省", None, None, out_dir=Path("./data/vectors/")
)
# 示例 2: 获取十堰市县区级边界
# out = fetch_and_save_geojson_by_name(
# "湖北省", "十堰市", None, out_dir=Path("./data/vectors/"), prefer_full=True
# )
# 示例 3: 获取区县级边界 (province + city + county)
# out = fetch_and_save_geojson_by_name(
# "湖北省", "十堰市", "郧西县", out_dir=Path("./data/vectors/"), prefer_full=True
# )
print(f"Saved raw JSON and GeoJSON: {out}.")

366
src/DATA_SuPER/DEM_SuPER.py Normal file
View File

@ -0,0 +1,366 @@
# -*- coding: utf-8 -*-
"""
===============================================================================
This module contains functions related to preprocessing DEM data.
For example, elevation, slope, aspect
Step1: Use earthaccess search and download DEM Data
- NASADEM_HGT
- includes 30m DEM, based on SRTM data
- https://lpdaac.usgs.gov/products/nasadem_hgtv001/
- NASADEM_SC
- includes 30m slope, aspect, based on NASADEM_HGT
- https://lpdaac.usgs.gov/products/nasadem_scv001/
- ALOS PALSAR RTC Project
- includes 12.5, 30m DEM, based on ALOS PALSAR data
- https://www.earthdata.nasa.gov/data/projects/alos-palsar-rtc-project
Step2a: Process NASADEM data
- 下载的 NASADEM 均为 *.zip 文件, 需先进行解压
- NASADEM 文件名称结构为: NASADEM_类型_网格编号/网格编号.数据类型
- 高程示例: NASADEM_HGT_n30e113/n30e113.hgt
- 坡度示例: NASADEM_SC_n30e113/n30e113.slope
- 坡向示例: NASADEM_SC_n30e113/n30e113.aspect
- 读取文件按指定范围进行裁剪并镶嵌, 坡度和坡向数据需要进行缩放处理, 将网格范围的结果保存为 *.tif 文件
Step2b: Process ALOS PALSAR RTC Project data
- 下载的 ALOS PALSAR RTC Project 均为 *.zip 文件, 需先进行解压
- ALOS PALSAR RTC Project 文件名称结构为: AP_轨道号_CCC_DDDD_RT1.数据类型.tif
- 高程示例: AP_16112_FBS_F0620_RT1.dem.tif
- 读取文件按指定范围进行裁剪并镶嵌, 坡度和坡向数据需要进行缩放处理, 将网格范围的结果保存为 *.tif 文件
-------------------------------------------------------------------------------
Authors: Hong Xie
Last Updated: 2025-10-20
===============================================================================
"""
import os
import sys
import glob
from pathlib import Path
import json
import requests
import zipfile
import time
import dask.distributed
import logging
import earthaccess
import numpy as np
from rioxarray import open_rasterio
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from src.utils.common_utils import (
setup_dask_environment,
setup_logging,
clip_image,
mosaic_images,
)
from src.HLS_SuPER.HLS_Su import earthdata_search
def reorganize_nasadem_urls(dem_results_urls: list):
"""
重组 NASADEM 下载链接
将同一格网内的高程, 坡度坡向数据链接进行组合
Parameters
----------
dem_results_urls: list
查询返回的 NASADEM 数据 URL 列表
Returns
-------
grouped_results_urls: list
重组后的 NASADEM 数据 URL 列表
"""
tile_ids = []
for granule in dem_results_urls:
tile_id = granule[0].split("/")[-2].split("_")[-1]
tile_ids.append(tile_id)
tile_ids = np.array(tile_ids)
# 根据瓦片ID找到对应的索引
tile_id_indices = np.where(tile_ids == tile_id)
# 根据索引过滤结果
return [dem_results_urls[i] for i in tile_id_indices[0]]
def download_granule(granule_urls: list[str], output_dir: str) -> bool:
"""
下载单批数据
Parameters
----------
granule_urls: list
查询返回的规范化待下载数据 URL 列表
output_dir: str
下载目录
Returns
-------
download_state: bool
下载状态 True or False
"""
# 检查是否已下载
if not all(
os.path.isfile(os.path.join(output_dir, os.path.basename(url)))
for url in granule_urls
):
try:
earthaccess.download(granule_urls, output_dir)
except Exception as e:
# 下载失败时, 先尝试使用 requests 库下载
for url in granule_urls:
response = requests.get(url, timeout=30)
if response.status_code == 200:
with open(
os.path.join(output_dir, os.path.basename(url)), "wb"
) as f:
f.write(response.content)
else:
logging.error(
f"Error downloading data: {response.status_code}. Skipping."
)
return False
logging.info("All Data already downloaded.")
return True
def process_dem_zip(
zip_file_list: list[str], output_dir: Path, mode_str: str = "NASADEM"
) -> None:
"""
处理下载的 DEM ZIP 文件,
若为 NASADEM, 则将其解压后的文件统一为可读写的 .hgt 格式;
若为 ALOSDEM, 则直接读取 dem 并转储为 DEFLATE 压缩后的 tif 文件.
Parameters
----------
zip_file_list: list
待解压的 ZIP 文件列表
output_dir: Path
输出目录
mode_str: str, optional
解压模式, 默认为 "NASADEM", 可选 "NASADEM", "ALOSDEM"
"""
try:
for zip_path in zip_file_list:
if not os.path.isfile(zip_path) or not zipfile.is_zipfile(zip_path):
continue
with zipfile.ZipFile(zip_path, "r") as zip_ref:
if mode_str == "NASADEM":
# 示例ZIP文件名称: NASADEM_HGT_n30e113.zip
# 仅解压包含 .hgt, .slope, .aspect 的文件
for tif_file in [
f
for f in zip_ref.namelist()
if f.endswith((".hgt", ".slope", ".aspect"))
]:
# 解压时重命名文件
new_name = (
tif_file.replace(".hgt", ".elevation.hgt")
if tif_file.endswith(".hgt")
else f"{tif_file}.hgt"
)
# 解压文件到指定目录
unzip_file_path = os.path.join(output_dir, new_name)
if os.path.exists(unzip_file_path):
continue
with zip_ref.open(tif_file) as source_file:
with open(unzip_file_path, "wb") as unzip_file:
unzip_file.write(source_file.read())
elif mode_str == "ALOSDEM":
# 示例ZIP文件名称: AP_14354_FBS_F3000_RT1.zip
# 仅解压包含 dem.tif 的文件, 其内部路径为 AP_14354_FBS_F3000_RT1/AP_14354_FBS_F3000_RT1.dem.tif
tif_file = [f for f in zip_ref.namelist() if f.endswith("dem.tif")][0]
# 读取压缩包内的文件并转储到指定目录
dem = open_rasterio(zip_ref.open(tif_file))
# 转储为 INT32 DEFLATE 压缩后的 .tif 文件
dem.rio.to_raster(
output_dir / os.path.basename(tif_file),
compress="DEFLATE",
)
except Exception as e:
logging.error(f"Error unzipping {mode_str} to {output_dir}: {e}")
return
def process_granule(
unzip_dir: Path,
output_dir: Path,
mode_str: str,
name: str,
roi: list,
clip=True,
tile_id: str = None,
) -> bool:
"""
读取解压并重命名处理后的指定类型 DEM 数据并进行预处理, 包括读取, 裁剪, 镶嵌, 并对坡度坡向进行缩放
Parameters
----------
unzip_dir: Path
解压后的 DEM 文件根目录
output_dir: Path
输出根目录
mode_str: str
数据模式, 可选 NASADEM, ALOSDEM
name: str
数据类型, 可选 elevation, slope, aspect
roi: list
网格范围
clip: bool
是否裁剪
tile_id: str
网格编号
Returns
-------
process_state: bool
处理状态 True or False
"""
if mode_str == "NASADEM":
dem_file_list = glob.glob(os.path.join(unzip_dir, f"*{name}.hgt"))
out_tif_name = f"DEM.{mode_str}.{tile_id}.2000.{name}.tif"
elif mode_str == "ALOSDEM":
dem_file_list = glob.glob(os.path.join(unzip_dir, "*.dem.tif"))
# 示例ZIP文件名称: AP_14354_FBS_F3000_RT1.zip
# 仅解压包含 dem.tif 的文件, 其内部路径为 AP_14354_FBS_F3000_RT1/AP_14354_FBS_F3000_RT1.dem.tif
tif_file = [f for f in zip_ref.namelist() if f.endswith("dem.tif")][0]
out_tif_name = f"DEM.{mode_str}.{tile_id}.2011.{name}.tif"
output_file = os.path.join(output_dir, out_tif_name)
if not os.path.isfile(output_file):
try:
dem_raster_list = []
for dem_path in dem_file_list:
raster = (
open_rasterio(dem_path).squeeze(dim="band", drop=True).rename(name)
)
if name == "slope" or name == "aspect":
org_attrs = raster.attrs
raster = raster * 0.01
# 恢复源数据属性信息
raster.attrs = org_attrs.copy()
raster.rio.write_crs("EPSG:4326", inplace=True)
raster.attrs["scale_factor"] = 1
dem_raster_list.append(raster)
if len(dem_raster_list) >= 1:
# 先逐一裁剪再镶嵌合成仅在提供ROI且要求裁剪时执行
if roi is not None and clip:
clipped_list = []
for raster in dem_raster_list:
clipped = clip_image(raster, roi, clip_by_box=True)
clipped_list.append(clipped)
# 镶嵌合成
if name == "slope" or name == "aspect":
dem_mosaiced = mosaic_images(dem_raster_list, nodata=-9999)
else:
dem_mosaiced = mosaic_images(dem_raster_list, nodata=-32768)
dem_mosaiced.rio.to_raster(
output_file, driver="COG", compress="DEFLATE"
)
except Exception as e:
logging.error(f"Error processing files in {name}: {e}")
return False
logging.info(f"Processed {output_file} successfully.")
else:
logging.warning(f"{output_file} already exists. Skipping.")
return True
def main(
output_root_dir: Path,
region_file: Path,
tile_id: str,
mode_str: str = "NASADEM",
):
results_urls = []
mode_str = mode_str.upper()
output_root_dir = output_root_dir / mode_str
# 放置下载的 ZIP 文件
download_dir = output_root_dir / "ZIP"
# 放置解压并预处理后的文件
unzip_dir = download_dir / "UNZIP"
output_dir = output_root_dir / "TIF" / tile_id
os.makedirs(download_dir, exist_ok=True)
os.makedirs(unzip_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)
results_urls_file = output_root_dir / f"{mode_str}_{tile_id}_results_urls.json"
# 配置日志
logs_file = output_root_dir / f"{mode_str}_{tile_id}_SuPER.log"
setup_logging(logs_file)
# 检索数据并存储, 默认覆盖上一次检索记录
if mode_str == "NASADEM":
asset_name = ["NASADEM_HGT", "NASADEM_SC"]
results_urls = earthdata_search(asset_name, region_file=region_file)
elif mode_str == "ALOSDEM":
asset_name = ["ALOS_PSR_RTC_HIGH"]
results_urls = earthdata_search(asset_name, region_file=region_file)
# ALOSDEM 数据仅保存含 "FBS" 字符串的 URL
results_urls = [url for url in results_urls if "FBS" in url[0]]
else:
results_urls = []
with open(results_urls_file, "w") as f:
json.dump(results_urls, f)
logging.info(f"Found {len(results_urls)} {mode_str} granules after filtering.")
if len(results_urls) == 0:
return
# 构造待解压的文件列表
zip_file_list = [
download_dir / os.path.basename(result[0])
for result in results_urls
]
client = dask.distributed.Client(timeout=60, memory_limit="8GB")
client.run(setup_dask_environment)
all_start_time = time.time()
client.scatter(results_urls)
logging.info(f"Start processing {mode_str} ...")
download_tasks = [
dask.delayed(download_granule)(granule_url, download_dir)
for granule_url in results_urls
]
# 根据模式传递正确的解压标识
unzip_tasks = dask.delayed(process_dem_zip)(zip_file_list, unzip_dir, mode_str)
if mode_str == "NASADEM":
process_tasks = [
dask.delayed(process_granule)(
unzip_dir, output_dir, mode_str, name, region, True, tile_id
)
for name in ["elevation", "slope", "aspect"]
]
dask.compute(*download_tasks)
dask.compute(unzip_tasks)
if mode_str == "NASADEM":
dask.compute(*process_tasks)
client.close()
all_total_time = time.time() - all_start_time
logging.info(
f"All {mode_str} Downloading complete and processed. Total time: {all_total_time} seconds"
)
if __name__ == "__main__":
earthaccess.login(strategy="netrc", persist=True)
# tile_id = "49REL"
# tile_id = "Wuhan"
tile_id = "Hubei"
region_file = Path(f"./data/vectors/{tile_id}.geojson")
# mode_str = "NASADEM"
mode_str = "ALOSDEM"
output_root_dir = Path(".\\data\\DEM")
main(output_root_dir, region_file, tile_id, mode_str)

View File

@ -28,10 +28,11 @@ import logging
import earthaccess
from xarray import open_dataset
sys.path.append("D:/NASA_EarthData_Script")
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from utils.common_utils import setup_dask_environment
from HLS_SuPER.HLS_Su import earthdata_search
from src.utils.common_utils import setup_dask_environment
from src.HLS_SuPER.HLS_Su import earthdata_search
def convert(source_nc_path: str, target_tif_path: str) -> None:

View File

@ -6,7 +6,7 @@ For example, MCD43A3, MCD43A4, MOD11A1.
-------------------------------------------------------------------------------
Authors: Hong Xie
Last Updated: 2025-07-15
Last Updated: 2025-10-16
===============================================================================
"""
@ -15,15 +15,22 @@ import sys
import json
import time
import logging
from pathlib import Path
import earthaccess
import rioxarray as rxr
import dask.distributed
import geopandas as gpd
sys.path.append("D:/NASA_EarthData_Script")
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from utils.common_utils import clip_image, reproject_image, setup_dask_environment
from HLS_SuPER.HLS_Su import earthdata_search
from src.utils.common_utils import (
clip_image,
reproject_image,
setup_dask_environment,
setup_logging,
)
from src.HLS_SuPER.HLS_Su import earthdata_search
def open_mcd43a4(file_path):
@ -117,7 +124,15 @@ def open_modis(file_path, prod_name):
raise ValueError(f"Unknown MODIS product: {prod_name}.")
def process_modis(download_file, prod_name, roi, clip=True, scale=True, target_crs=None, output_file=None):
def process_modis(
download_file,
prod_name,
roi,
clip=True,
scale=True,
target_crs=None,
output_file=None,
):
"""
MODIS 数据进行预处理, 包括裁剪, 重投影和缩放.
"""
@ -127,13 +142,14 @@ def process_modis(download_file, prod_name, roi, clip=True, scale=True, target_c
if roi is not None and clip:
modis = clip_image(modis, roi)
if target_crs is not None:
if target_crs is not None and modis is not None:
modis = reproject_image(modis, target_crs)
# 重投影后再裁剪一次
if roi is not None and clip:
modis = clip_image(modis, roi)
# 重投影后再裁剪一次
if roi is not None and clip:
modis = clip_image(modis, roi)
if modis.isnull().all():
logging.error(f"Processing {download_file}. Roi area all pixels are nodata.")
if scale:
# 缩放计算后会丢源属性和坐标系, 需要先备份源数据属性信息
org_attrs = modis.attrs
@ -165,14 +181,9 @@ def process_granule(
clip,
scale,
output_dir,
target_crs="EPSG:4326",
tile_id=None,
target_crs="EPSG:4326",
):
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s:%(asctime)s ||| %(message)s",
handlers=[logging.StreamHandler(sys.stdout)],
)
download_hdf_name = os.path.basename(granule_urls[0])
# 获取名称与日期
@ -191,7 +202,7 @@ def process_granule(
out_tif_name = f"MODIS.{prod_name}.{tile_id}.{date}.NBRDF.tif"
else:
out_tif_name = download_hdf_name.replace(".hdf", ".tif")
# 除 MCD43A4 需用于光谱指数计算外, MOD11A1 日间温度与 MCD43A4 反照率无需再按日期归档
# 除 MCD43A4 需用于光谱指数计算外, MOD11A1 日间温度与 MCD43A3 反照率无需再按日期归档
if prod_name == "MOD11A1" or prod_name == "MCD43A3":
output_path = os.path.join(output_dir, "TIF")
else:
@ -199,39 +210,44 @@ def process_granule(
os.makedirs(output_path, exist_ok=True)
output_file = os.path.join(output_path, out_tif_name)
if not os.path.isfile(output_file):
# Step1: 下载 HDF 文件
if not os.path.isfile(download_file):
try:
earthaccess.download(granule_urls, download_path)
except Exception as e:
logging.error(f"Error downloading {download_file}: {e}")
return
else:
logging.warning(f"{download_file} already exists. Skipping.")
# Step2: 处理 HDF 文件
# Step1: 下载 HDF 文件
if not os.path.isfile(download_file):
try:
process_modis(download_file, prod_name, roi, clip, scale, target_crs, output_file)
earthaccess.download(granule_urls, download_path)
except Exception as e:
logging.error(f"Error downloading {download_file}: {e}")
return
else:
logging.warning(f"{download_file} already exists. Skipping.")
# Step2: 处理 HDF 文件
if not os.path.isfile(output_file):
try:
process_modis(
download_file, prod_name, roi, clip, scale, target_crs, output_file
)
logging.info(f"Processed {output_file} successfully.")
except Exception as e:
os.remove(download_file)
logging.info(f"Removed corrupted file {download_file}. Retrying download.")
process_granule(granule_urls, roi, clip, scale, output_dir, target_crs, tile_id)
logging.info(f"Processed {output_file} successfully.")
process_granule(
granule_urls, roi, clip, scale, output_dir, target_crs, tile_id
)
else:
logging.warning(f"{output_file} already exists. Skipping.")
def main(
region: list,
region_file: Path,
asset_name: str,
modis_tile_id: str,
years: list,
dates: tuple[str, str],
tile_id: str,
target_crs: str,
output_root_dir: str,
):
bbox = tuple(list(region.total_bounds))
region = gpd.read_file(region_file)
results_urls = []
output_dir = os.path.join(output_root_dir, asset_name)
os.makedirs(output_dir, exist_ok=True)
@ -246,7 +262,7 @@ def main(
)
year_temporal = (f"{year}-{dates[0]}T00:00:00", f"{year}-{dates[1]}T23:59:59")
year_results = earthdata_search(
[asset_name], year_temporal, bbox, modis_tile_id
[asset_name], year_temporal, region_file, modis_tile_id
)
results_urls.extend(year_results)
with open(year_results_file, "w") as f:
@ -255,20 +271,9 @@ def main(
with open(results_urls_file, "w") as f:
json.dump(results_urls, f)
# 配置日志, 首次配置生效, 后续嵌套配置无效
logging.basicConfig(
level=logging.INFO, # 级别为INFO及以上的日志会被记录
format="%(levelname)s:%(asctime)s ||| %(message)s",
handlers=[
logging.StreamHandler(sys.stdout), # 输出到控制台
logging.FileHandler(
f"{output_dir}\\{asset_name}_{tile_id}_SuPER.log"
), # 输出到日志文件
],
)
client = dask.distributed.Client(timeout=60, memory_limit="8GB")
client.run(setup_dask_environment)
client.run(setup_logging)
all_start_time = time.time()
for year in years:
year_results_dir = os.path.join(output_dir, year)
@ -276,6 +281,11 @@ def main(
year_results_dir, f"{asset_name}_{modis_tile_id}_{year}_results_urls.json"
)
year_results = json.load(open(year_results_file))
# 配置主进程日志
logs_file = os.path.join(
year_results_dir, f"{asset_name}_{tile_id}_{year}_SuPER.log"
)
setup_logging(logs_file)
client.scatter(year_results)
start_time = time.time()
@ -287,14 +297,21 @@ def main(
clip=True,
scale=True,
output_dir=year_results_dir,
target_crs="EPSG:32649",
tile_id=tile_id,
target_crs=target_crs,
)
for granule_url in year_results
]
dask.compute(*tasks)
total_time = time.time() - start_time
# Dask任务结束后读取dask_worker.txt日志文件内容, 并注入到logs_file中
with open(logs_file, "a") as f:
with open("dask_worker.log", "r") as f2:
f.write(f2.read())
# 随后清空dask_worker.txt文件
with open("dask_worker.log", "w") as f:
f.write("")
logging.info(
f"{year} MODIS {asset_name} Downloading complete and proccessed. Total time: {total_time} seconds"
)
@ -303,19 +320,32 @@ def main(
logging.info(
f"All MODIS {asset_name} Downloading complete and proccessed. Total time: {all_total_time} seconds"
)
# 最后删除dask_worker.log文件
os.remove("dask_worker.log")
return
if __name__ == "__main__":
earthaccess.login(persist=True)
# region = gpd.read_file("./data/vectors/wuling_guanqu_polygon.geojson")
tile_id = "49REL"
region = gpd.read_file(f"./data/vectors/{tile_id}.geojson")
# asset_name = "MOD11A1"
region_file = f"./data/vectors/{tile_id}.geojson"
target_crs = "EPSG:32649"
asset_name = "MOD11A1"
# asset_name = "MCD43A3"
asset_name = "MCD43A4"
# asset_name = "MCD43A4"
modis_tile_id = "h27v06"
# 示例文件名称: MCD43A4.A2024001.h27v05.061.2024010140610.hdf
years = ["2024", "2023", "2022"]
years = ["2024"]
dates = ("03-01", "10-31")
output_root_dir = ".\\data\\MODIS\\"
main(region, asset_name, modis_tile_id, years, dates, tile_id, output_root_dir)
main(
region_file,
asset_name,
modis_tile_id,
years,
dates,
tile_id,
target_crs,
output_root_dir,
)

View File

@ -19,7 +19,7 @@ RTC-S1 数据产品由美国宇航局喷气推进实验室 (JPL) OPERA 项目组
5. 地形校正 (Terrain Flattening)
6. UTM投影重采样
- 产品特性:
- 时间覆盖: 2021-01-01 (持续更新)
- 时间覆盖: 2014-01-01 (历史数据已全覆盖, 持续更新最新数据)
- 空间分辨率: 30
- 数据格式: GeoTIFF/HDF5
- 进一步预处理:
@ -29,13 +29,14 @@ RTC-S1 数据产品由美国宇航局喷气推进实验室 (JPL) OPERA 项目组
-------------------------------------------------------------------------------
Authors: Hong Xie
Last Updated: 2025-03-10
Last Updated: 2025-10-20
===============================================================================
"""
import os
import sys
import glob
from pathlib import Path
import json
import time
from datetime import datetime
@ -48,10 +49,11 @@ import numpy as np
import xarray as xr
from rioxarray import open_rasterio
sys.path.append("D:/NASA_EarthData_Script")
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from utils.common_utils import setup_dask_environment, clip_image, mosaic_images
from HLS_SuPER.HLS_Su import earthdata_search
from src.utils.common_utils import setup_dask_environment, clip_image, mosaic_images
from src.HLS_SuPER.HLS_Su import earthdata_search
def download_granule(
@ -141,7 +143,7 @@ def convert_to_db(image: xr.DataArray) -> xr.DataArray:
def process_sar_granule(
url_basenames: list[str],
output_dir: str,
output_dir: Path,
date: str,
tile_id: str,
roi: list,
@ -156,7 +158,7 @@ def process_sar_granule(
----------
url_basenames : list[str]
用于文件匹配的 RTC-S1 数据下载链接的文件名列表
output_dir : str
output_dir : Path
输出根目录
date : str
日期, 格式为 YYYY%j
@ -243,7 +245,7 @@ def process_sar_granule(
def process_sar_static_granule(
url_basenames: list[str], output_dir: str, tile_id: str, roi: list, clip=True
url_basenames: list[str], output_dir: Path, tile_id: str, roi: list, clip=True
) -> bool:
"""
OPERA RTC-S1-STATIC 静态数据预处理
@ -254,7 +256,7 @@ def process_sar_static_granule(
----------
url_basenames : list[str]
用于文件匹配的 RTC-S1-STATIC 数据下载链接的文件名列表
output_dir: str
output_dir: Path
输出根目录
tile_id: str
tile id
@ -327,12 +329,12 @@ def process_sar_static_granule(
def main(
asset_name: list[str],
region: list,
region_file: Path,
years: list[str | int],
output_dir: str,
output_dir: Path,
tile_id: str,
):
bbox = tuple(list(region.total_bounds))
region = read_file(region_file)
# 示例文件名称: OPERA_L2_RTC-S1_T040-083952-IW1_20240605T102012Z_20240612T053743Z_S1A_30_v1.0_VH.tif
static_name = ["OPERA_L2_RTC-S1-STATIC_V1"]
sar_output_dir = os.path.join(output_dir, "RTC_S1")
@ -353,7 +355,7 @@ def main(
year_results_dir, f"RTC_S1_{tile_id}_{year}_results_urls.json"
)
year_temporal = (f"{year}-06-01T00:00:00", f"{year}-08-31T23:59:59")
year_results = earthdata_search(asset_name, year_temporal, roi=bbox)
year_results = earthdata_search(asset_name, year_temporal, region_file=region_file)
results_urls.extend(year_results)
with open(year_results_file, "w") as f:
json.dump(year_results, f)
@ -361,7 +363,7 @@ def main(
with open(results_urls_file, "w") as f:
json.dump(results_urls, f)
static_granule_urls = earthdata_search(static_name, roi=bbox)
static_granule_urls = earthdata_search(static_name, region_file=region_file)
static_url_basenames = [
os.path.basename(url) for sublist in static_granule_urls for url in sublist
]
@ -454,10 +456,11 @@ if __name__ == "__main__":
earthaccess.login(persist=True)
asset_name = ["OPERA_L2_RTC-S1_V1"]
tile_id = "49REL"
region = read_file(f".\\data\\vectors\\{tile_id}.geojson")
# region = read_file(f".\\data\\vectors\\{tile_id}.geojson")
region_file = Path(f".\\data\\vectors\\{tile_id}.geojson")
# region = read_file("./data/vectors/wuling_guanqu_polygon.geojson")
# years = ["2024", "2023", "2022"]
years = ["2024"]
output_dir = ".\\data\\SAR"
os.makedirs(output_dir, exist_ok=True)
main(asset_name, region, years, output_dir, tile_id)
output_dir = Path(".\\data\\SAR")
output_dir.mkdir(parents=True, exist_ok=True)
main(asset_name, region_file, years, output_dir, tile_id)

View File

@ -28,16 +28,17 @@ import h5py
from osgeo import gdal
import xarray as xr
sys.path.append("D:/NASA_EarthData_Script")
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from utils.common_params import EASE2_GRID_PARAMS, EPSG
from utils.common_utils import (
from src.utils.common_params import EASE2_GRID_PARAMS, EPSG
from src.utils.common_utils import (
array_to_raster,
load_band_as_arr,
reproject_image,
setup_dask_environment,
)
from HLS_SuPER.HLS_Su import earthdata_search
from src.HLS_SuPER.HLS_Su import earthdata_search
def convert(source_h5_path: str, target_tif_path: str, date: str) -> None:

View File

@ -55,7 +55,7 @@ import logging
import time
from datetime import datetime, timedelta
sys.path.append("D:\NASA_EarthData_Script")
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
class getInsituData:

View File

@ -3,24 +3,24 @@
===============================================================================
HLS Processing and Exporting Reformatted Data (HLS_PER)
This module contains functions to conduct subsetting and quality filtering of
This module contains functions to conduct subsetting and quality filtering of
search results.
-------------------------------------------------------------------------------
Authors: Cole Krehbiel, Mahsa Jami, and Erik Bolch
Editor: Hong Xie
Last Updated: 2025-03-30
Last Updated: 2026-05-18
===============================================================================
"""
import logging
import os
import sys
import logging
import numpy as np
from datetime import datetime as dt
import xarray as xr
import rioxarray as rxr
import dask.distributed
import numpy as np
import rioxarray as rxr
import xarray as xr
def create_output_name(url, band_dict):
@ -52,9 +52,13 @@ def open_hls(
Some HLS Landsat scenes have the metadata in the wrong location.
"""
# Open using rioxarray
da = rxr.open_rasterio(url, chunks=chunk_size, mask_and_scale=False).squeeze(
"band", drop=True
)
try:
da = rxr.open_rasterio(url, chunks=chunk_size, mask_and_scale=False).squeeze(
"band", drop=True
)
except Exception as e:
logging.warning(f"Failed to open {url}: {e}")
return None
# (Add) 若未获取到数据, 则返回 None
if da is None:
return None
@ -181,7 +185,6 @@ def process_granule(
os.path.isfile(f"{output_dir}/{create_output_name(url, band_dict)}")
for url in granule_urls
):
# First Handle Quality Layer
# (Add) 简化原有的冗余处理, 仅处理质量层, 并最后移除质量层下载url
if quality_filter:
@ -198,6 +201,12 @@ def process_granule(
if not os.path.isfile(quality_output_file):
# Open Quality Layer
qa_da = open_hls(quality_url, roi, clip, scale, chunk_size)
# (Add) 若返回的da为None, 则表示该url对应的文件不存在/无法访问, 跳过整个granule
if qa_da is None:
logging.warning(
f"Quality layer {quality_url} not accessible. Skipping granule."
)
return
# Write Output
# (Add) 添加压缩选项参数 compress
# compress 参数是源自 rioxarray 继承的 rasterio 的选项, 可以参考 https://rasterio.readthedocs.io/en/latest/api/rasterio.enums.html#rasterio.enums.Compression
@ -208,6 +217,12 @@ def process_granule(
)
else:
qa_da = open_hls(quality_output_file, roi, clip, scale, chunk_size)
# (Add) 若返回的da为None, 则表示本地文件无法访问, 跳过整个granule
if qa_da is None:
logging.warning(
f"Local quality file {quality_output_file} not accessible. Skipping granule."
)
return
logging.info(
f"Existing quality file {quality_output_name} found in {output_dir}."
)

View File

@ -3,23 +3,91 @@
===============================================================================
This module contains functions related to searching and preprocessing HLS data.
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Authors: Mahsa Jami, Cole Krehbiel, and Erik Bolch
Contact: lpdaac@usgs.gov
Contact: lpdaac@usgs.gov
Editor: Hong Xie
Last Updated: 2025-02-20
Last Updated: 2026-05-17
===============================================================================
"""
# Import necessary packages
import numpy as np
import logging
import os
import sys
from pathlib import Path
import earthaccess
import geopandas as gpd
import numpy as np
from shapely.geometry import box
from shapely.geometry.polygon import orient
def ensure_ccw(geom):
"""
Ensure the exterior ring of the polygon is counterclockwise.
"""
if geom.exterior.is_ccw:
return geom # Already counterclockwise
else:
return orient(geom, sign=1.0) # Make it counterclockwise
def format_roi(roi: Path):
"""
Determines if submitted ROI is a file or bbox coordinates.
If a file, opens a GeoJSON or shapefile and creates a list of polygon vertices in the correct order. If the file has multiple polygons it will use a unary union convex hull of the external bounds.
If bbox coordinates, creates a geodataframe with a single Polygon geometry.
Returns a geopandas dataframe for clipping and a list of vertices for searching.
"""
if os.path.isfile(roi): # and roi.endswith(("geojson", "shp")):
try:
# Open ROI if file
roi = gpd.read_file(roi)
# Check if ROI is in Geographic CRS, if not, convert to it
if not roi.crs.is_geographic:
roi = roi.to_crs("EPSG:4326")
logging.info(
"Note: ROI submitted is being converted to Geographic CRS (EPSG:4326)"
)
# (Add) 添加对多种几何图形类型的支持, 将MultiPolygon合并为Polygon
if len(roi) > 1 or roi.geometry[0].geom_type == "MultiPolygon":
# Merge all Polygon geometries and create external boundary
logging.info(
"Multiple polygons detected. Creating single geometry of external coordinates."
)
single_geometry = roi.union_all().convex_hull
roi = gpd.GeoDataFrame(geometry=[single_geometry], crs=roi.crs)
roi["geometry"] = roi["geometry"].apply(ensure_ccw)
# List Vertices in correct order for search
# (Add) 使用外包矩形坐标简化提交检索时使用的坐标, 仅支持逆时针
vertices_list = list(roi.geometry[0].exterior.coords)
except (FileNotFoundError, ValueError):
sys.exit(
f"The GeoJSON/shapefile is either not valid or could not be found.\nPlease double check the name and provide the absolute path to the file or make sure that it is located in {os.getcwd()}"
)
else:
# If bbox coordinates are submitted
bbox = tuple(map(float, roi.strip("'\"").split(",")))
print(bbox)
# Convert bbox to a geodataframe for clipping
roi = gpd.GeoDataFrame(geometry=[box(*bbox)], crs="EPSG:4326")
roi["geometry"] = roi["geometry"].apply(ensure_ccw)
vertices_list = list(roi.geometry[0].exterior.coords)
return (roi, vertices_list)
def earthdata_search(
asset_name: list,
dates: tuple = None,
roi: list = None,
region_file: Path = None,
tile_id: str = None,
hours: tuple = None,
log=False,
@ -30,28 +98,32 @@ def earthdata_search(
For example:
- MODIS: MCD43A3, MCD43A4, MOD11A1, MOD11A2, ...
- SMAP: SPL3SMP_E, SPL4SMGP, ...
- DEM: NASADEM_HGT, NASADEM_SC, ...
- GPM: GPM_3IMERGDL, ...
- DEM: NASADEM_HGT, NASADEM_SC, ALOS_PSR_RTC_HIGH, ALOS_PSR_RTC_LOW, ...
"""
# Search for data
if dates and not roi:
# 全球范围的数据集不需要roi参数
if not region_file:
# 全球范围的数据集不需要roi参数, 如 SMAP, GPM
results = earthaccess.search_data(
short_name=list(asset_name),
temporal=dates,
)
elif roi and not dates:
# 适用非时间序列数据, 如 DEM
results = earthaccess.search_data(
short_name=list(asset_name),
bounding_box=roi,
)
else:
results = earthaccess.search_data(
short_name=list(asset_name),
bounding_box=roi,
temporal=dates,
)
# 基于 roi 的外包多边形进行数据检索
roi, vertices_list = format_roi(region_file)
if not dates:
# 适用非时间序列数据, 如 DEM
results = earthaccess.search_data(
short_name=list(asset_name),
polygon=vertices_list,
)
else:
results = earthaccess.search_data(
short_name=list(asset_name),
polygon=vertices_list,
temporal=dates,
)
# 根据瓦片ID过滤影像
if tile_id:

View File

@ -1,11 +1,11 @@
# -*- coding: utf-8 -*-
"""
===============================================================================
HLS Subsetting, Processing, and Exporting Reformatted Data Prep Script
HLS Subsetting, Processing, and Exporting Reformatted Data Prep Script
Authors: Cole Krehbiel, Mahsa Jami, and Erik Bolch
Contact: lpdaac@usgs.gov
Editor: Hong Xie
Last Updated: 2025-01-15
Last Updated: 2026-05-17
===============================================================================
"""
@ -15,24 +15,24 @@ Last Updated: 2025-01-15
# TODO Add ZARR as output option
import argparse
import sys
import json
import logging
import os
import shutil
import logging
import sys
import time
import json
import earthaccess
from shapely.geometry import box
import geopandas as gpd
from datetime import datetime as dt
import dask.distributed
import earthaccess
sys.path.append("D:/NASA_EarthData_Script")
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from utils.common_utils import setup_dask_environment
from HLS_Su import hls_search
from HLS_PER import process_granule, create_timeseries_dataset
from src.HLS_SuPER.HLS_PER import create_timeseries_dataset, process_granule
from src.HLS_SuPER.HLS_Su import format_roi, hls_search
from src.utils.common_utils import setup_dask_environment
def parse_arguments():
@ -170,59 +170,6 @@ def parse_arguments():
return parser.parse_args()
def format_roi(roi):
"""
Determines if submitted ROI is a file or bbox coordinates.
If a file, opens a GeoJSON or shapefile and creates a list of polygon vertices in the correct order. If the file has multiple polygons it will use a unary union convex hull of the external bounds.
If bbox coordinates, creates a geodataframe with a single Polygon geometry.
Returns a geopandas dataframe for clipping and a list of vertices for searching.
"""
if os.path.isfile(roi): # and roi.endswith(("geojson", "shp")):
try:
# Open ROI if file
roi = gpd.read_file(roi)
# (Add) 添加对多种几何图形类型的支持, 将MultiPolygon合并为Polygon
if len(roi) > 1 or roi.geometry[0].geom_type == "MultiPolygon":
# Merge all Polygon geometries and create external boundary
logging.info(
"Multiple polygons detected. Creating single geometry of external coordinates."
)
single_geometry = roi.unary_union.convex_hull
roi = gpd.GeoDataFrame(geometry=[single_geometry], crs=roi.crs)
# Check if ROI is in Geographic CRS, if not, convert to it
if roi.crs.is_geographic:
# List Vertices in correct order for search
# (Add) 使用外包矩形坐标作为检索使用的坐标
minx, miny, maxx, maxy = roi.total_bounds
bounding_box = box(minx, miny, maxx, maxy)
vertices_list = list(bounding_box.exterior.coords)
else:
roi_geographic = roi.to_crs("EPSG:4326")
logging.info(
"Note: ROI submitted is being converted to Geographic CRS (EPSG:4326)"
)
vertices_list = list(roi_geographic.geometry[0].exterior.coords)
except (FileNotFoundError, ValueError):
sys.exit(
f"The GeoJSON/shapefile is either not valid or could not be found.\nPlease double check the name and provide the absolute path to the file or make sure that it is located in {os.getcwd()}"
)
else:
# If bbox coordinates are submitted
bbox = tuple(map(float, roi.strip("'\"").split(",")))
print(bbox)
# Convert bbox to a geodataframe for clipping
roi = gpd.GeoDataFrame(geometry=[box(*bbox)], crs="EPSG:4326")
vertices_list = list(roi.geometry[0].exterior.coords)
return (roi, vertices_list)
def format_dates(start, end):
# Strip Quotes
start = start.strip("'").strip('"')
@ -248,6 +195,8 @@ def format_tile_id(tile_id):
"""
(Add) 格式化tile_id参数
"""
if tile_id is None:
return None
tile_id = tile_id.strip("'").strip('"')
return str(tile_id)
@ -404,6 +353,21 @@ def confirm_action(prompt):
"""
Prompts the user to confirm an action.
"""
non_interactive = not sys.stdin.isatty() or os.environ.get(
"HLS_SUPER_NON_INTERACTIVE", ""
).lower() in {"1", "true", "yes", "y"}
if non_interactive:
prompt_l = prompt.lower()
if "use the existing results file" in prompt_l:
return False
if "overwrite the existing results file" in prompt_l:
return True
if "proceed with processing" in prompt_l:
return True
if "temporary directory" in prompt_l:
return True
return True
while True:
response = input(prompt).lower()
if response in ["y", "yes"]:
@ -481,6 +445,7 @@ def main():
# Defaults to the current directory
output_dir = os.getcwd() + os.sep
os.makedirs(output_dir, exist_ok=True)
logging.info(f"Output directory set to: {output_dir}")
# Format/Validate Dates
@ -569,6 +534,10 @@ def main():
)
logging.info(filter_log)
if results_count == 0:
logging.warning("No data found matching the search criteria. Exiting.")
sys.exit("No data found. Processing aborted.")
# Confirm Processing
if not confirm_action("Do you want to proceed with processing? (y/n)"):
sys.exit("Processing aborted.")
@ -607,7 +576,7 @@ def main():
logging.info("Processing...")
tasks = [
dask.delayed(process_granule)(
granule_url,
granule_urls,
roi=roi,
clip=clip,
quality_filter=qf,
@ -617,7 +586,7 @@ def main():
bit_nums=[1, 3],
chunk_size=chunk_size,
)
for granule_url in results_urls
for granule_urls in results_urls
]
dask.compute(*tasks)
@ -641,7 +610,7 @@ def main():
# End Timer
total_time = time.time() - start_time
logging.info(f"Processing complete. Total time: {round(total_time,2)}s, ")
logging.info(f"Processing complete. Total time: {round(total_time, 2)}s.")
if __name__ == "__main__":

9
src/__init__.py Normal file
View File

@ -0,0 +1,9 @@
# -*- coding: utf-8 -*-
"""算法源码包
包含所有遥感数据处理算法模块
- HLS_SuPER: HLS 遥感数据处理
- DATA_SuPER: 多源遥感数据处理MODISSMAPDEMSARGPM
- Basemap_SuPER: 底图数据处理
- utils: 通用工具函数
"""

View File

@ -5,29 +5,29 @@
-------------------------------------------------------------------------------
Authors: Hong Xie
Last Updated: 2025-07-07
Last Updated: 2025-09-11
===============================================================================
"""
import os
import sys
import glob
import json
import logging
import os
import sys
from datetime import datetime
import earthaccess
import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
from affine import Affine
from osgeo import gdal, gdal_array
from shapely import box
import xarray as xr
from rasterio.enums import Resampling
from rasterio.merge import merge
from rioxarray.merge import merge_arrays
from rioxarray import open_rasterio
import geopandas as gpd
import matplotlib.pyplot as plt
from rioxarray.merge import merge_arrays
from shapely import box
gdal.UseExceptions()
@ -229,10 +229,19 @@ def setup_dask_environment():
"""
Passes RIO environment variables to dask workers for authentication.
"""
import os
import rasterio
cookie_file_path = os.path.expanduser("~/cookies.txt")
candidate_cookie_paths = [
os.environ.get("EARTHDATA_COOKIE_FILE"),
os.path.expanduser("~/.urs_cookies"),
os.path.expanduser("~/.cookies"),
os.path.expanduser("~/cookies.txt"),
]
cookie_file_path = next(
(p for p in candidate_cookie_paths if p and os.path.exists(p)),
os.path.expanduser("~/cookies.txt"),
)
global env
gdal_config = {
@ -250,24 +259,35 @@ def setup_dask_environment():
env.__enter__()
def setup_logging(log_file: str = "dask_worker.log"):
def setup_logging(log_file: str = None):
"""
在Dask工作进程中设置logging
设置logging
Parameters
----------
log_file : str, optional
日志文件路径, by default "dask_worker.log"
日志文件路径, by default None
"""
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s:%(asctime)s ||| %(message)s",
handlers=[
logging.StreamHandler(sys.stdout),
logging.FileHandler(log_file),
],
)
if log_file is None:
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s:%(asctime)s ||| %(message)s",
handlers=[logging.StreamHandler(sys.stdout)],
encoding="utf-8", # Python 3.9+ 支持此参数
)
else:
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s:%(asctime)s ||| %(message)s",
handlers=[
logging.StreamHandler(sys.stdout),
logging.FileHandler(log_file, encoding="utf-8"),
],
encoding="utf-8", # Python 3.9+ 支持此参数
)
def load_band_as_arr(org_tif_path, band_num=1):
"""
@ -384,7 +404,7 @@ def array_to_raster(
except AttributeError:
# For backwards compatibility with older version of GDAL
rast = gdal.Open(gdal_array.GetArrayFilename(data))
except:
except Exception:
rast = gdal_array.OpenArray(data)
rast.SetGeoTransform(transform)
rast.SetProjection(wkt)
@ -410,15 +430,26 @@ def create_quality_mask(quality_data, bit_nums: list = [0, 1, 2, 3, 4, 5]):
def clip_image(
image: xr.DataArray | xr.Dataset, roi: gpd.GeoDataFrame, clip_by_box=True
):
image: xr.DataArray | xr.Dataset, roi: gpd.GeoDataFrame = None, clip_by_box=True
) -> xr.DataArray | xr.Dataset | None:
"""
Clip Image data to ROI.
args:
image (xarray.DataArray | xarray.Dataset): 通过 rioxarray.open_rasterio 加载的图像数据.
roi (gpd.GeoDataFrame): 感兴趣区数据.
clip_by_box (bool): 是否使用 bbox 进行裁剪, 默认为 True.
Parameters
----------
image : xarray.DataArray | xarray.Dataset
通过 rioxarray.open_rasterio 加载的图像数据.
roi : gpd.GeoDataFrame, optional
感兴趣区数据.
clip_by_box : bool, optional
是否使用 bbox 进行裁剪, 默认为 True.
Returns
-------
xarray.DataArray | xarray.Dataset | None
裁剪后的图像数据. 若裁剪后数据全为无效值, 则返回 None.
"""
if roi is None:
@ -443,20 +474,30 @@ def clip_image(
return image_cliped
def clip_roi_image(file_path: str, grid: gpd.GeoDataFrame) -> xr.DataArray | None:
def clip_roi_image(
file_path: str, grid: gpd.GeoDataFrame = None
) -> xr.DataArray | None:
"""
按研究区范围裁剪影像
args:
file_path (str): 待裁剪影像路径
grid (gpd.GeoDataFrame): 格网范围
return:
raster_cliped (xr.DataArray): 裁剪后的影像
Parameters
----------
file_path : str
待裁剪影像路径
grid : gpd.GeoDataFrame, optional
格网范围, 默认为 None.
Returns
-------
raster_cliped : xr.DataArray
裁剪后的影像
"""
raster = open_rasterio(file_path)
try:
doy = os.path.basename(file_path).split(".")[3]
except Exception as e:
except Exception:
doy = None
if doy:
raster.attrs["DOY"] = doy
@ -487,15 +528,27 @@ def reproject_image(
target_crs: str = None,
target_shape: tuple = None,
target_image: xr.DataArray = None,
):
) -> xr.DataArray:
"""
Reproject Image data to target CRS or target data.
args:
image (xarray.DataArray): 通过 rioxarray.open_rasterio 加载的图像数据.
target_crs (str): Target CRS, eg. EPSG:4326.
target_shape (tuple): Target shape, eg. (1000, 1000).
target_image (xarray.DataArray): Target image, eg. rioxarray.open_rasterio 加载的图像数据.
Parameters
----------
image : xarray.DataArray
通过 rioxarray.open_rasterio 加载的图像数据.
target_crs : str, optional
Target CRS, eg. EPSG:4326.
target_shape : tuple, optional
Target shape, eg. (1000, 1000).
target_image : xarray.DataArray, optional
Target image, eg. rioxarray.open_rasterio 加载的图像数据.
Returns
-------
xarray.DataArray
重投影后的图像数据.
"""
if target_image is not None:
# 使用 target_image 进行重投影匹配
@ -506,9 +559,9 @@ def reproject_image(
target_image.shape[1] == image.shape[1]
and target_image.shape[2] == image.shape[2]
):
# 若判断为降尺度/等尺度, 则直接使用 cubic_spline 重采样投影到目标影像
# 若判断为降尺度/等尺度, 则直接使用 cubic 双三次插值重采样投影到目标影像
image_reprojed = image.rio.reproject_match(
target_image, resampling=Resampling.cubic_spline
target_image, resampling=Resampling.cubic
)
else:
# print("target_image shape is not match with image shape", image.shape, "to", target_image.shape)
@ -520,7 +573,7 @@ def reproject_image(
# 使用 target_crs 进行重投影
reproject_kwargs = {
"dst_crs": target_crs,
"resampling": Resampling.cubic_spline,
"resampling": Resampling.cubic,
}
if target_shape is not None:
reproject_kwargs["shape"] = target_shape
@ -563,7 +616,7 @@ def mosaic_images(
tif_list,
nodata=nodata,
method=method,
resampling=Resampling.cubic_spline,
resampling=Resampling.cubic,
)
# 将结果重新构建为 xarray 数据集
# 单张SAR影像直接读取 transform: 233400.0 30.0 0.0 3463020.0 0.0 -30.0
@ -780,6 +833,8 @@ def plot(data, title=None, cmap="gray"):
title (str): 标题
cmap (str): 颜色映射
"""
import matplotlib.pyplot as plt
plt.imshow(data)
plt.title(title)
plt.axis("off") # 关闭坐标轴

214
src/utils/raw_to_cog.py Normal file
View File

@ -0,0 +1,214 @@
# -*- coding: utf-8 -*-
"""
===============================================================================
传统GeoTIFF转COG
COG (Cloud Optimized GeoTIFF) 是一种优化设计的 GeoTIFF 文件格式, 特别适用于云环境下
的存储和访问. 自带分块存储功能与影像金字塔, 能够在请求时按需加载, 实现了高效的压缩和传输.
-------------------------------------------------------------------------------
Authors: CVEO Team
Last Updated: 2026-04-22
===============================================================================
"""
import logging
import os
import sys
from pathlib import Path
import numpy as np
from osgeo import gdal
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
BASE_DIR = Path(__file__).parent.parent.parent
sys.path.append(str(BASE_DIR))
from src.utils.common_utils import setup_logging
gdal.UseExceptions()
# 设置 GDAL 选项以优化性能
gdal.SetConfigOption("GDAL_NUM_THREADS", "ALL_CPUS")
gdal.SetConfigOption("GDAL_CACHEMAX", "1024")
def check_and_get_image_info(ds):
"""
检查影像基本信息以及是否存在 NaN NoData
Returns
-------
tuple: (has_invalid, original_no_data)
has_invalid: bool
是否存在 NaN NoData
original_no_data: float or None
原影像定义的 NoData
"""
width = ds.RasterXSize
height = ds.RasterYSize
band_count = ds.RasterCount
logging.info(f"影像尺寸: {width}x{height}, 波段数: {band_count}")
has_invalid = False
original_no_data = None
# 读取第一个波段的 NoData 值作为参考
if band_count > 0:
original_no_data = ds.GetRasterBand(1).GetNoDataValue()
if original_no_data is not None:
logging.info(f"波段 1: 原影像定义的 NoData 值为 {original_no_data}")
has_invalid = True
else:
logging.info("波段 1: 原影像未定义 NoData 值")
# 仅检查是否存在 NaN 值
return has_invalid, original_no_data
def tif_to_cog(
input_path: str,
output_path: str,
output_type: int = gdal.GDT_Float32,
no_data: float = np.nan,
compress: str = "DEFLATE",
):
"""
将传统 GeoTIFF 转换为 COG 格式
Parameters
----------
input_path : str
输入 GeoTIFF 文件路径
output_path : str
输出 COG 文件路径
output_type : int, optional
输出数据类型, 本质是整数索引, 默认 float32, by default gdal.GDT_Float32
no_data : float, optional
无效值, 默认 np.nan, by default np.nan
compress : str, optional
压缩算法, 可选 "DEFLATE" "LZW", by default "DEFLATE"
"""
logging.info(f"输入文件: {input_path}")
if not os.path.exists(str(input_path)):
input_path = str(input_path).replace(".tif", "(2).tif")
if not os.path.exists(input_path):
logging.error(f"无法找到输入文件: {input_path}")
return
ds = gdal.Open(input_path)
if ds is None:
logging.error(f"无法打开输入文件: {input_path}")
return
# 1. 检查影像信息和无效值
has_invalid, original_no_data = check_and_get_image_info(ds)
# 2. 确定最终使用的 NoData 值
target_no_data = no_data
# 如果用户没有指定特定的 no_data (即传入的是 nan), 则尝试沿用原数据的 nodata
# 如果原数据也没有 nodata, 则根据 output_type 设置默认值
if np.isnan(target_no_data):
if original_no_data is None:
if output_type == gdal.GDT_Int16:
target_no_data = -32768
elif output_type == gdal.GDT_Float32:
target_no_data = -9999
elif output_type == gdal.GDT_Byte:
target_no_data = None
else:
target_no_data = np.nan
logging.warning(
f"原数据未设无效值且未指定新无效值, 现根据输出类型设为: {target_no_data}"
)
else:
target_no_data = original_no_data
logging.info(f"沿用原数据无效值: {target_no_data}")
else:
logging.info(f"使用指定的新无效值: {target_no_data}")
# 3. 转换逻辑
# 若原始影像存在无效值(NaN 或 NoData) 且 指定了新的 NoData 值, 且 新旧 NoData 值不一致 / 存在 NaN 值时使用重映射 (gdal.Warp)
use_warp = False
if target_no_data is not None and not np.isnan(target_no_data):
if has_invalid:
if original_no_data != target_no_data:
use_warp = True
logging.info(
f"原始影像存在无效值且已指定新无效值 ({original_no_data} -> {target_no_data}), 将使用 Warp 进行处理"
)
else:
pass
# 开始转换
logging.info("开始转换为 COG 格式...")
creation_options = [
f"COMPRESS={compress}",
"PREDICTOR=2", # 差值预测, 利于影像压缩
"NUM_THREADS=ALL_CPUS",
"BIGTIFF=IF_SAFER",
# "ZLEVEL=4", # 压缩级别, 支持 1-9, 默认为 6, COG 不支持设置
# "TILED=YES", # 分块参数, COG 格式自带分块, 不支持手动设置
# "PHOTOMETRIC=RGB", # 可视化参数, COG 格式不支持设置
]
if use_warp:
# 使用 Warp
warp_options = gdal.WarpOptions(
format="COG", # 输出为 COG 格式, 自动构建金字塔
outputType=output_type,
dstNodata=target_no_data,
# srcNodata 未指定时, 将自动读取源数据的 nodata
# 若源数据存在 NaN 却未定义 nodata, 将自动处理 float 的 NaN
# srcNodata=original_no_data,
creationOptions=creation_options,
callback=gdal.TermProgress_nocb, # 进度回调, 不显示进度条
multithread=True,
)
gdal.Warp(str(output_path), ds, options=warp_options)
else:
# 使用 Translate (无需修改原数据时使用, 更高效)
# 但 gdal.Translate 无法将原始 NoData 直接转为新的 nodata
translate_options = gdal.TranslateOptions(
format="COG",
outputType=output_type,
noData=target_no_data,
creationOptions=creation_options,
callback=gdal.TermProgress_nocb,
)
gdal.Translate(str(output_path), ds, options=translate_options)
logging.info(f"结果已保存到: {output_path}")
ds = None
return
def main(input_path, output_path, output_type=gdal.GDT_Float32):
input_path = Path(input_path)
output_dir = Path(output_path).parent
os.makedirs(output_dir, exist_ok=True)
# 配置日志记录
log_file = output_dir / "toCOG.log"
setup_logging(str(log_file))
# 关于无效值
# Float32 类型可以设为 -9999
if output_type == gdal.GDT_Float32:
no_data = -9999
else:
no_data = np.nan
tif_to_cog(str(input_path), str(output_path), output_type, no_data=no_data)
if __name__ == "__main__":
# 输入目录: 包含分块tif影像的根目录
input_root = Path(r"D:\CVEOdata\RS_Data\Terrain\test")
# 输出目录: 存放最终COG结果的目录
output_root = Path(r"D:\CVEOdata\RS_Data\Terrain")
input_path = input_root / "GLO30.DSM.2014.Hubei.30m.tif"
output_path = output_root / "GLO30.DSM.2014.Hubei.30m.tif"
output_type = gdal.GDT_Float32
main(input_path, output_path, output_type)

275
src/utils/raw_to_rgba.py Normal file
View File

@ -0,0 +1,275 @@
# -*- coding: utf-8 -*-
"""
===============================================================================
将原始数值影像转换为 8bit RGBA 图像
- 支持 Red, Green, Blue 等单波段影像
- 支持多波段影像
1. 将原始数据中 NoData 值替换为 NaN, 并将其设置为 Alpha 通道
2. 对比度线性拉伸至 0-255;
3. 合并包含 Alpha 通道在内的 4 个波段;
4. 保存为 uint8 格式 RGBA 图像
-------------------------------------------------------------------------------
Authors: CVEO Team
Last Updated: 2026-05-18
===============================================================================
"""
import os
from pathlib import Path
from typing import List, Optional
import numpy as np
import xarray as xr
from osgeo import gdal
from PIL import Image
from rioxarray import open_rasterio
gdal.UseExceptions()
def normalize_band(
band: np.ndarray,
method: str = "minmax",
percentile: float = 2.0,
nodata: Optional[float] = None,
) -> np.ndarray:
"""
将波段数据归一化到 0-255 uint8
Parameters
----------
band: np.ndarray
输入波段数据
method: str, optional
归一化方法 ('minmax', 'percentile', 'clip'), by default "minmax"
percentile: float, optional
百分比截断参数 (仅用于 'percentile' 方法), by default 2.0
nodata: float, optional
NoData , 如果不为 None, 则将该值替换为 NaN, by default None
"""
band = np.asarray(band, dtype=np.float32)
# 处理 NoData 值
if nodata is not None:
band[np.isclose(band, nodata)] = np.nan
if method == "clip":
# 直接截断并转换
band[~np.isfinite(band)] = 0.0
return np.clip(band, 0.0, 255.0).astype(np.uint8)
if method == "percentile":
# 百分比截断并线性拉伸
lower = np.nanpercentile(band, percentile)
upper = np.nanpercentile(band, 100 - percentile)
else: # minmax
# 使用0值处理NaN和负值
band = np.where(np.isnan(band) | (band < 0), 0, band)
lower = np.min(band)
upper = np.max(band)
if upper == lower:
return np.zeros_like(band, dtype=np.uint8)
stretched = (band - lower) / (upper - lower) * 255.0
# 处理拉伸后的 NaN 值 (原始数据中的 NaN 会传播)
stretched[~np.isfinite(stretched)] = 0.0
return np.clip(stretched, 0, 255).astype(np.uint8)
def render_image_rgb(
tiff_path: Path,
bands_lst: List[int],
normalization_method: str,
percentile: float = 0,
save_tiff: bool = True,
output_suffix: str = "_rgb.png",
) -> str:
"""
通用 GDAL 数据源处理函数, 读取多波段图像并转换为 RGBA 图像.
Parameters
----------
tiff_path : Path
输入 TIFF 图像路径.
bands_lst : List[int]
RGB 波段索引列表 (GDAL 索引, 1 开始).
normalization_method : str
归一化方法 ('minmax', 'percentile', 'clip').
percentile : float, optional
百分比截断参数 (仅用于 'percentile' 方法), by default 0.
save_tiff : bool, optional
是否保存为 RGBA TIFF 文件, by default True.
output_suffix : str, optional
输出文件后缀, by default "_rgb.png".
Returns
-------
str
生成的 PNG 文件路径.
"""
if not tiff_path.exists():
raise FileNotFoundError(f"input not found: {tiff_path}")
if len(bands_lst) < 3:
raise ValueError("rgb need at least 3 bands")
ds = gdal.Open(str(tiff_path))
xsize = ds.RasterXSize
ysize = ds.RasterYSize
bands = bands_lst[:3]
chans = []
# 初始化 Alpha 掩膜 (全 True 表示全不透明/有效)
alpha_mask = np.ones((ysize, xsize), dtype=bool)
for b in bands:
band = ds.GetRasterBand(int(b))
nodata = band.GetNoDataValue()
arr = band.ReadAsArray(0, 0, xsize, ysize)
# 更新 Alpha 掩膜: 任何波段为 NoData 或 NaN则该像素透明
if nodata is not None:
alpha_mask &= ~np.isclose(arr, nodata)
if np.issubdtype(arr.dtype, np.floating):
alpha_mask &= np.isfinite(arr)
chans.append(
normalize_band(
arr, method=normalization_method, percentile=percentile, nodata=nodata
)
)
# 生成包含 Alpha 通道的 RGBA
alpha_channel = np.where(alpha_mask, 255, 0).astype(np.uint8)
chans.append(alpha_channel)
rgba = np.dstack(chans)
# 构建输出路径
base_path = str(tiff_path)
if base_path.lower().endswith(".tif"):
base_path = base_path[:-4]
png_path = base_path + output_suffix
if save_tiff:
tiff_output_path = base_path + "_rgb.tif"
driver = gdal.GetDriverByName("GTiff")
# 创建 4 波段 (RGBA)
out_ds = driver.Create(tiff_output_path, xsize, ysize, 4, gdal.GDT_Byte)
out_ds.SetProjection(ds.GetProjection())
out_ds.SetGeoTransform(ds.GetGeoTransform())
for i, chan in enumerate(chans):
out_ds.GetRasterBand(i + 1).WriteArray(chan)
out_ds.FlushCache()
out_ds = None
ds = None
Image.fromarray(rgba, mode="RGBA").save(png_path, format="PNG")
print(f"已完成RGBA图像转换, 保存至: {png_path}")
return png_path
def combine_bands_to_rgb(
red_path: str, green_path: str, blue_path: str, output_path: str
) -> None:
"""
将红, 绿, 蓝三个单波段图像文件合并为 RGBA 图像 (GeoTIFF).
Parameters
----------
red_path : str
红色波段文件路径.
green_path : str
绿色波段文件路径.
blue_path : str
蓝色波段文件路径.
output_path : str
输出 RGBA 图像路径.
"""
paths = [red_path, green_path, blue_path]
for path in paths:
if not os.path.exists(path):
raise FileNotFoundError(f"文件不存在: {path}")
# 读取三个波段数据
bands_data = []
ref_band = None
mask = None
for path in paths:
da = open_rasterio(path, masked=True).squeeze(dim="band", drop=True)
if ref_band is None:
ref_band = da
# 更新掩膜: masked=True 时 nodata 会被转换为 NaN
current_mask = np.isfinite(da.values)
if mask is None:
mask = current_mask
else:
mask &= current_mask
# 使用 minmax 方法
bands_data.append(normalize_band(da.values, method="minmax"))
# 不再设置 NoData 值,使用 Alpha 通道表示透明度
alpha_channel = np.where(mask, 255, 0).astype(np.uint8)
bands_data.append(alpha_channel)
# 合并四个波段为 RGBA 图像
rgb_array = xr.DataArray(
np.dstack(bands_data),
dims=("y", "x", "band"),
coords={"band": [1, 2, 3, 4], "y": ref_band.y, "x": ref_band.x},
)
# 转置维度顺序以符合rioxarray要求
rgb_array = rgb_array.transpose("band", "y", "x")
# 写入元数据
rgb_array.rio.write_crs(ref_band.rio.crs, inplace=True)
rgb_array.rio.write_transform(ref_band.rio.transform(), inplace=True)
# 保存为TIFF文件
rgb_array.rio.to_raster(output_path, driver="COG", dtype="uint8")
print(f"RGBA图像已保存到: {output_path}")
return
if __name__ == "__main__":
# tif_dir = "D:\\open-earthdata-cli\\data\\HLS\\2024\\2024012"
# red_path = os.path.join(
# tif_dir, "HLS.S30.T49RGP.2024012T031101.v2.0.RED.subset.tif"
# )
# green_path = os.path.join(
# tif_dir, "HLS.S30.T49RGP.2024012T031101.v2.0.GREEN.subset.tif"
# )
# blue_path = os.path.join(
# tif_dir, "HLS.S30.T49RGP.2024012T031101.v2.0.BLUE.subset.tif"
# )
# output_path = os.path.join(tif_dir, "HLS.S30.T49RGP.2024012T031101.v2.0.RGB.tif")
# tif_dir = "D:\\open-earthdata-cli\\data\\HLS\\2025\\2025011"
# red_path = os.path.join(
# tif_dir, "HLS.S30.T49RGP.2025011T031009.v2.0.RED.subset.tif"
# )
# green_path = os.path.join(
# tif_dir, "HLS.S30.T49RGP.2025011T031009.v2.0.GREEN.subset.tif"
# )
# blue_path = os.path.join(
# tif_dir, "HLS.S30.T49RGP.2025011T031009.v2.0.BLUE.subset.tif"
# )
# output_path = os.path.join(tif_dir, "HLS.S30.T49RGP.2025011T031009.v2.0.RGB.tif")
# combine_bands_to_rgb(red_path, green_path, blue_path, output_path)
imgs_dir = Path(r"D:\CVEOProjects\prjaef\aef-backend-demo\media\imgs")
for region_name in ["Target1", "Target2", "Target3", "Target4"]:
for year in range(2015, 2025 + 1):
img_name = f"S1_S2_{region_name}_{year}.tif"
img_tif_path = imgs_dir / region_name / img_name
bands_lst = [3, 2, 1] # GDAL 波段索引从1开始
# render_image_rgb(img_tif_path, bands_lst, "clip")
render_image_rgb(img_tif_path, bands_lst, "percentile", percentile=1)

229
src/utils/sr2rgb_light.py Normal file
View File

@ -0,0 +1,229 @@
"""
使用 GDAL BuildVRT Translate 快速将影像批量镶嵌并转换为 8bit RGB (Light version)
1. 将输入目录下的所有 tif 文件合并为一个 VRT 文件;
2. 采用 GDAL ComputeStatistics 进行快速统计计算, 并根据指定的百分比截断值计算有效数据范围;
3. VRT 文件转换为 8bit RGB 格式的 COG 文件, 并保存到指定目录.
"""
import os
import sys
import logging
from pathlib import Path
from osgeo import gdal
import numpy as np
# 添加项目根目录到 sys.path确保作为独立脚本运行时也能正确导入
BASE_DIR = Path(__file__).parent.parent.parent
sys.path.append(str(BASE_DIR))
from src.utils.common_utils import setup_logging
gdal.UseExceptions()
# 设置 GDAL 选项以优化性能
gdal.SetConfigOption("GDAL_NUM_THREADS", "ALL_CPUS")
gdal.SetConfigOption("GDAL_CACHEMAX", "1024")
def compute_band_stats(band: gdal.Band, approx_ok: bool = True, no_data: float = np.nan, percentile: float = 2.0):
"""
使用 GDAL ComputeStatistics 计算波段的统计信息
实现类似 QGIS 图层符号化中的累计计数削减效果
Parameters
----------
band : gdal.Band
GDAL 波段对象
approx_ok : bool, optional
是否允许近似统计, 速度快, by default True
no_data : float, optional
无效值, 默认 np.nan, by default np.nan
percentile : float, optional
百分比截断值 (2 表示 2%-98% 范围), by default 2.0
Returns
-------
tuple
(min_val, max_val)
"""
try:
# 使用 GDAL 的 ComputeStatistics 计算统计信息
stats = band.ComputeStatistics(approx_ok)
if stats is not None and len(stats) >= 2:
min_val, max_val = stats[0], stats[1]
# 如果需要更精确的百分比统计, 使用直方图
if percentile > 0:
# 创建直方图
hist_min = min_val
hist_max = max_val
bucket_count = 256 # 直方图桶数
# 计算直方图
hist = band.GetHistogram(
hist_min, hist_max, bucket_count,
include_out_of_range=False, approx_ok=True
)
if hist and len(hist) > 0:
# 计算累积直方图
hist_array = np.array(hist, dtype=np.float32)
cum_hist = np.cumsum(hist_array)
total = cum_hist[-1]
# 找到百分比位置
# 考虑到无效值为 -9999, 所以直接从 0 开始, 尽量还原原始数据范围
if no_data <= 0:
lower_thresh = 0.0
else:
lower_thresh = total * (percentile / 100.0)
upper_thresh = total * (1 - percentile / 100.0)
# 找到对应的值
lower_idx = np.searchsorted(cum_hist, lower_thresh)
upper_idx = np.searchsorted(cum_hist, upper_thresh)
if lower_idx < len(hist_array) and upper_idx < len(hist_array):
# 将索引映射回实际值
bin_width = (hist_max - hist_min) / bucket_count
min_val = hist_min + lower_idx * bin_width
max_val = hist_min + upper_idx * bin_width
return min_val, max_val
except Exception as e:
logging.warning(f"计算统计信息失败: {str(e)}")
# 如果失败, 返回默认值
return 0.0, 1.0
def vrt_to_8bit_simple(input_dir: str | Path, output_path: str | Path,
no_data: float = np.nan, percentile: float = 1.0):
"""
使用 gdal.Translate 将指定目录下的 tif 文件合并并转换为 8bit
使用 GDAL 内置统计功能快速计算并转换, NaN 值将被赋值为 0
Parameters
----------
input_dir : str | Path
输入 TIF 文件所在目录
output_path : str | Path
输出文件路径
no_data : float, optional
输入影像中的无效值, 默认 np.nan
percentile : float, optional
百分比截断值, 默认 0% 99%
"""
input_dir = Path(input_dir)
output_path = Path(output_path)
# 1. 获取所有tif文件
tif_files = [
str(f)
for f in input_dir.iterdir()
if f.is_file() and f.suffix.lower() == ".tif"
]
if not tif_files:
logging.warning(f"{input_dir} 中没有找到 tif 文件.")
return
logging.info(f"1) 找到 {len(tif_files)} 个 tif 文件")
vrt_path = input_dir / "merged.vrt"
vrt_ds = None
try:
# 2. 创建VRT
logging.info("2) 构建 VRT 镶嵌...")
vrt_options = gdal.BuildVRTOptions(
srcNodata=np.nan, # 输入影像中的无效值, 设为 NaN 防止拼接处存在缝隙
VRTNodata=no_data, # 输出 VRT 中的无效值
)
vrt_ds = gdal.BuildVRT(str(vrt_path), tif_files, options=vrt_options)
if vrt_ds is None:
logging.error("构建 VRT 失败.")
return
# 获取波段数与影像尺寸
num_bands = vrt_ds.RasterCount
width = vrt_ds.RasterXSize
height = vrt_ds.RasterYSize
logging.info(f"波段数: {num_bands}, 影像尺寸: {width}x{height}")
# 3. 使用 GDAL 快速计算统计信息
logging.info("3) 使用 GDAL 计算影像统计信息...")
scale_params = []
for band_idx in range(1, num_bands + 1):
band = vrt_ds.GetRasterBand(band_idx)
# 使用 GDAL 内置函数计算统计
min_val, max_val = compute_band_stats(
band, no_data=no_data, percentile=percentile)
logging.info(
f"波段 {band_idx}: 有效值范围 [{min_val:.4f}, {max_val:.4f}]")
# 将有效值映射到 1-255, 保留 0 作为 NoData 专用值
# 避免暗部像素 (如水体、阴影) 会被映射为 0, 从而被误判为透明缺失
scale_params.append([min_val, max_val, 1, 255])
# 4. 使用gdal_translate转换为8bit, 无效值统一设为0
if output_path.exists():
logging.warning(f"结果文件已存在: {output_path}")
return
logging.info("4) 使用 gdal_translate 转换为 8bit COG (NaN->0)...")
translate_options = gdal.TranslateOptions(
format="COG", # 输出为 COG 格式, 自动构建金字塔
scaleParams=scale_params,
outputType=gdal.GDT_Byte,
noData=0, # 输出 NoData 设为 0
creationOptions=[
"COMPRESS=DEFLATE",
"ZLEVEL=4", # DEFLATE 压缩级别, 支持 1-9, 默认为 6
"PREDICTOR=2", # 差值预测, 利于影像压缩
"NUM_THREADS=ALL_CPUS",
"BIGTIFF=IF_SAFER",
# "TILED=YES", # COG 格式自带分块, 不支持手动设置
# "PHOTOMETRIC=RGB", # COG 格式不支持 photometric 参数
],
callback=gdal.TermProgress_nocb # 进度回调, 不显示进度条
)
gdal.Translate(str(output_path), vrt_ds, options=translate_options)
logging.info(f"已保存: {output_path}")
# 释放VRT数据集, 确保文件句柄释放
vrt_ds = None
except Exception as e:
logging.error(f"处理过程中出现异常: {str(e)}")
import traceback
logging.error(traceback.format_exc())
finally:
# 清理资源
vrt_ds = None
def main(input_dir, output_path):
input_dir = Path(input_dir)
output_path = Path(output_path)
output_root = output_path.parent
os.makedirs(output_root, exist_ok=True)
log_file = output_root / "sr2rgb_light.log"
setup_logging(str(log_file))
logging.info("开始批量处理 (Light Mode)...")
logging.info(f"输入目录: {input_dir}")
logging.info(f"输出文件: {output_path}")
vrt_to_8bit_simple(input_dir, output_path, no_data=-9999.0, percentile=1.0)
if __name__ == "__main__":
# 输入目录: 包含分块tif影像的根目录
input_root = Path(r"D:\CVEOdata\RS_Data\2025_S2_COG")
# 输出目录: 存放最终RGB镶嵌结果的目录
output_root = Path(r"D:\CVEOdata\RS_Data\2025_S2_RGB_COG")
rgb_file = output_root / "Hubei_Sentinel-2_2025_RGB_COG.tif"
main(input_root, rgb_file)