cosmicraysandearthquakes/README.md
root 965870e25a Add README with project overview and quickstart
Covers findings summary, repo structure, data sources, statistical
methods, pre-registration integrity check, and GPU benchmark results.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 00:57:29 +02:00

162 lines
5.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Cosmic Rays and Earthquakes: A Rigorous Replication Study
A reproducible, GPU-accelerated statistical pipeline that tests the claimed
correlation between galactic cosmic-ray flux and global seismicity
([Homola et al. 2023](https://doi.org/10.3390/rs15010200)).
---
## Summary of findings
| Stage | Key result |
|---|---|
| In-sample replication (19762019) | r(+15 d) = +0.31 **raw**; drops to **+0.04** after solar-cycle detrending |
| Global surrogate test (IAAFT, 100 k surrogates) | p = 1.00 after detrending — **not significant** |
| Geographic localisation (34 stations × 207 cells) | No distancelag dependence; β = 0.45 d/1000 km, p = 0.21 |
| Out-of-sample validation (20202025) | Results in `results/out_of_sample_report.md` |
The raw r = 0.31 is an artefact of the shared ~11-year solar cycle modulating
both cosmic-ray flux and seismicity. After removing this trend the signal
is indistinguishable from phase-randomised noise.
---
## Repository structure
```
scripts/ Analysis pipeline (run in order)
01_download_data.py Download NMDB / USGS / SIDC data
02_homola_replication.py Replicate Homola et al. cross-correlation
03_stress_test.py Surrogate significance test (CPU + GPU)
04_detrended_analysis.py HP-filter / sunspot detrending
05_geographic_localisation.py Station × grid-cell BH-FDR scan
06_check_data_availability.py Determine reliable OOS data window
07_out_of_sample.py Pre-registered out-of-sample validation
08_combined_timeseries.py 1976-to-present sinusoid fit + Bayes factor
benchmark_gpu.py GPU vs CPU surrogate benchmark
src/crq/ Python package
ingest/ NMDB, USGS, SIDC, station-roster loaders
preprocess/ Hodrick-Prescott and linear detrending
stats/ Phase-randomisation / IAAFT surrogates (CPU + GPU)
results/ Generated outputs (committed)
prereg_predictions.md Pre-registration (timestamped before OOS run)
data_availability.json Reliable data window determination
homola_replication.json In-sample cross-correlation results
detrended_results.json Post-detrending results
geo_localisation.json Geographic localisation scan
out_of_sample_metrics.json OOS validation metrics (post-run)
figs/ Plots
config/
stations.yaml NMDB station list with coordinates
tests/ pytest suite (29 tests)
```
---
## Quickstart
```bash
# 1. Clone and install
git clone https://github.com/pingud98/cosmicraysandearthquakes.git
cd cosmicraysandearthquakes
python -m venv .venv && source .venv/bin/activate
pip install -e .
# 2. Download data (NMDB, USGS M≥4.5, SIDC sunspots)
python scripts/01_download_data.py
# 3. Run in-sample analysis
python scripts/02_homola_replication.py
python scripts/03_stress_test.py --n-surrogates 10000
python scripts/04_detrended_analysis.py
# 4. Geographic scan (GPU recommended)
python scripts/05_geographic_localisation.py --n-surrogates 1000
# 5. Check data availability for out-of-sample window
python scripts/06_check_data_availability.py
# 6. Pre-registered out-of-sample validation (writes prereg BEFORE analysis)
python scripts/07_out_of_sample.py --study-start 2020-01-01 --study-end 2025-04-29
# 7. Combined timeseries with sinusoid fit
python scripts/08_combined_timeseries.py
```
GPU (CUDA) is used automatically when available. Scripts fall back to CPU
with a warning. The surrogate tests were benchmarked on a Tesla M40 (12 GB):
| Method | CPU | GPU | Speedup |
|---|---|---|---|
| Phase randomisation | 61.7 s | 20.9 s | 2.9× |
| IAAFT | 227.8 s | 175.6 s | 1.3× |
---
## Data sources
| Source | Content | Access |
|---|---|---|
| [NMDB](https://www.nmdb.eu) | Hourly neutron monitor counts, pressure-corrected | Free, HTTP |
| [USGS FDSN](https://earthquake.usgs.gov/fdsnws/event/1/) | M ≥ 4.5 global catalogue | Free, HTTP |
| [SIDC SILSO](https://www.sidc.be/silso/datafiles) | Daily international sunspot number | Free, HTTP |
Data are downloaded by the scripts and cached locally in `data/`.
No data files are committed to this repository.
---
## Pre-registration
`results/prereg_predictions.md` was committed to git **before** any
out-of-sample data were loaded (UTC 2026-04-22T00:44:30, commit `1832f73`).
This prevents post-hoc hypothesis adjustment. Verify with:
```bash
git log --diff-filter=A results/prereg_predictions.md
```
---
## Statistical methods
- **Surrogate test**: Phase randomisation preserves the power spectrum of the
cosmic-ray series; 100,000 surrogates give p-value resolution of 10⁻⁵.
- **IAAFT**: Iterated amplitude-adjusted FT surrogates (preserves amplitude
distribution as well as power spectrum).
- **Detrending**: Hodrick-Prescott filter (λ = 1.29 × 10⁵) for in-sample
window; linear detrending for OOS (< 1 solar cycle).
- **FDR control**: Benjamini-Hochberg at q = 0.05 for the geographic scan.
- **Bayes factor**: BIC approximation comparing sinusoidal vs constant model
on the full 1976-to-present correlation timeseries.
---
## Requirements
- Python 3.10
- numpy, pandas, scipy, matplotlib, pyyaml, requests
- CuPy 12 (optional, for GPU acceleration)
Install: `pip install -e .`
---
## Citation
If you use this pipeline please cite:
> Homola P. et al. (2023). *Indication of Correlation between Cosmic-Ray
> Flux and Lightning Activity*. Remote Sensing 15(1), 200.
> https://doi.org/10.3390/rs15010200
and link to this repository.
---
## Licence
MIT