4a Moderate principal conclusion: 'no statistically credible evidence'
replaced with 'no statistically robust evidence within tested frameworks'
plus clause acknowledging untested mechanisms (threshold effects,
nonlinear triggering, extreme-event coupling).
4b Fix independence-assumption framing: 'physically invalid seismic metric'
→ 'physically inappropriate'; separate claim about naive p-values now
reads 'statistically invalid under the violated serial-independence
assumption (autocorrelation inflates nominal sample size by 3–5×)'.
4c 3.9σ detrended peak no longer called 'marginal': figure caption and
nearby text now read 'nominally significant but sensitive to Neff
estimation, at a lag inconsistent with the claimed mechanism'.
4d CR terminology standardised: 'global CR index' defined precisely at
first use in Data section (dimensionless, station mean ≡ 1, ≥3 stations
per bin); 'CR flux' retained only for the physical quantity.
4e Geographic conclusion reframed: 'no local mechanism' replaced with
'inconsistent with simple wave-propagation or diffusion models, but does
not rule out instantaneous global coupling mechanisms (e.g. atmospheric
electric field modulation)'.
4f Bayes factor qualified: parenthetical after BF=0.75 notes the restricted
two-hypothesis model space and cites Kass & Raftery (1995).
4g OOS limitations expanded: explicit paragraph noting the 5-yr window
with no complete solar cycle, limited statistical power, and that
p_global=0.100 is consistent with—rather than strong evidence against
—the claim; OOS failure downweighted vs 44-yr in-sample analysis.
4h Confirmatory vs exploratory scope table added (Table tab:prereg_scope)
listing pre-specified parameters and which analyses were confirmatory
vs post-hoc exploratory.
4i Alternative solar-cycle confounds acknowledged in Discussion: geomagnetic
activity cycles and long-term seismic clustering added as alternative
explanations for the shared 10-year periodicity.
4j Fixed: 'Out-of-sample poos from script 08' removed from Limitations;
GitHub URL removed from abstract (kept in Data Availability only);
Discussion run-on sentences broken up.
4k Abstract rewritten to ≤250 words in five-part structure: prior claim,
data/methods (two sentences), key quantitative results, scoped
interpretation, one-sentence limitation. Causal language qualified.
Also adds KassRaftery1995 to refs.bib. PDF: 36 pages.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| config | ||
| paper | ||
| results | ||
| scripts | ||
| src/crq | ||
| tests | ||
| .gitignore | ||
| claude.md | ||
| pyproject.toml | ||
| README.md | ||
Cosmic Rays and Earthquakes: A Rigorous Replication Study
A reproducible, GPU-accelerated statistical pipeline that tests the claimed correlation between galactic cosmic-ray flux and global seismicity (Homola et al. 2023).
Summary of findings
| Stage | Key result |
|---|---|
| In-sample replication (1976–2019) | r(+15 d) = +0.31 raw; drops to +0.04 after solar-cycle detrending |
| Global surrogate test (IAAFT, 100 k surrogates) | p = 1.00 after detrending — not significant |
| Geographic localisation (34 stations × 207 cells) | No distance–lag dependence; β = −0.45 d/1000 km, p = 0.21 |
| Out-of-sample validation (2020–2025) | Results in results/out_of_sample_report.md |
The raw r = 0.31 is an artefact of the shared ~11-year solar cycle modulating both cosmic-ray flux and seismicity. After removing this trend the signal is indistinguishable from phase-randomised noise.
Repository structure
scripts/ Analysis pipeline (run in order)
01_download_data.py Download NMDB / USGS / SIDC data
02_homola_replication.py Replicate Homola et al. cross-correlation
03_stress_test.py Surrogate significance test (CPU + GPU)
04_detrended_analysis.py HP-filter / sunspot detrending
05_geographic_localisation.py Station × grid-cell BH-FDR scan
06_check_data_availability.py Determine reliable OOS data window
07_out_of_sample.py Pre-registered out-of-sample validation
08_combined_timeseries.py 1976-to-present sinusoid fit + Bayes factor
benchmark_gpu.py GPU vs CPU surrogate benchmark
src/crq/ Python package
ingest/ NMDB, USGS, SIDC, station-roster loaders
preprocess/ Hodrick-Prescott and linear detrending
stats/ Phase-randomisation / IAAFT surrogates (CPU + GPU)
results/ Generated outputs (committed)
prereg_predictions.md Pre-registration (timestamped before OOS run)
data_availability.json Reliable data window determination
homola_replication.json In-sample cross-correlation results
detrended_results.json Post-detrending results
geo_localisation.json Geographic localisation scan
out_of_sample_metrics.json OOS validation metrics (post-run)
figs/ Plots
config/
stations.yaml NMDB station list with coordinates
tests/ pytest suite (29 tests)
Quickstart
# 1. Clone and install
git clone https://github.com/pingud98/cosmicraysandearthquakes.git
cd cosmicraysandearthquakes
python -m venv .venv && source .venv/bin/activate
pip install -e .
# 2. Download data (NMDB, USGS M≥4.5, SIDC sunspots)
python scripts/01_download_data.py
# 3. Run in-sample analysis
python scripts/02_homola_replication.py
python scripts/03_stress_test.py --n-surrogates 10000
python scripts/04_detrended_analysis.py
# 4. Geographic scan (GPU recommended)
python scripts/05_geographic_localisation.py --n-surrogates 1000
# 5. Check data availability for out-of-sample window
python scripts/06_check_data_availability.py
# 6. Pre-registered out-of-sample validation (writes prereg BEFORE analysis)
python scripts/07_out_of_sample.py --study-start 2020-01-01 --study-end 2025-04-29
# 7. Combined timeseries with sinusoid fit
python scripts/08_combined_timeseries.py
GPU (CUDA) is used automatically when available. Scripts fall back to CPU with a warning. The surrogate tests were benchmarked on a Tesla M40 (12 GB):
| Method | CPU | GPU | Speedup |
|---|---|---|---|
| Phase randomisation | 61.7 s | 20.9 s | 2.9× |
| IAAFT | 227.8 s | 175.6 s | 1.3× |
Data sources
| Source | Content | Access |
|---|---|---|
| NMDB | Hourly neutron monitor counts, pressure-corrected | Free, HTTP |
| USGS FDSN | M ≥ 4.5 global catalogue | Free, HTTP |
| SIDC SILSO | Daily international sunspot number | Free, HTTP |
Data are downloaded by the scripts and cached locally in data/.
No data files are committed to this repository.
Pre-registration
results/prereg_predictions.md was committed to git before any
out-of-sample data were loaded (UTC 2026-04-22T00:44:30, commit 1832f73).
This prevents post-hoc hypothesis adjustment. Verify with:
git log --diff-filter=A results/prereg_predictions.md
Statistical methods
- Surrogate test: Phase randomisation preserves the power spectrum of the cosmic-ray series; 100,000 surrogates give p-value resolution of 10⁻⁵.
- IAAFT: Iterated amplitude-adjusted FT surrogates (preserves amplitude distribution as well as power spectrum).
- Detrending: Hodrick-Prescott filter (λ = 1.29 × 10⁵) for in-sample window; linear detrending for OOS (< 1 solar cycle).
- FDR control: Benjamini-Hochberg at q = 0.05 for the geographic scan.
- Bayes factor: BIC approximation comparing sinusoidal vs constant model on the full 1976-to-present correlation timeseries.
Requirements
- Python ≥ 3.10
- numpy, pandas, scipy, matplotlib, pyyaml, requests
- CuPy ≥ 12 (optional, for GPU acceleration)
Install: pip install -e .
Citation
If you use this pipeline please cite:
Homola P. et al. (2023). Indication of Correlation between Cosmic-Ray Flux and Lightning Activity. Remote Sensing 15(1), 200. https://doi.org/10.3390/rs15010200
and link to this repository.
Licence
MIT