Cosmic-ray / earthquake correlation analysis pipeline: data ingestion, surrogate significance tests, GPU acceleration, out-of-sample validation
Find a file
root fc859d5dff Editorial revision: precision, framing, and internal consistency (4a–4k)
4a  Moderate principal conclusion: 'no statistically credible evidence'
    replaced with 'no statistically robust evidence within tested frameworks'
    plus clause acknowledging untested mechanisms (threshold effects,
    nonlinear triggering, extreme-event coupling).

4b  Fix independence-assumption framing: 'physically invalid seismic metric'
    → 'physically inappropriate'; separate claim about naive p-values now
    reads 'statistically invalid under the violated serial-independence
    assumption (autocorrelation inflates nominal sample size by 3–5×)'.

4c  3.9σ detrended peak no longer called 'marginal': figure caption and
    nearby text now read 'nominally significant but sensitive to Neff
    estimation, at a lag inconsistent with the claimed mechanism'.

4d  CR terminology standardised: 'global CR index' defined precisely at
    first use in Data section (dimensionless, station mean ≡ 1, ≥3 stations
    per bin); 'CR flux' retained only for the physical quantity.

4e  Geographic conclusion reframed: 'no local mechanism' replaced with
    'inconsistent with simple wave-propagation or diffusion models, but does
    not rule out instantaneous global coupling mechanisms (e.g. atmospheric
    electric field modulation)'.

4f  Bayes factor qualified: parenthetical after BF=0.75 notes the restricted
    two-hypothesis model space and cites Kass & Raftery (1995).

4g  OOS limitations expanded: explicit paragraph noting the 5-yr window
    with no complete solar cycle, limited statistical power, and that
    p_global=0.100 is consistent with—rather than strong evidence against
    —the claim; OOS failure downweighted vs 44-yr in-sample analysis.

4h  Confirmatory vs exploratory scope table added (Table tab:prereg_scope)
    listing pre-specified parameters and which analyses were confirmatory
    vs post-hoc exploratory.

4i  Alternative solar-cycle confounds acknowledged in Discussion: geomagnetic
    activity cycles and long-term seismic clustering added as alternative
    explanations for the shared 10-year periodicity.

4j  Fixed: 'Out-of-sample poos from script 08' removed from Limitations;
    GitHub URL removed from abstract (kept in Data Availability only);
    Discussion run-on sentences broken up.

4k  Abstract rewritten to ≤250 words in five-part structure: prior claim,
    data/methods (two sentences), key quantitative results, scoped
    interpretation, one-sentence limitation. Causal language qualified.

Also adds KassRaftery1995 to refs.bib. PDF: 36 pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 21:23:57 +02:00
config Initial commit: full analysis pipeline source code 2026-04-22 02:45:10 +02:00
paper Editorial revision: precision, framing, and internal consistency (4a–4k) 2026-04-24 21:23:57 +02:00
results Add seven additional robustness checks (3a–3g) and update paper 2026-04-24 20:43:08 +02:00
scripts Add seven additional robustness checks (3a–3g) and update paper 2026-04-24 20:43:08 +02:00
src/crq Fix seismic metric + robustness checks (issues 2a–2d) 2026-04-24 14:49:54 +02:00
tests Initial commit: full analysis pipeline source code 2026-04-22 02:45:10 +02:00
.gitignore Initial commit: full analysis pipeline source code 2026-04-22 02:45:10 +02:00
claude.md Initial commit: full analysis pipeline source code 2026-04-22 02:45:10 +02:00
pyproject.toml Initial commit: full analysis pipeline source code 2026-04-22 02:45:10 +02:00
README.md Add README with project overview and quickstart 2026-04-24 00:57:29 +02:00

Cosmic Rays and Earthquakes: A Rigorous Replication Study

A reproducible, GPU-accelerated statistical pipeline that tests the claimed correlation between galactic cosmic-ray flux and global seismicity (Homola et al. 2023).


Summary of findings

Stage Key result
In-sample replication (19762019) r(+15 d) = +0.31 raw; drops to +0.04 after solar-cycle detrending
Global surrogate test (IAAFT, 100 k surrogates) p = 1.00 after detrending — not significant
Geographic localisation (34 stations × 207 cells) No distancelag dependence; β = 0.45 d/1000 km, p = 0.21
Out-of-sample validation (20202025) Results in results/out_of_sample_report.md

The raw r = 0.31 is an artefact of the shared ~11-year solar cycle modulating both cosmic-ray flux and seismicity. After removing this trend the signal is indistinguishable from phase-randomised noise.


Repository structure

scripts/          Analysis pipeline (run in order)
  01_download_data.py            Download NMDB / USGS / SIDC data
  02_homola_replication.py       Replicate Homola et al. cross-correlation
  03_stress_test.py              Surrogate significance test (CPU + GPU)
  04_detrended_analysis.py       HP-filter / sunspot detrending
  05_geographic_localisation.py  Station × grid-cell BH-FDR scan
  06_check_data_availability.py  Determine reliable OOS data window
  07_out_of_sample.py            Pre-registered out-of-sample validation
  08_combined_timeseries.py      1976-to-present sinusoid fit + Bayes factor
  benchmark_gpu.py               GPU vs CPU surrogate benchmark

src/crq/          Python package
  ingest/         NMDB, USGS, SIDC, station-roster loaders
  preprocess/     Hodrick-Prescott and linear detrending
  stats/          Phase-randomisation / IAAFT surrogates (CPU + GPU)

results/          Generated outputs (committed)
  prereg_predictions.md          Pre-registration (timestamped before OOS run)
  data_availability.json         Reliable data window determination
  homola_replication.json        In-sample cross-correlation results
  detrended_results.json         Post-detrending results
  geo_localisation.json          Geographic localisation scan
  out_of_sample_metrics.json     OOS validation metrics (post-run)
  figs/                          Plots

config/
  stations.yaml   NMDB station list with coordinates
tests/            pytest suite (29 tests)

Quickstart

# 1. Clone and install
git clone https://github.com/pingud98/cosmicraysandearthquakes.git
cd cosmicraysandearthquakes
python -m venv .venv && source .venv/bin/activate
pip install -e .

# 2. Download data (NMDB, USGS M≥4.5, SIDC sunspots)
python scripts/01_download_data.py

# 3. Run in-sample analysis
python scripts/02_homola_replication.py
python scripts/03_stress_test.py --n-surrogates 10000
python scripts/04_detrended_analysis.py

# 4. Geographic scan (GPU recommended)
python scripts/05_geographic_localisation.py --n-surrogates 1000

# 5. Check data availability for out-of-sample window
python scripts/06_check_data_availability.py

# 6. Pre-registered out-of-sample validation (writes prereg BEFORE analysis)
python scripts/07_out_of_sample.py --study-start 2020-01-01 --study-end 2025-04-29

# 7. Combined timeseries with sinusoid fit
python scripts/08_combined_timeseries.py

GPU (CUDA) is used automatically when available. Scripts fall back to CPU with a warning. The surrogate tests were benchmarked on a Tesla M40 (12 GB):

Method CPU GPU Speedup
Phase randomisation 61.7 s 20.9 s 2.9×
IAAFT 227.8 s 175.6 s 1.3×

Data sources

Source Content Access
NMDB Hourly neutron monitor counts, pressure-corrected Free, HTTP
USGS FDSN M ≥ 4.5 global catalogue Free, HTTP
SIDC SILSO Daily international sunspot number Free, HTTP

Data are downloaded by the scripts and cached locally in data/. No data files are committed to this repository.


Pre-registration

results/prereg_predictions.md was committed to git before any out-of-sample data were loaded (UTC 2026-04-22T00:44:30, commit 1832f73). This prevents post-hoc hypothesis adjustment. Verify with:

git log --diff-filter=A results/prereg_predictions.md

Statistical methods

  • Surrogate test: Phase randomisation preserves the power spectrum of the cosmic-ray series; 100,000 surrogates give p-value resolution of 10⁻⁵.
  • IAAFT: Iterated amplitude-adjusted FT surrogates (preserves amplitude distribution as well as power spectrum).
  • Detrending: Hodrick-Prescott filter (λ = 1.29 × 10⁵) for in-sample window; linear detrending for OOS (< 1 solar cycle).
  • FDR control: Benjamini-Hochberg at q = 0.05 for the geographic scan.
  • Bayes factor: BIC approximation comparing sinusoidal vs constant model on the full 1976-to-present correlation timeseries.

Requirements

  • Python ≥ 3.10
  • numpy, pandas, scipy, matplotlib, pyyaml, requests
  • CuPy ≥ 12 (optional, for GPU acceleration)

Install: pip install -e .


Citation

If you use this pipeline please cite:

Homola P. et al. (2023). Indication of Correlation between Cosmic-Ray Flux and Lightning Activity. Remote Sensing 15(1), 200. https://doi.org/10.3390/rs15010200

and link to this repository.


Licence

MIT