cosmicraysandearthquakes/paper/main.tex
root 1d01fcf102 Add seven additional robustness checks (3a–3g) and update paper
3a  Block-bootstrap surrogates (B=804 bins≈11 yr, n=5000): raw r(+15d)
    p=0.022 — marginally significant on undetrended series, driven by
    the shared solar-cycle trend not a causal signal.

3b  Partial correlation (sunspot-regressed, no filter): r(+15d) drops
    from 0.079 to 0.029 (63%) — confirms solar-cycle confounding without
    preprocessing circularity.

3c  Spectral coherence in solar-cycle band = 0.840 (>0.776 threshold);
    kNN mutual information at τ=+15d = 0.000 nats (p=1.000) — no
    nonlinear dependence at the claimed lag.

3d  Missing-data impact: 0% NaN at station thresholds 2/3/5; r(+15d)
    unchanged — missing data is not a confound.

3e  Bin-size sensitivity: dominant peak at τ≈-520d for 1/5/27-day bins;
    r at +15d scales with bin size (solar-cycle leakage, not physical).

3f  GK declustering removes 28.4% of events as aftershocks; r(+15d)
    changes by only Δ=0.014 — aftershock clustering not a confound.

3g  Per-solar-cycle analysis (cycles 21–24): r(+15d) all positive
    (0.018–0.073) but dominant peak lags scattered (-65/-125/+125/-125d)
    — phase-drifting solar-cycle artefact, not physical precursor.

Script fixes: vectorised block bootstrap (401×5000 → NumPy matmul),
  kNN MI with sorted searchsorted (O(N log N), no inflated values),
  coherence nperseg 512→2048 (resolves solar-cycle band), axhspan→axvspan.

Paper: new Methods subsections (block bootstrap, partial correlation,
  nonlinear dependence); new Results subsections for each check; updated
  Conclusions with 7-item robustness summary; Kraskov2004 and
  GardnerKnopoff1974 added to refs.bib; Homola2023 updated to arXiv
  preprint 2204.12310; eq:energy duplicate label fixed. PDF: 35 pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 20:43:08 +02:00

1433 lines
67 KiB
TeX

\documentclass[12pt,a4paper]{article}
%% ── Packages ────────────────────────────────────────────────────────────────
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{lmodern}
\usepackage[margin=2.5cm]{geometry}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{siunitx}
\usepackage{natbib}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{microtype}
\usepackage{setspace}
\usepackage{lineno}
\onehalfspacing
\hypersetup{
colorlinks=true,
linkcolor=blue!60!black,
citecolor=green!50!black,
urlcolor=blue!60!black,
}
\graphicspath{{figs/}}
%% ── Title block ─────────────────────────────────────────────────────────────
\title{\textbf{No Causal Link Between Galactic Cosmic-Ray Flux and Global
Seismicity: A Pre-Registered Replication with GPU-Accelerated Surrogate Testing
and Out-of-Sample Validation}}
\author{
J.~D.~Devine$^{1}$\\[4pt]
{\small $^{1}$ Independent researcher}\\
{\small \texttt{devine.jd@gmail.com}}
}
\date{April 2026}
%%%% ═══════════════════════════════════════════════════════════════════════════
\begin{document}
\maketitle
\linenumbers
%% ── Abstract ────────────────────────────────────────────────────────────────
\begin{abstract}
\noindent
\citet{Homola2023} reported a statistically significant positive correlation
($r \approx 0.31$) between galactic cosmic-ray (CR) flux measured by neutron
monitors and global seismicity (M~$\geq$~4.5) at a time lag of $\tau = +15$~days,
suggesting that elevated CR flux precedes increased earthquake activity.
That earlier value used a physically invalid seismic metric (direct sum of moment
magnitudes $M_W$, which are logarithmic); the correct metric — summed radiated energy
$E = 10^{1.5 M_W + 4.8}$\,J per bin \citep{Kanamori1977} — yields $r(+15\,\text{d}) = 0.081$.
We present a systematic replication and extension of this claim using data from
44 Neutron Monitor Database (NMDB) stations, the USGS global earthquake catalogue,
and SILSO daily sunspot numbers spanning 1976--2025.
Our analysis proceeds in four stages.
\textit{Stage~1} replicates the raw cross-correlation (with correct seismic energy metric:
$r(+15\,\text{d}) = 0.081$, peak $r = 0.139$ at $\tau = -525$~days) but demonstrates
that naive $p$-values are invalid because temporal autocorrelation and a shared
$\sim$11-year solar cycle inflate the apparent significance.
\textit{Stage~2} applies iterative amplitude-adjusted Fourier transform (IAAFT)
surrogate tests with $10^4$ realisations: after Hodrick--Prescott (HP) detrending
to remove the solar-cycle component, the peak correlation drops to
$r = 0.101$ and achieves formal significance ($p_\text{global} < 10^{-4}$, $>3.9\sigma$),
but $r(+15\,\text{d}) = 0.027$ --- well within the surrogate null distribution.
\textit{Stage~3} scans $34 \times 207 = 7{,}037$ station--grid-cell pairs for
geographic localisation; 455 pairs survive Benjamini--Hochberg correction
(expected false discoveries: 352), and the optimal lag $\tau^*$ shows no
dependence on great-circle distance ($\beta = -0.45$~days/1000~km, $p = 0.21$),
consistent with an isotropic CR signal rather than a local mechanism.
\textit{Stage~4} applies a pre-registered out-of-sample test on an independent
2020--2025 window ($T = 390$ five-day bins, $10^5$ phase-randomisation
surrogates on a Tesla M40 GPU):
$r(+15\,\text{d}) = 0.030$ (surrogate 95th percentile~$= 0.101$),
$p_\text{global} = 0.100$ --- consistent with noise.
Fitting a sinusoid to the 1976--2025 annual cross-correlation timeseries yields
a best-fit period of $P = 13.0$~years and a Bayes factor of $\text{BF} = 0.75$,
slightly favouring a constant (no modulation) over a sinusoidal relationship.
We conclude that the CR--seismic correlation reported by \citet{Homola2023} is an
artefact of the shared solar-cycle modulation of both galactic CR flux and
global seismicity, and not evidence of a physical causal link.
All analysis code, pre-registration document, and results are publicly available at
\url{https://github.com/pingud98/cosmicraysandearthquakes}.
\end{abstract}
\textbf{Keywords:} cosmic rays; seismicity; surrogate test; solar cycle;
Benjamini--Hochberg; pre-registration; out-of-sample validation.
\newpage
\tableofcontents
\newpage
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Introduction}
\label{sec:intro}
The hypothesis that galactic cosmic rays (CRs) influence seismic activity has
a long history in geophysics \citep{Stoupel1990,Urata2018}, motivated by
proposed mechanisms ranging from radon ionisation in fault zones to direct
nuclear interactions in crustal minerals.
\citet{Homola2023} recently presented observational support for this idea,
reporting a correlation coefficient $r \approx 0.31$ between a global CR index
constructed from NMDB neutron monitor records and a global seismic energy metric
derived from the USGS earthquake catalogue at a lag of $\tau = +15$~days
(CR leads seismic activity).
The associated naive $p$-value was reported as $p \sim 10^{-72}$ at this lag.
Such a claim, if correct, would be of profound scientific and societal
importance, potentially enabling short-term earthquake forecasting from
space-weather observations.
It therefore demands rigorous scrutiny.
Three statistical pitfalls immediately suggest themselves:
\begin{enumerate}
\item \textbf{Temporal autocorrelation.}
Both CR flux and seismicity exhibit strong low-frequency structure
(solar cycle, regional seismic cycles).
Treating successive 5-day bins as independent dramatically inflates
the degrees of freedom; a Bretherton effective-$N$ correction \citep{Bretherton1999}
is required.
\item \textbf{Shared solar-cycle trend.}
Galactic CR flux is modulated by the heliospheric magnetic field, which
varies on an $\sim$11-year solar cycle \citep{Potgieter2013}.
Global seismicity has also been reported to correlate weakly with solar
activity \citep{Odintsov2006,Tavares2011}, potentially generating a
spurious correlation between the two series with a lag structure
determined by the phase relationship of their respective solar responses,
not by any direct physical mechanism.
\item \textbf{Multiple-comparison inflation.}
Testing 401 lag values and selecting the maximum creates a look-elsewhere
effect that must be accounted for by comparing the observed peak against
a null distribution of peak statistics, not against the single-lag
Pearson $t$ distribution.
\end{enumerate}
This paper systematically addresses all three issues, extending the analysis
through a prospective out-of-sample validation window (2020--2025) whose
statistical predictions were pre-registered in a timestamped git commit
before any data in that window were examined.
The remainder of the paper is organised as follows.
Section~\ref{sec:data} describes the data sources and preprocessing.
Section~\ref{sec:methods} presents the statistical methods.
Section~\ref{sec:results} reports the results of each analysis stage.
Section~\ref{sec:discussion} interprets the findings.
Section~\ref{sec:conclusions} concludes.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Data}
\label{sec:data}
\subsection{Cosmic-Ray Flux: NMDB Neutron Monitors}
\label{sec:nmdb}
Galactic cosmic-ray flux is measured by neutron monitors (NMs), which detect
secondary neutrons produced when primary CRs interact with atmospheric nuclei.
We obtained pressure-corrected hourly count rates for all available stations from
the Neutron Monitor Database (NMDB, \url{https://www.nmdb.eu}) for the period
1976--2025.
After applying a coverage filter (requiring $\geq 60\%$ hourly data per day to
declare a daily bin valid), we retained \textbf{44 stations} with $\geq 50\%$
daily coverage over the in-sample window 1976--2019, and \textbf{35 stations}
over the out-of-sample window 2020--2025.
Each station's daily series was normalised by its long-run mean and resampled
to non-overlapping 5-day bins.
A global CR index was formed as the mean across all stations with valid data in
each bin, requiring at least three stations; bins failing this criterion were
set to \texttt{NaN}.
\subsection{Seismic Activity: USGS Earthquake Catalogue}
\label{sec:usgs}
Earthquake data were downloaded from the USGS Earthquake Hazards Programme
via the FDSN web service \citep{USGS2024}.
We retained all events with $M \geq 4.5$ globally, yielding a catalogue of
$\approx$\,47,860 events over 2020--2025 in the out-of-sample window alone.
The seismic metric for each 5-day bin is the logarithm (base 10) of the total
radiated seismic energy. For each event with moment magnitude $M_W$ we compute
the radiated energy
\begin{equation}
E_i = 10^{1.5 M_{W,i} + 4.8} \quad [\text{joules}]
\label{eq:energy}
\end{equation}
following \citet{Kanamori1977}. Events within each bin are summed linearly,
$E_\text{bin} = \sum_i E_i$, and the metric is $\log_{10}(E_\text{bin})$
(empty bins set to NaN).
This formulation correctly weights large earthquakes: an $M_W\,8.0$ event
contributes $\sim$1000$\times$ more energy than an $M_W\,6.0$ event.
Directly summing $M_W$ values — as used in several earlier studies — is
physically invalid because $M_W$ is logarithmic; such sums do not correspond
to any additive physical quantity.
\subsection{Solar Activity: SIDC Sunspot Number}
\label{sec:sidc}
The SILSO international sunspot number \citep{SIDC2024} provided the solar
activity index used to remove the solar-cycle trend.
We used the daily series (version 2.0), smoothed with a 365-day running mean
for detrending purposes.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Methods}
\label{sec:methods}
\subsection{Cross-Correlation at Lag $\tau$}
\label{sec:xcorr}
Let $x_t$ denote the global CR index and $y_t$ the seismic metric in 5-day bin
$t$, with $t = 1, \ldots, T$.
The normalised cross-correlation at lag $k$ (bins) is
\begin{equation}
r(k) = \frac{1}{n \,\hat{\sigma}_x \hat{\sigma}_y}
\sum_{t=1}^{n}
\tilde{x}_t \,\tilde{y}_{t+k},
\label{eq:xcorr}
\end{equation}
where $\tilde{x}_t = x_t - \bar{x}$, $n = T - |k|$, and the sums run over
the valid overlap region.
A positive lag $k > 0$ corresponds to CR leading seismicity.
Lags range from $-200$ to $+200$~days (step $= 5$~days, i.e.\ 1 bin),
giving 81 lag values in the in-sample window.
\subsection{Effective Degrees of Freedom}
\label{sec:neff}
Because both $x$ and $y$ are autocorrelated, the effective sample size
$N_\text{eff}$ is substantially smaller than $T$.
We compare three $N_\text{eff}$ estimators; all are listed in Table~\ref{tab:neff}.
The \citet{Bartlett1946} first-order formula uses only the lag-1 autocorrelations:
\begin{equation}
N_\text{eff}^\text{Bart} = T\,
\frac{1 - r_{xx}(1)\,r_{yy}(1)}
{1 + r_{xx}(1)\,r_{yy}(1)},
\label{eq:neff_bartlett}
\end{equation}
the \citet{Bretherton1999} full-sum formula uses all significant lags:
\begin{equation}
N_\text{eff}^\text{Breth} = T \left(
1 + 2 \sum_{k=1}^{K} r_{xx}(k)\, r_{yy}(k)
\right)^{-1},
\label{eq:neff_breth}
\end{equation}
where $r_{xx}$ and $r_{yy}$ are the sample autocorrelation functions of $x$
and $y$, and $K = 200$ lags;
and a Monte Carlo estimate from 1000 phase-randomised surrogates:
$N_\text{eff}^\text{MC} = 1/\hat{\sigma}^2_{r_\text{null}}$,
where $\hat{\sigma}^2_{r_\text{null}}$ is the variance of the surrogate
correlation values at the target lag.
All three methods agree that the Bartlett estimator produces the most liberal
(largest) $N_\text{eff}$ and the Monte Carlo approach the most conservative.
\subsection{Surrogate Significance Tests}
\label{sec:surrogates}
To correctly account for autocorrelation and multiple lags simultaneously we
use surrogate time-series methods \citep{Theiler1992,Schreiber2000}.
\subsubsection{Phase Randomisation}
\label{sec:phase}
Phase surrogates of $x$ are constructed by multiplying the discrete Fourier
transform of $x$ by random unit-magnitude complex numbers (with conjugate
symmetry to preserve real-valuedness):
\begin{equation}
\tilde{X}(\omega_k) = |X(\omega_k)| \, e^{i\phi_k},
\quad \phi_k \sim \mathcal{U}(0, 2\pi),
\label{eq:phase}
\end{equation}
followed by the inverse DFT.
This preserves the power spectrum (and hence autocorrelation structure) of $x$
while destroying any phase relationship with $y$.
\subsubsection{IAAFT Surrogates}
\label{sec:iaaft}
Iterative amplitude-adjusted Fourier transform (IAAFT) surrogates
\citep{Schreiber2000} additionally preserve the amplitude distribution of $x$
by alternating between power-spectrum matching (in Fourier space) and
rank-order resampling (in time domain) until convergence.
IAAFT surrogates are more conservative than phase surrogates when $x$ has
a non-Gaussian distribution.
\subsubsection{Block-Bootstrap Surrogates}
\label{sec:blockbootstrap}
IAAFT surrogates preserve the power spectrum and amplitude distribution of each
series but assume linearity and stationarity --- assumptions violated by seismicity,
which is non-stationary, heavy-tailed, and aftershock-clustered.
As a complementary null we use a \emph{circular block bootstrap} (CBB) that
makes no parametric assumptions about the spectral shape.
Each surrogate is constructed by independently resampling $x$ and $y$ in
contiguous circular blocks of length $B = 804$ bins ($\approx 11$~yr,
i.e.\ approximately one solar cycle), drawn with replacement.
Independent resampling of $x$ and $y$ destroys any cross-series correlation
while preserving each series' within-block temporal structure.
We generate $S = 5{,}000$ surrogate pairs and compare the resulting null
distribution to the IAAFT null.
\subsubsection{Global $p$-Value}
\label{sec:pvalue}
For each surrogate $s = 1, \ldots, S$, we compute the peak cross-correlation
$\rho_s = \max_k |r_s(k)|$ across all tested lags.
The global $p$-value is
\begin{equation}
p_\text{global} = \frac{\#\{s : \rho_s \geq \rho_\text{obs}\}}{S},
\label{eq:pglobal}
\end{equation}
where $\rho_\text{obs}$ is the observed peak.
This test is simultaneously valid for all lags and all lag-selection rules,
eliminating the multiple-comparison problem.
\subsubsection{GPU Acceleration}
\label{sec:gpu}
With $S = 10^5$ surrogates and $T \approx 3{,}200$ bins, direct CPU computation
would require $\sim$3\,h.
We vectorise the surrogate generation and cross-correlation evaluation over all
$S$ realisations simultaneously using CuPy on an NVIDIA Tesla M40 (12\,GB VRAM).
For the geographic scan (Section~\ref{sec:geo}), all $N_\text{cells}$ seismic
cell series are evaluated in a single GPU matrix multiply per lag:
\begin{equation}
\mathbf{R}_{\text{lag}} = \frac{1}{n}\,
\mathbf{X}_\text{surr}^{(z)} \bigl(\mathbf{Y}^{(z)}\bigr)^\top
\in \mathbb{R}^{S \times N_\text{cells}},
\label{eq:matmul}
\end{equation}
where rows of $\mathbf{X}_\text{surr}^{(z)}$ are standardised surrogates and
columns of $\mathbf{Y}^{(z)}$ are standardised seismic cell series.
Benchmarks show a $2.9\times$ speedup for phase surrogates and $1.3\times$ for IAAFT
(limited by chunked argsort to avoid VRAM overflow).
\subsection{Solar-Cycle Detrending}
\label{sec:detrend}
We apply three complementary detrending approaches to isolate the CR--seismic
relationship from the shared solar-cycle trend:
\begin{enumerate}
\item \textbf{Hodrick--Prescott (HP) filter} \citep{HP1997} with smoothing
parameter $\lambda = 4.54 \times 10^{10}$.
The HP literature standardises on $\lambda_\text{annual} = 1600$ for annual
data \citep{RahnUhlig2002}.
For data sampled at bin period $p$ days the rescaling rule is
\begin{equation}
\lambda_p = \lambda_\text{annual}
\!\left(\frac{365}{p}\right)^{\!4}
= 1600 \times (365/5)^4
\approx 4.54 \times 10^{10}.
\label{eq:hp_lambda}
\end{equation}
This large value attenuates only very low-frequency variation (periods
$\gtrsim 10$~years), preserving any sub-decadal signal of interest.
The trend component is subtracted from both $x_t$ and $y_t$.
\item \textbf{STL decomposition} \citep{Cleveland1990}: seasonal-trend
decomposition using LOESS, applied independently to each series.
\item \textbf{Sunspot regression}: residuals after regressing each series on
the 365-day smoothed sunspot number and its 12-month lag.
\end{enumerate}
For the out-of-sample window ($\sim$5 years, less than one solar cycle), the
HP filter is inappropriate (it would remove any genuine sub-decadal signal);
we use linear detrending instead, as pre-specified in the pre-registration.
\subsection{Partial Correlation Analysis}
\label{sec:methods:partialcorr}
Because the solar-cycle component is removed \emph{before} concluding that the
observed signal is solar-cycle-driven, there is a potential circularity concern.
We address this with a partial correlation analysis on the \emph{unfiltered} data.
The seismic metric $y_t$ is regressed on the smoothed sunspot number $z_t$:
\begin{equation}
\hat{y}_t = \hat{\beta} z_t + \hat{\alpha}, \qquad
y_t^\text{resid} = y_t - \hat{y}_t,
\label{eq:partialcorr}
\end{equation}
and the cross-correlation between the CR index $x_t$ and the residual
$y_t^\text{resid}$ is computed across all lags.
If the CR--seismic correlation is driven entirely by a shared solar-cycle trend,
$r(x, y^\text{resid})$ should approach zero.
This analysis avoids any digital filter and thus sidesteps the preprocessing
circularity.
\subsection{Nonlinear Dependence Tests}
\label{sec:methods:nonlinear}
Cross-correlation captures only linear dependence; we supplement it with two
nonlinear measures for the main CR--seismic pair.
\paragraph{Spectral coherence.}
The magnitude-squared coherence at frequency $f$ is
\begin{equation}
C_{xy}(f) = \frac{|S_{xy}(f)|^2}{S_{xx}(f)\,S_{yy}(f)},
\label{eq:coherence}
\end{equation}
estimated via Welch's method with $L = 2{,}048$-point Hann windows (75\%
overlap), giving a frequency resolution of $(1/5)/2048 \approx 0.036$
cycles~yr$^{-1}$ --- sufficient to resolve the solar-cycle band
(0.08--0.115~cycles~yr$^{-1}$, periods 9--12.5~yr).
\paragraph{Mutual information.}
We estimate $I(x; y)$ at lags $\tau = 0$ and $\tau = +15$~d using the
\citet{Kraskov2004} $k$-nearest-neighbour estimator ($k = 5$, Chebyshev
metric in joint space):
\begin{equation}
\hat{I}(x; y) = \psi(k) + \psi(N)
- \bigl\langle\psi(n_x + 1)\bigr\rangle
- \bigl\langle\psi(n_y + 1)\bigr\rangle,
\label{eq:mi}
\end{equation}
where $\psi$ is the digamma function and $n_x(i)$, $n_y(i)$ count marginal
neighbours within the $k$-th-neighbour radius.
Statistical significance is assessed against a shuffle null of 1{,}000
random permutations of $y$.
\subsection{Geographic Localisation Scan}
\label{sec:geo}
If CRs cause earthquakes via a local mechanism, the optimal lag $\tau^*(s,g)$
for station $s$ and grid cell $g$ should increase with their great-circle
distance $d(s,g)$ (propagation delay).
Under the null hypothesis of global CR isotropy, $\tau^*$ should be
distance-independent.
We define a $10° \times 10°$ longitude--latitude grid (648 cells total),
retain cells with $\geq 100$ events, and for each of the $34 \times 207 = 7{,}037$
station--cell pairs compute the peak cross-correlation $r^*(s,g)$ and optimal
lag $\tau^*(s,g)$ using GPU-accelerated phase surrogates (1000 realisations).
Pairs are declared significant at false discovery rate $q = 0.05$ using the
Benjamini--Hochberg (BH) procedure \citep{Benjamini1995}:
rank the $m = 7{,}037$ $p$-values $p_{(1)} \leq \ldots \leq p_{(m)}$; the
threshold is $p_{(k)} \leq (k/m) \times q$ for the largest $k$ satisfying
this condition.
Distance dependence of $\tau^*$ is tested by ordinary least-squares regression
of $\tau^*(s,g)$ on $d(s,g)$.
\subsection{Pre-Registered Out-of-Sample Validation}
\label{sec:prereg}
To guard against post-hoc hypothesis adjustment, we followed an open-science
pre-registration protocol:
\begin{enumerate}
\item The predictions below were written to \texttt{results/prereg\_predictions.md}.
\item This file was committed to git (\texttt{1832f73}) with a UTC timestamp
(\texttt{2026-04-22T00:44:30Z}) \emph{before} any out-of-sample data were loaded.
\item The analysis script enforces this ordering programmatically (the
pre-registration function is the first call in \texttt{run()}).
\end{enumerate}
The pre-registered predictions, scored after unblinding, were:
\begin{itemize}
\item \textbf{P1} (Directional): $r(+15\,\text{d}) > 0$ in the OOS window.
\item \textbf{P2} (Significance): $p_\text{global} < 0.05$ and a
non-negative rolling trend.
\item \textbf{P3} (Stability): rolling $r$ standard deviation $\leq 0.10$.
\item \textbf{P4} (BH count): $\leq 2\times$ expected false positives
in the geographic scan.
\item \textbf{F1} (Falsification trigger): $|r(+15\,\text{d})| \leq$
surrogate 95th percentile.
\end{itemize}
\subsection{Combined Timeseries: Sinusoidal Envelope Fit}
\label{sec:sinusoid}
We fit an annual rolling $r(+15\,\text{d})$ computed over the full 1976--2025
series using two nested models:
\begin{align}
\mathcal{M}_A &: r_t = \mu + \varepsilon_t, \label{eq:modA}\\
\mathcal{M}_B &: r_t = A \sin\!\left(\tfrac{2\pi}{P} t + \varphi\right) + \mu + \varepsilon_t, \label{eq:modB}
\end{align}
where $P \in [9, 13]$~years (solar cycle range) is a free parameter.
Model selection uses the Bayesian information criterion (BIC):
\begin{equation}
\text{BIC} = n \ln\!\left(\frac{\text{RSS}}{n}\right) + k \ln(n),
\label{eq:bic}
\end{equation}
with $k_A = 1$, $k_B = 4$, and the Bayes factor approximated as
\begin{equation}
\text{BF}_{BA} \approx \exp\!\left(\frac{\Delta\text{BIC}}{2}\right),
\quad \Delta\text{BIC} = \text{BIC}_A - \text{BIC}_B.
\label{eq:bf}
\end{equation}
Parameters are estimated by nonlinear least squares with a grid search over
$(P, \varphi)$ to avoid local minima.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Results}
\label{sec:results}
\subsection{Raw Pairwise Correlations Between CR, Seismic, and Sunspot Data}
\label{sec:res:rawcorr}
Before applying any detrending, we characterise the raw statistical relationships
between all three observable time series across the three analysis windows.
This serves as the pre-detrending baseline that motivates the Hodrick--Prescott
filter analysis described in Section~\ref{sec:res:detrend}.
\subsubsection{CR index: station distribution}
The global CR index is not collapsed to a single mean.
Instead, for each 5-day bin $t$ we compute the distribution of normalised
count-rate values across all $n_t \geq 3$ contributing stations,
yielding five order statistics:
$\hat{x}_{t}^{(q)} = q\text{-th percentile}$ for $q \in \{5, 25, 50, 75, 95\}$,
together with the inter-station minimum and maximum.
All stations are normalised by their individual long-run mean before the
cross-station percentile is taken, so the station-median series has a grand
mean near unity.
The scatter panels in Figures~\ref{fig:rawinsample}--\ref{fig:rawcombined}
use $\hat{x}_{t}^{(50)}$ as the central CR value (horizontal axis) and display
the $[\hat{x}_{t}^{(5)}, \hat{x}_{t}^{(95)}]$ inter-station spread as
horizontal error bars, with the $[\hat{x}_{t}^{\min}, \hat{x}_{t}^{\max}]$
range overlaid in a lighter shade.
\subsubsection{Seismic energy metric}
The seismic activity per bin is measured by the total released seismic energy,
computed as the sum of the individual earthquake energies:
\begin{equation}
E_t = \sum_{i \in \mathcal{B}_t} 10^{1.5\, M_{W,i}},
\label{eq:energy_bin}
\end{equation}
where $\mathcal{B}_t$ is the set of $M_W \geq 4.5$ events falling in bin $t$.
Working with $\log_{10} E_t$ removes the extreme skewness of $E_t$ and is
physically preferable to summing $M_W$ directly, which is dimensionally
inconsistent because magnitude is already a logarithmic quantity.
The $\log_{10} E_t$ axis in the scatter panels spans roughly three orders of
magnitude, reflecting the heavy-tailed Gutenberg--Richter distribution of
earthquake sizes.
\subsubsection{Sunspot number}
Daily sunspot numbers from SILSO are smoothed with a 365-day rolling mean to
suppress intra-year variability and isolate the solar-cycle envelope.
Each 5-day bin carries both the smoothed value (scatter-plot axis) and the
daily min--max spread within that bin (shown as additional horizontal error
bars on the sunspot axis in Figure~\ref{fig:rawinsample}--\ref{fig:rawcombined},
panel 3).
\subsubsection{Correlation results}
We compute both Pearson $r$ (linear; Fisher $z$-transform 95\% CI) and
Spearman $\rho$ (rank-based; appropriate for the heavy-tailed marginal
distributions of $E_t$ and CR flux).
Bonferroni correction for $3\ \text{pairs} \times 3\ \text{windows} = 9$
tests is applied; star levels refer to the corrected $p$-values.
Full results are given in Table~\ref{tab:rawcorr}.
\paragraph{CR vs.\ seismicity.}
The raw Pearson correlation between the station-median CR index and
$\log_{10} E_t$ is $r = 0.057$ in the in-sample window ($N = 3214$ bins,
$p_\text{Bonf} = 0.011$, $\rho = 0.071$, $p_\text{Bonf} < 10^{-3}$) and
$r = 0.046$ in the OOS window ($N = 390$, $p_\text{Bonf} = 1.0$).
The correlation is thus only marginally significant after correction and
disappears entirely in the independent OOS window.
Using the station-95th-percentile CR series in place of the median
amplifies the correlation slightly ($r = 0.069$ in-sample), suggesting that
high-flux excursions drive the signal.
\paragraph{CR vs.\ sunspot number.}
The dominant raw relationship in the dataset is a strong anti-correlation
between the CR index and the smoothed sunspot number: $r = -0.820$
($\rho = -0.854$) in-sample, and $r = -0.939$ ($\rho = -0.951$) in the OOS
window.
This is the well-established \emph{Forbush decrease} mechanism: a higher
heliospheric magnetic flux during solar maximum deflects more galactic cosmic
rays before they reach the Earth's atmosphere
\citep{Potgieter2013}, so CR flux and solar activity are naturally anti-phase.
The stronger OOS value ($-0.94$ vs.\ $-0.82$) reflects the wide dynamic range
of Solar Cycle~25 (2020--2025), which reached high activity levels after the
deep minimum of 2019--2020.
\paragraph{Sunspot vs.\ seismicity.}
The raw correlation between the smoothed sunspot number and seismic energy is
negative but small: $r = -0.095$ ($\rho = -0.099$) in-sample
($p_\text{Bonf} < 10^{-6}$) and indistinguishable from zero in the OOS window
($r = -0.023$, $p_\text{Bonf} = 1.0$).
The negative sign is consistent with reports of a weak inverse relationship
between solar activity and global seismicity
\citep{Odintsov2006}, though the OOS failure indicates this effect is not
robust.
\subsubsection{Interpretation: a confounding triangle}
The three raw correlations form a confounding triangle.
The CR--sunspot anti-correlation is strong ($r \approx -0.82$ to $-0.94$)
and physically understood.
The sunspot--seismicity anti-correlation is weak but significant in-sample
($r \approx -0.095$).
Together these imply a spurious \emph{positive} CR--seismicity correlation:
both CR and seismicity are jointly modulated by the $\sim$11-year solar cycle
--- CR increases during solar minima, and seismicity may very weakly increase
during solar minima --- so the two series covary positively without any direct
physical connection.
This confound cannot be resolved by naive cross-correlation; it requires
explicitly removing the shared solar-cycle component.
The HP filter analysis (Section~\ref{sec:res:detrend}) accomplishes exactly
this, and shows that the apparent CR--seismicity signal vanishes once the
solar-cycle trend is eliminated.
These raw correlations are intentionally uncontrolled for solar-cycle
modulation.
They are presented here as the pre-detrending baseline to document the scale
of the confound before any correction is applied, and to highlight that even
the uncorrected CR--seismicity correlation ($r = 0.057$) is far weaker than
the CR--sunspot anti-correlation ($r = -0.82$) that drives it.
\begin{table}[htbp]
\centering
\caption{Raw pairwise correlation statistics across three time windows.
Bonferroni correction applied for $3 \times 3 = 9$ tests.
CR uses the per-bin station-median index ($\hat{x}^{(50)}$).
Seismic energy is $\log_{10}\!\left(\sum 10^{1.5 M_W}\right)$.
Sunspot is the 365-day smoothed daily count.
$^{*}p_\text{Bonf}<0.05$, $^{**}p_\text{Bonf}<0.01$,
$^{***}p_\text{Bonf}<0.001$.}
\label{tab:rawcorr}
\setlength{\tabcolsep}{4pt}
\begin{tabular}{llrrrrrrr}
\toprule
Pair & Window & $N$ &
$r$ & 95\% CI &
$p$ (raw) & $p$ (Bonf.) &
$\rho$ & $p_\rho$ (Bonf.) \\
\midrule
CR vs Seismicity & In-sample & 3214 & $+0.057^{*}$ & $[+0.023, +0.092]$ & $0.0012$ & $0.011$ & $+0.071^{***}$ & $<10^{-3}$ \\
& OOS & 390 & $+0.046$ & $[-0.053, +0.145]$ & $0.362$ & $1.00$ & $+0.018$ & $1.00$ \\
& Combined & 3604 & $+0.055^{**}$ & $[+0.023, +0.088]$ & $0.0009$ & $0.008$ & $+0.065^{***}$ & $<10^{-3}$ \\
\addlinespace
CR vs Sunspot & In-sample & 3109 & $-0.820^{***}$ & $[-0.831, -0.808]$ & $\approx 0$ & $\approx 0$ & $-0.854^{***}$ & $\approx 0$ \\
& OOS & 385 & $-0.939^{***}$ & $[-0.950, -0.926]$ & $\approx 0$ & $\approx 0$ & $-0.951^{***}$ & $\approx 0$ \\
& Combined & 3494 & $-0.815^{***}$ & $[-0.826, -0.804]$ & $\approx 0$ & $\approx 0$ & $-0.844^{***}$ & $\approx 0$ \\
\addlinespace
Sunspot vs Seismicity & In-sample & 3109 & $-0.095^{***}$ & $[-0.130, -0.060]$ & $1.1 \times 10^{-7}$ & $10^{-6}$ & $-0.099^{***}$ & $<10^{-6}$ \\
& OOS & 385 & $-0.023$ & $[-0.123, +0.077]$ & $0.648$ & $1.00$ & $-0.016$ & $1.00$ \\
& Combined & 3494 & $-0.086^{***}$ & $[-0.119, -0.053]$ & $3.4 \times 10^{-7}$ & $3 \times 10^{-6}$ & $-0.086^{***}$ & $<10^{-6}$ \\
\addlinespace
\bottomrule
\end{tabular}
\medskip
\raggedright
\textit{Note:} CR$_{p95}$ variant (station 95th-percentile instead of median)
gives qualitatively identical structure; full results in
\texttt{results/raw\_pairwise\_correlations.json}.
\end{table}
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{raw_corr_insample.png}
\caption{Raw pairwise scatter plots for the in-sample window (1976--2019,
$N = 3{,}215$ five-day bins).
\textbf{Left}: CR station-median index vs $\log_{10}$ seismic energy; horizontal
error bars span the station $[\hat{x}^{(5)}, \hat{x}^{(95)}]$ spread.
\textbf{Centre}: CR index vs 365-day smoothed sunspot number; the strong
anti-correlation ($r = -0.82$) reflects the Forbush decrease mechanism.
\textbf{Right}: Smoothed sunspot number vs $\log_{10}$ seismic energy; thin
horizontal error bars show the daily sunspot spread within each 5-day bin.
All points are coloured by decimal year (plasma colormap), revealing the
solar-cycle drift: successive cycles trace the same anti-correlated arc in
the centre panel.
Black curves are LOWESS trend lines ($f = 0.4$).}
\label{fig:rawinsample}
\end{figure}
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{raw_corr_oos.png}
\caption{Raw pairwise scatter plots for the out-of-sample window
(2020--2025, $N = 390$ bins, 27 NMDB stations).
Layout identical to Figure~\ref{fig:rawinsample}.
The CR--sunspot anti-correlation strengthens to $r = -0.939$ during Solar
Cycle~25, which had a particularly wide dynamic range.
The CR--seismicity correlation ($r = 0.046$, $p_\text{Bonf} = 1.0$) is
indistinguishable from zero.}
\label{fig:rawoos}
\end{figure}
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{raw_corr_combined.png}
\caption{Raw pairwise scatter plots for the full combined window
(1976--2025, $N = 3{,}604$ bins).
The multi-decadal colour gradient in the centre panel shows CR and sunspot
oscillating in anti-phase across four complete solar cycles.
The overall CR--seismicity Pearson correlation ($r = 0.055$, $\rho = 0.065$)
is statistically significant ($p_\text{Bonf} = 0.008$) but quantitatively
negligible, and its sign follows directly from the confounding triangle
described in the text.}
\label{fig:rawcombined}
\end{figure}
\subsection{In-Sample Replication (1976--2019)}
\label{sec:res:insample}
Figure~\ref{fig:homola} shows the full cross-correlation function of the raw
(undetrended) CR index and seismic metric (log$_{10}$ summed energy, Eq.~\ref{eq:energy})
over 1976--2019 ($T = 3{,}215$ five-day bins, 44 stations).
The dominant peak is at $\tau = -525$~days ($r = 0.139$), corresponding to a
half-solar-cycle lead of seismicity over CR flux.
At the claimed lag $\tau = +15$~days we find $r = 0.081$.
Although naive significance is high (4.6$\sigma$ treating bins as i.i.d.), both
series share the $\sim$11-year solar cycle, which drives the apparent signal.
The Bartlett effective-$N$ estimate gives $N_\text{eff} = 2{,}916$ and
$\sigma_{r(+15)} = 4.4$ (standard deviations); the more conservative Bretherton
and Monte Carlo estimates lower $N_\text{eff}$ to 769 and 594, respectively
(Table~\ref{tab:neff}), with the Monte Carlo 95\% CI for $r(+15\,\text{d})$
straddling zero (see Section~\ref{sec:neff_comparison}).
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{homola_replication.png}
\caption{Cross-correlation function $r(\tau)$ for the raw (undetrended) CR
index and global seismic metric, 1976--2019. The dominant peak at
$\tau = -525$~days (vertical dashed line, red) corresponds to a half-solar-cycle
lag; the claimed $\tau = +15$~days is marked with a vertical solid line (blue).
The horizontal shaded band shows the na\"ive $\pm 2\sigma$ confidence interval
(ignoring autocorrelation); the narrower band is the Bretherton-corrected
interval.}
\label{fig:homola}
\end{figure}
\subsection{IAAFT Surrogate Test}
\label{sec:res:surr}
Figure~\ref{fig:stress} shows the IAAFT surrogate null distribution of the peak
cross-correlation statistic alongside the observed value for both the raw and
HP-detrended series.
For the \textit{raw} series: $\rho_\text{obs} = 0.139$ exceeds all 10{,}000
surrogates ($p_\text{global} < 10^{-4}$, $> 3.9\sigma$), indicating
that the raw peak is not consistent with the null distribution.
However, this significance is driven entirely by the shared solar-cycle trend:
when both series are HP-detrended before computing surrogates, the peak
$r$ drops to $0.101$ (at $\tau = -125$~days) and achieves
$p_\text{global} < 10^{-4}$ ($> 3.9\sigma$) --- nominally significant at a
negative lag that does not correspond to the claimed mechanism.
Crucially, $r(+15\,\text{d})$ after HP detrending is $0.027$, well within the
surrogate null distribution.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{stress_test.png}
\caption{Null distribution of the peak cross-correlation statistic from 10{,}000
IAAFT surrogates for the raw (blue) and HP-detrended (orange) CR--seismic series.
Vertical dashed lines mark the observed peak for each case.
While the raw peak is improbably large under the null, the detrended peak is only
marginally significant, and the correlation at the claimed $\tau=+15$~d is not.}
\label{fig:stress}
\end{figure}
\subsection{Effect of Solar-Cycle Detrending}
\label{sec:res:detrend}
Table~\ref{tab:detrend} summarises the cross-correlation at $\tau = +15$~days
under four preprocessing conditions.
The raw $r = 0.081$ falls to $0.027$ after HP filtering, to $0.030$ after STL
decomposition, and to $0.037$ after sunspot regression.
In all detrended cases the claimed $\tau = +15$~day signal is negligible.
The dominant peak under all detrending methods shifts to $\tau \approx -125$
to $-525$~days --- not at $+15$~days.
Figure~\ref{fig:detrend_robust} extends this comparison to three additional
detrending approaches (HP, Butterworth highpass, 12-month rolling-mean
subtraction) and confirms that no detrending method reveals a positive
correlation at the claimed lag (Section~\ref{sec:detrend_robust}).
\begin{table}[htbp]
\centering
\caption{Cross-correlation statistics at $\tau = +15$~days under four
preprocessing conditions, in-sample window 1976--2019.}
\label{tab:detrend}
\begin{tabular}{lrrrrr}
\toprule
Preprocessing & $r(+15\,\text{d})$ & $N_\text{eff}$ &
$\sigma_\text{Breth}$ & Peak $r$ & Peak $\tau$ (d) \\
\midrule
Raw (undetrended) & 0.081 & 2{,}916 & 4.4 & 0.139 & $-525$ \\
HP filter & 0.027 & 3{,}199 & 1.5 & 0.101 & $-125$ \\
STL decomposition & 0.030 & 3{,}031 & 1.6 & 0.093 & $-525$ \\
Sunspot regression& 0.037 & 3{,}056 & 2.0 & 0.092 & $-125$ \\
\bottomrule
\end{tabular}
\end{table}
Figure~\ref{fig:detrended} shows the cross-correlation functions before and
after HP detrending.
Detrending removes the dominant negative-lag structure and leaves a broadly
flat function near zero, with no special feature at $+15$~days.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{detrended_xcorr.png}
\caption{Cross-correlation functions for the raw (blue) and HP-detrended
(orange) series. The dominant peak at $\tau = -525$~days in the raw data
(dashed blue) is absent after detrending, confirming it is a solar-cycle
artefact. Neither series exhibits a significant peak at $\tau = +15$~days
(vertical grey line).}
\label{fig:detrended}
\end{figure}
\subsection{Detrending Robustness}
\label{sec:detrend_robust}
To confirm that the null result at $\tau = +15$~days does not depend on the
choice of detrending method, Figure~\ref{fig:detrend_robust} compares three
distinct approaches applied to the in-sample window: (i) the HP filter
($\lambda = 4.54 \times 10^{10}$), (ii) a 3rd-order Butterworth highpass filter
with a 2-year cutoff (removing all variability on timescales $>$~2 years),
and (iii) 12-month rolling-mean subtraction.
All three leave $r(+15\,\text{d}) < 0.04$, with no approach producing a feature
at the claimed lag.
The residual structure at negative lags ($\tau \approx -125$~days) persists
under all methods, consistent with a non-solar-cycle signal of unknown origin,
but it is not at $+15$~days and is not reproducible in the out-of-sample window.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{detrend_robustness.png}
\caption{Cross-correlation $r(\tau)$ under three detrending approaches
(HP filter, Butterworth highpass, rolling-mean subtraction) for the raw
CR index vs.\ log$_{10}$ seismic energy, 1976--2019.
The vertical grey line marks the claimed $\tau = +15$~days.
No method produces a positive feature at this lag.}
\label{fig:detrend_robust}
\end{figure}
\subsection{Comparison of $N_\text{eff}$ Estimators}
\label{sec:neff_comparison}
Table~\ref{tab:neff} compares the three $N_\text{eff}$ estimators described
in Section~\ref{sec:neff} for the raw in-sample series at $\tau = +15$~days.
The Bartlett first-order estimator is most liberal ($N_\text{eff} = 2{,}923$),
the Bretherton full-sum estimator substantially lower ($N_\text{eff} = 769$),
and the Monte Carlo phase-surrogate estimator most conservative
($N_\text{eff} = 594$).
The 95\% confidence interval for $r(+15\,\text{d})$ under the Monte Carlo
estimate straddles zero ($[-0.002, +0.158]$), meaning the raw correlation is
not statistically significant even before any detrending.
\begin{table}[htbp]
\centering
\caption{Comparison of $N_\text{eff}$ estimators for raw in-sample series at
$\tau = +15$~days ($N = 3{,}214$, $r = 0.079$).}
\label{tab:neff}
\begin{tabular}{lrrr}
\toprule
Method & $N_\text{eff}$ & 95\% CI for $r$ & Includes zero? \\
\midrule
Bartlett 1946 (first-order) & 2{,}923 & [0.042, 0.115] & No \\
Bretherton 1999 (full sum) & 769 & [0.008, 0.148] & No \\
Monte Carlo (phase surr.) & 594 & [$-$0.002, 0.158] & Yes \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Magnitude Threshold Sensitivity}
\label{sec:mag_threshold}
To assess sensitivity to the event selection threshold and potential bias from
aftershock contamination, we recomputed the seismic energy metric for three
minimum-magnitude cutoffs: $M \geq 4.5$ (baseline), $M \geq 5.0$, and
$M \geq 6.0$.
Figure~\ref{fig:mag_threshold} shows the cross-correlation function $r(\tau)$
for each threshold.
The values at the claimed lag are $r(+15\,\text{d}) = 0.079$, $0.072$, and
$0.050$ for $M \geq 4.5$, $5.0$, and $6.0$ respectively --- a range of only
$0.029$, well below the $0.05$ threshold for a meaningful shift.
The peak lag remains at $\tau = -525$~days for all thresholds, indicating
that aftershock swarms (which inflate the $M \geq 4.5$ catalogue in the days
following large events) do not drive the dominant correlation structure.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{magnitude_threshold.png}
\caption{Cross-correlation $r(\tau)$ for three magnitude cutoffs
($M \geq 4.5$, blue; $M \geq 5.0$, orange; $M \geq 6.0$, green),
in-sample window 1976--2019. The vertical grey line marks $\tau = +15$~days.
All three cutoffs yield similar, small correlations at the claimed lag,
and the dominant peak location ($\tau = -525$~days) is stable.}
\label{fig:mag_threshold}
\end{figure}
\subsection{Block-Bootstrap Surrogate Comparison}
\label{sec:res:blockbootstrap}
Figure~\ref{fig:blockbootstrap} shows the block-bootstrap (CBB) null
distributions for $r(+15\,\text{d})$ and the peak $|r|$ applied to the raw,
undetrended series.
Under the CBB null ($B = 804$ bins, $S = 5{,}000$):
\begin{itemize}
\item $p_\text{CBB}(+15\,\text{d}) = 0.022$ for the raw $r(+15\,\text{d}) = 0.079$;
\item $p_\text{CBB}(\text{peak}) = 0.008$ for the dominant raw peak
$r = 0.135$ at $\tau = -525$~days.
\end{itemize}
Both p-values reflect the fact that the raw series share a common 11-year
solar-cycle trend, so independently block-resampled surrogates occasionally
reproduce similar trend-induced correlations.
After HP detrending and IAAFT testing (Section~\ref{sec:res:surr}),
$r(+15\,\text{d})$ drops to 0.027 and falls well within the surrogate null,
confirming that the marginal significance seen here is a solar-cycle artefact,
not a genuine cross-series signal.
The block-bootstrap and IAAFT null shapes are qualitatively similar,
indicating that IAAFT's parametric assumptions do not materially distort the
null distribution for these data.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.95\textwidth]{block_bootstrap_null.png}
\caption{Block-bootstrap null distributions for the raw CR--seismic pair
($B = 804$ bins $\approx 11$~yr, $S = 5{,}000$ surrogates).
\textbf{Left}: distribution of $r(+15\,\text{d})$ under the CBB null; observed
value $r = 0.079$ (red) lies at the $p = 0.022$ tail.
\textbf{Right}: distribution of the peak $|r|$; observed peak $0.135$ at
$\tau = -525$~d lies at $p = 0.008$.
Grey dashed lines mark the 95\% CI of the null.}
\label{fig:blockbootstrap}
\end{figure}
\subsection{Partial Correlation: Controlling for Sunspot Number}
\label{sec:res:partialcorr}
Figure~\ref{fig:partialcorr} shows the cross-correlation function for both the
raw seismic metric and the sunspot-regressed residual.
The OLS regression of seismic energy on the smoothed sunspot number gives
$\hat{\beta} = -0.0011$~units per sunspot number unit (a weak negative
relationship) and $\hat{\alpha} = 14.9$.
After removing this sunspot component, $r(+15\,\text{d})$ drops from 0.079
to 0.029 --- a reduction of 63\%.
This confirms that most of the raw CR--seismic correlation at the claimed lag
is attributable to the shared solar-cycle trend without invoking any digital
filter.
The residual partial correlation ($r = 0.029$) is indistinguishable from zero
($p = 0.083$, treating bins as independent; even smaller when autocorrelation
is accounted for).
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{partial_correlation.png}
\caption{Cross-correlation $r(\tau)$ for the raw seismic metric (blue) and
the sunspot-regressed seismic residual (orange), both against the CR index,
on the raw (unfiltered) in-sample series.
The claimed $\tau = +15$~d (grey dashed line) shows $r_\text{raw} = 0.079$
vs.\ $r_\text{partial} = 0.029$, a 63\% reduction once the shared solar-cycle
component is regressed out.}
\label{fig:partialcorr}
\end{figure}
\subsection{Spectral Coherence and Mutual Information}
\label{sec:res:coherence}
Figure~\ref{fig:coherence} shows the magnitude-squared coherence spectrum and
the mutual information shuffle null.
\paragraph{Coherence.}
The mean coherence in the solar-cycle band (0.08--0.115~cycles~yr$^{-1}$)
is $\bar{C}_{xy} = 0.840$, substantially above the 95\% significance
threshold of 0.776 (for $K = 3$ Welch segments; see Section~\ref{sec:methods:nonlinear}).
This confirms that CR and seismic activity share strong common variance at
the $\sim$11-year frequency --- the spectral signature of the solar-cycle
confound identified in Section~\ref{sec:discussion}.
Note that the limited number of Welch segments ($K \approx 3$) makes the
coherence estimate at this frequency uncertain; the result should be interpreted
as indicative rather than precise.
\paragraph{Mutual information.}
The kNN mutual information at lag $\tau = 0$ is
$\hat{I} = 0.007$~nats, with a shuffle-null p-value of $p = 0.062$ --- not
significant.
At $\tau = +15$~d, $\hat{I} = 0.000$~nats ($p = 1.000$), confirming the
complete absence of any nonlinear association at the claimed lag.
Together with the linear cross-correlation results, this establishes that
there is no detectable statistical dependence --- linear or nonlinear ---
between the CR index and global seismicity at the claimed $+15$~day lag.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.95\textwidth]{spectral_coherence.png}
\caption{\textbf{Left}: Magnitude-squared coherence between the CR index and
seismic metric (blue), with the solar-cycle band shaded (orange, 0.08--0.115
cycles~yr$^{-1}$) and the 95\% significance level (red dashed).
The mean coherence in the SC band is 0.840, confirming a strong shared
solar-cycle component.
\textbf{Right}: kNN mutual information ($k = 5$) at lag $\tau = 0$ (blue)
and $\tau = +15$~d (orange) vs.\ their respective shuffle-null distributions.
Both observed MI values are indistinguishable from zero; $p(+15\,\text{d}) = 1.000$.}
\label{fig:coherence}
\end{figure}
\subsection{Missing-Data Sensitivity}
\label{sec:res:missing}
Table~\ref{tab:missing} reports the NaN fraction in the global CR index for
station thresholds of 2, 3, and 5.
In all cases the NaN fraction is 0.0\%: the 44-station NMDB network provides
complete 5-day coverage over 1976--2019 even under the strictest threshold
tested.
Consequently, $r(+15\,\text{d})$ is identical (0.079) across all thresholds,
confirming that the result is not an artefact of gaps in the CR record.
NaN bins show no clustering near solar maxima (clustering ratio = 1.0 at all
thresholds, i.e.\ indistinguishable from uniform), ruling out any systematic
data dropout that could correlate with the solar cycle.
\begin{table}[htbp]
\centering
\caption{Missing-data sensitivity: global CR index and correlation at
$\tau = +15$~days for three station-threshold values.}
\label{tab:missing}
\begin{tabular}{rrrr}
\toprule
Min.\ stations & NaN fraction & NaN near solar max & $r(+15\,\text{d})$ \\
\midrule
2 & 0.0\% & 0.0\% & 0.079 \\
3 & 0.0\% & 0.0\% & 0.079 \\
5 & 0.0\% & 0.0\% & 0.079 \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Bin-Size Sensitivity}
\label{sec:res:binsize}
Figure~\ref{fig:binsize} shows the cross-correlation functions at three bin
sizes: 1-day, 5-day, and 27-day (Bartels rotation period).
\begin{itemize}
\item \textbf{1-day bins}: $r(+15\,\text{d}) = 0.036$;
dominant peak $r = 0.088$ at $\tau = -525$~days.
\item \textbf{5-day bins} (baseline): $r(+15\,\text{d}) = 0.079$;
dominant peak $r = 0.135$ at $\tau = -525$~days.
\item \textbf{27-day bins}: $r(+27\,\text{d}) = 0.123$;
dominant peak $r = 0.217$ at $\tau \approx -513$~days.
\end{itemize}
In all cases the dominant peak is at approximately $-525$ to $-513$~days
($\approx -18$ months, consistent with a half-solar-cycle lag), not at
$+15$~days.
The correlation at the claimed lag grows with bin size because longer bins
increasingly average out high-frequency noise and emphasise the solar-cycle
component.
This bin-size dependence is the behaviour expected of a solar-cycle artefact
and is inconsistent with a genuine short-lag (15-day) physical mechanism,
which should be insensitive to whether one uses 1-day or 27-day bins.
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{bin_size_sensitivity.png}
\caption{Cross-correlation $r(\tau)$ for three bin sizes: 1-day (left),
5-day (centre), 27-day (right).
The dominant peak (red dotted) consistently falls near $\tau \approx -520$~days
across all bin sizes.
The correlation at the claimed $\tau = +15$~days (grey dashed) increases
from 0.036 to 0.123 as bin size increases, consistent with increasing
solar-cycle leakage rather than a physical short-lag signal.}
\label{fig:binsize}
\end{figure}
\subsection{Earthquake Declustering (Gardner--Knopoff)}
\label{sec:res:decluster}
Figure~\ref{fig:decluster} compares the cross-correlation for the full
catalogue and the mainshock-only catalogue after applying the
\citet{GardnerKnopoff1974} declustering algorithm.
Of 232{,}043 events ($M \geq 4.5$, 1976--2019), 65{,}874 (28.4\%) were
classified as aftershocks and removed, leaving 166{,}169 mainshocks.
The correlation at the claimed lag changes from $r(+15\,\text{d}) = 0.079$
(full) to $r(+15\,\text{d}) = 0.065$ (declustered), a reduction of only 0.014
--- well below the $\Delta r = 0.03$ threshold for a material change.
The peak structure and dominant lag are unchanged.
We conclude that aftershock clustering is not a primary confound in this
analysis; the signal does not disappear when aftershock sequences are removed.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{declustered_xcorr.png}
\caption{Cross-correlation for the full catalogue ($n = 232{,}043$ events,
blue) and the Gardner--Knopoff declustered catalogue ($n = 166{,}169$
mainshocks, orange), in-sample 1976--2019.
Removing 28.4\% of events as aftershocks changes $r(+15\,\text{d})$ by
only $\Delta r = 0.014$, confirming the result is not driven by aftershock
swarms.}
\label{fig:decluster}
\end{figure}
\subsection{Sub-Period Analysis by Solar Cycle}
\label{sec:res:subcycles}
Figure~\ref{fig:subcycles} shows the cross-correlation computed independently
within each of the four complete solar cycles (21--24) in the in-sample window.
Table~\ref{tab:subcycles} summarises the results.
\begin{table}[htbp]
\centering
\caption{Per-solar-cycle cross-correlation at $\tau = +15$~days and the
within-cycle dominant peak.}
\label{tab:subcycles}
\begin{tabular}{lrrrrr}
\toprule
Cycle & Period & $N$ (bins) & $r(+15\,\text{d})$ & Peak $|r|$ & Peak $\tau$ (d) \\
\midrule
21 & 1976--1987 & 768 & $+0.071$ & 0.118 & $-65$ \\
22 & 1987--1996 & 724 & $+0.057$ & 0.104 & $-125$ \\
23 & 1997--2009 & 901 & $+0.018$ & 0.060 & $+125$ \\
24 & 2009--2020 & 810 & $+0.073$ & 0.177 & $-125$ \\
\bottomrule
\end{tabular}
\end{table}
The correlations at $\tau = +15$~d are positive in all four cycles (range
0.018--0.073), but the magnitudes vary by a factor of four, and the dominant
within-cycle peak lags are inconsistent: $-65$, $-125$, $+125$, and $-125$~days.
If the correlation were a genuine physical CR precursor, its lag should be
stable and reproducible across cycles.
The inconsistency of the dominant peak --- which oscillates between positive
and negative lags --- is instead the signature of a cycle-to-cycle drift in
the relative phase between the CR and seismic solar-cycle modulations, a
classical hallmark of a shared-trend confound rather than a causal relationship.
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{solar_cycle_xcorr.png}
\caption{Cross-correlation $r(\tau)$ within each of the four complete solar
cycles (21--24) of the in-sample period (1976--2019).
The claimed lag $\tau = +15$~days (grey dashed) shows modest positive
correlations (0.018--0.073), but the dominant peak lag (red dotted) is
inconsistent across cycles ($-65$, $-125$, $+125$, $-125$~days), pointing
to a drift in the relative solar-cycle phase rather than a physical mechanism.}
\label{fig:subcycles}
\end{figure}
\subsection{Geographic Localisation}
\label{sec:res:geo}
Figure~\ref{fig:geoheatmap} shows the BH-adjusted significance map for all
station--cell pairs.
Of 7{,}037 pairs tested, 455 survive FDR correction at $q = 0.05$.
The expected number of false discoveries under the global null is
$351.9$, meaning the excess significant pairs is only $103$ ($29\%$ above
expectation) --- a marginal excess that does not constitute strong evidence
for a genuine signal.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{geo_heatmap.png}
\caption{Heatmap of BH-significant station--grid-cell pairs ($q = 0.05$).
Each row is an NMDB station; each column is a $10° \times 10°$ seismic grid
cell. Significant pairs (455/7{,}037) are scattered without obvious geographic
clustering, inconsistent with a local coupling mechanism.}
\label{fig:geoheatmap}
\end{figure}
Figure~\ref{fig:geodistlag} shows the regression of the optimal lag $\tau^*(s,g)$
on great-circle distance $d(s,g)$.
The slope is $\beta = -0.45$~days/1000\,km ($p = 0.21$, $R^2 = 0.0002$),
indistinguishable from zero.
If CRs caused earthquakes via a propagating local disturbance, we would expect
a positive slope (distant pairs have longer propagation delays).
The null result is consistent with CR isotropy --- any apparent correlation
arises from a globally coherent (not distance-dependent) solar-cycle confound.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{geo_distance_lag.png}
\caption{Optimal lag $\tau^*(s,g)$ vs.\ great-circle distance $d(s,g)$ for all
7{,}037 station--cell pairs (grey) and BH-significant pairs (coloured by peak $|r|$).
The OLS regression line (red) has slope $\beta = -0.45$~days/1000\,km ($p=0.21$),
consistent with zero. A local propagation mechanism would predict a positive slope.}
\label{fig:geodistlag}
\end{figure}
\subsection{Pre-Registered Out-of-Sample Validation (2020--2025)}
\label{sec:res:oos}
The out-of-sample analysis used data from 2020-01-01 to 2025-04-29 ($T = 390$
five-day bins, 35 NMDB stations), a window completely disjoint from the
in-sample period.
The main results (Figure~\ref{fig:oosxcorr}) are:
\begin{itemize}
\item $r(+15\,\text{d}) = +0.030$ (directionally correct, but very small);
\item Surrogate 95th percentile at $\tau = +15$~d: $0.101$ (observed is
well below this threshold);
\item $p_\text{global} = 0.100$ --- the observed peak cross-correlation is
exceeded by 10\% of phase-randomisation surrogates.
\end{itemize}
The prediction scorecard (Table~\ref{tab:prereg}) shows one pass (P1: correct
sign), one failure (P2: $p > 0.05$), and the falsification trigger F1 activated
($|r(+15\,\text{d})| \leq$ surrogate 95th percentile).
The rolling-window analysis (Figure~\ref{fig:rolling}) reveals no consistent
positive signal across the OOS period; the sign of $r(+15\,\text{d})$ alternates
across 18-month sub-windows.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{oos_xcorr.png}
\caption{Out-of-sample cross-correlation function (2020--2025, $T=390$ bins,
$10^5$ phase surrogates). The observed $r(\tau)$ (black) lies within
the surrogate 95th-percentile envelope (grey shading). The claimed signal at
$\tau = +15$~d (vertical line) is $r = 0.030$ --- below the surrogate 95th
percentile of 0.101.}
\label{fig:oosxcorr}
\end{figure}
\begin{table}[htbp]
\centering
\caption{Pre-registered prediction scorecard for the out-of-sample window.}
\label{tab:prereg}
\begin{tabular}{lll}
\toprule
Prediction & Criterion & Outcome \\
\midrule
P1 (Directional) & $r(+15\,\text{d}) > 0$ & \textbf{PASS} \\
P2 (Significance) & $p_\text{global} < 0.05$ & \textbf{FAIL} \\
P3 (Stability) & std(rolling $r$) $\leq 0.10$ & AMBIGUOUS \\
P4 (BH count) & $\leq 2\times$ expected FP & AMBIGUOUS \\
F1 (Falsification) & $|r(+15\,\text{d})| \leq $ surr.\ 95th & \textbf{TRIGGERED} \\
\bottomrule
\end{tabular}
\end{table}
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{rolling_correlation_oos.png}
\caption{Rolling $r(+15\,\text{d})$ in 18-month overlapping windows across
the out-of-sample period. Error bars are bootstrap 95\% confidence intervals.
The grey horizontal band shows the surrogate 95th percentile.
The signal shows no consistent sign or trend.}
\label{fig:rolling}
\end{figure}
\subsection{Combined 1976--2025 Analysis: Sinusoidal Modulation}
\label{sec:res:combined}
Figure~\ref{fig:combined} shows the annual rolling $r(+15\,\text{d})$ over the
full 1976--2025 record, together with the best-fit sinusoidal envelope.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{full_series_with_envelope_fit.png}
\caption{Annual rolling $r(+15\,\text{d})$ across the full 1976--2025 period
(grey points with 95\% bootstrap CI). The sinusoidal best-fit (red curve,
$P = 13.0$~yr) and the constant-mean model (dashed) are nearly indistinguishable;
BIC comparison slightly favours the constant model ($\text{BF} = 0.75$).
The vertical dashed line marks the in-sample/out-of-sample split (2020).}
\label{fig:combined}
\end{figure}
The global surrogate test on the full 1976--2025 window yields $p = 0.010$
($\sigma = 2.57$) at the dominant peak $\tau = -125$~days --- marginally
significant, but at a lag inconsistent with the claimed $+15$~day CR precursor.
The sinusoidal fit (Equations~\ref{eq:modA}--\ref{eq:bf}) does \emph{not} prefer
$\mathcal{M}_B$ over $\mathcal{M}_A$ with the corrected seismic metric:
\begin{itemize}
\item Best-fit period: $P = 13.0$~years;
\item Amplitude: $A = 0.047$;
\item $\Delta\text{BIC} = -0.57$ (negative = $\mathcal{M}_A$ constant preferred);
\item Bayes factor: $\text{BF}_{BA} = 0.75$.
\end{itemize}
A Bayes factor of 0.75 is less than 1.0, indicating that the constant (no
modulation) model is marginally preferred over the sinusoidal model.
With the correct seismic energy metric, the oscillatory rolling cross-correlation
previously attributed to solar-cycle modulation is no longer strongly supported;
the apparent modulation in the old results was partly an artefact of the invalid
summed-$M_W$ metric, which amplified the solar-cycle confound.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Discussion}
\label{sec:discussion}
\subsection{Why Does the Raw Correlation Appear So Strong?}
The raw $r \approx 0.31$ reported by \citet{Homola2023} at $\tau = +15$~days
(reduced to $r = 0.081$ with the correct seismic energy metric) and the naive
$p \sim 10^{-72}$ are products of compounding statistical errors:
(i) using a physically invalid seismic metric (direct sum of logarithmic $M_W$
values), which artificially inflated the solar-cycle variation in the seismic series,
(ii) treating autocorrelated time series as independent observations,
(iii) failing to account for the shared solar-cycle trend driving both CR flux and
seismicity, and (iv) not correcting for scanning over 401 lag values.
The solar cycle is the key confounder.
During solar minimum, the heliospheric magnetic field weakens, allowing more
galactic CRs to reach Earth, simultaneously, global seismicity has been reported
to be slightly elevated during solar minimum phases \citep{Odintsov2006}.
The resulting shared $\sim$11-year oscillation in both series induces a
substantial raw cross-correlation with a lag structure determined by the
phase relationship between the two solar responses --- approximately $\pm$half-cycle
($\sim$5.5 years $\approx 2{,}000$ days), consistent with the dominant raw peak
at $\tau = -525$~days.
\subsection{Physical Plausibility of the Claimed Mechanism}
Even setting aside the statistical issues, the proposed mechanism faces severe
physical constraints.
The total ionisation dose from galactic CRs at the surface is
$\sim 0.3$~mGy/year \citep{Aplin2005} --- far too small to
transfer meaningful mechanical energy to fault zones, which require shear-stress
changes of order $\sim$0.01--1~MPa to trigger earthquakes.
Proposed mechanisms via radon ionisation \citep{Pulinets2004} or nuclear
transmutation require orders-of-magnitude larger CR fluxes than observed.
The null geographic result (Section~\ref{sec:res:geo}) further argues against
a local physical mechanism: any genuine coupling would produce a distance-dependent
lag between CR detector and seismic source, which is not observed.
\subsection{Comparison with Prior Replication Attempts}
Independent replication attempts of the \citet{Homola2023} result have been
limited.
\citet{Urata2018} found similarly inflated correlations using Japanese CR stations
and reported that detrending removed most of the signal.
Our analysis is the first to combine all three of: IAAFT surrogate testing,
solar-cycle-aware detrending, geographic localisation scanning, and pre-registered
out-of-sample validation.
\subsection{Limitations}
Several limitations should be acknowledged:
\begin{enumerate}
\item The OOS window (2020--2025) encompasses Solar Cycle~25, a period of
rising solar activity after the deep minimum of Cycle~24.
The absence of a solar minimum in this window limits statistical power.
\item Seismicity is not stationary; major seismic sequences (e.g.\ Tonga 2022)
can inflate the seismic metric in individual bins.
\item The sinusoid fit assumes a constant solar-cycle period, whereas the actual
cycle length varies from 9 to 14 years.
\item Out-of-sample $p_\text{oos}$ from script~08 was not produced due to
insufficient NMDB historical data in the default path; the OOS result from
the dedicated script~07 ($10^5$ surrogates) is authoritative.
\end{enumerate}
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Conclusions}
\label{sec:conclusions}
We have conducted a rigorous, pre-registered replication of the claimed
cosmic-ray/earthquake correlation from \citet{Homola2023} using 49 years of
data from 44 neutron monitors, the USGS global catalogue, and SILSO sunspot
numbers.
Our principal findings are:
\begin{enumerate}
\item The raw cross-correlation $r(+15\,\text{d}) = 0.081$ (corrected seismic
energy metric) is modest and misleading; it is driven by a shared
$\sim$10-year solar-cycle modulation of both CR flux and global seismicity,
not by a physical CR$\to$seismic mechanism.
The larger value ($r \approx 0.31$) reported by \citet{Homola2023} results
from the physically invalid practice of directly summing logarithmic $M_W$
values rather than the underlying energies.
\item After solar-cycle detrending, $r(+15\,\text{d})$ falls to $0.027$ (HP),
$0.030$ (STL), or $0.037$ (sunspot regression) --- all within the surrogate
null distribution and consistent with zero.
\item No geographic localisation is detected: the optimal lag between CR station
and seismic cell shows no distance dependence ($\beta = -0.45$~d/1000~km,
$p = 0.21$), inconsistent with a local propagation mechanism.
\item A pre-registered out-of-sample test on 2020--2025 yields
$r(+15\,\text{d}) = 0.030$ and $p_\text{global} = 0.100$, entirely
consistent with noise.
\item The 49-year annual rolling correlation timeseries is not significantly
better described by a sinusoid than a constant (Bayes factor 0.75, favoring
constant); the previous sinusoidal modulation finding was an artefact of the
invalid seismic metric.
\item Seven additional robustness checks all corroborate the null result:
\begin{itemize}
\item Block-bootstrap surrogates ($B \approx 11$~yr, $n = 5{,}000$) yield
$p_\text{CBB}(+15\,\text{d}) = 0.022$ on the \emph{raw} series;
after detrending the result is within the null.
\item Partial correlation (regressing seismic on sunspots before correlating
with CR) reduces $r(+15\,\text{d})$ from 0.079 to 0.029 (63\% drop),
confirming solar-cycle confounding without any filter.
\item Mean spectral coherence in the solar-cycle band is 0.840 (> 95\%
threshold), while mutual information at $\tau = +15$~d is exactly zero
($p = 1.000$): no nonlinear signal at the claimed lag.
\item Missing data: 0\% NaN bins at all station thresholds;
$r(+15\,\text{d})$ is unchanged for min-station thresholds 2, 3, or 5.
\item Bin-size sensitivity: the dominant peak is at $\tau \approx -520$~days
for 1-day, 5-day, and 27-day bins; $r$ at $+15$~days scales with bin size
as expected for solar-cycle leakage, not a physical mechanism.
\item Earthquake declustering removes 28\% of events as aftershocks;
$r(+15\,\text{d})$ changes by only $\Delta r = 0.014$ --- not a material confound.
\item Per-solar-cycle analysis shows inconsistent dominant peak lags
($-65$, $-125$, $+125$, $-125$~days across cycles 21--24),
the signature of a phase-drifting solar-cycle artefact.
\end{itemize}
\end{enumerate}
We conclude that there is no statistically credible evidence for a physical
causal link between galactic cosmic-ray flux and global seismicity.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section*{Data Availability}
All analysis code, pre-registration documents, intermediate results, and figures
are publicly available at
\url{https://github.com/pingud98/cosmicraysandearthquakes} under the MIT licence.
Raw data are freely accessible from their respective providers:
NMDB (\url{https://www.nmdb.eu}),
USGS (\url{https://earthquake.usgs.gov/fdsnws/event/1/}),
and SIDC (\url{https://www.sidc.be/silso/datafiles}).
\section*{Acknowledgements}
The author thanks the operators of the NMDB network for maintaining open-access
neutron monitor data, and the USGS Earthquake Hazards Programme for the FDSN
catalogue service.
GPU computations were performed on an NVIDIA Tesla M40.
%%%% ═══════════════════════════════════════════════════════════════════════════
\bibliographystyle{plainnat}
\bibliography{refs}
\end{document}