cosmicraysandearthquakes/paper/main.tex

1096 lines
52 KiB
TeX
Raw Normal View History

\documentclass[12pt,a4paper]{article}
%% ── Packages ────────────────────────────────────────────────────────────────
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{lmodern}
\usepackage[margin=2.5cm]{geometry}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{siunitx}
\usepackage{natbib}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{microtype}
\usepackage{setspace}
\usepackage{lineno}
\onehalfspacing
\hypersetup{
colorlinks=true,
linkcolor=blue!60!black,
citecolor=green!50!black,
urlcolor=blue!60!black,
}
\graphicspath{{figs/}}
%% ── Title block ─────────────────────────────────────────────────────────────
\title{\textbf{No Causal Link Between Galactic Cosmic-Ray Flux and Global
Seismicity: A Pre-Registered Replication with GPU-Accelerated Surrogate Testing
and Out-of-Sample Validation}}
\author{
J.~D.~Devine$^{1}$\\[4pt]
{\small $^{1}$ Independent researcher}\\
{\small \texttt{devine.jd@gmail.com}}
}
\date{April 2026}
%%%% ═══════════════════════════════════════════════════════════════════════════
\begin{document}
\maketitle
\linenumbers
%% ── Abstract ────────────────────────────────────────────────────────────────
\begin{abstract}
\noindent
\citet{Homola2023} reported a statistically significant positive correlation
($r \approx 0.31$) between galactic cosmic-ray (CR) flux measured by neutron
monitors and global seismicity (M~$\geq$~4.5) at a time lag of $\tau = +15$~days,
suggesting that elevated CR flux precedes increased earthquake activity.
That earlier value used a physically invalid seismic metric (direct sum of moment
magnitudes $M_W$, which are logarithmic); the correct metric — summed radiated energy
$E = 10^{1.5 M_W + 4.8}$\,J per bin \citep{Kanamori1977} — yields $r(+15\,\text{d}) = 0.081$.
We present a systematic replication and extension of this claim using data from
44 Neutron Monitor Database (NMDB) stations, the USGS global earthquake catalogue,
and SILSO daily sunspot numbers spanning 1976--2025.
Our analysis proceeds in four stages.
\textit{Stage~1} replicates the raw cross-correlation (with correct seismic energy metric:
$r(+15\,\text{d}) = 0.081$, peak $r = 0.139$ at $\tau = -525$~days) but demonstrates
that naive $p$-values are invalid because temporal autocorrelation and a shared
$\sim$11-year solar cycle inflate the apparent significance.
\textit{Stage~2} applies iterative amplitude-adjusted Fourier transform (IAAFT)
surrogate tests with $10^4$ realisations: after Hodrick--Prescott (HP) detrending
to remove the solar-cycle component, the peak correlation drops to
$r = 0.101$ and achieves formal significance ($p_\text{global} < 10^{-4}$, $>3.9\sigma$),
but $r(+15\,\text{d}) = 0.027$ --- well within the surrogate null distribution.
\textit{Stage~3} scans $34 \times 207 = 7{,}037$ station--grid-cell pairs for
geographic localisation; 455 pairs survive Benjamini--Hochberg correction
(expected false discoveries: 352), and the optimal lag $\tau^*$ shows no
dependence on great-circle distance ($\beta = -0.45$~days/1000~km, $p = 0.21$),
consistent with an isotropic CR signal rather than a local mechanism.
\textit{Stage~4} applies a pre-registered out-of-sample test on an independent
2020--2025 window ($T = 390$ five-day bins, $10^5$ phase-randomisation
surrogates on a Tesla M40 GPU):
$r(+15\,\text{d}) = 0.030$ (surrogate 95th percentile~$= 0.101$),
$p_\text{global} = 0.100$ --- consistent with noise.
Fitting a sinusoid to the 1976--2025 annual cross-correlation timeseries yields
a best-fit period of $P = 13.0$~years and a Bayes factor of $\text{BF} = 0.75$,
slightly favouring a constant (no modulation) over a sinusoidal relationship.
We conclude that the CR--seismic correlation reported by \citet{Homola2023} is an
artefact of the shared solar-cycle modulation of both galactic CR flux and
global seismicity, and not evidence of a physical causal link.
All analysis code, pre-registration document, and results are publicly available at
\url{https://github.com/pingud98/cosmicraysandearthquakes}.
\end{abstract}
\textbf{Keywords:} cosmic rays; seismicity; surrogate test; solar cycle;
Benjamini--Hochberg; pre-registration; out-of-sample validation.
\newpage
\tableofcontents
\newpage
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Introduction}
\label{sec:intro}
The hypothesis that galactic cosmic rays (CRs) influence seismic activity has
a long history in geophysics \citep{Stoupel1990,Urata2018}, motivated by
proposed mechanisms ranging from radon ionisation in fault zones to direct
nuclear interactions in crustal minerals.
\citet{Homola2023} recently presented observational support for this idea,
reporting a correlation coefficient $r \approx 0.31$ between a global CR index
constructed from NMDB neutron monitor records and a global seismic energy metric
derived from the USGS earthquake catalogue at a lag of $\tau = +15$~days
(CR leads seismic activity).
The associated naive $p$-value was reported as $p \sim 10^{-72}$ at this lag.
Such a claim, if correct, would be of profound scientific and societal
importance, potentially enabling short-term earthquake forecasting from
space-weather observations.
It therefore demands rigorous scrutiny.
Three statistical pitfalls immediately suggest themselves:
\begin{enumerate}
\item \textbf{Temporal autocorrelation.}
Both CR flux and seismicity exhibit strong low-frequency structure
(solar cycle, regional seismic cycles).
Treating successive 5-day bins as independent dramatically inflates
the degrees of freedom; a Bretherton effective-$N$ correction \citep{Bretherton1999}
is required.
\item \textbf{Shared solar-cycle trend.}
Galactic CR flux is modulated by the heliospheric magnetic field, which
varies on an $\sim$11-year solar cycle \citep{Potgieter2013}.
Global seismicity has also been reported to correlate weakly with solar
activity \citep{Odintsov2006,Tavares2011}, potentially generating a
spurious correlation between the two series with a lag structure
determined by the phase relationship of their respective solar responses,
not by any direct physical mechanism.
\item \textbf{Multiple-comparison inflation.}
Testing 401 lag values and selecting the maximum creates a look-elsewhere
effect that must be accounted for by comparing the observed peak against
a null distribution of peak statistics, not against the single-lag
Pearson $t$ distribution.
\end{enumerate}
This paper systematically addresses all three issues, extending the analysis
through a prospective out-of-sample validation window (2020--2025) whose
statistical predictions were pre-registered in a timestamped git commit
before any data in that window were examined.
The remainder of the paper is organised as follows.
Section~\ref{sec:data} describes the data sources and preprocessing.
Section~\ref{sec:methods} presents the statistical methods.
Section~\ref{sec:results} reports the results of each analysis stage.
Section~\ref{sec:discussion} interprets the findings.
Section~\ref{sec:conclusions} concludes.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Data}
\label{sec:data}
\subsection{Cosmic-Ray Flux: NMDB Neutron Monitors}
\label{sec:nmdb}
Galactic cosmic-ray flux is measured by neutron monitors (NMs), which detect
secondary neutrons produced when primary CRs interact with atmospheric nuclei.
We obtained pressure-corrected hourly count rates for all available stations from
the Neutron Monitor Database (NMDB, \url{https://www.nmdb.eu}) for the period
1976--2025.
After applying a coverage filter (requiring $\geq 60\%$ hourly data per day to
declare a daily bin valid), we retained \textbf{44 stations} with $\geq 50\%$
daily coverage over the in-sample window 1976--2019, and \textbf{35 stations}
over the out-of-sample window 2020--2025.
Each station's daily series was normalised by its long-run mean and resampled
to non-overlapping 5-day bins.
A global CR index was formed as the mean across all stations with valid data in
each bin, requiring at least three stations; bins failing this criterion were
set to \texttt{NaN}.
\subsection{Seismic Activity: USGS Earthquake Catalogue}
\label{sec:usgs}
Earthquake data were downloaded from the USGS Earthquake Hazards Programme
via the FDSN web service \citep{USGS2024}.
We retained all events with $M \geq 4.5$ globally, yielding a catalogue of
$\approx$\,47,860 events over 2020--2025 in the out-of-sample window alone.
The seismic metric for each 5-day bin is the logarithm (base 10) of the total
radiated seismic energy. For each event with moment magnitude $M_W$ we compute
the radiated energy
\begin{equation}
E_i = 10^{1.5 M_{W,i} + 4.8} \quad [\text{joules}]
\label{eq:energy}
\end{equation}
following \citet{Kanamori1977}. Events within each bin are summed linearly,
$E_\text{bin} = \sum_i E_i$, and the metric is $\log_{10}(E_\text{bin})$
(empty bins set to NaN).
This formulation correctly weights large earthquakes: an $M_W\,8.0$ event
contributes $\sim$1000$\times$ more energy than an $M_W\,6.0$ event.
Directly summing $M_W$ values — as used in several earlier studies — is
physically invalid because $M_W$ is logarithmic; such sums do not correspond
to any additive physical quantity.
\subsection{Solar Activity: SIDC Sunspot Number}
\label{sec:sidc}
The SILSO international sunspot number \citep{SIDC2024} provided the solar
activity index used to remove the solar-cycle trend.
We used the daily series (version 2.0), smoothed with a 365-day running mean
for detrending purposes.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Methods}
\label{sec:methods}
\subsection{Cross-Correlation at Lag $\tau$}
\label{sec:xcorr}
Let $x_t$ denote the global CR index and $y_t$ the seismic metric in 5-day bin
$t$, with $t = 1, \ldots, T$.
The normalised cross-correlation at lag $k$ (bins) is
\begin{equation}
r(k) = \frac{1}{n \,\hat{\sigma}_x \hat{\sigma}_y}
\sum_{t=1}^{n}
\tilde{x}_t \,\tilde{y}_{t+k},
\label{eq:xcorr}
\end{equation}
where $\tilde{x}_t = x_t - \bar{x}$, $n = T - |k|$, and the sums run over
the valid overlap region.
A positive lag $k > 0$ corresponds to CR leading seismicity.
Lags range from $-200$ to $+200$~days (step $= 5$~days, i.e.\ 1 bin),
giving 81 lag values in the in-sample window.
\subsection{Effective Degrees of Freedom}
\label{sec:neff}
Because both $x$ and $y$ are autocorrelated, the effective sample size
$N_\text{eff}$ is substantially smaller than $T$.
We compare three $N_\text{eff}$ estimators; all are listed in Table~\ref{tab:neff}.
The \citet{Bartlett1946} first-order formula uses only the lag-1 autocorrelations:
\begin{equation}
N_\text{eff}^\text{Bart} = T\,
\frac{1 - r_{xx}(1)\,r_{yy}(1)}
{1 + r_{xx}(1)\,r_{yy}(1)},
\label{eq:neff_bartlett}
\end{equation}
the \citet{Bretherton1999} full-sum formula uses all significant lags:
\begin{equation}
N_\text{eff}^\text{Breth} = T \left(
1 + 2 \sum_{k=1}^{K} r_{xx}(k)\, r_{yy}(k)
\right)^{-1},
\label{eq:neff_breth}
\end{equation}
where $r_{xx}$ and $r_{yy}$ are the sample autocorrelation functions of $x$
and $y$, and $K = 200$ lags;
and a Monte Carlo estimate from 1000 phase-randomised surrogates:
$N_\text{eff}^\text{MC} = 1/\hat{\sigma}^2_{r_\text{null}}$,
where $\hat{\sigma}^2_{r_\text{null}}$ is the variance of the surrogate
correlation values at the target lag.
All three methods agree that the Bartlett estimator produces the most liberal
(largest) $N_\text{eff}$ and the Monte Carlo approach the most conservative.
\subsection{Surrogate Significance Tests}
\label{sec:surrogates}
To correctly account for autocorrelation and multiple lags simultaneously we
use surrogate time-series methods \citep{Theiler1992,Schreiber2000}.
\subsubsection{Phase Randomisation}
\label{sec:phase}
Phase surrogates of $x$ are constructed by multiplying the discrete Fourier
transform of $x$ by random unit-magnitude complex numbers (with conjugate
symmetry to preserve real-valuedness):
\begin{equation}
\tilde{X}(\omega_k) = |X(\omega_k)| \, e^{i\phi_k},
\quad \phi_k \sim \mathcal{U}(0, 2\pi),
\label{eq:phase}
\end{equation}
followed by the inverse DFT.
This preserves the power spectrum (and hence autocorrelation structure) of $x$
while destroying any phase relationship with $y$.
\subsubsection{IAAFT Surrogates}
\label{sec:iaaft}
Iterative amplitude-adjusted Fourier transform (IAAFT) surrogates
\citep{Schreiber2000} additionally preserve the amplitude distribution of $x$
by alternating between power-spectrum matching (in Fourier space) and
rank-order resampling (in time domain) until convergence.
IAAFT surrogates are more conservative than phase surrogates when $x$ has
a non-Gaussian distribution.
\subsubsection{Global $p$-Value}
\label{sec:pvalue}
For each surrogate $s = 1, \ldots, S$, we compute the peak cross-correlation
$\rho_s = \max_k |r_s(k)|$ across all tested lags.
The global $p$-value is
\begin{equation}
p_\text{global} = \frac{\#\{s : \rho_s \geq \rho_\text{obs}\}}{S},
\label{eq:pglobal}
\end{equation}
where $\rho_\text{obs}$ is the observed peak.
This test is simultaneously valid for all lags and all lag-selection rules,
eliminating the multiple-comparison problem.
\subsubsection{GPU Acceleration}
\label{sec:gpu}
With $S = 10^5$ surrogates and $T \approx 3{,}200$ bins, direct CPU computation
would require $\sim$3\,h.
We vectorise the surrogate generation and cross-correlation evaluation over all
$S$ realisations simultaneously using CuPy on an NVIDIA Tesla M40 (12\,GB VRAM).
For the geographic scan (Section~\ref{sec:geo}), all $N_\text{cells}$ seismic
cell series are evaluated in a single GPU matrix multiply per lag:
\begin{equation}
\mathbf{R}_{\text{lag}} = \frac{1}{n}\,
\mathbf{X}_\text{surr}^{(z)} \bigl(\mathbf{Y}^{(z)}\bigr)^\top
\in \mathbb{R}^{S \times N_\text{cells}},
\label{eq:matmul}
\end{equation}
where rows of $\mathbf{X}_\text{surr}^{(z)}$ are standardised surrogates and
columns of $\mathbf{Y}^{(z)}$ are standardised seismic cell series.
Benchmarks show a $2.9\times$ speedup for phase surrogates and $1.3\times$ for IAAFT
(limited by chunked argsort to avoid VRAM overflow).
\subsection{Solar-Cycle Detrending}
\label{sec:detrend}
We apply three complementary detrending approaches to isolate the CR--seismic
relationship from the shared solar-cycle trend:
\begin{enumerate}
\item \textbf{Hodrick--Prescott (HP) filter} \citep{HP1997} with smoothing
parameter $\lambda = 4.54 \times 10^{10}$.
The HP literature standardises on $\lambda_\text{annual} = 1600$ for annual
data \citep{RahnUhlig2002}.
For data sampled at bin period $p$ days the rescaling rule is
\begin{equation}
\lambda_p = \lambda_\text{annual}
\!\left(\frac{365}{p}\right)^{\!4}
= 1600 \times (365/5)^4
\approx 4.54 \times 10^{10}.
\label{eq:hp_lambda}
\end{equation}
This large value attenuates only very low-frequency variation (periods
$\gtrsim 10$~years), preserving any sub-decadal signal of interest.
The trend component is subtracted from both $x_t$ and $y_t$.
\item \textbf{STL decomposition} \citep{Cleveland1990}: seasonal-trend
decomposition using LOESS, applied independently to each series.
\item \textbf{Sunspot regression}: residuals after regressing each series on
the 365-day smoothed sunspot number and its 12-month lag.
\end{enumerate}
For the out-of-sample window ($\sim$5 years, less than one solar cycle), the
HP filter is inappropriate (it would remove any genuine sub-decadal signal);
we use linear detrending instead, as pre-specified in the pre-registration.
\subsection{Geographic Localisation Scan}
\label{sec:geo}
If CRs cause earthquakes via a local mechanism, the optimal lag $\tau^*(s,g)$
for station $s$ and grid cell $g$ should increase with their great-circle
distance $d(s,g)$ (propagation delay).
Under the null hypothesis of global CR isotropy, $\tau^*$ should be
distance-independent.
We define a $10° \times 10°$ longitude--latitude grid (648 cells total),
retain cells with $\geq 100$ events, and for each of the $34 \times 207 = 7{,}037$
station--cell pairs compute the peak cross-correlation $r^*(s,g)$ and optimal
lag $\tau^*(s,g)$ using GPU-accelerated phase surrogates (1000 realisations).
Pairs are declared significant at false discovery rate $q = 0.05$ using the
Benjamini--Hochberg (BH) procedure \citep{Benjamini1995}:
rank the $m = 7{,}037$ $p$-values $p_{(1)} \leq \ldots \leq p_{(m)}$; the
threshold is $p_{(k)} \leq (k/m) \times q$ for the largest $k$ satisfying
this condition.
Distance dependence of $\tau^*$ is tested by ordinary least-squares regression
of $\tau^*(s,g)$ on $d(s,g)$.
\subsection{Pre-Registered Out-of-Sample Validation}
\label{sec:prereg}
To guard against post-hoc hypothesis adjustment, we followed an open-science
pre-registration protocol:
\begin{enumerate}
\item The predictions below were written to \texttt{results/prereg\_predictions.md}.
\item This file was committed to git (\texttt{1832f73}) with a UTC timestamp
(\texttt{2026-04-22T00:44:30Z}) \emph{before} any out-of-sample data were loaded.
\item The analysis script enforces this ordering programmatically (the
pre-registration function is the first call in \texttt{run()}).
\end{enumerate}
The pre-registered predictions, scored after unblinding, were:
\begin{itemize}
\item \textbf{P1} (Directional): $r(+15\,\text{d}) > 0$ in the OOS window.
\item \textbf{P2} (Significance): $p_\text{global} < 0.05$ and a
non-negative rolling trend.
\item \textbf{P3} (Stability): rolling $r$ standard deviation $\leq 0.10$.
\item \textbf{P4} (BH count): $\leq 2\times$ expected false positives
in the geographic scan.
\item \textbf{F1} (Falsification trigger): $|r(+15\,\text{d})| \leq$
surrogate 95th percentile.
\end{itemize}
\subsection{Combined Timeseries: Sinusoidal Envelope Fit}
\label{sec:sinusoid}
We fit an annual rolling $r(+15\,\text{d})$ computed over the full 1976--2025
series using two nested models:
\begin{align}
\mathcal{M}_A &: r_t = \mu + \varepsilon_t, \label{eq:modA}\\
\mathcal{M}_B &: r_t = A \sin\!\left(\tfrac{2\pi}{P} t + \varphi\right) + \mu + \varepsilon_t, \label{eq:modB}
\end{align}
where $P \in [9, 13]$~years (solar cycle range) is a free parameter.
Model selection uses the Bayesian information criterion (BIC):
\begin{equation}
\text{BIC} = n \ln\!\left(\frac{\text{RSS}}{n}\right) + k \ln(n),
\label{eq:bic}
\end{equation}
with $k_A = 1$, $k_B = 4$, and the Bayes factor approximated as
\begin{equation}
\text{BF}_{BA} \approx \exp\!\left(\frac{\Delta\text{BIC}}{2}\right),
\quad \Delta\text{BIC} = \text{BIC}_A - \text{BIC}_B.
\label{eq:bf}
\end{equation}
Parameters are estimated by nonlinear least squares with a grid search over
$(P, \varphi)$ to avoid local minima.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Results}
\label{sec:results}
\subsection{Raw Pairwise Correlations Between CR, Seismic, and Sunspot Data}
\label{sec:res:rawcorr}
Before applying any detrending, we characterise the raw statistical relationships
between all three observable time series across the three analysis windows.
This serves as the pre-detrending baseline that motivates the Hodrick--Prescott
filter analysis described in Section~\ref{sec:res:detrend}.
\subsubsection{CR index: station distribution}
The global CR index is not collapsed to a single mean.
Instead, for each 5-day bin $t$ we compute the distribution of normalised
count-rate values across all $n_t \geq 3$ contributing stations,
yielding five order statistics:
$\hat{x}_{t}^{(q)} = q\text{-th percentile}$ for $q \in \{5, 25, 50, 75, 95\}$,
together with the inter-station minimum and maximum.
All stations are normalised by their individual long-run mean before the
cross-station percentile is taken, so the station-median series has a grand
mean near unity.
The scatter panels in Figures~\ref{fig:rawinsample}--\ref{fig:rawcombined}
use $\hat{x}_{t}^{(50)}$ as the central CR value (horizontal axis) and display
the $[\hat{x}_{t}^{(5)}, \hat{x}_{t}^{(95)}]$ inter-station spread as
horizontal error bars, with the $[\hat{x}_{t}^{\min}, \hat{x}_{t}^{\max}]$
range overlaid in a lighter shade.
\subsubsection{Seismic energy metric}
The seismic activity per bin is measured by the total released seismic energy,
computed as the sum of the individual earthquake energies:
\begin{equation}
E_t = \sum_{i \in \mathcal{B}_t} 10^{1.5\, M_{W,i}},
\label{eq:energy}
\end{equation}
where $\mathcal{B}_t$ is the set of $M_W \geq 4.5$ events falling in bin $t$.
Working with $\log_{10} E_t$ removes the extreme skewness of $E_t$ and is
physically preferable to summing $M_W$ directly, which is dimensionally
inconsistent because magnitude is already a logarithmic quantity.
The $\log_{10} E_t$ axis in the scatter panels spans roughly three orders of
magnitude, reflecting the heavy-tailed Gutenberg--Richter distribution of
earthquake sizes.
\subsubsection{Sunspot number}
Daily sunspot numbers from SILSO are smoothed with a 365-day rolling mean to
suppress intra-year variability and isolate the solar-cycle envelope.
Each 5-day bin carries both the smoothed value (scatter-plot axis) and the
daily min--max spread within that bin (shown as additional horizontal error
bars on the sunspot axis in Figure~\ref{fig:rawinsample}--\ref{fig:rawcombined},
panel 3).
\subsubsection{Correlation results}
We compute both Pearson $r$ (linear; Fisher $z$-transform 95\% CI) and
Spearman $\rho$ (rank-based; appropriate for the heavy-tailed marginal
distributions of $E_t$ and CR flux).
Bonferroni correction for $3\ \text{pairs} \times 3\ \text{windows} = 9$
tests is applied; star levels refer to the corrected $p$-values.
Full results are given in Table~\ref{tab:rawcorr}.
\paragraph{CR vs.\ seismicity.}
The raw Pearson correlation between the station-median CR index and
$\log_{10} E_t$ is $r = 0.057$ in the in-sample window ($N = 3214$ bins,
$p_\text{Bonf} = 0.011$, $\rho = 0.071$, $p_\text{Bonf} < 10^{-3}$) and
$r = 0.046$ in the OOS window ($N = 390$, $p_\text{Bonf} = 1.0$).
The correlation is thus only marginally significant after correction and
disappears entirely in the independent OOS window.
Using the station-95th-percentile CR series in place of the median
amplifies the correlation slightly ($r = 0.069$ in-sample), suggesting that
high-flux excursions drive the signal.
\paragraph{CR vs.\ sunspot number.}
The dominant raw relationship in the dataset is a strong anti-correlation
between the CR index and the smoothed sunspot number: $r = -0.820$
($\rho = -0.854$) in-sample, and $r = -0.939$ ($\rho = -0.951$) in the OOS
window.
This is the well-established \emph{Forbush decrease} mechanism: a higher
heliospheric magnetic flux during solar maximum deflects more galactic cosmic
rays before they reach the Earth's atmosphere
\citep{Potgieter2013}, so CR flux and solar activity are naturally anti-phase.
The stronger OOS value ($-0.94$ vs.\ $-0.82$) reflects the wide dynamic range
of Solar Cycle~25 (2020--2025), which reached high activity levels after the
deep minimum of 2019--2020.
\paragraph{Sunspot vs.\ seismicity.}
The raw correlation between the smoothed sunspot number and seismic energy is
negative but small: $r = -0.095$ ($\rho = -0.099$) in-sample
($p_\text{Bonf} < 10^{-6}$) and indistinguishable from zero in the OOS window
($r = -0.023$, $p_\text{Bonf} = 1.0$).
The negative sign is consistent with reports of a weak inverse relationship
between solar activity and global seismicity
\citep{Odintsov2006}, though the OOS failure indicates this effect is not
robust.
\subsubsection{Interpretation: a confounding triangle}
The three raw correlations form a confounding triangle.
The CR--sunspot anti-correlation is strong ($r \approx -0.82$ to $-0.94$)
and physically understood.
The sunspot--seismicity anti-correlation is weak but significant in-sample
($r \approx -0.095$).
Together these imply a spurious \emph{positive} CR--seismicity correlation:
both CR and seismicity are jointly modulated by the $\sim$11-year solar cycle
--- CR increases during solar minima, and seismicity may very weakly increase
during solar minima --- so the two series covary positively without any direct
physical connection.
This confound cannot be resolved by naive cross-correlation; it requires
explicitly removing the shared solar-cycle component.
The HP filter analysis (Section~\ref{sec:res:detrend}) accomplishes exactly
this, and shows that the apparent CR--seismicity signal vanishes once the
solar-cycle trend is eliminated.
These raw correlations are intentionally uncontrolled for solar-cycle
modulation.
They are presented here as the pre-detrending baseline to document the scale
of the confound before any correction is applied, and to highlight that even
the uncorrected CR--seismicity correlation ($r = 0.057$) is far weaker than
the CR--sunspot anti-correlation ($r = -0.82$) that drives it.
\begin{table}[htbp]
\centering
\caption{Raw pairwise correlation statistics across three time windows.
Bonferroni correction applied for $3 \times 3 = 9$ tests.
CR uses the per-bin station-median index ($\hat{x}^{(50)}$).
Seismic energy is $\log_{10}\!\left(\sum 10^{1.5 M_W}\right)$.
Sunspot is the 365-day smoothed daily count.
$^{*}p_\text{Bonf}<0.05$, $^{**}p_\text{Bonf}<0.01$,
$^{***}p_\text{Bonf}<0.001$.}
\label{tab:rawcorr}
\setlength{\tabcolsep}{4pt}
\begin{tabular}{llrrrrrrr}
\toprule
Pair & Window & $N$ &
$r$ & 95\% CI &
$p$ (raw) & $p$ (Bonf.) &
$\rho$ & $p_\rho$ (Bonf.) \\
\midrule
CR vs Seismicity & In-sample & 3214 & $+0.057^{*}$ & $[+0.023, +0.092]$ & $0.0012$ & $0.011$ & $+0.071^{***}$ & $<10^{-3}$ \\
& OOS & 390 & $+0.046$ & $[-0.053, +0.145]$ & $0.362$ & $1.00$ & $+0.018$ & $1.00$ \\
& Combined & 3604 & $+0.055^{**}$ & $[+0.023, +0.088]$ & $0.0009$ & $0.008$ & $+0.065^{***}$ & $<10^{-3}$ \\
\addlinespace
CR vs Sunspot & In-sample & 3109 & $-0.820^{***}$ & $[-0.831, -0.808]$ & $\approx 0$ & $\approx 0$ & $-0.854^{***}$ & $\approx 0$ \\
& OOS & 385 & $-0.939^{***}$ & $[-0.950, -0.926]$ & $\approx 0$ & $\approx 0$ & $-0.951^{***}$ & $\approx 0$ \\
& Combined & 3494 & $-0.815^{***}$ & $[-0.826, -0.804]$ & $\approx 0$ & $\approx 0$ & $-0.844^{***}$ & $\approx 0$ \\
\addlinespace
Sunspot vs Seismicity & In-sample & 3109 & $-0.095^{***}$ & $[-0.130, -0.060]$ & $1.1 \times 10^{-7}$ & $10^{-6}$ & $-0.099^{***}$ & $<10^{-6}$ \\
& OOS & 385 & $-0.023$ & $[-0.123, +0.077]$ & $0.648$ & $1.00$ & $-0.016$ & $1.00$ \\
& Combined & 3494 & $-0.086^{***}$ & $[-0.119, -0.053]$ & $3.4 \times 10^{-7}$ & $3 \times 10^{-6}$ & $-0.086^{***}$ & $<10^{-6}$ \\
\addlinespace
\bottomrule
\end{tabular}
\medskip
\raggedright
\textit{Note:} CR$_{p95}$ variant (station 95th-percentile instead of median)
gives qualitatively identical structure; full results in
\texttt{results/raw\_pairwise\_correlations.json}.
\end{table}
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{raw_corr_insample.png}
\caption{Raw pairwise scatter plots for the in-sample window (1976--2019,
$N = 3{,}215$ five-day bins).
\textbf{Left}: CR station-median index vs $\log_{10}$ seismic energy; horizontal
error bars span the station $[\hat{x}^{(5)}, \hat{x}^{(95)}]$ spread.
\textbf{Centre}: CR index vs 365-day smoothed sunspot number; the strong
anti-correlation ($r = -0.82$) reflects the Forbush decrease mechanism.
\textbf{Right}: Smoothed sunspot number vs $\log_{10}$ seismic energy; thin
horizontal error bars show the daily sunspot spread within each 5-day bin.
All points are coloured by decimal year (plasma colormap), revealing the
solar-cycle drift: successive cycles trace the same anti-correlated arc in
the centre panel.
Black curves are LOWESS trend lines ($f = 0.4$).}
\label{fig:rawinsample}
\end{figure}
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{raw_corr_oos.png}
\caption{Raw pairwise scatter plots for the out-of-sample window
(2020--2025, $N = 390$ bins, 27 NMDB stations).
Layout identical to Figure~\ref{fig:rawinsample}.
The CR--sunspot anti-correlation strengthens to $r = -0.939$ during Solar
Cycle~25, which had a particularly wide dynamic range.
The CR--seismicity correlation ($r = 0.046$, $p_\text{Bonf} = 1.0$) is
indistinguishable from zero.}
\label{fig:rawoos}
\end{figure}
\begin{figure}[htbp]
\centering
\includegraphics[width=\textwidth]{raw_corr_combined.png}
\caption{Raw pairwise scatter plots for the full combined window
(1976--2025, $N = 3{,}604$ bins).
The multi-decadal colour gradient in the centre panel shows CR and sunspot
oscillating in anti-phase across four complete solar cycles.
The overall CR--seismicity Pearson correlation ($r = 0.055$, $\rho = 0.065$)
is statistically significant ($p_\text{Bonf} = 0.008$) but quantitatively
negligible, and its sign follows directly from the confounding triangle
described in the text.}
\label{fig:rawcombined}
\end{figure}
\subsection{In-Sample Replication (1976--2019)}
\label{sec:res:insample}
Figure~\ref{fig:homola} shows the full cross-correlation function of the raw
(undetrended) CR index and seismic metric (log$_{10}$ summed energy, Eq.~\ref{eq:energy})
over 1976--2019 ($T = 3{,}215$ five-day bins, 44 stations).
The dominant peak is at $\tau = -525$~days ($r = 0.139$), corresponding to a
half-solar-cycle lead of seismicity over CR flux.
At the claimed lag $\tau = +15$~days we find $r = 0.081$.
Although naive significance is high (4.6$\sigma$ treating bins as i.i.d.), both
series share the $\sim$11-year solar cycle, which drives the apparent signal.
The Bartlett effective-$N$ estimate gives $N_\text{eff} = 2{,}916$ and
$\sigma_{r(+15)} = 4.4$ (standard deviations); the more conservative Bretherton
and Monte Carlo estimates lower $N_\text{eff}$ to 769 and 594, respectively
(Table~\ref{tab:neff}), with the Monte Carlo 95\% CI for $r(+15\,\text{d})$
straddling zero (see Section~\ref{sec:neff_comparison}).
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{homola_replication.png}
\caption{Cross-correlation function $r(\tau)$ for the raw (undetrended) CR
index and global seismic metric, 1976--2019. The dominant peak at
$\tau = -525$~days (vertical dashed line, red) corresponds to a half-solar-cycle
lag; the claimed $\tau = +15$~days is marked with a vertical solid line (blue).
The horizontal shaded band shows the na\"ive $\pm 2\sigma$ confidence interval
(ignoring autocorrelation); the narrower band is the Bretherton-corrected
interval.}
\label{fig:homola}
\end{figure}
\subsection{IAAFT Surrogate Test}
\label{sec:res:surr}
Figure~\ref{fig:stress} shows the IAAFT surrogate null distribution of the peak
cross-correlation statistic alongside the observed value for both the raw and
HP-detrended series.
For the \textit{raw} series: $\rho_\text{obs} = 0.139$ exceeds all 10{,}000
surrogates ($p_\text{global} < 10^{-4}$, $> 3.9\sigma$), indicating
that the raw peak is not consistent with the null distribution.
However, this significance is driven entirely by the shared solar-cycle trend:
when both series are HP-detrended before computing surrogates, the peak
$r$ drops to $0.101$ (at $\tau = -125$~days) and achieves
$p_\text{global} < 10^{-4}$ ($> 3.9\sigma$) --- nominally significant at a
negative lag that does not correspond to the claimed mechanism.
Crucially, $r(+15\,\text{d})$ after HP detrending is $0.027$, well within the
surrogate null distribution.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{stress_test.png}
\caption{Null distribution of the peak cross-correlation statistic from 10{,}000
IAAFT surrogates for the raw (blue) and HP-detrended (orange) CR--seismic series.
Vertical dashed lines mark the observed peak for each case.
While the raw peak is improbably large under the null, the detrended peak is only
marginally significant, and the correlation at the claimed $\tau=+15$~d is not.}
\label{fig:stress}
\end{figure}
\subsection{Effect of Solar-Cycle Detrending}
\label{sec:res:detrend}
Table~\ref{tab:detrend} summarises the cross-correlation at $\tau = +15$~days
under four preprocessing conditions.
The raw $r = 0.081$ falls to $0.027$ after HP filtering, to $0.030$ after STL
decomposition, and to $0.037$ after sunspot regression.
In all detrended cases the claimed $\tau = +15$~day signal is negligible.
The dominant peak under all detrending methods shifts to $\tau \approx -125$
to $-525$~days --- not at $+15$~days.
Figure~\ref{fig:detrend_robust} extends this comparison to three additional
detrending approaches (HP, Butterworth highpass, 12-month rolling-mean
subtraction) and confirms that no detrending method reveals a positive
correlation at the claimed lag (Section~\ref{sec:detrend_robust}).
\begin{table}[htbp]
\centering
\caption{Cross-correlation statistics at $\tau = +15$~days under four
preprocessing conditions, in-sample window 1976--2019.}
\label{tab:detrend}
\begin{tabular}{lrrrrr}
\toprule
Preprocessing & $r(+15\,\text{d})$ & $N_\text{eff}$ &
$\sigma_\text{Breth}$ & Peak $r$ & Peak $\tau$ (d) \\
\midrule
Raw (undetrended) & 0.081 & 2{,}916 & 4.4 & 0.139 & $-525$ \\
HP filter & 0.027 & 3{,}199 & 1.5 & 0.101 & $-125$ \\
STL decomposition & 0.030 & 3{,}031 & 1.6 & 0.093 & $-525$ \\
Sunspot regression& 0.037 & 3{,}056 & 2.0 & 0.092 & $-125$ \\
\bottomrule
\end{tabular}
\end{table}
Figure~\ref{fig:detrended} shows the cross-correlation functions before and
after HP detrending.
Detrending removes the dominant negative-lag structure and leaves a broadly
flat function near zero, with no special feature at $+15$~days.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{detrended_xcorr.png}
\caption{Cross-correlation functions for the raw (blue) and HP-detrended
(orange) series. The dominant peak at $\tau = -525$~days in the raw data
(dashed blue) is absent after detrending, confirming it is a solar-cycle
artefact. Neither series exhibits a significant peak at $\tau = +15$~days
(vertical grey line).}
\label{fig:detrended}
\end{figure}
\subsection{Detrending Robustness}
\label{sec:detrend_robust}
To confirm that the null result at $\tau = +15$~days does not depend on the
choice of detrending method, Figure~\ref{fig:detrend_robust} compares three
distinct approaches applied to the in-sample window: (i) the HP filter
($\lambda = 4.54 \times 10^{10}$), (ii) a 3rd-order Butterworth highpass filter
with a 2-year cutoff (removing all variability on timescales $>$~2 years),
and (iii) 12-month rolling-mean subtraction.
All three leave $r(+15\,\text{d}) < 0.04$, with no approach producing a feature
at the claimed lag.
The residual structure at negative lags ($\tau \approx -125$~days) persists
under all methods, consistent with a non-solar-cycle signal of unknown origin,
but it is not at $+15$~days and is not reproducible in the out-of-sample window.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{detrend_robustness.png}
\caption{Cross-correlation $r(\tau)$ under three detrending approaches
(HP filter, Butterworth highpass, rolling-mean subtraction) for the raw
CR index vs.\ log$_{10}$ seismic energy, 1976--2019.
The vertical grey line marks the claimed $\tau = +15$~days.
No method produces a positive feature at this lag.}
\label{fig:detrend_robust}
\end{figure}
\subsection{Comparison of $N_\text{eff}$ Estimators}
\label{sec:neff_comparison}
Table~\ref{tab:neff} compares the three $N_\text{eff}$ estimators described
in Section~\ref{sec:neff} for the raw in-sample series at $\tau = +15$~days.
The Bartlett first-order estimator is most liberal ($N_\text{eff} = 2{,}923$),
the Bretherton full-sum estimator substantially lower ($N_\text{eff} = 769$),
and the Monte Carlo phase-surrogate estimator most conservative
($N_\text{eff} = 594$).
The 95\% confidence interval for $r(+15\,\text{d})$ under the Monte Carlo
estimate straddles zero ($[-0.002, +0.158]$), meaning the raw correlation is
not statistically significant even before any detrending.
\begin{table}[htbp]
\centering
\caption{Comparison of $N_\text{eff}$ estimators for raw in-sample series at
$\tau = +15$~days ($N = 3{,}214$, $r = 0.079$).}
\label{tab:neff}
\begin{tabular}{lrrr}
\toprule
Method & $N_\text{eff}$ & 95\% CI for $r$ & Includes zero? \\
\midrule
Bartlett 1946 (first-order) & 2{,}923 & [0.042, 0.115] & No \\
Bretherton 1999 (full sum) & 769 & [0.008, 0.148] & No \\
Monte Carlo (phase surr.) & 594 & [$-$0.002, 0.158] & Yes \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Magnitude Threshold Sensitivity}
\label{sec:mag_threshold}
To assess sensitivity to the event selection threshold and potential bias from
aftershock contamination, we recomputed the seismic energy metric for three
minimum-magnitude cutoffs: $M \geq 4.5$ (baseline), $M \geq 5.0$, and
$M \geq 6.0$.
Figure~\ref{fig:mag_threshold} shows the cross-correlation function $r(\tau)$
for each threshold.
The values at the claimed lag are $r(+15\,\text{d}) = 0.079$, $0.072$, and
$0.050$ for $M \geq 4.5$, $5.0$, and $6.0$ respectively --- a range of only
$0.029$, well below the $0.05$ threshold for a meaningful shift.
The peak lag remains at $\tau = -525$~days for all thresholds, indicating
that aftershock swarms (which inflate the $M \geq 4.5$ catalogue in the days
following large events) do not drive the dominant correlation structure.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{magnitude_threshold.png}
\caption{Cross-correlation $r(\tau)$ for three magnitude cutoffs
($M \geq 4.5$, blue; $M \geq 5.0$, orange; $M \geq 6.0$, green),
in-sample window 1976--2019. The vertical grey line marks $\tau = +15$~days.
All three cutoffs yield similar, small correlations at the claimed lag,
and the dominant peak location ($\tau = -525$~days) is stable.}
\label{fig:mag_threshold}
\end{figure}
\subsection{Geographic Localisation}
\label{sec:res:geo}
Figure~\ref{fig:geoheatmap} shows the BH-adjusted significance map for all
station--cell pairs.
Of 7{,}037 pairs tested, 455 survive FDR correction at $q = 0.05$.
The expected number of false discoveries under the global null is
$351.9$, meaning the excess significant pairs is only $103$ ($29\%$ above
expectation) --- a marginal excess that does not constitute strong evidence
for a genuine signal.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{geo_heatmap.png}
\caption{Heatmap of BH-significant station--grid-cell pairs ($q = 0.05$).
Each row is an NMDB station; each column is a $10° \times 10°$ seismic grid
cell. Significant pairs (455/7{,}037) are scattered without obvious geographic
clustering, inconsistent with a local coupling mechanism.}
\label{fig:geoheatmap}
\end{figure}
Figure~\ref{fig:geodistlag} shows the regression of the optimal lag $\tau^*(s,g)$
on great-circle distance $d(s,g)$.
The slope is $\beta = -0.45$~days/1000\,km ($p = 0.21$, $R^2 = 0.0002$),
indistinguishable from zero.
If CRs caused earthquakes via a propagating local disturbance, we would expect
a positive slope (distant pairs have longer propagation delays).
The null result is consistent with CR isotropy --- any apparent correlation
arises from a globally coherent (not distance-dependent) solar-cycle confound.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{geo_distance_lag.png}
\caption{Optimal lag $\tau^*(s,g)$ vs.\ great-circle distance $d(s,g)$ for all
7{,}037 station--cell pairs (grey) and BH-significant pairs (coloured by peak $|r|$).
The OLS regression line (red) has slope $\beta = -0.45$~days/1000\,km ($p=0.21$),
consistent with zero. A local propagation mechanism would predict a positive slope.}
\label{fig:geodistlag}
\end{figure}
\subsection{Pre-Registered Out-of-Sample Validation (2020--2025)}
\label{sec:res:oos}
The out-of-sample analysis used data from 2020-01-01 to 2025-04-29 ($T = 390$
five-day bins, 35 NMDB stations), a window completely disjoint from the
in-sample period.
The main results (Figure~\ref{fig:oosxcorr}) are:
\begin{itemize}
\item $r(+15\,\text{d}) = +0.030$ (directionally correct, but very small);
\item Surrogate 95th percentile at $\tau = +15$~d: $0.101$ (observed is
well below this threshold);
\item $p_\text{global} = 0.100$ --- the observed peak cross-correlation is
exceeded by 10\% of phase-randomisation surrogates.
\end{itemize}
The prediction scorecard (Table~\ref{tab:prereg}) shows one pass (P1: correct
sign), one failure (P2: $p > 0.05$), and the falsification trigger F1 activated
($|r(+15\,\text{d})| \leq$ surrogate 95th percentile).
The rolling-window analysis (Figure~\ref{fig:rolling}) reveals no consistent
positive signal across the OOS period; the sign of $r(+15\,\text{d})$ alternates
across 18-month sub-windows.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{oos_xcorr.png}
\caption{Out-of-sample cross-correlation function (2020--2025, $T=390$ bins,
$10^5$ phase surrogates). The observed $r(\tau)$ (black) lies within
the surrogate 95th-percentile envelope (grey shading). The claimed signal at
$\tau = +15$~d (vertical line) is $r = 0.030$ --- below the surrogate 95th
percentile of 0.101.}
\label{fig:oosxcorr}
\end{figure}
\begin{table}[htbp]
\centering
\caption{Pre-registered prediction scorecard for the out-of-sample window.}
\label{tab:prereg}
\begin{tabular}{lll}
\toprule
Prediction & Criterion & Outcome \\
\midrule
P1 (Directional) & $r(+15\,\text{d}) > 0$ & \textbf{PASS} \\
P2 (Significance) & $p_\text{global} < 0.05$ & \textbf{FAIL} \\
P3 (Stability) & std(rolling $r$) $\leq 0.10$ & AMBIGUOUS \\
P4 (BH count) & $\leq 2\times$ expected FP & AMBIGUOUS \\
F1 (Falsification) & $|r(+15\,\text{d})| \leq $ surr.\ 95th & \textbf{TRIGGERED} \\
\bottomrule
\end{tabular}
\end{table}
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{rolling_correlation_oos.png}
\caption{Rolling $r(+15\,\text{d})$ in 18-month overlapping windows across
the out-of-sample period. Error bars are bootstrap 95\% confidence intervals.
The grey horizontal band shows the surrogate 95th percentile.
The signal shows no consistent sign or trend.}
\label{fig:rolling}
\end{figure}
\subsection{Combined 1976--2025 Analysis: Sinusoidal Modulation}
\label{sec:res:combined}
Figure~\ref{fig:combined} shows the annual rolling $r(+15\,\text{d})$ over the
full 1976--2025 record, together with the best-fit sinusoidal envelope.
\begin{figure}[htbp]
\centering
\includegraphics[width=0.90\textwidth]{full_series_with_envelope_fit.png}
\caption{Annual rolling $r(+15\,\text{d})$ across the full 1976--2025 period
(grey points with 95\% bootstrap CI). The sinusoidal best-fit (red curve,
$P = 13.0$~yr) and the constant-mean model (dashed) are nearly indistinguishable;
BIC comparison slightly favours the constant model ($\text{BF} = 0.75$).
The vertical dashed line marks the in-sample/out-of-sample split (2020).}
\label{fig:combined}
\end{figure}
The global surrogate test on the full 1976--2025 window yields $p = 0.010$
($\sigma = 2.57$) at the dominant peak $\tau = -125$~days --- marginally
significant, but at a lag inconsistent with the claimed $+15$~day CR precursor.
The sinusoidal fit (Equations~\ref{eq:modA}--\ref{eq:bf}) does \emph{not} prefer
$\mathcal{M}_B$ over $\mathcal{M}_A$ with the corrected seismic metric:
\begin{itemize}
\item Best-fit period: $P = 13.0$~years;
\item Amplitude: $A = 0.047$;
\item $\Delta\text{BIC} = -0.57$ (negative = $\mathcal{M}_A$ constant preferred);
\item Bayes factor: $\text{BF}_{BA} = 0.75$.
\end{itemize}
A Bayes factor of 0.75 is less than 1.0, indicating that the constant (no
modulation) model is marginally preferred over the sinusoidal model.
With the correct seismic energy metric, the oscillatory rolling cross-correlation
previously attributed to solar-cycle modulation is no longer strongly supported;
the apparent modulation in the old results was partly an artefact of the invalid
summed-$M_W$ metric, which amplified the solar-cycle confound.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Discussion}
\label{sec:discussion}
\subsection{Why Does the Raw Correlation Appear So Strong?}
The raw $r \approx 0.31$ reported by \citet{Homola2023} at $\tau = +15$~days
(reduced to $r = 0.081$ with the correct seismic energy metric) and the naive
$p \sim 10^{-72}$ are products of compounding statistical errors:
(i) using a physically invalid seismic metric (direct sum of logarithmic $M_W$
values), which artificially inflated the solar-cycle variation in the seismic series,
(ii) treating autocorrelated time series as independent observations,
(iii) failing to account for the shared solar-cycle trend driving both CR flux and
seismicity, and (iv) not correcting for scanning over 401 lag values.
The solar cycle is the key confounder.
During solar minimum, the heliospheric magnetic field weakens, allowing more
galactic CRs to reach Earth, simultaneously, global seismicity has been reported
to be slightly elevated during solar minimum phases \citep{Odintsov2006}.
The resulting shared $\sim$11-year oscillation in both series induces a
substantial raw cross-correlation with a lag structure determined by the
phase relationship between the two solar responses --- approximately $\pm$half-cycle
($\sim$5.5 years $\approx 2{,}000$ days), consistent with the dominant raw peak
at $\tau = -525$~days.
\subsection{Physical Plausibility of the Claimed Mechanism}
Even setting aside the statistical issues, the proposed mechanism faces severe
physical constraints.
The total ionisation dose from galactic CRs at the surface is
$\sim 0.3$~mGy/year \citep{Aplin2005} --- far too small to
transfer meaningful mechanical energy to fault zones, which require shear-stress
changes of order $\sim$0.01--1~MPa to trigger earthquakes.
Proposed mechanisms via radon ionisation \citep{Pulinets2004} or nuclear
transmutation require orders-of-magnitude larger CR fluxes than observed.
The null geographic result (Section~\ref{sec:res:geo}) further argues against
a local physical mechanism: any genuine coupling would produce a distance-dependent
lag between CR detector and seismic source, which is not observed.
\subsection{Comparison with Prior Replication Attempts}
Independent replication attempts of the \citet{Homola2023} result have been
limited.
\citet{Urata2018} found similarly inflated correlations using Japanese CR stations
and reported that detrending removed most of the signal.
Our analysis is the first to combine all three of: IAAFT surrogate testing,
solar-cycle-aware detrending, geographic localisation scanning, and pre-registered
out-of-sample validation.
\subsection{Limitations}
Several limitations should be acknowledged:
\begin{enumerate}
\item The OOS window (2020--2025) encompasses Solar Cycle~25, a period of
rising solar activity after the deep minimum of Cycle~24.
The absence of a solar minimum in this window limits statistical power.
\item Seismicity is not stationary; major seismic sequences (e.g.\ Tonga 2022)
can inflate the seismic metric in individual bins.
\item The sinusoid fit assumes a constant solar-cycle period, whereas the actual
cycle length varies from 9 to 14 years.
\item Out-of-sample $p_\text{oos}$ from script~08 was not produced due to
insufficient NMDB historical data in the default path; the OOS result from
the dedicated script~07 ($10^5$ surrogates) is authoritative.
\end{enumerate}
%%%% ═══════════════════════════════════════════════════════════════════════════
\section{Conclusions}
\label{sec:conclusions}
We have conducted a rigorous, pre-registered replication of the claimed
cosmic-ray/earthquake correlation from \citet{Homola2023} using 49 years of
data from 44 neutron monitors, the USGS global catalogue, and SILSO sunspot
numbers.
Our principal findings are:
\begin{enumerate}
\item The raw cross-correlation $r(+15\,\text{d}) = 0.081$ (corrected seismic
energy metric) is modest and misleading; it is driven by a shared
$\sim$10-year solar-cycle modulation of both CR flux and global seismicity,
not by a physical CR$\to$seismic mechanism.
The larger value ($r \approx 0.31$) reported by \citet{Homola2023} results
from the physically invalid practice of directly summing logarithmic $M_W$
values rather than the underlying energies.
\item After solar-cycle detrending, $r(+15\,\text{d})$ falls to $0.027$ (HP),
$0.030$ (STL), or $0.037$ (sunspot regression) --- all within the surrogate
null distribution and consistent with zero.
\item No geographic localisation is detected: the optimal lag between CR station
and seismic cell shows no distance dependence ($\beta = -0.45$~d/1000~km,
$p = 0.21$), inconsistent with a local propagation mechanism.
\item A pre-registered out-of-sample test on 2020--2025 yields
$r(+15\,\text{d}) = 0.030$ and $p_\text{global} = 0.100$, entirely
consistent with noise.
\item The 49-year annual rolling correlation timeseries is not significantly
better described by a sinusoid than a constant (Bayes factor 0.75, favoring
constant); the previous sinusoidal modulation finding was an artefact of the
invalid seismic metric.
\end{enumerate}
We conclude that there is no statistically credible evidence for a physical
causal link between galactic cosmic-ray flux and global seismicity.
%%%% ═══════════════════════════════════════════════════════════════════════════
\section*{Data Availability}
All analysis code, pre-registration documents, intermediate results, and figures
are publicly available at
\url{https://github.com/pingud98/cosmicraysandearthquakes} under the MIT licence.
Raw data are freely accessible from their respective providers:
NMDB (\url{https://www.nmdb.eu}),
USGS (\url{https://earthquake.usgs.gov/fdsnws/event/1/}),
and SIDC (\url{https://www.sidc.be/silso/datafiles}).
\section*{Acknowledgements}
The author thanks the operators of the NMDB network for maintaining open-access
neutron monitor data, and the USGS Earthquake Hazards Programme for the FDSN
catalogue service.
GPU computations were performed on an NVIDIA Tesla M40.
%%%% ═══════════════════════════════════════════════════════════════════════════
\bibliographystyle{plainnat}
\bibliography{refs}
\end{document}