citests.tests.ml_based_tests¶
Machine-learning-based CI tests (survey family): GCM, WGCM, PCM.
All three tests dispatch through the optional pycomets package
using random-forest regression by default. They require the [ml]
extra; pycomets is lazy-loaded so module import does not fail
when the extra is missing.
Classes¶
Module Contents¶
- class citests.tests.ml_based_tests.GCM(data: numpy.ndarray, **kwargs: Any)[source]¶
Bases:
citests.tests.base.CITKTestGCM test via pycomets (Shah & Peters, 2020).
Uses RF regression by default (pycomets default). In-sample residuals, no cross-fitting.
Initialise the test and (optionally) load a JSON p-value cache.
- Parameters:
data – Sample matrix in shape
(n, p).cache_path – Optional path to a JSON cache file used to memoise p-values across calls. The cache is keyed by
(data_hash, method_name, parameters_hash)and stamped withformat_versionso v0.1.0 caches can be detected and invalidated by future releases.
- Raises:
TypeError – If
kwargscontains keys outsidecls.accepted_kwargsandcls._protocol_kwargs.
- supported_dtypes¶
- class citests.tests.ml_based_tests.PCM(data: numpy.ndarray, **kwargs: Any)[source]¶
Bases:
citests.tests.base.CITKTestProjected Covariance Measure test via pycomets (Lundborg et al., 2022).
Uses RF regression with sample splitting (pycomets default).
Initialise the test and (optionally) load a JSON p-value cache.
- Parameters:
data – Sample matrix in shape
(n, p).cache_path – Optional path to a JSON cache file used to memoise p-values across calls. The cache is keyed by
(data_hash, method_name, parameters_hash)and stamped withformat_versionso v0.1.0 caches can be detected and invalidated by future releases.
- Raises:
TypeError – If
kwargscontains keys outsidecls.accepted_kwargsandcls._protocol_kwargs.
- supported_dtypes¶
- class citests.tests.ml_based_tests.WGCM(data: numpy.ndarray, **kwargs: Any)[source]¶
Bases:
citests.tests.base.CITKTestWeighted GCM test via pycomets (Scheidegger et al., 2022).
Implements WGCM.est: sample-split (50/50) weight estimation. Nuisance regressions
E[Y|Z]andE[X|Z]use RF; the weight regressionE[(rX·rY)|Z]uses pycomets’s KRR default. The cache fingerprint"pycomets_WGCM_RF"reflects the two RF nuisance choices but not the KRR weight choice — the v0.1.0 contract freezes both.Empty conditioning set falls back to
GCM(the unweighted reduction when there is no Z to localise weights on), so the test is safe to use inside PC at depth 0.Limitations (v0.1.0)¶
Results are not bit-reproducible across processes. pycomets’s
WGCM.testusesrng=np.random.default_rng()as a mutable default arg, and its RF wrapper does not surfacerandom_state. Type-I and power are statistically valid; only the per-call p-value varies between fresh Python processes.
Initialise the test and (optionally) load a JSON p-value cache.
- param data:
Sample matrix in shape
(n, p).- param cache_path:
Optional path to a JSON cache file used to memoise p-values across calls. The cache is keyed by
(data_hash, method_name, parameters_hash)and stamped withformat_versionso v0.1.0 caches can be detected and invalidated by future releases.- raises TypeError:
If
kwargscontains keys outsidecls.accepted_kwargsandcls._protocol_kwargs.
- supported_dtypes¶