citests.tests.ml_based_tests

Machine-learning-based CI tests (survey family): GCM, WGCM, PCM.

All three tests dispatch through the optional pycomets package using random-forest regression by default. They require the [ml] extra; pycomets is lazy-loaded so module import does not fail when the extra is missing.

Classes

GCM

GCM test via pycomets (Shah & Peters, 2020).

PCM

Projected Covariance Measure test via pycomets (Lundborg et al., 2022).

WGCM

Weighted GCM test via pycomets (Scheidegger et al., 2022).

Module Contents

class citests.tests.ml_based_tests.GCM(data: numpy.ndarray, **kwargs: Any)[source]

Bases: citests.tests.base.CITKTest

GCM test via pycomets (Shah & Peters, 2020).

Uses RF regression by default (pycomets default). In-sample residuals, no cross-fitting.

Initialise the test and (optionally) load a JSON p-value cache.

Parameters:
  • data – Sample matrix in shape (n, p).

  • cache_path – Optional path to a JSON cache file used to memoise p-values across calls. The cache is keyed by (data_hash, method_name, parameters_hash) and stamped with format_version so v0.1.0 caches can be detected and invalidated by future releases.

Raises:

TypeError – If kwargs contains keys outside cls.accepted_kwargs and cls._protocol_kwargs.

supported_dtypes
class citests.tests.ml_based_tests.PCM(data: numpy.ndarray, **kwargs: Any)[source]

Bases: citests.tests.base.CITKTest

Projected Covariance Measure test via pycomets (Lundborg et al., 2022).

Uses RF regression with sample splitting (pycomets default).

Initialise the test and (optionally) load a JSON p-value cache.

Parameters:
  • data – Sample matrix in shape (n, p).

  • cache_path – Optional path to a JSON cache file used to memoise p-values across calls. The cache is keyed by (data_hash, method_name, parameters_hash) and stamped with format_version so v0.1.0 caches can be detected and invalidated by future releases.

Raises:

TypeError – If kwargs contains keys outside cls.accepted_kwargs and cls._protocol_kwargs.

supported_dtypes
class citests.tests.ml_based_tests.WGCM(data: numpy.ndarray, **kwargs: Any)[source]

Bases: citests.tests.base.CITKTest

Weighted GCM test via pycomets (Scheidegger et al., 2022).

Implements WGCM.est: sample-split (50/50) weight estimation. Nuisance regressions E[Y|Z] and E[X|Z] use RF; the weight regression E[(rX·rY)|Z] uses pycomets’s KRR default. The cache fingerprint "pycomets_WGCM_RF" reflects the two RF nuisance choices but not the KRR weight choice — the v0.1.0 contract freezes both.

Empty conditioning set falls back to GCM (the unweighted reduction when there is no Z to localise weights on), so the test is safe to use inside PC at depth 0.

Limitations (v0.1.0)

  • Results are not bit-reproducible across processes. pycomets’s WGCM.test uses rng=np.random.default_rng() as a mutable default arg, and its RF wrapper does not surface random_state. Type-I and power are statistically valid; only the per-call p-value varies between fresh Python processes.

Initialise the test and (optionally) load a JSON p-value cache.

param data:

Sample matrix in shape (n, p).

param cache_path:

Optional path to a JSON cache file used to memoise p-values across calls. The cache is keyed by (data_hash, method_name, parameters_hash) and stamped with format_version so v0.1.0 caches can be detected and invalidated by future releases.

raises TypeError:

If kwargs contains keys outside cls.accepted_kwargs and cls._protocol_kwargs.

supported_dtypes