simdesign.utils.stats

This module provides statistics related utility methods.

simdesign.utils.stats.random_truncated_lognormal(size, mu, sigma, lower, upper)[source]

Sample from a truncated log-normal distribution.

Parameters:
  • size (int) – Sample size (number of random samples to generate).

  • mu (np.ndarray) – Mean of the log-normal distribution in log-space.

  • sigma (np.ndarray) – Standard deviation of the log-normal distribution in log-space.

  • lower (np.ndarray) – Lower truncation limit of the distribution.

  • upper (np.ndarray) – Upper truncation limit of the distribution.

Returns:

An array of random samples from the truncated log-normal distribution.

Return type:

np.ndarray

Raises:
  • Exception – If sigma is less than or equal to 0.

  • Exception – If upper is less than or equal to lower.

Notes

The truncated log-normal distribution is sampled by generating random probabilities within the cumulative distribution function (CDF) range defined by the lower and upper truncation limits.

Examples

>>> random_truncated_lognormal(
        100, mu=1.0, sigma=0.5, lower=2.0, upper=10.0)
array([2.35, 4.17, 8.22, ...])  # Example output
simdesign.utils.stats.truncated_lognormal_cdf_inv(cdf, mu, sigma, lower, upper)[source]

Inverts the truncated (lower-upper) log-normal CDF.

Parameters:
  • cdf (np.ndarray) – The value of the cumulative distribution.

  • mu (np.ndarray) – Mean of the log-normal PDF.

  • sigma (np.ndarray) – Standard deviation of the log-normal PDF.

  • lower (np.ndarray) – Lower truncation limit.

  • upper (np.ndarray) – Upper truncation limit.

Returns:

Real X, the argument of the PDF with the given CDF.

Return type:

np.ndarray

simdesign.utils.stats.random_multivariate_normal(m, k0, size, epsilon=0.0)[source]

Sample from a multivariate normal distribution.

It follows section A.2 Gaussian Identities of the book Gaussian Processes for Machine Learning.

Parameters:
  • m (np.ndarray) – Mean vector.

  • k0 (np.ndarray) – Covariance matrix.

  • size (int) – Sample size.

  • epsilon (float, optional) – Perturbation value, by default 0.0.

Returns:

Sample from a multivariate normal distribution with mean m and covariance k_0.

Return type:

np.ndarray

Raises:

ValueError – non-positive definite covariance matrix.

References

https://juanitorduz.github.io/multivariate_normal/ https://gaussianprocess.org/gpml/chapters/

Notes

In practice, it may be necessary to add a small multiple of the identity matrix to the covariance matrix for numerical reasons. This is because the eigenvalues of the matrix k0 can decay very rapidly and without this stabilization the Cholesky decomposition fails. The effect on the generated samples is to add additional independent noise of variance. From the context can usually be chosen to have inconsequential effects on the samples, while ensuring numerical stability (A.2 Gaussian Identities). For this purpose, epsilon value can be changed.

simdesign.utils.stats.random_choice(q, size, p=None)[source]

Randomly selects elements from the input list ‘q’ with replacement, optionally using specified probabilities.

Parameters:
  • q (List[str]) – Input list of strings from which elements will be randomly chosen.

  • size (int) – Number of elements to be selected.

  • p (List[float], optional) – The probabilities associated with each element in ‘q’. If None, the probabilities are assumed to be uniform. By default None.

Returns:

An array containing the randomly selected elements from ‘q’.

Return type:

np.ndarray

Examples

>>> q = ['B01', 'B02', 'B03', 'B04', 'B05']
>>> size = 3
>>> random_choice(q, size)
array(['B03', 'B02', 'B05'], dtype='<U4')
simdesign.utils.stats.random_uniform(size, lower, upper)[source]

Generates an array of random samples uniformly distributed within a specified range.

Parameters:
  • size (int) – The number of random samples to generate.

  • lower (float) – The lower bound of the uniform distribution.

  • upper (float) – The upper bound of the uniform distribution.

Returns:

An array of random samples uniformly distributed between lower and upper.

Return type:

np.ndarray

Notes

The function uses the inverse cumulative distribution function (CDF) of the uniform distribution to generate samples based on random probabilities.

Examples

>>> random_uniform(5, 0, 1)
array([0.123, 0.456, 0.789, 0.234, 0.567])  # Example output
simdesign.utils.stats.random_multivariate_truncated_lognormal(size, rho, lower_bound, upper_bound, theta, std_ln)[source]

Sample from a multivariate truncated log-normal distribution.

Parameters:
  • size (int) – Number of samples to generate.

  • rho (np.ndarray) – Correlation matrix (n x n) describing the relationships between variables.

  • lower_bound (np.ndarray) – Lower bounds for the truncated log-normal distributions.

  • upper_bound (np.ndarray) – Upper bounds for the truncated log-normal distributions.

  • theta (np.ndarray) – Medians of the log-normal distributions.

  • std_ln (np.ndarray) – Logarithmic standard deviations of the log-normal distributions.

Returns:

Samples from the multivariate truncated log-normal distribution (size x n).

Return type:

np.ndarray

Notes

~LN(ln(theta), sigma)

simdesign.utils.stats.get_time_based_seed()[source]

Return a random seed derived from the current date and time.

Returns:

Summation of numbers in time date.

Return type:

int