models package

Submodules

models.Copula module

This file contains functions to generate synthetic data using copula-based methods.

Functions:

generate_data_copula: Function to fit a Gaussian copula to original data and generate new samples.
generate_synthetic_data_for_copula: Function to run comparison of original and generated data distributions using copula.
generate_future_data_copula: Function to generate future data using copula for imputation.
impute_missing_data_copula: Function to impute missing values using copula-based generated data.

Dependencies:

numpy
pandas
GaussianMultivariate (copulas)
data_preprocess
data_desc

Usage:

To generate synthetic data using copula-based methods, use the functions in this file.

models.Copula.generate_data_copula(original_data, n_samples=10000, hyperparameters=None)

Function to fit a Gaussian copula to original data and generate new samples.

Parameters:

original_datanumpy array: Original data to fit the copula.
n_samplesint, optional (default=10000): Number of samples to generate.
hyperparametersdict, optional: Hyperparameters for copula fitting.

Returns:

numpy array: Generated synthetic

models.Copula.generate_future_data_copula(df, generated_df, future_timestamps, hyperparameters=None)

Function to generate future data using copula method

Parameters:

dfpandas DataFrame: DataFrame containing original data.
generated_dfpandas DataFrame: DataFrame containing generated data.
future_timestampslist: List of future timestamps.
hyperparametersdict, optional: Hyperparameters for copula fitting.

Returns:

pandas DataFrame: DataFrame containing the generated future data.

models.Copula.generate_synthetic_data_for_copula(df, n_samples=10000, hyperparameters=None)

Function to run comparison of original and generated data distributions using copula.

Parameters:

dfpandas DataFrame: DataFrame containing original data.
n_samplesint, optional (default=10000): Number of samples to generate.
hyperparametersdict, optional: Hyperparameters for copula fitting.

Returns:

pandas DataFrame: DataFrame containing the generated synthetic data.

models.Copula.impute_missing_data_copula(df, hyperparameters=None)

Function to impute missing values using copula-based generated data.

Parameters:

dfpandas DataFrame: DataFrame containing original data.
hyperparametersdict, optional: Hyperparameters for copula fitting.

Returns:

pandas DataFrame: DataFrame containing the imputed data.

models.ITS module

This module contains functions to generate synthetic data using Inverse Transform Sampling (ITS).

Functions:

generate_data_inverse_transform: Function to generate synthetic data using Inverse Transform Sampling.
generate_synthetic_data_for_ITS: Function to generate synthetic data using ITS and check for acceptable error.
generate_future_data_ITS: Function to generate synthetic data for the future period using ITS.
impute_missing_data_ITS: Function to impute missing data using ITS.

Dependencies:

numpy
pandas
scipy
data_preprocess
streamlit

models.ITS.generate_data_inverse_transform(data, n_samples=10000)

Function to generate synthetic data using Inverse Transform Sampling.

Parameters:

datanumpy array: Original data to generate synthetic data from.
n_samplesint, optional: Number of samples to generate. Default is 10000.

Returns:

numpy array: Generated synthetic data.

models.ITS.generate_future_data_ITS(df, generated_df, future_timestamps)

Function to generate synthetic data for the future period using ITS.

Parameters:

dfpandas DataFrame: Input DataFrame with original data.
generated_dfpandas DataFrame: DataFrame containing generated synthetic data.
future_timestampsnumpy array: Timestamps for the future period.

Returns:

pandas DataFrame: DataFrame containing the generated future data.

models.ITS.generate_synthetic_data_for_ITS(df, n_samples=10000)

Function to generate synthetic data using ITS and check for acceptable error.

Parameters:

dfpandas DataFrame: Input DataFrame with original data.
n_samplesint, optional: Number of samples to generate. Default is 10000.

Returns:

pandas DataFrame: DataFrame containing the generated synthetic data.

models.ITS.impute_missing_data_ITS(df)

Function to impute missing data using Inverse Transform Sampling.

Parameters:

dfpandas DataFrame: Input DataFrame with missing data.

Returns:

pandas DataFrame: DataFrame with imputed missing data.

models.Imputation module

This module is used to find best imputation method for the given dataset. The module contains functions to train the imputation models and generate synthetic data using the trained models.

Imputation methods:

Forward Fill
Backward Fill
Linear Interpolation
KNN Imputer
MICE Imputer
Random Forest Imputer
Iterative Imputer

Functions:

generate_synthetic_data_for_imputation: Function to generate synthetic data using the specified imputation method.
impute_missing_data_imputation: Function to impute missing data using the specified imputation method.

Dependencies:

pandas
numpy
scikit-learn

models.Imputation.generate_synthetic_data_for_imputation(original_data, method='ffill')

Function to generate synthetic data using the specified imputation method.

Parameters:

original_datapandas DataFrame: Input DataFrame with original data.
methodstr, optional: Imputation method to use. Default is ‘ffill’.

Returns:

pandas DataFrame: DataFrame containing the synthetic data generated using the specified imputation method.

models.Imputation.impute_missing_data_imputation(original_data, method='ffill')

Function to impute missing data

Parameters:

original_datapandas DataFrame: Input DataFrame with original data.
methodstr, optional: Imputation method to use. Default is ‘ffill’.

Returns:

pandas DataFrame: DataFrame containing the imputed data using the KDE method.

models.KDE module

This module contains functions to generate synthetic data using Kernel Density Estimation (KDE) method.

Functions:

train_kde_model_with_hyperparameter_tuning: Function to train the KDE model on the data with hyperparameter tuning.
generate_synthetic_data_for_KDE: Function to generate synthetic data using the KDE model.
generate_future_data_KDE: Generate synthetic future data using KDE.
impute_missing_data_KDE: Impute missing data in the DataFrame using KDE.

Dependencies:

numpy
pandas
sklearn
streamlit
data_preprocess

models.KDE.generate_future_data_KDE(df, generated_df, future_timestamps, bandwidths=None)

Generate synthetic future data using KDE.

Parameters:

dfpandas DataFrame: Input DataFrame with original data.
generated_dfpandas DataFrame: DataFrame to store the generated synthetic data.
future_timestampspandas DatetimeIndex: Index containing future timestamps for data generation.
bandwidthsdict, optional: Dictionary of bandwidths for each column. If None, bandwidths will be determined using hyperparameter tuning.

Returns:

pandas DataFrame: DataFrame containing the generated future data.

models.KDE.generate_synthetic_data_for_KDE(df, n_samples=10000, bandwidths=None)

Function to generate synthetic data using the KDE model.

Parameters:

dfpandas DataFrame: Input DataFrame with original data.
n_samplesint, optional: Number of synthetic samples to generate for each column. Default is 10000.
bandwidthsdict, optional: Dictionary of bandwidths for each column. If None, bandwidths will be determined using hyperparameter tuning.

Returns:

pandas DataFrame: DataFrame containing the generated synthetic data.
dict: Dictionary containing the best bandwidths for each column.

models.KDE.impute_missing_data_KDE(df, bandwidths=None)

Impute missing data in the DataFrame using KDE.

Parameters:

dfpandas DataFrame: Input DataFrame with missing values.
bandwidthsdict, optional: Dictionary of bandwidths for each column. If None, bandwidths will be determined using hyperparameter tuning.

Returns:

pandas DataFrame: DataFrame with missing values imputed using KDE.

models.KDE.train_kde_model_with_hyperparameter_tuning(df, bandwidths=None)

Train the KDE model on the data with hyperparameter tuning.

Parameters:

dfpandas DataFrame: Input DataFrame with original data.
bandwidthsdict, optional: Dictionary of bandwidths for each column. If None, bandwidths will be determined using hyperparameter tuning.

Returns:

dict: Dictionary containing the trained KDE models for each column.
dict: Dictionary containing the best bandwidths for each column.

models.MonteCarlo module

Monte Carlo Simulation

Monte Carlo Simulation is a method used to generate synthetic data. It is based on the principle of random sampling and is used to estimate the distribution of a variable by generating a large number of random samples.

The Monte Carlo Simulation model generates synthetic data by sampling from a normal distribution with a given mean and standard deviation.

Functions:

tune_parameters: Function to tune the parameters of the Monte Carlo Simulation model.
generate_data_monte_carlo: Function to generate synthetic data using the Monte Carlo Simulation model.
generate_synthetic_data_for_MCS: Function to generate synthetic data for all columns in the DataFrame using the Monte Carlo Simulation model.
generate_future_data_MCS: Function to generate future synthetic data using the Monte Carlo Simulation model.
impute_missing_data_MC: Function to impute missing data using the Monte Carlo Simulation model.

Dependencies:

numpy
pandas
scipy
skopt
streamlit
data_preprocess

Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) is a method used to generate synthetic data by sampling from a probability distribution. It is based on the Markov chain principle, where the next state of the chain depends only on the current state.

The Markov Chain Monte Carlo model generates synthetic data by sampling from a normal distribution with a given mean and standard deviation.

Functions:

generate_data_mcmc: Function to generate synthetic data using the Markov Chain Monte Carlo (MCMC) method.
generate_synthetic_data_for_MCMC: Function to generate synthetic data for all columns in the DataFrame using the Markov Chain Monte Carlo (MCMC) method.
generate_future_data_MCMC: Function to generate future synthetic data using the Markov Chain Monte Carlo (MCMC) method.
impute_missing_data_MCMC: Function to impute missing data using the Markov Chain Monte Carlo (MCMC) method.

Dependencies:

numpy
pandas
streamlit

models.MonteCarlo.generate_data_mcmc(initial_state, proposal_std, n_samples=10000, burn_in=1000)

Function to generate synthetic data using the Markov Chain Monte Carlo (MCMC) method.

Parameters:

initial_statefloat: Initial state of the Markov chain.
proposal_stdfloat: Standard deviation of the proposal distribution.
n_samplesint: Number of samples to generate.

Returns:

numpy array: Array containing the generated synthetic data.

models.MonteCarlo.generate_data_monte_carlo(mean, std, n_samples=10000)

Function to generate synthetic data using the Monte Carlo Simulation model.

Parameters:

meanfloat: Mean of the data distribution.
stdfloat: Standard deviation of the data distribution.
n_samplesint: Number of samples to generate.

Returns:

numpy array: Array containing the generated synthetic data.

models.MonteCarlo.generate_future_data_MCMC(df, generated_df, future_timestamps)

Function to generate future synthetic data using the Markov Chain Monte Carlo (MCMC) method.

Parameters:

dfpandas DataFrame: DataFrame containing the original data.
generated_dfpandas DataFrame: DataFrame containing the generated synthetic data.
future_timestampsnumpy array: Array containing the future timestamps.

Returns:

pandas DataFrame: DataFrame containing the generated future synthetic data.

models.MonteCarlo.generate_future_data_MCS(df, generated_df, future_timestamps)

Function to generate future synthetic data using the Monte Carlo Simulation model.

Parameters:

dfpandas DataFrame: DataFrame containing the original data.
generated_dfpandas DataFrame: DataFrame containing the generated synthetic data.
future_timestampsnumpy array: Array containing the future timestamps.

Returns:

pandas DataFrame: DataFrame containing the generated future synthetic data.

models.MonteCarlo.generate_synthetic_data_for_MCMC(df, n_samples=10000)

Function to generate synthetic data for all columns in the DataFrame using the Markov Chain Monte Carlo (MCMC) method.

Parameters:

dfpandas DataFrame: DataFrame containing the original data.
n_samplesint: Number of samples to generate.

Returns:

pandas DataFrame: DataFrame containing the generated synthetic data.

models.MonteCarlo.generate_synthetic_data_for_MCS(df, n_samples=10000)

Function to generate synthetic data for all columns in the DataFrame using the Monte Carlo Simulation model.

Parameters:

dfpandas DataFrame: DataFrame containing the original data.
n_samplesint: Number of samples to generate.

Returns:

pandas DataFrame: DataFrame containing the generated synthetic data.

models.MonteCarlo.impute_missing_data_MC(df)

Function to impute missing data using the Monte Carlo Simulation model.

Parameters:

dfpandas DataFrame: DataFrame containing the original data.

Returns:

pandas DataFrame: DataFrame containing the imputed synthetic data.

models.MonteCarlo.impute_missing_data_MCMC(df)

Function to impute missing data using the Markov Chain Monte Carlo (MCMC) method.

Parameters:

dfpandas DataFrame: DataFrame containing the original data.

Returns:

pandas DataFrame: DataFrame containing the imputed synthetic data.

models.MonteCarlo.objective(params, original_data)

Objective function for hyperparameter tuning of the Monte Carlo Simulation model.

Parameters:

paramstuple: Tuple containing the mean and standard deviation of the data distribution.
original_datanumpy array: Array containing the original data.

Returns:

float: Kolmogorov-Smirnov statistic between the original and generated data distributions.

models.MonteCarlo.tune_parameters(original_data)

Function to tune the parameters of the Monte Carlo Simulation model.

Parameters:

original_datanumpy array: Array containing the original data.

Returns:

tuple: Tuple containing the optimal mean and standard deviation for the Monte Carlo Simulation model.

models package

Submodules

models.Copula module

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

models.ITS module

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

models.Imputation module

Parameters:

Returns:

Parameters:

Returns:

models.KDE module

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

models.MonteCarlo module

Monte Carlo Simulation

Markov Chain Monte Carlo

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Module contents