Title: | Dummy Data for Dummies |
---|---|
Description: | Allows you to specify and sample from a Bayesian Network (a.k.a. a parametric Directed Acyclic Graph, or pDAG). |
Authors: | William Hulme [aut, cre] |
Maintainer: | William Hulme <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9000 |
Built: | 2024-12-18 16:30:36 UTC |
Source: | https://github.com/wjchulme/dd4d |
Complement of %in%
. Returns the elements of x
that are
not in y
.
x %ni% y
x %ni% y
x |
a vector |
y |
a vector |
expr
.Get all functions that are used in a formula expr
.
all_funs(expr)
all_funs(expr)
expr |
a formula object |
Converts list to data frame which is a bit easier to work with, and embellishes with some useful columns. The function performs a few checks on the list, for instance to make sure the graph is acyclic and that variables used in the expressions are defined elsewhere or already known. The known_variables argument is for passing a character vector of variables names for variables that are already defined externally in a given dataset, which can be passed to bn_simulate whilst variable_formula is the variable name itself, this is to help with the bn_simulate function it doesn't actually lead to self-dependence (eg var depends on var)
bn_create(list, known_variables = NULL)
bn_create(list, known_variables = NULL)
list |
of node objects, created by |
known_variables |
character vector of variables that will be provided by an external dataset |
data.frame
Specify a variable node in the network
bn_node(variable_formula, missing_rate = ~0, keep = TRUE, needs = character())
bn_node(variable_formula, missing_rate = ~0, keep = TRUE, needs = character())
variable_formula |
A RHS-only formula specified how to simulate that variable. Use |
missing_rate |
A RHS-only formula. This specifies how missing values should be distributed.
Can use a simple proportion such as |
keep |
logical. Should this variable be kept in the final simulated output or not |
needs |
A character vector of variables. If any variables given in |
Object of class node
and list
.
bn_node(variable_formula = ~floor(rnorm(n=..n, mean=60, sd=15)))
bn_node(variable_formula = ~floor(rnorm(n=..n, mean=60, sd=15)))
Plot bn_df object
bn_plot(bn_df, connected_only = FALSE)
bn_plot(bn_df, connected_only = FALSE)
bn_df |
initialised bn_df object, with simulation instructions. Created with |
connected_only |
logical. Only plot nodes that are connected to other nodes |
plot
Simulate data from bn_df object
bn_simulate(bn_df, known_df = NULL, pop_size, keep_all = FALSE, .id = NULL)
bn_simulate(bn_df, known_df = NULL, pop_size, keep_all = FALSE, .id = NULL)
bn_df |
initialised bn_df object, with simulation instructions. Created with |
known_df |
data.frame. Optional data.frame containing upstream variables used for simulation. |
pop_size |
integer. The size of the dataset to be created. |
keep_all |
logical. Keep all simulated variables or only keep those specified by |
.id |
character. Name of id column placed at the start of the dataset. If NULL (default) then no id column is created. |
tbl
Converts a bn_df object to a dagitty object
bn2dagitty(bn_df)
bn2dagitty(bn_df)
bn_df |
initialised bn_df object, with simulation instructions. Created with |
dagitty object
Random categorical variables
rcat(n, levels, p)
rcat(n, levels, p)
n |
number of samples |
levels |
vector of categories to sample from |
p |
vector of probabilities |
a character
vector
#' rcat(n=10, levels=c("a","b"), p=c(0.2,0.8))
#' rcat(n=10, levels=c("a","b"), p=c(0.2,0.8))
Random factor variables
rfactor(n, levels, p)
rfactor(n, levels, p)
n |
number of samples |
levels |
vector of categories to sample from |
p |
vector of probabilities |
a factor
vector
#' rfactor(n=10, levels=c("a","b"), p=c(0.2,0.8))
#' rfactor(n=10, levels=c("a","b"), p=c(0.2,0.8))