Release notes

0.6.4 (Dec 6, 2024)

Fixes

Fix with_regex_replacement_table mutator not behaving correctly when pattern only matches partly

0.6.3 (Dec 4, 2024)

Features

Improve randomized selection of replacements in mutators using replacement tables

Fixes

Fix phonetic replacement rules not being matched correctly against original data when the desired pattern occurs in multiple places

0.6.2 (Nov 27, 2024)

Features

Add placeholder option to with_generator for inserting generated values

0.6.1 (Nov 15, 2024)

Fixes

Fix indexing behavior in dfbitlookup when using NumPy data types

0.6.0 (Nov 15, 2024)

Breaking changes

Change mutator type definition from Callable[[list[pd.Series]], list[pd.Series]] to Callable[[list[pd.Series], Optional[float]], list[pd.Series]] to delegate the selection of rows to mutate to the mutators themselves
Generator and Mutator type definitions are now exported at the top level of the module
Replace D option in favor of d for unit parameter in with_datetime_offset
Remove strategy parameter from with_missing_value
Remove rng parameter from mutate_data_frame
Remove with_edit in favor of with_group

Features

with_replacement_table, with_regex_replacement_table and with_phonetic_replacement_table now favor rare replacements over common ones
Add rng parameter to with_function, with_lowercase, with_missing_value, with_noop, with_repeat, with_uppercase
with_permute now permutes series contents in a way that values are guaranteed to not remain in their original series
Add days, hours, minutes and seconds to list of permitted unit values for with_datetime_offset
Add list[str] as option to charset parameter of with_cldr_keymap_file, with_insert and with_substitute

Fixes

When providing a list of mutators to a column in mutate_data_frame, all mutators are now applied to all rows instead of with a 1 / mutator_count probability
Fix with_regex_replacement_table interpreting numbers in pattern and substitution columns as belonging to a named capture group

Documentation

Use section-style navigation instead of tabs in Gecko docs

0.5.2 (Nov 5, 2024)

Features

Add generator.with_group for grouping multiple (weighted) generators together

Internal

Remove automated benchmarks

0.5.1 (Oct 30, 2024)

Features

Add the option to use data frames for all generators and mutators that accept paths to CSV files

0.5.0 (Oct 23, 2024)

Breaking changes

to_data_frame has a new call signature that ensures that it's consistent with mutate_data_frame

df_generated = generator.to_data_frame(
    [
        (("fruit", "type"), generator.from_multicolumn_frequency_table(
            "fruit-types.csv",
            value_columns=["fruit", "type"],
            freq_column="count",
            rng=rng,
        )),
        ("weight", generator.from_uniform_distribution(
            low=20,
            high=100,
            rng=rng,
        )),
    ], 
    10_000
)

Features

Add mutator.with_group for grouping multiple mutators together
Add support for Python 3.13

Documentation

Fix creation and modification timestamps in documentation

0.4.2 (Sep 20, 2024)

Fixes

Fix NaNs produced by generators and mutators that take in CSV files with empty cells

0.4.1 (Sep 12, 2024)

Features

Add inline and reverse flags to with_replacement_table mutator

0.4.0 (Sep 10, 2024)

Breaking changes

mutate_data_frame has a new call signature which ensures the order of mutation operations

df_mutated = mutator.mutate_data_frame(
    df_original,
    [
        ("gender", (0.1, mutator.with_categorical_values(
            "./gender.csv",
            value_column="gender",
            rng=rng
        ))),
        (("given_name", "last_name"), (0.05, mutator.with_permute())),
        ("postcode", [
            mutator.with_delete(rng=rng),
            mutator.with_substitute(charset="0123456789", rng=rng)
        ])
    ],
    rng=rng
)

Features

Add generator.from_datetime_range for generating dates and times
Add mutator.with_lowercase and mutator.with_uppercase for case conversions
Add mutator.with_datetime_offset for applying arbitrary offsets to dates and times
Add mutator.with_generator for appending, prepending or replacing data in a series with values from a generator
Add mutator.with_regex_replacement_table for regex-based substitutions
Add mutator.with_repeat for repeated values

Fixes

Fix mutate_data_frame raising an error if probability is provided as an integer, not a float

0.3.2 (Jul 19, 2024)

Fixes

Fix multiple mutators not being applied correctly when defined on the same column

0.3.1 (Mar 28, 2024)

Fixes

Fix IndexError when calling with_permute on empty series
Fix Python version range in pyproject.toml

Refactors

Fix type hint on **kwargs in benchmarks

Documentation

Add navigation tabs to documentation
Fix image link in README so that it can be displayed on PyPI

Internal

Cache dependencies in CI pipelines
Reorganize dependencies into groups for tests, development and documentation

0.3.0 (Mar 18, 2024)

Features

Allow corruptor.with_permute to work with more than two series at once
Infer header parameter for functions reading CSV files
Remove list length constraints from mutator module

Refactors

Fix type hints on *args and **kwargs
Rename corruptor module to mutator
Rename Corruptor type alias to Mutator
Rename corruptor.corrupt_dataframe to mutator.mutate_data_frame
Rename generator.to_dataframe to generator.to_dataframe

Documentation

Add API reference to documentation
Update docs to use new "mutator" terminology wherever possible
Use Google format docstrings instead of reST

Internal

Merge documentation repository into main repository
Move repositories from GitLab to GitHub
Refine benchmark suite, add example based on German population dataset

0.2.0 (Feb 16, 2024)

Features

Add generator.with_permute for swapping values between series
Set wider version ranges for dependencies
Fix corruptor.corrupt_dataframe to not modify original data frame
Add tests to all corruptor functions to ensure no modifications to original data

Refactors

Change generators to take in and return a list of series instead of single series
Change generator.to_dataframe signature to align with corruptor.corrupt_dataframe

Internal

Extend CI pipeline with a benchmarking step that runs on release and when manually triggered
Add benchmark based on the "fruits" example in the docs

0.1.0 (Feb 8, 2024)

Initial release