Skip to content

Instantly share code, notes, and snippets.

@shaypal5
Last active August 1, 2022 17:36

Revisions

  1. shaypal5 revised this gist Aug 1, 2022. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion pdp_post_adv2.py
    Original file line number Diff line number Diff line change
    @@ -13,7 +13,7 @@
    >>> mp.pipeline
    A pdpipe pipeline:
    [ 0] Drop columns Columns with at least 0.2 missing value rate
    [ 1] Drop labels by values
    [ 1] Drop rows by label values
    [ 2] Encode label values
    [ 3] Drop columns 'Name'
    [ 4] Apply dataframe method set_index with kwargs {'keys': 'id'}
  2. shaypal5 revised this gist Aug 1, 2022. 1 changed file with 16 additions and 15 deletions.
    31 changes: 16 additions & 15 deletions pdp_post_adv2.py
    Original file line number Diff line number Diff line change
    @@ -16,20 +16,21 @@
    [ 1] Drop labels by values
    [ 2] Encode label values
    [ 3] Drop columns 'Name'
    [ 4] Drop rows by qualifier <RowQualifier: Qualify rows with X[Savings] >
    [ 4] Apply dataframe method set_index with kwargs {'keys': 'id'}
    [ 5] Drop rows by qualifier <RowQualifier: Qualify rows with X[Savings] >
    101>
    [ 5] Assign column Viking with df[Country].isin(['Denmark', 'Finland']) &
    [ 6] Assign column Viking with df[Country].isin(['Denmark', 'Finland']) &
    ~df[Bearded]
    [ 6] Assign column YearlyGrands with df[Savings] * 1000 / df[Age]
    [ 7] Bin Savings by [1].
    [ 8] One-hot encode 'Country'
    [ 9] Tokenize Quote
    [10] Stemming tokens in Quote...
    [11] Remove stopwords from Quote
    [12] Count-vectorizing column Quote.
    [13] Decompose columns Columns that start with Quote with PCA
    [14] Encode 'Savings_bin', 'Gender'
    [15] Scale columns Columns of dtypes <class 'numpy.number'>
    [16] Drop columns 'Bearded'
    [17] Transform input dataframes to the following schema: <Learnable Schema>
    [18] Validates conditions
    [ 7] Assign column YearlyGrands with df[Savings] * 1000 / df[Age]
    [ 8] Bin Savings by [1].
    [ 9] One-hot encode 'Country'
    [10] Tokenize Quote
    [11] Stemming tokens in Quote...
    [12] Remove stopwords from Quote
    [13] Count-vectorizing column Quote.
    [14] Decompose columns Columns that start with Quote with PCA
    [15] Encode 'Savings_bin', 'Gender'
    [16] Scale columns Columns of dtypes <class 'numpy.number'>
    [17] Drop columns 'Bearded'
    [18] Transform input dataframes to the following schema: <Learnable Schema>
    [19] Validates conditions
  3. shaypal5 created this gist Aug 1, 2022.
    35 changes: 35 additions & 0 deletions pdp_post_adv2.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,35 @@
    >>> mp = MyPipelineAndModel(
    savings_max_val=101,
    drop_gender=False,
    standardize=True,
    ohencode_country=True,
    savings_bin_val=1,
    pca_threshold=25,
    fit_intercept=True)
    >>> mp
    <PdPipeline -> LogisticRegression>
    >>> mp.estimator
    LogisticRegression()
    >>> mp.pipeline
    A pdpipe pipeline:
    [ 0] Drop columns Columns with at least 0.2 missing value rate
    [ 1] Drop labels by values
    [ 2] Encode label values
    [ 3] Drop columns 'Name'
    [ 4] Drop rows by qualifier <RowQualifier: Qualify rows with X[Savings] >
    101>
    [ 5] Assign column Viking with df[Country].isin(['Denmark', 'Finland']) &
    ~df[Bearded]
    [ 6] Assign column YearlyGrands with df[Savings] * 1000 / df[Age]
    [ 7] Bin Savings by [1].
    [ 8] One-hot encode 'Country'
    [ 9] Tokenize Quote
    [10] Stemming tokens in Quote...
    [11] Remove stopwords from Quote
    [12] Count-vectorizing column Quote.
    [13] Decompose columns Columns that start with Quote with PCA
    [14] Encode 'Savings_bin', 'Gender'
    [15] Scale columns Columns of dtypes <class 'numpy.number'>
    [16] Drop columns 'Bearded'
    [17] Transform input dataframes to the following schema: <Learnable Schema>
    [18] Validates conditions