Last active
November 5, 2021 15:20
-
-
Save afeld/a7a62271923c7a079d02f8f38efc0a78 to your computer and use it in GitHub Desktop.
reduce size of CSV with pandas
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
# maintain the original data format by reading everything as strings | |
original = pd.read_csv("original.csv", dtype="object") | |
# set the random_state so it's reproducible. alternatively, can pass a `frac` to use a percentage. | |
sampled = original.sample(n=5000, random_state=1).sort_index() | |
# exclude the index so the columns match the orignal | |
sampled.to_csv("sampled.csv", index=False) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment