Skip to content

Instantly share code, notes, and snippets.

@yanofsky
Last active October 17, 2024 22:49
Show Gist options
  • Save yanofsky/5436496 to your computer and use it in GitHub Desktop.
Save yanofsky/5436496 to your computer and use it in GitHub Desktop.
A script to download all of a user's tweets into a csv
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <https://unlicense.org>
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=200)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print(f"getting tweets before {oldest}")
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print(f"...{len(alltweets)} tweets downloaded so far")
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text] for tweet in alltweets]
#write the csv
with open(f'new_{screen_name}_tweets.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(outtweets)
pass
if __name__ == '__main__':
#pass in the username of the account you want to download
get_all_tweets("J_tsar")
@heenashree
Copy link

@brianhalperin

I received the same error. Try changing line 53.

Change line 53 from this:
with open('%s_tweets.csv' % screen_name, 'wb') as f:

to this:
with open('%s_tweets.csv' % screen_name, 'w') as f:

Pretty much just drop the 'b'. Let me know if it works for you.

This worked...thank you so much

@heenashree
Copy link

This code worked wonders :)

@prakashjha17
Copy link

@david Yanofsky
when i tried writing the code and i got the below error::
for the
LOC-->new_tweets = api.user_timeline(screen_name = screen_name,count=20)
error::
NameError Traceback (most recent call last)
in
----> 1 new_tweets = api.user_timeline(screen_name = screen_name,count=20)

NameError: name 'screen_name' is not defined

  1. LOC::alltweets.extend(new_tweets)
    error
    NameError Traceback (most recent call last)
    in
    ----> 1 alltweets.extend(new_tweets)

NameError: name 'new_tweets' is not defined
3.LOC->oldest = alltweets[-1].id - 1
ERROR::
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range
4.LOC->while len(new_tweets) > 0:
ERROR
File "", line 2

^

SyntaxError: unexpected EOF while parsing

  1. print "getting tweets before %s" % (oldest)
    ERROR
    File "", line 1
    print "getting tweets before %s" % (oldest)
    ^
    SyntaxError: invalid syntax

6.oldest = alltweets[-1].id - 1
ERROR
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range

I am new to Python.
I have written exactly the same as you had mentioned.

Could you please help me solve the issue.

Thanks in advance.

Thanks,
Prakash Jha

@prakashjha17
Copy link

This code worked wonders :)

when i tried writing the code and i got the below error::
for the
LOC-->new_tweets = api.user_timeline(screen_name = screen_name,count=20)
error::
NameError Traceback (most recent call last)
in
----> 1 new_tweets = api.user_timeline(screen_name = screen_name,count=20)

NameError: name 'screen_name' is not defined

LOC::alltweets.extend(new_tweets)
error
NameError Traceback (most recent call last)
in
----> 1 alltweets.extend(new_tweets)
NameError: name 'new_tweets' is not defined
3.LOC->oldest = alltweets[-1].id - 1
ERROR::
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range
4.LOC->while len(new_tweets) > 0:
ERROR
File "", line 2

^
SyntaxError: unexpected EOF while parsing

print "getting tweets before %s" % (oldest)
ERROR
File "", line 1
print "getting tweets before %s" % (oldest)
^
SyntaxError: invalid syntax
6.oldest = alltweets[-1].id - 1
ERROR
IndexError Traceback (most recent call last)
in
----> 1 oldest = alltweets[-1].id - 1

IndexError: list index out of range

I am new to Python.
I have written exactly the same as you had mentioned.

Could you please help me solve the issue.

Thanks in advance.

Thanks,
Prakash Jha

@sonamgupta1105
Copy link

@yanofsky Thanks for writing this code. Helped me to start learning API and building dataset with it. Do you know any way I can filter the tweets by particular hashtags ?

@ParthS28
Copy link

Hello, I am getting this error. Can anyone help?

TypeError                                 Traceback (most recent call last)
<ipython-input-25-6f34111d251b> in <module>
     49 if __name__ == '__main__':
     50         #pass in the username of the account you want to download
---> 51         get_all_tweets("realDonaldTrump")

<ipython-input-25-6f34111d251b> in get_all_tweets(screen_name)
     41         with open('%s_tweets.csv' % screen_name, 'wb') as f:
     42                 writer = csv.writer(f)
---> 43                 writer.writerow(['id','created_at','text'])
     44                 writer.writerows(outtweets)
     45 

TypeError: a bytes-like object is required, not 'str'

@kevinSJ27
Copy link

@ParthS28 on line 41 remove the following:
41 with open('%s_tweets.csv' % screen_name, 'wb') as f:
41 with open('%s_tweets.csv' % screen_name, 'w') as f:
remove the b at the end and it should work

@pratikone
Copy link

I have created a modified version which fetches the tweets and creates tweet threads out of it
https://gist.github.com/pratikone/4cdd5b1149aef0418611eb8748d90ee9

@musahibrahimali
Copy link

i am getting this error , can anyone help

Traceback (most recent call last):
File "C:/Users/MUSAH IBRAHIM ALI/PycharmProjects/Election Prediction/test.py", line 61, in
get_all_tweets("NAkufoAddo")
File "C:/Users/MUSAH IBRAHIM ALI/PycharmProjects/Election Prediction/test.py", line 53, in get_all_tweets
writer.writerow(["id", "created_at", "text"])
TypeError: a bytes-like object is required, not 'str'

@onmyeoin
Copy link

onmyeoin commented Apr 6, 2020

@jefische
Copy link

jefische commented Aug 16, 2020

The following terminal output and errors are repeated several times when I execute the code (though I've only pasted one iteration below). Not sure what the issue is but mentions certificate failures and something about max retries - any ideas?

Terminal Output:

PS C:\Users\jefischer\Documents\My_Projects\Thinkorswim> & C:/Users/jefischer/AppData/Local/Programs/Python/Python38/python.exe c:/Users/jefischer/Documents/My_Projects/Thinkorswim/tweet_dumper.py
Traceback (most recent call last):
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 978, in validate_conn
conn.connect()
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connection.py", line 362, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\ssl
.py", line 384, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1040, in _create
self.do_handshake()
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\adapters.py", line 439, in send resp = conn.urlopen(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "C:\Users\jefischer\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /1.1/statuses/user_timeline.json?screen_name=unusual_whales&count=200 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)')))

@Stryker412
Copy link

Stryker412 commented Nov 3, 2020

Is it possible to put in multiple screennames to get all of the tweets sorted by username in one csv?

@musahibrahimali
Copy link

musahibrahimali commented Nov 4, 2020

@kcalderw79 : Yes its possible to do this in one script file to extract data from multiple screen names to one csv file. check it out on my project on here https://github.com/MIA-GH/Elections/blob/master/scripts/main.py .
you can insert the screen names in the users array on line 47 of the main.py file above.
This same script allows you to also extract data using certain keywords. you can insert these keywords (hashtags) in the array on line 25 in the terms array.

cheers mate.

just a heads up don't forget to insert your twitter here https://github.com/MIA-GH/Elections/blob/master/scripts/twitter_credentials.py before running the screen

@JunaidAWahid
Copy link

Works great. But have a question. How do I get only the status and not reply or retweets from a user? Is there any way?

add include_rts='false', exclude_replies='true', in user.timeline in line 39

@mindyng
Copy link

mindyng commented Jan 5, 2021

@kcalderw79 : Yes its possible to do this in one script file to extract data from multiple screen names to one csv file. check it out on my project on here https://github.com/MIA-GH/Elections/blob/master/scripts/main.py .
you can insert the screen names in the users array on line 47 of the main.py file above.
This same script allows you to also extract data using certain keywords. you can insert these keywords (hashtags) in the array on line 25 in the terms array.

cheers mate.

just a heads up don't forget to insert your twitter here https://github.com/MIA-GH/Elections/blob/master/scripts/twitter_credentials.py before running the screen

^ This worked for me. I love it because it creates such a rich dataset: multiple users and multiple KW's/hashtags pulled! Only edit I made was pasting Twitter API credentials straight into the script. So no need for: from scripts import twitter_credentials as api. Though the way that it is originally set up helps with quick script transfer across the web. Thanks, @Mia-gh!

@likeablegeek
Copy link

Hi @yanofsky ...

What license are you distributing this code with? Do you have any objections to this code being used/extended in a project which is being shared under the Apache 2.0 license?

Thanks.

@yanofsky
Copy link
Author

yanofsky commented Apr 1, 2021

@likeablegeek, I added a License file.

@likeablegeek
Copy link

@likeablegeek, I added a License file.

Thanks.

@JayJay-101
Copy link

works like charm, just had to pass encoding parameter with value utf-8 at last block,

@alessandromonolo
Copy link

works like charm, just had to pass encoding parameter with value utf-8 at last block,

which line of code? can you please copy-paste your list block of code?
I had the same error problem.

@shubhamcodez
Copy link

How to get more than 3240 tweets?

@yanofsky
Copy link
Author

yanofsky commented Nov 27, 2022

@shubhamcodez You can't using the API, unless they're your own. It's a limit imposed by Twitter.

@shubhamcodez
Copy link

I found a script that let's extract all tweets at once.

@SalmanKhaja
Copy link

Can you share the link?

@shubhamcodez
Copy link

shubhamcodez commented Dec 25, 2022

@SalmanKhaja
Copy link

@Wamy-Dev
Copy link

Does this still work in late 2023 because of the API changes?

@shubhamcodez
Copy link

Does this still work in late 2023 because of the API changes?

Nope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment