Skip to content

Instantly share code, notes, and snippets.

@nandoquintana
Created December 5, 2019 09:51
Show Gist options
  • Save nandoquintana/72e9f01413d927332b6c19b02a0d6dee to your computer and use it in GitHub Desktop.
Save nandoquintana/72e9f01413d927332b6c19b02a0d6dee to your computer and use it in GitHub Desktop.
Find instagram hashtags through a regular expression in Python
import re
txt = """
hola @gorka!!!!!! #ñam???? #normal #subtêrráneo @nandoquintana #puravida!!! #espagne🇪🇸
#凤凰卫视
"""
not_in_hashtags = "\"$%&'()*+,-./:;<=>?[\]^`{|}~\n#@ "
hashtags = re.findall(f'\#[^{re.escape(not_in_hashtags)}]+', txt)
print(hashtags)
@zkwp5
Copy link

zkwp5 commented Jul 15, 2022

Verify Github on Galaxy. gid:mmqGht93YKVt5ytDukzQDG

@zkwp5
Copy link

zkwp5 commented Jul 15, 2022

gid:mmqGht93YKVt5ytDukzQDG

@fl0aten
Copy link

fl0aten commented Jul 15, 2022

Thanks! Nice Regex!

In my tests, some hashtags still had a space behind them, which turned out to be "Non Breakable Space".

"\u00A0" should fix it.

not_in_hashtags = "\"$%&'()*+,-./:;<=>?[\]^`{|}~\n#@ \u00A0"

(I am in a different programming language.... can't promise that it works that way in Python.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment