-
-
Save troy/184547 to your computer and use it in GitHub Desktop.
# usage: redfin-images "http://www.redfin.com/WA/Seattle/123-Home-Row-12345/home/1234567" | |
function redfin-images() { | |
wget -O - $1 | grep "full:" | awk -F \" '{print $4}' | xargs wget - | |
} |
wget -O - http://www.redfin.com/WA/Seattle/123-Home-Row-12345/home/1234567 | grep "full:" | awk -F \" '{print $4}' | xargs wget - |
Got a version working with bash pre-4.2 (ie. Mac OS X Mojave comes with bash 3.2.57(1)-release):
-
brew install ascii2uni
-
wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[0-9_]*.jpg" | ascii2uni -Z '\u%04X' | xargs wget --user-agent="Mozilla"
(Unicode codepoint is tricky to work with in older versions of bash)
But really, just upgrade your Mac's bash.
I just used the most recent command and it did not work. For the listing I'm using, all the images start with PW
so I had to change the regex to [0-9_PW]*.jpg
and that worked. Thank you!!!
I found that the latest jpegs were prefixed with "VALO", my regex is rusty because when I tried wildcarding A-Z it wouldn't pull them. See where I hardcoded VALO below, and be aware that you may need to tweak that should the prefix for jpegs change
bash-5.0$ wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\u002F\\u002Fssl.cdn-redfin.com\\u002Fphoto\\u002F\d*\\u002Fbigphoto\\u002F\d*\\u002F[VALO][0-9_].jpg") | xargs wget --user-agent="Mozilla"
The following worked for me
wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg") | xargs wget --user-agent="Chrome"
This solution does not work for me, though at first it looked as it would: "Connecting to www.redfin.com..." -> "HTTP request sent, awaiting response... 200 OK" -> "Length: unspecified [text/html]" -> "Saving to: 'STDOUT'" -> "written to STDOUT"... but then get a "wget: missing URL" message.
Any thoughts on how to solve?
Not a dev, just a curiosity-driven individual trying to learn. In retrospect, it would have been faster (for me) to right click each photo and save, but there would not have been any fun!
The following worked for me
wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg") | xargs wget --user-agent="Chrome"
wget: missing URL = you didn't provide a valid url. Navigate to the listing via redfin, copy the entire url, use it in place of the dummy url in example. Enjoy!
Hi Andrew, yes, I originally used a valid Redfin URL with all components; IE: https://www.redfin.com/State/City/StreetAddress-Zip/home/ID. Yet, it still gave me a "wget: missing URL" message.
I will try a different listing and see if it replicates.
I even changed useragent to Mozilla or Chrome, as well as the "-" between "-0 - https:..." to no avail.
wget: missing URL = you didn't provide a valid url. Navigate to the listing via redfin, copy the entire url, use it in place of the dummy url in example. Enjoy!
does this still work?
@mals14: I haven't used it in over 10 years, so I have no idea. Based on the fact that people have commenting in the last few months stating that it works, I'm guessing that it does. Try it and see :)
@troy - thank you for replying. It does not work anymore, I guess because the website can sense it is wget request and does not respond well.
I am not sure if all the others get the notification or not.
For now, I copied the displayed page on realtor.com I believe, and then pasted it as markdown, and then used a python script from GitHub contributor to download the image files. Quite a round about solution but worked.
Following worked for me on my mac
1) brew install uni2ascii
2) wget --user-agent="Mozilla" -O - <RedFinURL> | egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg" | ascii2uni -Z '\u%04X' | xargs wget --user-agent="Mozilla"
@punjabdhaputar Thank you for sharing. It works!
Was able to understand how it works by first saving wget output, then found the format that egrep is looking for in that document, and ascii2uni is changing the format back to something that wget can use. Good stuff and thanks again for sharing!
@punjabdhaputar worked for me too, thanks!!
Any idea how to download the photos that are visible after signing in?
@gauravchak: After using a browser to log in, you might be able to change wget
to present session cookies from the browser. For that, look into the --load-cookies
option. You'd need to manually create the cookie file.
Assuming you're just saving images from a handful of listings for personal use (which is what this script was intended for), one of these methods might be easier than adding cookie support:
- In Firefox, choose Tools -> Page Info, select the Media tab, highlight multiple image URLs in the listing, and click "Save as." This probably won't show high-res images that are only shown in an interactive gallery (lightbox), but it will at least show the average-size images. If you need high-res images, you can probably find a different real estate site that does show all of the high-res images in one page and use the same technique.
- Use a "Save all images" browser extensions (example: https://github.com/belaviyo/save-images - I haven't personally used it). Browser extensions are risky, so look for a trusted one with lots of users and comments (and ideally, public source code), and uninstall it as soon as you're done.
It was working well earlier this year, but now it just downloads a single image. Does anyone know how to adjust it to download all images? Thanks
@punjabdhaputar ...this still works! thanks!
@punjabdhaputar just tried this and was able to save myself several right click-save trouble...thanks!
It still works.
@gauravchak managed to make it work for listings that require signing in, by using the method outlined here: How do I use wget/curl to download from a site I am logged into?.
- Logged into Redfin in Firefox.
- Open "Network" tab of Web Developer tool: Ctrl-Shift-E
- I took the very first request that was sent when I refreshed the screen on Firefox.
- Pasted it in Sublime, and saw a large amount of cookie values in there. To figure out where cookies started and stopped, I just searched for "-H" in the file and took what was there for cookies only. I took everything that was in between
'Cookie: key1=value1; key2=value2; [....]; keyn=valuen'
- Recreated the wget command as such:
wget --no-cookies --header "Cookie: key1=value1; key2=value2; [....]; keyn=valuen" --user-agent="Mozilla" -O - <RedFinURL> | egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg" | ascii2uni -Z '\u%04X' | xargs wget --user-agent="Mozilla"
And that did the trick. Hope that helps.
@punjabdhaputar Thanks, it works for me.
Redfin seems to be blocking this now, getting 403 Forbiden
I created a lil Go program to do this https://github.com/timendez/go-redfin-archiver
Clone repo, and just run e.g. go run archive.go https://www.redfin.com/CA/San-Jose/206-Grayson-Ter-95126/home/2122534
@timendez I just tried your Go program and it worked great. Nice work!
For anyone else who encounters this gist: Strongly consider using @timendez's program instead: https://github.com/timendez/go-redfin-archiver
I've updated the script to match current Redfin, as of 7/15/2019 (requires bash 4.2+ for
echo -e
a la https://stackoverflow.com/a/8795949):wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[0-9_]*.jpg") | xargs wget --user-agent="Mozilla"