Skip to content

Instantly share code, notes, and snippets.

@edelpero
Last active May 10, 2021 08:31
Show Gist options
  • Save edelpero/9257311 to your computer and use it in GitHub Desktop.
Save edelpero/9257311 to your computer and use it in GitHub Desktop.
Heroku, Ruby on Rails and PhantomJS

#Heroku, Ruby on Rails and PhantomJS

In this post, I’m going to show you how to modify an existing Ruby on Rails app running on Heroku’s Cedar stack to use PhantomJS for screen scraping. If you’ve never heard of PhantomJS, it’s a command-line WebKit-based browser (that supports JavaScript, cookies, etc.).

Let’s get started. This is a high-level overview of the required steps:

  • Modify your app to use multiple Heroku buildpacks.
  • Extend your app to use both the Ruby as well as the PhantomJS buildpacks.
  • Confirm that everything worked.

David Dollar has created Heroku buildpack that allows you to use multiple Heroku buildpacks for your app. ;-) Install it by setting an environment variable:

$ heroku config:set BUILDPACK_URL=https://github.com/ddollar/heroku-buildpack-multi.git

Extend your app to use both the Ruby as well as the PhantomJS buildpacks

Create a .buildpacks file in the root directory of your app:

$ touch .buildpacks

Within your .buildpacks file, specify that your app uses both the Ruby and the PhantomJS buildpacks. This prevents you from having to (cross-)compile PhantomJS yourself. Make the contents of your .buildpacks file:

https://github.com/heroku/heroku-buildpack-ruby
https://github.com/stomita/heroku-buildpack-phantomjs

PhantonJS has a dependency on libQtWebKit.so.4, which the PhantomJS buildpack installs on Heroku in/app/vendor/phantomjs/lib. Modify your Heroku app’s LD_LIBRARY_PATH to include this directory:

$ heroku config:set PATH="/usr/local/bin:/usr/bin:/bin:/app/vendor/phantomjs/bin"
$ heroku config:set LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib:/app/vendor/phantomjs/lib

Confirm that everything worked

First things first - load your app in your web browser and confirm that it still comes up. :-) Next, launch a Heroku bash shell:

$ heroku run bash

Within the bash shell, invoke the PhantomJS executable to ensure that it runs:

$ vendor/phantomjs/bin/phantomjs —version

Good to go! Now you’re ready to call PhantomJS from your Ruby code.

@benbowden
Copy link

I almost died today trying to solve this so hopefully this helps that poor sole that had to go through what I did today. Basically did everything that was said and I'll just recap what I did.

Overview of what needs to be done:


heroku config:set BUILDPACK_URL=https://github.com/ddollar/heroku-buildpack-multi.git

Create a .buildpacks file in your app for rails (touch .buildpacks should do it but you can also write click on your main project directory and create it)

screen shot 2015-11-22 at 10 04 02 pm
In the .buildpacks file add
https://github.com/heroku/heroku-buildpack-ruby
https://github.com/stomita/heroku-buildpack-phantomjs

Then in your terminal set the configs:

$ heroku config:set PATH="/usr/local/bin:/usr/bin:/bin:/app/vendor/phantomjs/bin"
$ heroku config:set LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib:/app/vendor/phantomjs/lib

running "heroku run bash" to make sure everything was setup doesn't work until you deply to heroku. Keep that in mind if you are getting an error when you run "vendor/phantomjs/bin/phantomjs —version", deploy then it'll most likely work.

New stuff


Great, now your heroku app is being stupid when you go to your domain name and it's probably giving you an error such as:

screen shot 2015-11-22 at 10 03 37 pm

The reason I found was because the dynos weren't setup anymore to know what to do. So you need to create a Procfile, make sure you spell it exactly like Procfile and DONT lowercase the P:

screen shot 2015-11-22 at 10 06 41 pm

Then put these commands in the Procfile (these commands below for some reason get erased in heroku thanks to buildpacks so you are basically setting your application back to what it was doing before you used buildpacks):

web: bin/rails server -p $PORT -e $RAILS_ENV
worker: bundle exec rake jobs:work

Sweet, commit that and push to heroku. One more step..

Make sure you have a dyno actually running for web:
screen shot 2015-11-22 at 10 08 33 pm

you can also just run this in your terminal:

heroku ps:scale web=1

Which sets your dyno web dyno to 1. Anyways, after that everything worked for me. Hopefully that helps!

@montekaka
Copy link

thanks for the post, but when i run browser = Watir::Browser.new(:phantomjs, :args => args), it didn't get any dynamic html content we expected to see.

@jaypinho
Copy link

Apologies in advance if this is way out of scope, but after following these directions successfully, I'm still unclear on how I'd actually execute Phantomjs commands within a Rails controller on Heroku. I've spent hours googling and on Stack Overflow to no avail but feel I'm missing something obvious.

How do I reference phantomjs from within my Rails app and then use Rails to execute Phantomjs commands? Thank you!!

@tomgallagher
Copy link

First you need to add the 3rd party phantomjs build pack.

https://elements.heroku.com/buildpacks/stomita/heroku-buildpack-phantomjs

Follow the instructions, deploy and check that it installs and that your app still works.

Then you need to get something that can run Phantom for you. Capybara and the Poltergeist gems are what you need. Install the gems and read the instructions.

After you've done that, just use the Capybara instructions in Ruby..

I've got it working. This article will show you how to do it.

http://www.chrisle.me/2012/12/scraping-html5-sites-using-capybara-phantomjs/

My advice is to run it in a Rake task first to work out what's going on and then migrate.

Hope that helps!

@JHFirestarter
Copy link

JHFirestarter commented Apr 23, 2016

@edelpero @benbowden @tomgallagher watir+phantomjs still does not work for me. I followed all of the above: web dyno is on & running, app is up, Multipack is my framework, and https://github.com/ddollar/heroku-buildpack-multi.git my buildpack in Heroku. I config'd the paths above.

On my local terminal, seems good: vendor/phantomjs/bin/phantomjs -v => 2.1.1
In heroku bash, not so much: vendor/phantomjs/bin/phantomjs -v => bash: vendor/phantomjs/bin/phantomjs: cannot execute binary file: Exec format error

I tried running a rake task heroku run rake example_task and received Selenium::WebDriver::Error::WebDriverError: unable to connect to phantomjs @ http://127.0.0.1:#### after 20 seconds

@Jpatcourtney
Copy link

Hey great tutorial. I found it because I "lowercased the p" in Procfile!

Now even when I rename it or create a new file I can not get heroku to see it. When I push it to github it is still lowercase.

Any idea how I can fix this?

Thanks!

@noafroboy
Copy link

Don't need to do any of this anymore.

See: https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app

Simply do
heroku buildpacks:add --index 1 https://github.com/stomita/heroku-buildpack-phantomjs

And deploy. Worked for me on Phantom 2.1.1

@mdkarp
Copy link

mdkarp commented Sep 20, 2016

thanks @noafroboy worked for me!

@sebastian-palma
Copy link

Also you can download the phamtomjs package and extract the executable in the bin folder, then you put it in the bin folder in your Rails app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment