Skip to content

Instantly share code, notes, and snippets.

@lwe
Created July 14, 2010 08:53
Show Gist options
  • Save lwe/475200 to your computer and use it in GitHub Desktop.
Save lwe/475200 to your computer and use it in GitHub Desktop.
Try to scrape formula information from @formula.homepage.
# Small utility which uses the homepage and nokogori to get a description from the formula's homepage.
#
# As written in the homebrew wiki:
# > Homebrew doesn’t have a description field because the homepage is always up to date,
# > and Homebrew is not. Thus it’s less maintenance for us. To satisfy the description
# > we’re going to invent a new packaging microformat and persuade everyone to publish
# > it on their homepage.
#
# Too bad no packaging microformat has yet been invented, but brew-more just first looks for a
# `<meta name="description">` tag, then for an `a#project_summary_link` tag (which is used in
# While this does not lead to a good description for all formulas, it works for quiet a few,
# try e.g. `brew more rubinius`.
#
# Note: this command depends on `nokogori`, `json` and `rubygems`
#
# Edit: non-sudo gem install works fine (by adamv)
# Edit: use google search instead of title as fallback - returns pretty good results :)
# Edit: ensure error contains json & nokogiri gem
#
require 'formula'
require 'uri'
require 'open-uri'
require 'rubygems'
begin
require 'json'
require 'nokogiri'
rescue LoadError
onoe "command requires 'json' and 'nokogiri' gem..."
exit 2
end
# split description at 80 chars
MAX_CHARS = 120
class Object
# Define try() method to simplify Nokogori scrape-ing
def try(method, *args); self.nil? ? nil : self.send(method, *args) end
end
# Print usage
def usage(code = 0)
puts "Usage: brew more [formula] ... (formula description scraper)"
exit(code)
end
def scrape_info(formula)
more = "<No description>"
google = false
if doc = Nokogiri::HTML(open(formula.homepage))
part = doc.xpath('/html/head/meta[@name="description"]').first.try(:[], 'content') || doc.css('a#project_summary_link').first.try(:text)
unless part
# try a google search :)
if hash = JSON.load(open("http://www.google.com/uds/GwebSearch?q=#{URI.escape(formula.homepage)}&v=1.0")).try(:[], 'responseData').try(:[], 'results').try(:first)
part = Nokogiri::HTML(hash['title']).text + ' ' + Nokogiri::HTML(hash['content']).text
google = true
end
end
more,c = part.split(/ +/).inject([' ',1]) do |res, i|
if (i.length + 1 + res[1]) > MAX_CHARS
res[0] << "\n "
res[1] = 1
end
[res[0] << " " << i, res[1] + 1 + i.length]
end if part
more = "(Description via Google) \n" << more if google
end
morebody = formula.homepage.to_s, more, "\n"
ohai "#{formula.name} #{formula.version}" + (formula.prefix.parent.directory? ? " (installed)" : ""), morebody
end
if ARGV.include?('-h') || ARGV.include?('--help')
usage
elsif ARGV.named.empty?
onoe "please specifiy a formula"
usage(1)
end
ARGV.formulae.each { |formula| scrape_info(formula) }
@Zearin
Copy link

Zearin commented May 28, 2011

Looks like your introductory comment has an unfinished sentence! I found the full sentence in this previous version:
https://gist.github.com/475200/757b6c4b48e99bf075f3b24ceb41ba71abdd8bcc

@lwe
Copy link
Author

lwe commented May 28, 2011

completed, thanks :)

@Zearin
Copy link

Zearin commented May 28, 2011

w00t! Thank you. ☺

@Zearin
Copy link

Zearin commented May 28, 2011

Request
Hey, have you seen the CommonJS package format ? It’s basically the spec for the package.json files used by npm.

I know, I know…that’s not so helpful for Homebrew, but there are some packages that are registered in both Homebrew and npm.

Under the section Required Field, it lists fields for both “description” and “keywords”.

There’s also DOAP for project metadata. It’s surprisingly rare on GitHub, but it is nevertheless in wide use elsewhere. Maybe you don’t think it’s worth it, but I’m trying to at least raise awareness of DOAP in order to encourage the use of existing standards. A handy page for quickly learning what you can scrape from a DOAP file is the DOAP a Matic. (The DOAP Homepage is available as well, but not really structured for being quickly usable.)

@lwe
Copy link
Author

lwe commented May 29, 2011

Mhh, sounds interesting, though the main issue is probably discovery of these resources. DOAP as sometimes linked with a <link rel="meta" title="DOAP" .../>, but no idea if that's standard :)

@Zearin
Copy link

Zearin commented Sep 16, 2011

(…I didn’t know you can’t make pull requests for Gists!)

Hey, I made a couple of changes to the output for brew more. Basically it makes each formula stand out better (using ohai()), and puts the “via Google” disclaimer up front.

I made the changes for readability, and also because when using brew more on multiple formulae, the “via Google” disclaimer made it harder to read because it was always in a different place. Now, when the disclaimer appears, it is in a predictable place, and also lets you know before reading the description that it’s the second-choice source for grabbing a formula’s description.

Would you consider copying it? It’s here:

https://gist.github.com/1222231

@lwe
Copy link
Author

lwe commented Sep 19, 2011

Hey, jep, too bad it's not possible to bring in changes from other gists... well anyway, copied your changes, thx!

@Hnasar
Copy link

Hnasar commented Mar 30, 2012

I just spent 20 minutes trying to figure out why brew-more.rb couldn't find nokogiri, though I installed it....It turns out that I didn't have json installed (line 26), and THIS was falsely causing the nokogiri error. Annoying.

@lwe
Copy link
Author

lwe commented Mar 31, 2012

good catch, thanks :)

@simonweil
Copy link

How does one install this command?

@drew1kun
Copy link

Sorry for maybe stupid question, but could someone explain how to install this formula. Thank you!

@tjnycum
Copy link

tjnycum commented Jun 4, 2015

May I suggest storing this in a regular repo ("homebrew-more") so it can be tapped and installed easily?

@bfontaine
Copy link

Note that brew-desc is now part of the core and all formulae have a desc field (at least in the core).

@bfontaine
Copy link

@Drewshg312 Assuming ~/bin exists and is in your PATH this should work:

cd ~/bin
wget https://gist.githubusercontent.com/lwe/475200/raw/a28c407438a6e0a88c63055a768c7eeff3670b88/brew-more.rb
chmod u+x brew-more.rb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment