Skip to content

Instantly share code, notes, and snippets.

@lwe
Created July 14, 2010 08:53
Show Gist options
  • Save lwe/475200 to your computer and use it in GitHub Desktop.
Save lwe/475200 to your computer and use it in GitHub Desktop.
Try to scrape formula information from @formula.homepage.
# Small utility which uses the homepage and nokogori to get a description from the formula's homepage.
#
# As written in the homebrew wiki:
# > Homebrew doesn’t have a description field because the homepage is always up to date,
# > and Homebrew is not. Thus it’s less maintenance for us. To satisfy the description
# > we’re going to invent a new packaging microformat and persuade everyone to publish
# > it on their homepage.
#
# Too bad no packaging microformat has yet been invented, but brew-more just first looks for a
# `<meta name="description">` tag, then for an `a#project_summary_link` tag (which is used in
# While this does not lead to a good description for all formulas, it works for quiet a few,
# try e.g. `brew more rubinius`.
#
# Note: this command depends on `nokogori`, `json` and `rubygems`
#
# Edit: non-sudo gem install works fine (by adamv)
# Edit: use google search instead of title as fallback - returns pretty good results :)
# Edit: ensure error contains json & nokogiri gem
#
require 'formula'
require 'uri'
require 'open-uri'
require 'rubygems'
begin
require 'json'
require 'nokogiri'
rescue LoadError
onoe "command requires 'json' and 'nokogiri' gem..."
exit 2
end
# split description at 80 chars
MAX_CHARS = 120
class Object
# Define try() method to simplify Nokogori scrape-ing
def try(method, *args); self.nil? ? nil : self.send(method, *args) end
end
# Print usage
def usage(code = 0)
puts "Usage: brew more [formula] ... (formula description scraper)"
exit(code)
end
def scrape_info(formula)
more = "<No description>"
google = false
if doc = Nokogiri::HTML(open(formula.homepage))
part = doc.xpath('/html/head/meta[@name="description"]').first.try(:[], 'content') || doc.css('a#project_summary_link').first.try(:text)
unless part
# try a google search :)
if hash = JSON.load(open("http://www.google.com/uds/GwebSearch?q=#{URI.escape(formula.homepage)}&v=1.0")).try(:[], 'responseData').try(:[], 'results').try(:first)
part = Nokogiri::HTML(hash['title']).text + ' ' + Nokogiri::HTML(hash['content']).text
google = true
end
end
more,c = part.split(/ +/).inject([' ',1]) do |res, i|
if (i.length + 1 + res[1]) > MAX_CHARS
res[0] << "\n "
res[1] = 1
end
[res[0] << " " << i, res[1] + 1 + i.length]
end if part
more = "(Description via Google) \n" << more if google
end
morebody = formula.homepage.to_s, more, "\n"
ohai "#{formula.name} #{formula.version}" + (formula.prefix.parent.directory? ? " (installed)" : ""), morebody
end
if ARGV.include?('-h') || ARGV.include?('--help')
usage
elsif ARGV.named.empty?
onoe "please specifiy a formula"
usage(1)
end
ARGV.formulae.each { |formula| scrape_info(formula) }
@tjnycum
Copy link

tjnycum commented Jun 4, 2015

May I suggest storing this in a regular repo ("homebrew-more") so it can be tapped and installed easily?

@bfontaine
Copy link

Note that brew-desc is now part of the core and all formulae have a desc field (at least in the core).

@bfontaine
Copy link

@Drewshg312 Assuming ~/bin exists and is in your PATH this should work:

cd ~/bin
wget https://gist.githubusercontent.com/lwe/475200/raw/a28c407438a6e0a88c63055a768c7eeff3670b88/brew-more.rb
chmod u+x brew-more.rb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment