Created
April 13, 2023 04:45
-
-
Save bobmonsour/895945ad5652d11129d6bbde67ffb2a1 to your computer and use it in GitHub Desktop.
An Eleventy filter that extracts the meta description from within the <head> element of a web page
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// getDescription - given a url, this Eleventy filter extracts the meta | |
// description from within the <head> element of a web page using the cheerio | |
// library. | |
// | |
// The full html content of the page is fetched using the eleventy-fetch plugin. | |
// If you have a lot of links from which you want to extract descriptions, the | |
// initial build time will be slow. However, the plugin will cache the content | |
// for a duration of your choosing (in this example, it's set to 1 day). | |
// | |
// The description is extracted from the <meta> element with the name attribute | |
// of "description". | |
// | |
// If no description is found, the filter returns an empty string. In the event | |
// of an error, the filter logs an error to the console and returns the string | |
// "(no description available)" | |
// | |
// Be sure to create a .cache folder in your project root and add .cache to your | |
// .gitignore file. See https://www.11ty.dev/docs/plugins/fetch/#installation | |
// | |
const EleventyFetch = require("@11ty/eleventy-fetch"); | |
const cheerio = require("cheerio"); | |
eleventyConfig.addFilter( | |
"getDescription", | |
async function getDescription(link) { | |
try { | |
let htmlcontent = await EleventyFetch(link, { | |
duration: "1d", | |
type: "buffer", | |
}); | |
const $ = cheerio.load(htmlcontent); | |
// console.log( | |
// "description: " + $("meta[name=description]").attr("content") | |
// ); | |
return $("meta[name=description]").attr("content"); | |
} catch (e) { | |
console.log( | |
"Error fetching description for " + link + ": " + e.message | |
); | |
return "(no description available)"; | |
} | |
} | |
); |
I do want to also put in a plug for the excellent https://www.npmjs.com/package/linkedom library for this too!
Thanks, Zach. I can't quite understand how to make that work, but I'm still in the early stages of javascript and npm package knowledge journey. Once I understand "cascading asset bucketing" I think I'll be read to conquer linkedom ;-)
For my use case, specifically for the 11tybundle.dev site, I have changed the cache duration to '*', meaning that eleventy will never fetch new data (after the first success). There's no need for me to be re-fetching complete blog posts to extract a description...once is quite enough.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While it's probably obvious, I wanted to note that this can be adapted to extract just about anything from an HTML document. And the item to be extracted could easily be an additional argument to the filter.