-
-
Save kennym/1115810 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby | |
# | |
# Convert blogger (blogspot) posts to jekyll posts | |
# | |
# Basic Usage | |
# ----------- | |
# | |
# ./blogger_to_jekyll.rb feed_url | |
# | |
# where `feed_url` can have the following format: | |
# | |
# http://{your_blog_name}.blogspot.com/feeds/posts/default | |
# | |
# Requirements | |
# ------------ | |
# | |
# * feedzirra: https://github.com/pauldix/feedzirra | |
# | |
# Notes | |
# ----- | |
# | |
# * Make sure Blogger shows full output of article in feeds. | |
# * Commenting on migrated articles will be set to false by default. | |
include Config | |
require 'rubygems' if CONFIG['host_os'].start_with? "darwin" | |
require 'feedzirra' | |
require 'date' | |
require 'optparse' | |
def parse_post_entries(feed, verbose) | |
posts = [] | |
feed.entries.each do |post| | |
obj = Hash.new | |
created_datetime = post.updated | |
creation_date = Date.parse(created_datetime.to_s) | |
title = post.title | |
file_name = creation_date.to_s + "-" + title.split(/ */).join("-").delete('\/') + ".html" | |
content = post.content | |
obj["file_name"] = file_name | |
obj["title"] = title | |
obj["creation_datetime"] = created_datetime | |
obj["updated_datetime"] = post.updated | |
obj["content"] = content | |
obj["categories"] = post.categories.join(" ") | |
posts.push(obj) | |
end | |
return posts | |
end | |
def write_posts(posts, verbose) | |
Dir.mkdir("_posts") unless File.directory?("_posts") | |
total = posts.length, i = 1 | |
posts.each do |post| | |
file_name = "_posts/".concat(post["file_name"]) | |
header = %{--- | |
layout: post | |
title: #{post["title"]} | |
date: #{post["creation_datetime"]} | |
updated: #{post["updated_datetime"]} | |
comments: false | |
categories: #{post["categories"]} | |
--- | |
} | |
File.open(file_name, "w+") {|f| | |
f.write(header) | |
f.write(post["content"]) | |
f.close | |
} | |
if verbose | |
puts " [#{i}/#{total[0]}] Written post #{file_name}" | |
i += 1 | |
end | |
end | |
end | |
def main | |
options = {} | |
opt_parser = OptionParser.new do |opt| | |
opt.banner = "Usage: ./blogger_to_jekyll.rb FEED_URL [OPTIONS]" | |
opt.separator "" | |
opt.separator "Options" | |
opt.on("-v", "--verbose", "Print out all.") do | |
options[:verbose] = true | |
end | |
end | |
opt_parser.parse! | |
if ARGV[0] | |
feed_url = ARGV.first | |
else | |
puts opt_parser | |
exit() | |
end | |
puts "Fetching feed #{feed_url}..." | |
feed = Feedzirra::Feed.fetch_and_parse(feed_url) | |
puts "Parsing feed..." | |
posts = parse_post_entries(feed, options[:verbose]) | |
puts "Writing posts to _posts/..." | |
write_posts(posts, options[:verbose]) | |
puts "Done!" | |
end | |
main() |
You might want to store the updated time in YAML front matter btw. Probably doesn't hurt.
You lost me. YAML?
See this. It's just that little snippet in front of a page.
I updated my Gist to contain the change mentioned above. Feel free to merge.
Merged. :)
Thanks for this gist. Works
After changing Config to RbConfig it worked. Thanks!
FYI, if you'd also like to migrate the comments: http://blog.coolaj86.com/articles/migrate-from-blogger-to-ruhoh-with-proper-redirects.html
This requires changing your template (explained in the walkthrough) and exporting a backup of your blog (also explained in the walkthrough)
Thanks for this script! Everything worked well, except the handling of colon characters in the title. They make Jekyll fall over and die, for some reason. Relevant: jekyll/jekyll#549
thanks:)
Hi Kennym,
Worked for me. Works like a charm. But I had some trouble because the feedzirra module is now renamed to feedjira.
You need to update the script to show that. I did the same and I could do the import then.
I have one question though: I lost the comments I had on blogger in the process. How do I migrate the comments?
+1 for feedjira
thanks for script!
when I rename feedzirra to feedjira, it work.
but feeds/posts/default option parse only some part of all my posts.
so, I change feeds/posts/default to feeds/posts/default?max-results=100 and it parse all my post.
I link about parsing all post.
http://too-clever-by-half.blogspot.kr/2011/12/blog-feed-500-post-limit-for-more-than.html
I´m getting blogspot_to_jekyll.rb:25:in
blogspot_to_jekyll.rb:27:in
<main>': uninitialized constant CONFIG (NameError)
after rename it to feedjira, why?
@danielgomezrico I got the same problem and solved it on my fork @kennym if you want, you can update yours from my code 😄 👍
Hi kennym,
Great code. Worked for me on Yosemite with some minor changes.
I removed the deprecated CONFIG call. I think rubygems is now required for El Capitan anyway.
Feedzirra is now called feedjira, so I made the appropriate changes in the code.
After these two minor changes, the code worked perfectly 10.11.3
Feel free to do a pull and merge. In my commit message, i inadvertently stated I was updated for Yosemite. This is my first fork, edit, and push of code on Git, and my first time working in Ruby.
Even I need help to migrate my blog 'https://shindesavita87.blogspot.co.uk' from blogspot to GitHub blog. Can you suggest me is it doable and if yes and how can we do that?
main': undefined method
fetch_and_parse' for Feedjira::Feed:Class (NoMethodError)
@yuceltoluyag - I'm pretty sure this script might need some updates after 9 years :-D
@yuceltoluyag - I'm pretty sure this script might need some updates after 9 years :-D
https://stackoverflow.com/questions/37371947/importing-my-blogger-blog-into-jekyll solved my problem =) ty for answer ;)
I am getting this error <internal:C:/Ruby30-x64/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in
require': cannot load such file -- feedzirra`
If anyone looking new changes. This is my fork https://gist.github.com/RobbiNespu/372571e4d271122ece3ee3a1830b4d26
If anyone looking new changes. This is my fork https://gist.github.com/RobbiNespu/372571e4d271122ece3ee3a1830b4d26
Thanks for the update. I have updated the code to fix few more errors which I have encountered.
My Fork: https://gist.github.com/prabathbr/0bb416b2dee7ed18d2a6fd3d8dd4b021
Updates:
- added <require 'httparty'>
- created "setup.sh" which will make a ruby environment to run the script in Ubuntu 20.04 LTS
- fixed script error[Invalid argument @ rb_sysopen --- post name ---- (Errno::EINVAL)] when running on posts with invalid post names with ":" & "*"
Hi, one more thing... I noticed it is better to use "created_datetime = post.published" instead of "created_datetime = post.updated". Might want to change that in your gist. You might want to store the updated time in YAML front matter btw. Probably doesn't hurt.