Skip to content

Instantly share code, notes, and snippets.

@frosit
Created October 3, 2016 13:28
Show Gist options
  • Save frosit/92e23480b0d86d50c833169b8c967e53 to your computer and use it in GitHub Desktop.
Save frosit/92e23480b0d86d50c833169b8c967e53 to your computer and use it in GitHub Desktop.
Quick reference for rewritetoolset

Commands for hypernode

Quick note on command options

Due to the earlier analysis phase of this set. There are some option embedded which are redundant to some commands. These options will show available, but won't do anything at some newer commands. They are mostly available to analysis and benchmark commands.

Sometimes redundant options

  • --save (sometimes generates a HTML report)
  • --share-statistics (disabled by default)
  • --log-statistics (mostly generates a JSON file on var/rewrite_tools/stats)
  • --store (mostly available)

Getting started real quick

1. Analysis

magerun rewrites:analysis:totals --store all

magerun rewrites:analysis:top --store all

2. Measuring (optional)

rewrites:benchmark:resolve-urls --store all

magerun rewrites:benchmark:indexer --limit 1

3. Heavy cleaning

Safe cleaning

rewrites:clean:disabled --store all

rewrites:clean:older-than 90

Risky cleaning

magerun rewrites:clean:yolo

4. Permanent fix

magerun rewrites:fix:products


more in depth information.

1. Analyse the problem

First we need some indication on how big the problem is. 2 commands are very suitable for that:

Get duplicate totals

Easier to immediately check for all stores

magerun rewrites:analysis:totals --store all

If you're hitting the 1 million of dupes, it's critical. +90% percentages are easily reached. @todo explain when to continue

Get top duplicated products

We can see which products or categories cause the problem by running the following command.

magerun rewrites:analysis:top --store all

If there are a couple of products or categories that cause most of the duplicates, it could be fixed easily and even manually fixing the url keys can be an option over using experimental and complex fix commands.

1-1 What is the impact on loading times

We can measure the impact of the problem by benchmarking. There are benchmark commands for indexing times, url resolvement times and site performance. The first 2 are most usable but may take a while if the problem is real.

URL resolve times

This basically generates a sitemap. Magento has the process all duplicates to find the actual ones, if your indexes are not up-to-date (probably not), this will take a while. If your hitting the millions of dupes, skip it. it takes too long.

rewrites:benchmark:resolve-urls --store all

note: this command also fixes some outdated indexes, the second time will finish almost instant due to up to date indexes.

Indexer times

The following command runs a full reindex of the catalog_url index and outputs the runtime. This one is set to 10 runs by default, but if you just want to know the time, set the limit to 1. That is enough information on how long it runs, and how money duplicates are created on each run.

magerun rewrites:benchmark:indexer --limit 1

2. Decide the next course of action

Depending on earlier results, we know the impact and scale of the problem. Another thing we need to know, is the store's state. Is it a new shop, high volume / traffic show, SEO scores etc. The hard part about fixing this problem is maintaining SEO score, while guaranteeing uptime's.

Ask yourself these questions:

Is it a problem to lose all SEO scores?

  • Yes: Start with building the whitelist
  • No: Lucky you

Is it a problem to go offline or respond really slow for a relative amount of time? (tens of minutes to maybe hours)

  • Yes: Continue with a test / development setup
  • No: Again, lucky you

Was there enough time to create a fresh can of coffee while running the analysis commands?

  • Yes: Start with some heavy cleaning
  • No: Go right up to fixing

Do some heavy cleaning

In some cases, the indexes are so clogged that it's barely workable. You probably have tons of rewrites which aren't used anyway. We have commands to clean those out.

Important note

These commands have a dry-run options (--dry-run). Use them to check the response first.

Disabled products and store views

Most stores have store views and or products which are disabled. We don't need those rewrites. Although they will be recreated if we don't fix them. Wiping them out makes the database somewhat more workable.

rewrites:clean:disabled --store all

Old rewrites

The clean:older-than command removes all rewrites which are older than x days from now. Most stores with big rewrite problems are building them up for several years. After a couple of months, the old ones shouldn't be indexed by google anymore. There's no mechanism to check for indexed URL's yet, but this is possible if this toolset proves worthy. The whitelist commands have options to whitelist url's by CSV (google analytics), access logs or the visitor log with another time x days option. They are not available to this clean:older-than command, yet.

Run the following command:

rewrites:clean:older-than days

where days is the number of days in the past you want to preserve rewrites for. The --store option is available, but i prefer to set it to --store all The --dry-run options works too.

Everything

There are some scenario's where a shop lost all their scores, income, and everything is pretty messed up and clogged etc. When there's nothing you need to preserve, it's time to start fresh. The clean yolo command removes every duplicate rewrite without checking other things like times, whitelists, disabled statuses or whatsoever. You still have to fix the duplicate keys but the store wont be blocking that much. You only live once right, cliches are there for the ones who have nothing to care about.

magerun rewrites:clean:yolo

This command has the --dry-run options available

Permanent fixing

The whitelist commands are not yet integrated with permanent fixing. therefore only 1 permanent fix command is available. But i's a general good fix.

magerun rewrites:fix:products

  • Start with the --dry-run option
  • Optionally specify a different suffix if urls seem out of shape --new-suffix

Building the whitelist (unfinished)

In this stage, it is required to keep your SEO scores. We can make this more safe by creating a whitelist of rewrite urls by adding URLs from different sources. The goal is to whitelist URL's that we're recently visited and are probably indexed by google. Sources are Google analytics ar any other website statistics tool or service (CSV), Magento's visitor log table and server acces logs.

note �Building whitelists is mostly finished, using them not yet. Testing the whitelist build-up would be appreciated. The implementation does not require a lot of time and will be done a.s.a.p.

Start by adding URL's from sources available to you.

Adding sources to the whitelist databases

We use JSON based database for this. Each command builds a seperate JSON database which later on will be processed into a master whitelist. This master whitelist is then used to backup rewrites or ignore the removal of rewrites.

From access logs

This command currently only works for hypernode. Parsing all those access logs requires quite a lot of memory. The command is aware of it's memory usage and will automatically trigger garbage collecting to lower it's memory usage. Problems could come up in high traffic stores.

magerun rewrites:log:parse --file="/var/log/nginx/access.log*"

From visitor logs

This command processer Magento's visitor log table into a whitelist database. Rewrites older than x days will be filtered out. By default this is set to 60 days.

magerun rewrites:url:visitor --max-age 90

From CSV

This is the best and most safe option of preserving Google SEO scores as CSV from Google analytics can be added to the whitelist. Specify a path to the CSV and a column to take URL's from.

magerun rewrites:url:csv --csv path/to/csv.csv --column urlcolumn

Building the whitelist

When all whitelist sources are converted to whitelist json databases, it is time to process these into one master whitelist. All sources are combined and cut up into segments. For example:

url : some_product_url_duplicatevalue.html segment : some_product_url

Each segment is then queried against the rewrites database and each result of this segment query is filtered on a max-age and matched back against the combined sources of whitelists. each match is then added to the master whitelist. This dramatically reduces the size of the master whitelist and load on the database. The extra filter makes sure there are no redundant rewrites added to the master list. If all sources are added correctly, this is a strong safeguard on maintaining valuable URL's. The master whitelist will be used to backup everything in the future so rewrites could be wiped and recreated after fixing the keys. Unfortunately, this is a work in progress. But testing the buildup is gladly appreciated.

This concept is a complex one, i'm happy to explain it in-dept.

Usage

magerun rewrites:url:whitelist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment