Quick note on command options
Due to the earlier analysis phase of this set. There are some option embedded which are redundant to some commands. These options will show available, but won't do anything at some newer commands. They are mostly available to analysis and benchmark commands.
Sometimes redundant options
- --save (sometimes generates a HTML report)
- --share-statistics (disabled by default)
- --log-statistics (mostly generates a JSON file on var/rewrite_tools/stats)
- --store (mostly available)
magerun rewrites:analysis:totals --store all
magerun rewrites:analysis:top --store all
rewrites:benchmark:resolve-urls --store all
magerun rewrites:benchmark:indexer --limit 1
Safe cleaning
rewrites:clean:disabled --store all
rewrites:clean:older-than 90
Risky cleaning
magerun rewrites:clean:yolo
magerun rewrites:fix:products
more in depth information.
First we need some indication on how big the problem is. 2 commands are very suitable for that:
Get duplicate totals
Easier to immediately check for all stores
magerun rewrites:analysis:totals --store all
If you're hitting the 1 million of dupes, it's critical. +90% percentages are easily reached. @todo explain when to continue
Get top duplicated products
We can see which products or categories cause the problem by running the following command.
magerun rewrites:analysis:top --store all
If there are a couple of products or categories that cause most of the duplicates, it could be fixed easily and even manually fixing the url keys can be an option over using experimental and complex fix commands.
We can measure the impact of the problem by benchmarking. There are benchmark commands for indexing times, url resolvement times and site performance. The first 2 are most usable but may take a while if the problem is real.
URL resolve times
This basically generates a sitemap. Magento has the process all duplicates to find the actual ones, if your indexes are not up-to-date (probably not), this will take a while. If your hitting the millions of dupes, skip it. it takes too long.
rewrites:benchmark:resolve-urls --store all
note: this command also fixes some outdated indexes, the second time will finish almost instant due to up to date indexes.
Indexer times
The following command runs a full reindex of the catalog_url index and outputs the runtime. This one is set to 10 runs by default, but if you just want to know the time, set the limit to 1. That is enough information on how long it runs, and how money duplicates are created on each run.
magerun rewrites:benchmark:indexer --limit 1
Depending on earlier results, we know the impact and scale of the problem. Another thing we need to know, is the store's state. Is it a new shop, high volume / traffic show, SEO scores etc. The hard part about fixing this problem is maintaining SEO score, while guaranteeing uptime's.
Ask yourself these questions:
Is it a problem to lose all SEO scores?
- Yes: Start with building the whitelist
- No: Lucky you
Is it a problem to go offline or respond really slow for a relative amount of time? (tens of minutes to maybe hours)
- Yes: Continue with a test / development setup
- No: Again, lucky you
Was there enough time to create a fresh can of coffee while running the analysis commands?
- Yes: Start with some heavy cleaning
- No: Go right up to fixing
In some cases, the indexes are so clogged that it's barely workable. You probably have tons of rewrites which aren't used anyway. We have commands to clean those out.
Important note
These commands have a dry-run options (--dry-run). Use them to check the response first.
Disabled products and store views
Most stores have store views and or products which are disabled. We don't need those rewrites. Although they will be recreated if we don't fix them. Wiping them out makes the database somewhat more workable.
rewrites:clean:disabled --store all
Old rewrites
The clean:older-than command removes all rewrites which are older than x days from now. Most stores with big rewrite problems are building them up for several years. After a couple of months, the old ones shouldn't be indexed by google anymore. There's no mechanism to check for indexed URL's yet, but this is possible if this toolset proves worthy. The whitelist commands have options to whitelist url's by CSV (google analytics), access logs or the visitor log with another time x days option. They are not available to this clean:older-than command, yet.
Run the following command:
rewrites:clean:older-than days
where days is the number of days in the past you want to preserve rewrites for. The --store option is available, but i prefer to set it to --store all The --dry-run options works too.
Everything
There are some scenario's where a shop lost all their scores, income, and everything is pretty messed up and clogged etc. When there's nothing you need to preserve, it's time to start fresh. The clean yolo command removes every duplicate rewrite without checking other things like times, whitelists, disabled statuses or whatsoever. You still have to fix the duplicate keys but the store wont be blocking that much. You only live once right, cliches are there for the ones who have nothing to care about.
magerun rewrites:clean:yolo
This command has the --dry-run options available
The whitelist commands are not yet integrated with permanent fixing. therefore only 1 permanent fix command is available. But i's a general good fix.
magerun rewrites:fix:products
- Start with the
--dry-run
option - Optionally specify a different suffix if urls seem out of shape
--new-suffix
In this stage, it is required to keep your SEO scores. We can make this more safe by creating a whitelist of rewrite urls by adding URLs from different sources. The goal is to whitelist URL's that we're recently visited and are probably indexed by google. Sources are Google analytics ar any other website statistics tool or service (CSV), Magento's visitor log table and server acces logs.
note �Building whitelists is mostly finished, using them not yet. Testing the whitelist build-up would be appreciated. The implementation does not require a lot of time and will be done a.s.a.p.
Start by adding URL's from sources available to you.
We use JSON based database for this. Each command builds a seperate JSON database which later on will be processed into a master whitelist. This master whitelist is then used to backup rewrites or ignore the removal of rewrites.
This command currently only works for hypernode. Parsing all those access logs requires quite a lot of memory. The command is aware of it's memory usage and will automatically trigger garbage collecting to lower it's memory usage. Problems could come up in high traffic stores.
magerun rewrites:log:parse --file="/var/log/nginx/access.log*"
This command processer Magento's visitor log table into a whitelist database. Rewrites older than x days will be filtered out. By default this is set to 60 days.
magerun rewrites:url:visitor --max-age 90
This is the best and most safe option of preserving Google SEO scores as CSV from Google analytics can be added to the whitelist. Specify a path to the CSV and a column to take URL's from.
magerun rewrites:url:csv --csv path/to/csv.csv --column urlcolumn
When all whitelist sources are converted to whitelist json databases, it is time to process these into one master whitelist. All sources are combined and cut up into segments. For example:
url : some_product_url_duplicatevalue.html segment : some_product_url
Each segment is then queried against the rewrites database and each result of this segment query is filtered on a max-age and matched back against the combined sources of whitelists. each match is then added to the master whitelist. This dramatically reduces the size of the master whitelist and load on the database. The extra filter makes sure there are no redundant rewrites added to the master list. If all sources are added correctly, this is a strong safeguard on maintaining valuable URL's. The master whitelist will be used to backup everything in the future so rewrites could be wiped and recreated after fixing the keys. Unfortunately, this is a work in progress. But testing the buildup is gladly appreciated.
This concept is a complex one, i'm happy to explain it in-dept.
Usage
magerun rewrites:url:whitelist