Removing Unused Media from WordPress


A few days ago I was on Reddit when I stumbled on a question asking how to remove unused media from WordPress without using any plugins. Below is a slightly cleaned up repost of my CLI solution.

WordPress has a somewhat "fire-and-forget" approach to media, where images removed from Media Library aren't removed from wp-content/uploads. If you're managing a site where users modify a lot of content and media, then in a few months your uploads folder will get cluttered. Severity of this clutter and its impact will depend on the amount of users and size of files they upload.

Why would you use CLI if there's already plugins that take care of this? I got to admit, party I like the challenge. However, it might be useful if you manage several wordpress installations and want to clean them up quickly. Installing something like Media Cleaner on each site just so you can run it once can be daunting. This solution is run on the server directly, meaning you don't have to log into each site's Dashboard to do this. Also, all these commands are easily scripted and lightweight.

Basically, you have to generate a list of current files from wp-content/uploads directory and then compare it to what media is used in the database posts table. Then you delete files that show up in uploads but not in the database. Don't neglect to backup your database and files before you try any of this.

Get a List of Images Stored on the Server

For this example WordPress is installed in /var/www/wordpress, your path will likely be different, so keep that in mind. First, generate a list of files present using find and egrep.

$ cd /var/www/wordpress/wp-content/uploads
$ find . -type f | egrep "\.(jpg|png|gif)$" | egrep -v "[-][0-9]{1,4}[x][0-9]{1,4}" > media_available.txt"

First regular expression matches common image file extensions. You can add more extentions if necessary, or remove this entire egrep command to look for all file types. Second regular expression filters out the auto generated thumbnail files, which usually end up looking something like filename-150x150.jpg.

Get a List of Images Used on the Website

Query the database for what images are actually used with mysql.

$ mysql -u wp_db_username -p
mysql> use wp_db_name;
mysql> SELECT guid FROM `wp_posts` WHERE post_type = 'attachment' AND post_mime_type LIKE 'image%' INTO OUTFILE '/tmp/media_used.txt';
mysql> exit;    
$ sudo mv /tmp/media_used.txt .

Caveats:

  1. If you manually linked images from wp-content/uploads with html inside your posts or pages, instead of using the Media Library, then this will not count them.
  2. If you're using a security plugin that modifies table prefixes, then you should look up the prefix that is used and substitute it instead of wp_. Use show tables; to see what prefix is used for the tables.

Clean Up Generated Files

Clean up the generated files a bit with sed to remove the the unnecessary full path.

$ sed -i -e "s|/var/www/wordpress/wp-content/uploads||" media_available.txt
$ sed -i -e "s|http://your-domain.com/wordpress/wp-content/uploads||" media_used.txt

The "|" character is used as a delimeter in the substitute expression because the standard "/" interferes with path format. For reference, default is: s/old/new/. You should test your sed command swithout the -i flag at first because using it overwrites the file in place.

Generate List of Orphaned Media

Now you should have two files which you can compare with comm and generate a list of orphaned media:

$ comm --nocheck-order -23 media_available.txt media_used.txt > media_orphaned.txt

Removed Orphaned Media

At this point you don't need media_available.txt and media_used.txt, so you can remove them. Now we need to modify the media_orphaned.txt file so it includes the auto generated thumbnails. Easiest way to do this is to insert an asterisk right before the file extension.

$ rm media_available.txt media_used.txt
$ sed -e "s|\.|\*\.|" media_orphaned.txt

Finally, you can use xargs and rm to go though the orphaned file list and delete those files.

$ xargs rm -i < media_orphaned.txt
$ rm media_orphaned.txt