A few days ago I was on Reddit when I stumbled on a question asking how to remove unused media from WordPress without using any plugins. Below is a slightly cleaned up repost of my CLI solution.
WordPress has a somewhat "fire-and-forget" approach to media, where images removed from Media Library aren't removed from wp-content/uploads. If you're managing a site where users modify a lot of content and media, then in a few months your uploads folder will get cluttered. Severity of this clutter and its impact will depend on the amount of users and size of files they upload.
Why would you use CLI if there's already plugins that take care of this? I got to admit, party I like the challenge. However, it might be useful if you manage several wordpress installations and want to clean them up quickly. Installing something like Media Cleaner on each site just so you can run it once can be daunting. This solution is run on the server directly, meaning you don't have to log into each site's Dashboard to do this. Also, all these commands are easily scripted and lightweight.
Basically, you have to generate a list of current files from wp-content/uploads directory and then compare it to what media is used in the database posts table. Then you delete files that show up in uploads but not in the database. Don't neglect to backup your database and files before you try any of this.
For this example WordPress is installed in /var/www/wordpress, your path will likely be different, so keep that in mind. First, generate a list of files present using find
and egrep
.
$ cd /var/www/wordpress/wp-content/uploads $ find . -type f | egrep "\.(jpg|png|gif)$" | egrep -v "[-][0-9]{1,4}[x][0-9]{1,4}" > media_available.txt"
First regular expression matches common image file extensions. You can add more extentions if necessary, or remove this entire egrep
command to look for all file types. Second regular expression filters out the auto generated thumbnail files, which usually end up looking something like filename-150x150.jpg.
Query the database for what images are actually used with mysql
.
$ mysql -u wp_db_username -p mysql> use wp_db_name; mysql> SELECT guid FROM `wp_posts` WHERE post_type = 'attachment' AND post_mime_type LIKE 'image%' INTO OUTFILE '/tmp/media_used.txt'; mysql> exit; $ sudo mv /tmp/media_used.txt .
Caveats:
show tables;
to see what prefix is used for the tables.Clean up the generated files a bit with sed
to remove the the unnecessary full path.
$ sed -i -e "s|/var/www/wordpress/wp-content/uploads||" media_available.txt $ sed -i -e "s|http://your-domain.com/wordpress/wp-content/uploads||" media_used.txt
The "|" character is used as a delimeter in the substitute expression because the standard "/" interferes with path format. For reference, default is: s/old/new/. You should test your sed command swithout the -i flag at first because using it overwrites the file in place.
Now you should have two files which you can compare with comm
and generate a list of orphaned media:
$ comm --nocheck-order -23 media_available.txt media_used.txt > media_orphaned.txt
At this point you don't need media_available.txt and media_used.txt, so you can remove them. Now we need to modify the media_orphaned.txt file so it includes the auto generated thumbnails. Easiest way to do this is to insert an asterisk right before the file extension.
$ rm media_available.txt media_used.txt $ sed -e "s|\.|\*\.|" media_orphaned.txt
Finally, you can use xargs
and rm
to go though the orphaned file list and delete those files.
$ xargs rm -i < media_orphaned.txt $ rm media_orphaned.txt