WordPress.org

Make WordPress Core

Opened 3 weeks ago

Closed 12 days ago

#48453 closed defect (bug) (fixed)

Regression: Implied contract between image sub-size filenames and their base filename now broken

Reported by: ianmjones Owned by: azaozz
Milestone: 5.3 Priority: normal
Severity: major Version: 5.3
Component: Media Keywords: 2nd-opinion has-patch dev-reviewed
Focuses: Cc:
PR Number:

Description

Since the introduction of big image scaling and automatic rotation of images in #47873 and further refined in #48304, the base file name for an image stored in the _wp_attached_file postmeta value may no longer directly relate to the filenames of image sub-sizes held in the _wp_attachment_metadata postmeta value's sizes array.

This now means the local URL for a thumbnail might be:

https://example.com/wp-content/uploads/2019/10/image-150x150.jpg

But it's base URL might be any of the following:

https://example.com/wp-content/uploads/2019/10/image-scaled-2560.jpg
https://example.com/wp-content/uploads/2019/10/image-rotated.jpg
https://example.com/wp-content/uploads/2019/10/image.jpg

That image-scaled-2560.jpg filename is dependent on a filterable threshold, and could change depending on themes or plugins that are installed and any criteria that might apply.

Previously there was an implied contract between the filenames for elements of the sizes array, they were the base file name with -WWWWxHHHH as the suffix depending on their width (WWWW, usually 3 to 4 numerics), and height (HHHH, usually 3 to 4 numerics).

This meant when a plugin such as the many many page builders and other custom content builders added an image to their content and did not use the standard functions to do so that generate an img tag with wp-image-NNN class name that could be used to find the attachment ID, you could at least parse the URL's path element, strip off the consistently formatted size information, and then find the attachment ID via its base path/filename.

This is no longer possible as the base file name for scaled or rotated images that corresponds to the sizes is now within the original_image meta field within the _wp_attachment_metadata value.

To find the attachment ID for a local image URL you could previously do a relatively fast SQL select based on the meta_key = '_wp_attached_file' and meta_value = '2019/10/image.jpg' for the above example.

Now you're going to have to concoct some UNION on the possible alternate file names, with some fudging for alternate thresholds, or a very expensive LIKE select on the meta_value of the _wp_attachment_metadata, or somehow capture the relationship between the _wp_attached_file value and the original_image in custom data (custom tables no doubt) so that there's some sort of fast lookup.

Proposal:

1) Instead of suffixing the base file name for images with either -scaled-NNNN or -rotated, leave it as per the "normal" sanitized version, but move the original file to a suffixed name such as image-original-version.jpg and store that in original_image. For clarity, it could be something like image-original-pre-scaling-2560.jpg or image-original-pre-rotation.jpg.

or

2) Ensure that the sub-sizes are generated with a matching file name. So if the "full" size is now "image-scaled-2560.jpg", there would be "image-scaled-2560-150x150.jpg" and "image-scaled-2560-300x200.jpg" etc.

As an aside, while testing I have noticed a couple of further problems related to the renaming of the base file...

1) When a large file is edited with the standard editor, the base file and sizes now have a unified image-scaled-2560-e123456789012 base, which is good. Except the twentytwenty-fullscreen size retained the original image-1980x990.jpg filename.

2) The change in filename during upload causes problems with the wp_unique_filename being handled consistently by plugins.

Attachments (1)

48453.diff (877 bytes) - added by azaozz 3 weeks ago.

Download all attachments as: .zip

Change History (19)

#1 follow-up: @azaozz
3 weeks ago

  • Keywords needs-patch 2nd-opinion added
  • Milestone changed from Awaiting Review to 5.3

Moving to 5.3 for consideration.

Yes, it's always been "problematic" to find the attachment ID from an image URL, and the new functionality in 5.3 makes it even harder. It is, as mentioned, not the proper way to do things, but it was a more or less viable fallback.

Been wondering if there should be a better image_url_to_attachment_id() function in core to help with these cases. Not having the expected wp-image-#### class name also means these images don't have srcset and sizes attributes, or at least not the values generated in core.

At first look at the proposed solutions thinking that moving/renaming of the originally uploaded file is the better one. This was also considered in the big image ticket. The argument against was that renaming the original image may introduce another (very small) possibility of errors while post-processing images, while saving another image is proven to work properly at that time. Also thinking that appending only -original or perhaps -wp-original to the file name should be sufficient. There are cases where it can reach length limits.

Any other thoughts/ideas?

This ticket was mentioned in Slack in #core by azaozz. View the logs.


3 weeks ago

#3 in reply to: ↑ 2 ; follow-ups: @Iulia Cazan
3 weeks ago

Replying to slackbot:

This ticket was mentioned in Slack in #core by azaozz. View the logs.

From my perspective, moving the "original" would only complicate things, and that is because currently (in 5.3-RC2 also) there is no information recorded in the database of it, no file path, no width/height, just nothing, and we rely on the directory mentioned in the metadata for the full size to identify it (or at least I could not find another way to do this). It will just become a nightmare to delete it or just use it to generate files as a fallback when something went wrong with the full-size.

From all my tests, the issue with the full-size naming happens only when the threshold kicks in, and the sub-sizes are not generated from the new full size generated, but from the original itself.

For example, this is quite odd for big files, let's say the uploaded file is image.jpeg of 6MB, this is then generating with the new threshold filter a full-size called image-2560.jpeg of 1MB (if the script does not fail). The sub-sizes are then generated from the 6MB file, processing which is again prone to script fails not to mention it will take a lot of time.

Why aren't the sub-sizes generated from the new full-size which is smaller and also the quality high enough as far as I could test?

Also, what is the point to keep the "original" when the threshold was applied? As far as I understand the process image-2560.jpeg can be easily renamed back after the real original file and the real original removed (this is never used as far as I can see in the code). If the code would do this switch, then I do not see a problem anymore for the naming of the sub-sizes, and also, the sub-sizes will generate faster, from a smaller file.

Did I get this right?

Last edited 3 weeks ago by Iulia Cazan (previous) (diff)

#4 in reply to: ↑ 1 @ianmjones
3 weeks ago

Replying to azaozz:

Been wondering if there should be a better image_url_to_attachment_id() function in core to help with these cases.

That would be awesome! 👍

At first look at the proposed solutions thinking that moving/renaming of the originally uploaded file is the better one.

That's my instinct too.

I expect end users would be less surprised by the filename they see everywhere being the sanitized version of what they uploaded, just as it is in WP5.2.

We also see a lot of feedback in our support related to the importance of image file names being as uploaded, people get a bit twitchy if their carefully crafted SEO friendly file names get changed. So tacking on -scaled-2560 or -rotated might be unwelcome.

Seeing as the original_image isn't expected to be used much, if at all, then I don't see much downside to that file being renamed rather than the base file used across WP.

If you know the attachment ID you can therefore easily get the _wp_attachment_metadata and check whether there is an original_image to optionally process.

And because _wp_attached_file has the relative path to uploads that gives the location of the plain file names in sizes and original_image, it's easy to locate them with dirname.

The argument against was that renaming the original image may introduce another (very small) possibility of errors while post-processing images, while saving another image is proven to work properly at that time.

I guess this is the only real concern with renaming the original rather than full/scaled image.

Also thinking that appending only -original or perhaps -wp-original to the file name should be sufficient. There are cases where it can reach length limits.

This concern with file name length possibly hitting a limit raises another good reason for renaming the original and not the scaled image.

If a large image.jpg is uploaded with the current RC2 method, and that image is not only renamed for uniqueness but then edited, we could have something like the following...

full: image-1-scaled-2560-e1572283267932.jpg
large: image-1-scaled-2560-e1572283267932-1024x768.jpg
original_image: image-1.jpg

For that large version, we're looking at an extra 38 characters at least.

However, if instead the original gets renamed we have...

full: image-1-e1572283267932.jpg
large: image-1-e1572283267932-1024x768.jpg
original_image: image-1-wp-original.jpg

That means large is 12 characters less in length, and original_image has still gained far less characters than large gained for the edit and size suffixes and so must be within bounds.

#5 in reply to: ↑ 3 @ianmjones
3 weeks ago

Replying to Iulia Cazan:

Did I get this right?

I was a bit confused by your comment, it didn't quite fit with what I'm seeing in RC2. But I hope my previous comment in reply to @azaozz helps, particularly with how easy it is to find the original_image if you already know the attachment ID.

I personally can't speak to your comments regarding the quality difference between using the original_image verses a scaled version for resizing, but I suspect someone in the know can help, and will likely start with "It depends ..."! 😉

#6 follow-up: @chrisvanpatten
3 weeks ago

I'm finding it difficult to articulate my exact thoughts, but I am increasingly concerned about these changes drifting further and further from the original construction of the feature as we get so late into the RC cycle.

Further, I think it's important to clarify that the "implied contract" is not only implied, it is assumed, and could be argued that this has always been a case of "doing it wrong". While I'm sympathetic to the problems with page builders and third party plugins, I'd actually argue that as image management and subsize generation gets more complex, there is a clear case to firmly break from this implied contract, and reinforce that WordPress should be fully trusted to handle filenames, and that third party plugins should stop assuming that image URLs are a constant.

#7 in reply to: ↑ 6 @Iulia Cazan
3 weeks ago

Replying to chrisvanpatten:

and that third party plugins should stop assuming that image URLs are a constant.

I am not sure that the third-party plugins are a problem in this discussion. I think that the discussed changes in the naming of files are affecting a lot of areas of WordPress, SEO, etc., that's why it is concerning.

#8 @chrisvanpatten
3 weeks ago

I am not sure that the third-party plugins are a problem in this discussion. I think that the discussed changes in the naming of files are affecting a lot of areas of WordPress, SEO, etc., that's why it is concerning.

The entire point of the original ticket is about the compatibility problems experienced by third party plugins.

I don't think SEO should at all be a concern here. WordPress has long appended things to non-original image URLs, and that practice is not changing. If you insert a "medium" or "large" sized image (as most likely do because loading a "full" sized image directly in a block post is going to dramatically slow down your page), it will now — as ever — append dimensions. I don't see the addition of -scaled-2560 being materially different from how it works today with the 1234x1234 style suffixes.

#9 in reply to: ↑ 3 @azaozz
3 weeks ago

Replying to Iulia Cazan:

From my perspective, moving the "original" would only complicate things,

Yes, think so too. Another good/important reason is that WordPress always tries its best to keep content added by users unchanged. This includes uploaded files.

Also, what is the point to keep the "original" when the threshold was applied?

Same reason as above. Also it lets WP create better quality image sub-sizes in the future and also lets it be used as "media storage".

#10 @azaozz
3 weeks ago

After quite a bit of testing thinking that the problem reported here would be fixed best by updating/extending the DB query when looking at _wp_attached_file meta values.

Trying to guess the attachment ID from an image URL has always been a hit-and-miss process. Ideally there should be no need for that, but... some themes and plugins are not using the WP functions and "doing-it-wrong". Adding a (somewhat standardized) function to handle this in core seems to make most sense (yeah, it means that yet again core will be fixing the "doing-it-wrong" practices in themes and plugins...).

For an image named landscape.jpg currently the value in _wp_attached_file may be landscape.jpg, landscape-scaled-####.jpg or landscape-rotated.jpg. Thinking that removing the variable -#### bit from the file name for scaled images, and updating the DB query to do

WHERE ( meta_value='landscape.jpg' OR meta_value='landscape-scaled.jpg' OR meta_value='landscape-rotated.jpg' )

would be the most straightforward fix here.

This is fully backwards compatible and the DB query will be almost as fast as before (assuming it has LIMIT 1).

Last edited 3 weeks ago by azaozz (previous) (diff)

#11 follow-up: @chrisvanpatten
3 weeks ago

@azaozz What’s the context where this would be used in core? I’m concerned because we’ve had a ton of problems with performance on plugins that attempt to do these sorts of “URL to ID” lookups, and had to either disable the behaviour or find different plugins.

Admittedly as a large media brand we are an edge case (~15 million postmeta rows and up on our largest sites) but these types of queries always seemed to be super problematic and introduced all kinds of performance hits on uncached requests (primarily in REST API calls).

Last edited 3 weeks ago by chrisvanpatten (previous) (diff)

#12 in reply to: ↑ 11 @azaozz
3 weeks ago

Replying to chrisvanpatten:

What’s the context where this would be used in core?

This is not used in core, and hopefully never will be. But there are plugins that need it, so they implement similar code/solutions. Was thinking that having something like that in core will "standardize" the way it's done and let core fix it when needed, like in this case.

Of course this is still just an idea :) Ideally all themes and plugins would fix their code and make these cases non-existent.

At the same time if core can change something to help plugins work better, especially when adding new functionality, I'm thinking it would be good to do the change. As far as I see the only change needed in this case is to remove the variable number form scaled image file names. That would still make these file names easily recognizable, and as this is new feature, it has no back-compat issues.

@azaozz
3 weeks ago

#13 @azaozz
3 weeks ago

In 48453.diff: remove the -#### part from scaled image file names.

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


12 days ago

#15 @SergeyBiryukov
12 days ago

  • Keywords has-patch dev-reviewed added; needs-patch removed

48453.diff looks good to me.

#16 @azaozz
12 days ago

  • Owner set to azaozz
  • Resolution set to fixed
  • Status changed from new to closed

In 46658:

Media: Remove the variable number (from the big image threshold value) when generating file names for scaled images. This makes it easier to "calculate" the full size file name from the name of an intermediate size image.

Props ianmjones, azaozz.
Fixes #48453 for trunk.

#17 @azaozz
12 days ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopen for merging to the 5.3 branch.

#18 @azaozz
12 days ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 46659:

Media: Remove the variable number (from the big image threshold value) when generating file names for scaled images. This makes it easier to "calculate" the full size file name from the name of an intermediate size image.

Props ianmjones, azaozz.
Merges [46658] to the 5.3 branch.
Fixes #48453.

Note: See TracTickets for help on using tickets.