Opened 3 years ago
Last modified 17 months ago
#40228 reopened enhancement
Use get_sites in get_blog_details
Reported by: | spacedmonkey | Owned by: | flixos90 |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Networks and Sites | Keywords: | has-patch has-unit-tests ms-roadmap |
Focuses: | multisite | Cc: |
Description
Currently get_blog_details performs raw sql queries to get data out of the blogs tables. This data is then cached. However how the caching is done in the function is horrible. It generates three different caches that store the whole wp_site object in cache.
wp_cache_delete( $blog_id , 'blog-details' );
wp_cache_delete( $blog_id . 'short' , 'blog-details' );
wp_cache_delete( $domain_path_key, 'blog-lookup' );
This function should be refactor to use get_sites and relay on the caching built into wp_site_query
Attachments (6)
Change History (52)
#2
@
3 years ago
- Owner set to flixos90
- Status changed from new to assigned
Thanks for the code snippet @spacedmonkey!
I think we should rather direct our efforts to get_site_by()
(#40180) and then wrap that function with get_blog_details()
. See https://core.trac.wordpress.org/ticket/40064#comment:19
I'll probably work on a patch later that takes your above proposal into account.
#3
@
3 years ago
I like the idea of get_site_by however it is not a full replacement for all the params that get_blog_details. get_term_by only supports getting a term by one field, such as slug. However get_blog_details allows an array to passed and lookup is done by multiple fields. This array maps pretty well to the args of get_sites.
Looking at this code again, I might skip get_sitrs if a number is passed and just use get_site.
#4
@
3 years ago
I think we should also support an array in get_site_by()
. We can definitely wrap it in get_blog_details()
. I have written a patch for that, which I'll upload in a bit.
#5
@
3 years ago
Here is a much less tidy version of the function but does things a much more quicker way. If you pass nothing or a number, just use get_site. This means that we don't need to spin up a while wp_site_query, which will use a lot lesss memory.
function get_blog_details( $fields = null, $get_all = true ) {
$args = array();
$details = null;
if ( is_numeric( $fields ) ){
$details = get_site( $fields );
} elseif ( is_array( $fields ) ) {
if ( isset($fields['blog_id']) ) {
$args['site__in'][] = $fields['blog_id'];
}
if ( isset($fields['domain']) ){
$domains = array( $fields['domain'] );
if ( substr( $fields['domain'], 0, 4 ) == 'www.' ) {
$nowww = substr( $fields['domain'], 4 );
$domains[] = $nowww;
}
$args['domain__in'] = $domains;
}
if( isset($fields['path']) ) {
$args['path'] = $fields['path'];
}
} elseif ( $fields && is_string( $fields ) ) {
$current_network = get_network();
$slug = trim( $fields, '/' );
if ( is_subdomain_install() ) {
$domain = $slug . '.' . preg_replace( '|^www\.|', '', $current_network->domain );
$path = $current_network->path;
} else {
$domain = $current_network->domain;
$path = $current_network->path . $slug . '/';
}
$args['path'] = $path;
$args['domain'] = $domain;
} else {
$details = get_site();
}
if( null === $details ){
if( empty( $args ) ){
return false;
}
$args['number'] = 1;
$sites = get_sites( $args );
if( empty( $sites ) ){
return false;
}
$details = array_shift( $sites );
}
$blog_id = $details->blog_id;
if ( $get_all ) {
switch_to_blog( $blog_id );
$details->blogname = get_option( 'blogname' );
$details->siteurl = get_option( 'siteurl' );
$details->post_count = get_option( 'post_count' );
$details->home = get_option( 'home' );
restore_current_blog();
/**
* Filters a blog's details.
*
* @since MU
* @deprecated 4.7.0 Use site_details
*
* @param object $details The blog details.
*/
$details = apply_filters_deprecated( 'blog_details', array( $details ), '4.7.0', 'site_details' );
}
return $details;
}
#6
@
3 years ago
40228-tests.diff are two additional unit tests that verify the behavior described by @stephdau in https://core.trac.wordpress.org/ticket/40180#comment:3 works as expected.
#8
@
3 years ago
- Keywords has-patch has-unit-tests dev-feedback added
- Milestone changed from Awaiting Review to 4.8
- Version 3.0 deleted
40228.diff shows a possible implementation of get_blog_details()
using get_site_by()
(see #40180). Since the patch relies on that function, it is required that you apply the patch on #40180 as well in order to test.
All regular get_blog_details()
tests pass after the change, however I had to adjust / partly remove some tests related to the blog-details
and blog-lookup
caches, since these now don't exist any longer. There are two tests failing related to the site-details
cache, however that is due to a bug in setting that cache, for which I'll open a ticket shortly.
The code in get_blog_details()
has obviously been drastically reduced, and by using get_site_by()
the patch fulfills the original requirement of using the more efficient get_sites()
/ get_site()
functions. Admittedly there is one really weird line in the new code ($site->detail = $site->detail;
), but it actually does what it is supposed to do (PHP magic method weirdness) - however this code is one of the reasons that I think we should recommend using get_site_by()
over get_blog_details()
going forward (see more below).
I think this approach makes more sense than refactoring get_blog_details()
itself. get_site_by()
is more explicit in its usage and does not explicitly set the "site details" on the object (which is a bad practice since they're now lazy-loaded, but we have to keep it in get_blog_details()
for BC. We could use get_site_by()
in other core functions, like get_id_from_blogname()
for example. These are some of the reasons why I think get_site_by()
is viable to have in core although it serves a rather similar use-case like get_blog_details()
. Let's not deprecate get_blog_details()
, but let's provide a better alternative for new development going forward.
This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#17
@
3 years ago
- Milestone changed from 4.8 to Future Release
Let's get back to this soon, once we figured out what to do about #40180 (introduce a new function and use it vs adjusting get_blog_details()
directly).
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.
3 years ago
#20
@
3 years ago
- Keywords ms-roadmap added
- Milestone changed from Future Release to 4.9
Per today's office hours, it was decided to introduce the new function and use it to improve get_blog_details()
.
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#23
@
3 years ago
40228.2.diff is a reworked version of get_blog_details()
to work with the latest patch in #40180.
The removal of all the custom logic in get_blog_details()
causes the cache groups blog-details
and blog-lookup
to no longer be used, so we can probably remove them completely within the next releases. Therefore these groups have been removed from related unit tests and replaced with the newer cache groups.
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#26
@
3 years ago
40228.2.diff is looking good. It applies cleanly and tests pass after [41698]. I'll spend some more time with it tomorrow morning.
I do think we need to keep the deprecated blog_details
filter as part of get_blog_details()
. Its deprecation is a recommendation to use another filter, but we'd be breaking back-compat by removing it.
This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.
3 years ago
#28
@
3 years ago
40228.3.diff adds in the missing documentation for the old blog_details
filter and adjusts documentation for the back-compat code to be more verbose.
#30
follow-up:
↓ 32
@
3 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
These changes look like they introduce sever regressions in the actual caching of data for calls to get_blog_details
as far as I can tell from a quick review.
The issue is that by switching to WP_Site_Query
and using get_sites() as our source of caching to avoid queries we effectively remove most of the caching behind get_blog_details
.
Previously the cached data for a blog would only been invalidated if there were changes to that blogs details or they were explicitly cleared.
Now we have caching into WP_Site_Query
which is invalidated everytime we change the last_changed
value for sites
.
We change the last_changed
value for sites everytime:
- A new site is created
clean_blog_cache
is called (akarefresh_blog_details
is called)
So invalidating the data for one site invalidates them all, which on any multisite install with a reasonable number of sites makes the caching pointless - every time update_blog_status
is called because a site has new content all the caches are invalidated.
We also have added a significant amount of extra code which has to be run to fetch the info about a single site by calling get_site_by
and then get_sites
and then WP_Site_Query
instead of a simple $wpdb
query.
This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.
3 years ago
#32
in reply to:
↑ 30
@
3 years ago
Replying to westi:
The issue is that by switching to
WP_Site_Query
and using get_sites() as our source of caching to avoid queries we effectively remove most of the caching behindget_blog_details
.
Previously the cached data for a blog would only been invalidated if there were changes to that blogs details or they were explicitly cleared.
Now we have caching into
WP_Site_Query
which is invalidated everytime we change thelast_changed
value forsites
.
Do I understand you correctly that you are talking particularly about the removal of the blog-lookup
cache (used for domain/path combination lookups before)? We faced a similar issue before with #40362, and your comment confirms our approach to hold off of that one.
Given the outlined problems, my suggestion would be to add usage of that cache group back in cases where it applies (domain+path), and do something similar to a lookup by ID (the individual sites
cache is also more granularly invalidated than last_changed
used by WP_Site_Query
). So we'd keep using get_sites()
in case of a cache miss for the original caches.
Does that sound good to you?
#33
@
3 years ago
- Severity changed from normal to blocker
I think it is more nuanced than that but lets compare in detail to see.
The current release version of get_blog_details
leverages caching as follows:
1) blog-lookup
cache for converting domain + path => blog_id
2) blog-details
cache for converting a blog_id => The response of get_blog_details
3) sites
cache (inside WP_Site) to populate the WP_Site object on construction via get_instance
4) site-details
cache - This is potentially used for the extended properties on the WP_Site
object but not directly by get_blog_details
as it sets the extended properties itself by switching and calling get_option
This gives us up to four different cache groups being used, which are all global groups keyed by blog_id and invalidation only happens to cache items when the object they reference has changed - i.e. invalidation is tied to mutation of the state of a blog_id.
The code in trunk leverages caching as follows:
get_blog_details
calls get_site_by
calls get_sites
calls WP_Site_Query::query
calls WP_Site_Query::get_sites
this path leverages caching as follows:
1) sites
cache inside WP_Site_Query::get_sites
which is keyed off a cache key based on the query and last_changed timestamp for sites.
2) _prime_site_caches
is called with the list of found IDs to prime the sites
cache group with the wp_blogs
data for each found ID
3) get_site
is called for each id to get the WP_Site
object which uses the existing or primed data in the sites
cache group.
So yes, maybe bringing back the blog-lookup
cache would make this change safer, but also I think we need to not call get_site_by
if we have a blog_id
and we should review and document the change in query count and complexity when the caches are all empty.
Side note: is WP_Site_Query safe for large multi-site installs aka does it self neuter bad queries?
Also, I think this ticket is now a blocker for 4.9?
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#35
@
3 years ago
Addes some improves to the get_site_by
function.
- If the lookup is by id, don't use
get_sites
, just useget_site
and do the network_id compare using theWP_Site
object for better performance. - Implement the
blog-lookup
cache inget_site_by
to save lookup on domain + path, as this cache key is invalidated inclean_blog_cache
andget_sites
is invalidated a lot in high traffic sites.
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#37
@
3 years ago
We discussed this in detail in today's meeting and came to the conclusion that the commit should be reverted for now:
- The original idea was to bring back the
blog-lookup
cache, since it's less aggressively invalidated (per site instead of per any site). We wanted to leverage it inget_site_by()
first, however it caches full site objects including their site details which is not what's needed inget_site_by()
. So we then considered bringing it back toget_blog_details()
itself. - However, there's also the
blog-details
cache, which is in a way more stable than either of thesites
orsite-details
cache since it also caches sites not found (as -1).
The above with all the other complexity of get_blog_details()
involved brought us to the conclusion to revert this change for now.
In addition, get_site_by()
itself should be reverted as well for now and rescheduled for 5.0: We need to generally work on improving caching for WP_Site_Query
(and WP_Network_Query
) as well as consider storing not found values for the sites
and networks
caches too. While get_site_by()
makes sense as an explicit and less complex replacement for get_blog_details()
, it is not ready yet in terms of caching, where it currently falls short of get_blog_details()
under specific circumstances. When we have the new code's caches work as well as the old ones, we can introduce get_site_by()
as a proper replacement for get_blog_details()
and reconsider using it in the old function too.
#38
@
3 years ago
Thanks for all your hard work on this, reading the logs it does sound like moving this change to 5.0 is a better idea than trying to resolve all the issues this close to the release of 4.9.
#41
@
3 years ago
- Keywords dev-feedback removed
- Milestone changed from 4.9 to 5.0
- Severity changed from blocker to normal
This can be revisited when caching in get_site_by()
works as well as (or better than) that in get_blog_details()
.
Here is what a get_blog_details function would look like using get_sites. It basically creates different args and passes it to get_sites and returns 1 site.
function get_blog_details( $fields = null, $get_all = true ) { $args = array(); if ( is_array( $fields ) ) { if ( isset($fields['blog_id']) ) { $args['site__in'][] = $fields['blog_id']; } if ( isset($fields['domain']) ){ $domains = array( $fields['domain'] ); if ( substr( $fields['domain'], 0, 4 ) == 'www.' ) { $nowww = substr( $fields['domain'], 4 ); $domains[] = $nowww; } $args['domain__in'] = $domains; } if( isset($fields['path']) ) { $args['path'] = $fields['path']; } } elseif ( is_numeric( $fields ) ){ $args['site__in'][] = $fields; } elseif ( $fields && is_string( $fields ) ) { $current_network = get_network(); $slug = trim( $fields, '/' ); if ( is_subdomain_install() ) { $domain = $slug . '.' . preg_replace( '|^www\.|', '', $current_network->domain ); $path = $current_network->path; } else { $domain = $current_network->domain; $path = $current_network->path . $slug . '/'; } $args['path'] = $path; $args['domain'] = $domain; } else { $args['site__in'][] = get_current_blog_id(); } if( empty( $args ) ){ return false; } $args['number'] = 1; $sites = get_sites( $args ); if( empty( $sites ) ){ return false; } $details = array_shift( $sites ); $blog_id = $details->blog_id; if ( $get_all ) { switch_to_blog( $blog_id ); $details->blogname = get_option( 'blogname' ); $details->siteurl = get_option( 'siteurl' ); $details->post_count = get_option( 'post_count' ); $details->home = get_option( 'home' ); restore_current_blog(); /** * Filters a blog's details. * * @since MU * @deprecated 4.7.0 Use site_details * * @param object $details The blog details. */ $details = apply_filters_deprecated( 'blog_details', array( $details ), '4.7.0', 'site_details' ); } return $details; }