Opened 3 years ago
Last modified 9 months ago
#40180 assigned enhancement
Introduce `get_site_by()` function for multisite
Reported by: | flixos90 | Owned by: | flixos90 |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Networks and Sites | Keywords: | has-patch has-unit-tests ms-roadmap needs-refresh |
Focuses: | multisite | Cc: |
Description
As @stephdau mentioned on #40064, the new get_site()
function from 4.6 is not a direct replacement for get_blog_details()
which we are planning to deprecate.
Therefore I think it makes sense to introduce a function get_site_by()
that is basically a wrapper for a WP_Site_Query
(with LIMIT 1
), and then runs array_shift()
to only return the first result. This function would then become the direct replacement that people who previously relied on get_blog_details()
with something other than an ID could use.
Parameter-wise, we should probably adjust it a little bit to match things like get_term_by()
, which also makes the way that function works more explicit.
Attachments (6)
Change History (50)
#2
@
3 years ago
There is one thing we'll need to contend with, for consistency, with other get_*_by()
instances: get_user_by()
, get_term_by()
, etc all assume that the 1st parameter is a single $field
, such as login
(user) or slug
(term).
In the case of get_blog_details()
, that same equivalent parameter can accept an array of $fields
, such as array( 'domain' => 'blah.com', 'path' => '/' )
;
#3
@
3 years ago
This will also have to be considered:
foreach ( get_site( 52723464 ) as $key => $value ) { echo "$key, "; } blog_id, domain, path, site_id, registered, last_updated, public, archived, mature, spam, deleted, lang_id, ds_blog, ds_stats, foreach ( get_blog_details( 52723464 ) as $key => $value ) { echo "$key, "; } blog_id, domain, path, site_id, registered, last_updated, public, archived, mature, spam, deleted, lang_id, ds_blog, ds_stats, blogname, siteurl, post_count, home,
#4
@
3 years ago
@stephdau We can likely handle the parameters in the same way as a function like add_query_arg()
works: You can either pass a key and a value as two separate parameters, or an array of key => value pairs (in that case the second parameter would be ignored). This makes the function flexible enough for these use-cases.
Thanks for bringing up the object iteration as well.
#7
@
3 years ago
- Keywords has-patch dev-feedback has-unit-tests added; 2nd-opinion removed
40180.diff is a possible implementation for get_site_by()
, including unit tests. It accepts either a $field
and its $value
as parameters or a single array of fields and their values.
I decided to not make the function a one-to-one replacement for get_blog_details()
as we're probably not going to deprecate that function. However, get_site_by()
supports everything that get_blog_details()
needs and does that in a much more efficient way, so I suggest we should use the new get_site_by()
in get_blog_details()
- all the latter needs to do then is tweak how the parameters are passed to it and modify the return value appropriately.
#8
@
3 years ago
I just added a patch on #40228 to demonstrate how this function could be used to improve get_blog_details()
as well. My comment https://core.trac.wordpress.org/ticket/40228#comment:8 also goes a bit more into detail why I think that modifying get_blog_details()
is not sufficient here and why it's worth introducing get_site_by()
.
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#14
@
3 years ago
- Milestone changed from 4.8 to Future Release
This needs more discussion and review. If we introduce such a function, it makes sense to bring it in alongside enhancements like #40364.
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.
3 years ago
#17
@
3 years ago
- Keywords ms-roadmap added
- Milestone changed from Future Release to 4.9
Per today's office hours, it was decided to introduce the new function and use it to improve get_blog_details()
.
Let's think about possibly finding a more suitable name. The function is not entirely equivalent to something like get_term_by()
, since it also supports an array, so a different kind of name might make sense.
#18
@
3 years ago
40180.2.diff makes a few improvements:
- It adds support for getting a site by only its path, if the current setup is a subdirectory install. Example:
get_site_by( 'path', '/foo/' )
- It introduces a new idea of simply passing a URL (domain-path combination), which is a lot quicker than passing an array of domain and path. Example:
get_site_by( 'url', 'mysite.com/foo/' )
- An additional tweak in the beginning of the functions ensures that
get_site()
is called as an easy shortcut when only an ID is passed. - Further unit tests have been added/adjusted.
I'm starting to like the url
argument. Actually, that argument would allow us to completely get rid of the requirement of supporting an array, which would make the function much simpler. The only thing that people might wanna pass in addition is the network ID, but since that only applies to multi-network installs, it could as well be an extra third parameter, as it is in several other core functions. I will create another patch with that proposal.
#19
@
3 years ago
40180.3.diff implements a more straightforward version of get_site_by()
, where it only accepts a single parameter (as mentioned in https://core.trac.wordpress.org/ticket/40180#comment:18). With this kind of functionality, it actually works similar like get_term_by()
, thus the name now makes sense.
The idea is that the new url
field is a combination of domain and path and thus covers that use-case in a simple way. Unit tests for the function are included.
I like this approach far better, and it clears up the naming issues we discussed on Tuesday. Furthermore get_site_by( 'url', 'mysite.com/foo/' )
is simpler than get_site_by( array( 'domain' => 'mysite.com', 'path' => '/foo/' ) )
.
#20
@
3 years ago
I really like 40180.3.diff. My problem with this function as been passing an array to it. It makes this function stick out compared to get_term_by. By the param url is a better work around for this.
Do we have a use for it core other than get_blog_details
? We replaced nearly all calls to the blogs table directly a while ago..
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#22
@
3 years ago
- Keywords needs-dev-note added
40180.4.diff fixes two issues with the previous patch:
- If a domain starting with
www.
is given, the query is modified to be sorted bydomain_length
inDESC
order: If you provide awww.
URL and there actually is awww.
and its variant withoutwww.
in the database, you don't want the non-prefixed site to be returned. Seeget_blog_details()
where this currently happens to. - Paths given are now correctly parsed. In the previous patch it was possible for the path to end up as
//
which is invalid.
More tests have been added to verify the function works as expected.
This function is good to go in IMO, waiting for another review now. :)
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#24
@
3 years ago
In addition to improving get_blog_details()
(see #40228), get_site_by()
can also be leveraged in the following functions:
get_id_from_blogname()
get_blog_id_from_url()
(this will also get rid of theblog-id-cache
cache group)domain_exists()
These can be dealt with in separate tickets post-merge.
#25
follow-up:
↓ 26
@
3 years ago
Thanks for all the work on this, @flixos90, it's looking good!
- I remember being a little wary of
url
as an argument, but it does seem to fit pretty well. - I'm a little unsure of the
domain
andpath
arguments as implemented. We don't have the sameis_subdomain_install()
restrictions inget_sites()
. I would argue that if somebody is looking for protection like that, they should adjust based on their awareness of the environment rather than maybe go away thinking that a site doesn't exist. I may be biased because in the current form, I would not be able to useget_site_by( 'domain' ... )
in my configuration. - For the
slug
argument, I'm not sure we can assume thatpath
will always be$network->path
whenis_subdomain_install()
. I would go the other way and assume that we could leavepath
off entirely.
Everything else seems solid. I'll stare at the unit tests some more later as well.
#26
in reply to:
↑ 25
;
follow-up:
↓ 27
@
3 years ago
Replying to jeremyfelt:
I'm a little unsure of the
domain
andpath
arguments as implemented. We don't have the sameis_subdomain_install()
restrictions inget_sites()
. I would argue that if somebody is looking for protection like that, they should adjust based on their awareness of the environment rather than maybe go away thinking that a site doesn't exist. I may be biased because in the current form, I would not be able to useget_site_by( 'domain' ... )
in my configuration.
I think these restrictions make sense, as I see get_site_by()
as a function that should generally only run a query which will produce a single result (even if it didn't have the limit
argument). I don't think it should be a "here is some input, give me a random site for which this data applies" function. Using a full url
will always fulfill this requirement (unless someone has similar www and non-www URL sites, which is edge-case though), using a slug
will too on regular multisite setups. domain
and path
are basically more specific variants of slug
, to be used when it's already clear that the setup is subdomain/subdirectory (get_blog_details()
supports this only for subdomain installs for some reason, but it does support the general idea).
I may be a bit conservative here, but what I'm essentially saying that slug
, domain
and path
only exist to account for the typical multisite (where no sub-site has an actually distinct non-subdomain domain name etc.). With anything custom, get_site_by()
should mostly be used with url
(and id
which always works of course).
For the
slug
argument, I'm not sure we can assume thatpath
will always be$network->path
whenis_subdomain_install()
. I would go the other way and assume that we could leavepath
off entirely.
I took this bit of code from get_id_from_blogname()
which works that way. I think it should stay that way since that is how a regular multisite network works. You either only use subdomains or subdirectories. Everything else is, I'd say, custom and the slug
way alone cannot account for this.
#27
in reply to:
↑ 26
;
follow-up:
↓ 28
@
3 years ago
Replying to flixos90:
I may be a bit conservative here, but what I'm essentially saying that
slug
,domain
andpath
only exist to account for the typical multisite (where no sub-site has an actually distinct non-subdomain domain name etc.). With anything custom,get_site_by()
should mostly be used withurl
(andid
which always works of course).
Ok, that make sense. After poking around a bit more, I'm on the same page.
For the
slug
argument, I'm not sure we can assume thatpath
will always be$network->path
whenis_subdomain_install()
. I would go the other way and assume that we could leavepath
off entirely.
I took this bit of code from
get_id_from_blogname()
which works that way. I think it should stay that way since that is how a regular multisite network works. You either only use subdomains or subdirectories. Everything else is, I'd say, custom and theslug
way alone cannot account for this.
Agreed. I did try setting a path before installing multisite and was reminded that you can't install in subdomain mode when that's the case, so if get_id_from_blogname()
is doing it that way, then it may be useful for somebody and we should probably just let it be. :)
I tweaked the tests in 40180.5.diff:
- Removed the assertions that the returns were
WP_Site
objects. We should be asserting one thing per test, and if it's not returning the object, then other things are going to fail. - Added 2 explicit tests for retrieving a site by slug on specific networks. I see that the other tests use different networks, but I think having the explicit behavior tested is good.
The tests for get_site_by( 'id' ...)
gave me pause. I definitely don't think we need to account for somebody passing null
as the ID value when they can just use get_site()
. I think that test can be removed.
As for id
itself, are we doing this just for parity with other functions? I can't imagine this being more useful than get_site( $id )
.
#28
in reply to:
↑ 27
;
follow-up:
↓ 29
@
3 years ago
Replying to jeremyfelt:
Removed the assertions that the returns were
WP_Site
objects. We should be asserting one thing per test, and if it's not returning the object, then other things are going to fail.
The reason I put these extra assertions in was that I wanted to have failures rather than errors (if get_site_by()
returns null which I'd consider a test failure, it throws an error without the WP_Site
assertion). But I see the one assertion per test argument, so I agree with that as well.
- Added 2 explicit tests for retrieving a site by slug on specific networks. I see that the other tests use different networks, but I think having the explicit behavior tested is good.
Good catch, that makes sense.
The tests for
get_site_by( 'id' ...)
gave me pause. I definitely don't think we need to account for somebody passingnull
as the ID value when they can just useget_site()
. I think that test can be removed.
As for
id
itself, are we doing this just for parity with other functions? I can't imagine this being more useful thanget_site( $id )
.
Right, I agree there's probably nobody who will use get_site_by( 'id', ... )
over get_site( ... )
, but I did it for parity which I still think makes sense.
As far as putting null
goes, I agree that this test can be removed. While it would still work because it just calls get_site()
, trying to get a site by a specific ID, which the function name implies, and then passing null
doesn't make sense. Should we actually catch that behavior and return null
in such a case?
#29
in reply to:
↑ 28
@
3 years ago
Replying to flixos90:
As far as putting
null
goes, I agree that this test can be removed. While it would still work because it just callsget_site()
, trying to get a site by a specific ID, which the function name implies, and then passingnull
doesn't make sense. Should we actually catch that behavior and returnnull
in such a case?
That's tough. We probably aren't consistent in core with how much protection we offer. The function signature implies through documentation that string|int
is expected, which I'd expect to be enough. We could do something similar to get_term_by()
and check for a string length greater than 0 if slug
, domain
, path
, or url
and for a positive int
if id
.
#30
@
3 years ago
In 40180.5.diff get by id, calls get_site
. But what does the network_id
mean in this context. It is completely ignored. The network id means, get a site in this network. I think we shouldn't exist early here, but just pass site__in
arg to get_sites
.
This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.
3 years ago
#32
@
3 years ago
- Owner changed from flixos90 to jeremyfelt
- Status changed from assigned to reviewing
In 40180.6.diff:
- Ensure a non-empty string is passed for string fields. Previously, if an empty string was passed for something like
domain
,get_sites()
would return the results of a default query (get_sites( array( 'domain' => '' ) );
). - Add tests for empty string values.
- Ensure a numeric value is passed when
id
is used to avoid an unnecessary query attempt. - Use the
site__in
parameter whenid
is used as a parameter so that thenetwork_id
parameter continues to work as expected.
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
3 years ago
#36
@
3 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Due to issues that came up in https://core.trac.wordpress.org/ticket/40228#comment:32, we revisited the way caching in get_site_by()
currently works.
We discussed the topic in detail today and came to the conclusion that get_site_by()
should be reverted for now and rescheduled for 5.0: We need to generally work on improving caching for WP_Site_Query (and WP_Network_Query) to use special caches for common use-cases, which can be invalidated less aggressively (currently any change to any site/network invalidates every site/network query cache). We should furthermore consider storing not found values for the sites
and networks
caches too.
To be clear: get_site_by()
itself still is probably ready, but WP_Site_Query
that it uses needs to improve how it caches results for the function to be really beneficial.
#37
@
3 years ago
- Owner changed from jeremyfelt to flixos90
- Status changed from reopened to assigned
I'm liking the
get_site_by()
idea, from a processing standpoint.I'm still not super fond of the idea of deprecating
get_blog_details()
, primarily because of the amount of person-hours that 3rd-parties will end up spending on moving to the new function, and users upgrading plugins/etc because of it (EG: I have 559 instances of the use ofget_blog_details()
outside of Core in the codebase I work on). But at least having a direct replacement means devs have more options (regex replacements, etc).Keeping the full functionality around is ultimately what matters, so it's still a more viable option than the previous plan.
Thanks, @flixos90.