Google SEO Office Hours, 8 Oct 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: English Google SEO office-hours from October 8, 2021

Description

This is a recording of the Google SEO office-hours hangout from October 8, 2021. These sessions are open to anything search & website-owner related like crawling, indexing, mobile sites, internationalization, duplicate content, Sitemaps, Search Console, pagination, duplicate content, multi-lingual/multi-regional sites, etc.

Find out more at https://goo.gle/seo-oh-en

Feel free to join us - we welcome folks of all levels!

A

Hands to ask questions, or just like okay, then I'll use my hand.

B

All right welcome everyone to today's google search central seo office hours hangout. My name is john mueller. I'm a search advocate at google here in switzerland and part of what we do are these office hour sessions where people can jump in and raise their hands and ask questions, and we have a bunch of questions also submitted on youtube, but maybe we'll get started with with the first ones.

B

Here, um let's see, I I'm not quite sure like how how the order was originally because it's swapped around a little bit but I'll just go from the top. I'm sure we'll make it uh come on.

C

um We have exam websites, people added some exams and give the label for the exams and google indexes all the pages. All the label pages like tech pages, and this their role is filtering the exams based on the text based on the topic and google indexes all the topics you know, but is most of them include the same exams uh like a duplicate page and we canonicalized these urls. Firstly, but it caused a lot of crawling time for google, for example, we used the third party tool to crawl all the pages.

C

It was about 100 000 pages. Then we are given no index on robot text and the crawling time degrees, and it is very fine, but we want to remove some of these pages, which indexed by google, for example. We find them on search results. What we don't find don't want to do that, how we can do that, we gave no index and google cannot access.

C

uh Should we remove them one by one or.

B

I I would just leave them as no index and it's what what will happen is over time when we recrawl those pages they will drop out, but that that can happen. Maybe I don't know take a couple of months, but it's not that we would crawl those pages more just because of the no index. Essentially, when we crawl them, we will see the no index and then we will drop them and usually, if we see a no index, then we would not crawl them.

B

That often- and I I think, with something in the order of a hundred thousand pages, it just takes a while, but it's not like super super gigantic and impossible to recrawl. Does.

C

It affect negatively to our seo performance uh in terms of we will see that these are duplicate pages but canonicalized. At the same time now.

B

C

B

No, no, it's it's perfectly fine! It's it's purely a matter of kind of the the crawling side of things. uh If we see that they're duplicate, then we will try to treat them as duplicates and we we will just focus on one canonical url for them and yeah. I. So from that point of view, I will just let it be process, reprocessed and drop out on its own.

C

B

You very much sure uh la gray.

B

Can't hear anything I think I think you're still muted.

B

Otherwise, we will come back to you like whenever okay yeah so feel free to reconnect and I'll just go to the next one and come back to you when you're ready, okay, um michael.

D

Hi john, um so you've recommended several times in the past that large sites that they focus on smaller set of pages. um I guess um so we might the site I'm working on right. Now.

D

We have a lot of pages that a lot of pages, like thousand pages, that don't get any traffic um that are old, um so I've been recommending to remove those um but there's a question that our dev team has that they were under the impression that the more pages that google has indexed of your site, the higher authority it ascribes to the site and are reticent to remove any pages.

D

Could you shed some light on on that.

B

Yeah, so it's definitely not the case that if you have more pages indexed that we think your website is better, so I I think that at least is is absolutely not not the case. um It's sometimes it makes sense to have a lot of pages indexed. Sometimes they're, they're kind of useful pages have index like that, uh but it's not not a sign of quality.

B

With regards to like how many pages that are indexed, and especially, if you're talking about something in the order of I don't know, thousand two thousand five thousand pages- that's that's a pretty low number for for our systems in general, and it's not that we would say. Oh five, 000 pages is better than 1 000 pages for us, it's all kind of like. Well, it's a it's a small website and we we make do with what we can pull out there and, of course, like small website, is relative.

B

It's not like saying it's like an irrelevant website. It might be small, but it might still be very useful, um but it's certainly not the case that just having more pages indexed is is a sign of quality. Okay,.

D

Thank you so much.

B

Cool uh la gray, maybe we'll try you again.

E

There can you hear me yes, good this connecting from work and connecting from home is different. Anyway, um back in mid-july, we started knowing noticing a lot of errors in search consoles submitted, but no index. The urls themselves do not have a no index on them, but on the subsequent crawl they get indexed.

E

The problem is, is that you know we get 300 errors no index and then on subsequent crawls, only five get crawled before they recrawl. You know so many more so, given that they are no index and and granted if things can't render, or they can't find the page they're directed to our on page, not found which does have a no index, and so I know somehow they're getting directed there. Is this just a memory issue or like since they're able to get subsequently called fine? Is it just a.

B

It's it's hard to say without looking at the pages, so I I would really try to double check if this was a problem, then and is not a problem anymore or if it's still something that kind of intermittently happens, uh because if it, if it doesn't matter, if it doesn't kind of take place now anymore, then like no.

E

It it does still take place now, like it just it falled on 10-4 and then on 10-5, it recrawled again and the ones that got recrawled were indexed fine. So I know if it's caching it somehow or if it was just a memory issue, I'm I'm not sure.

B

Yeah, my my hunch, without knowing your site, is that something with the rendering is sometimes going wrong, and it's reaching that error page that you mentioned and uh that's. If that's something that still takes place, I will try to figure out what what might be causing that, and it might be that, like when you test the page in search console 9 times out of 10. It works well but kind of that one time out of 10 when it doesn't work well and redirects to the error page or we think it redirects to the error. Page.

B

uh That's kind of the the case I will try to drill down into and try to figure out is. Is it that, like there are too many requests to render this page or there's something complicated with the javascript that sometimes takes too long and sometimes works well and then try to try to narrow things down from from that point of view,.

E

Okay, okay, so so could it be that if there's too much volume then the pages couldn't be so many resources are blocked, and so then it looks like page not found and then, when there's less traffic on the website, the resources are able to be rendered.

B

Could be yeah could be, and I I mean what happens on our side? Is we crawl, the html page and.

F

B

We try to process the html page in in kind of the chromium kind of chrome type browser, and for that we try to pull in all of the resources that are mentioned on there.

B

So if you go to the what is it the developer console in chrome and you look at the network section, it shows you like a waterfall diagram of everything that it loads to render the page and if, if there are lots of things that need to be loaded, then it can happen that things time out and then we might run into that error situation.

B

Okay, so that's kind of the direction I would go and if you can't isolate exactly what is going wrong, then I would try to take the number of requests there and just see if there are ways that you can minimize that, maybe the developer team can combine the different javascript files or combine the css files minimize the images or things like that. Okay,.

E

Okay, cool. Thank you. Cool.

B

G

Hi john, it's me again yeah last time we uh talked about uh some problems with a website where we have um uh like an e-commerce website where we have um information and stuff and transactional stuff and um yeah. uh Your advice was to yeah separate these content a little bit and into yeah uh transaction oriented and uh information oriented uh pages.

G

So, but I have another question um regarding this: if you have a let's say an e-commerce website and you have a huge blog or a magazine or something like that where you have loads of information and stuff yeah, but it's a it's an own section and on the other hand you have all these product pages and categories and and so on, so um would would this huge blog with pure informational, stuff uh yeah, give the whole website kind of or an informational uh touch or character, so that google says oh.

G

We are not sure this is if this is a more yeah, something where people can get information rather than buying stuff. Or is this every evaluation done on a per page base.

B

I I don't think we have that documented or defined, but my my understanding is that this is more of a page level thing, because it's it's I mean just purely from trying to think of of it, as I don't know, a practical way like how you would implement it and look at websites overall, a lot of websites just have a mix of different kinds of content, and then you try to figure out which, which of these pages match the the searcher's intent and try to rank those appropriately.

B

So my my feeling is this is something more that would be on a page level rather than on a website level.

G

Okay, so we we don't, have the uh uh we don't, uh have the risk that yeah by adding more and more uh yeah text, uh uh content uh that we uh kind of dilute the uh product uh pages or something I I don't think so.

B

I mean you, you see that with with news websites, often that they have kind of the recent events, but they also have sections for maybe maybe like older events that took place- or I don't know, 9 11 or for other big big events.

B

They they kind of, have an isolated archive section and those are very different intents like if you want something really now that is happening or if you want some kind of informational research, evergreen type content, and there too, we kind of have to look at it on a per page basis and not like say. Oh, this is a research website, because there's some research content here, all right cool thanks a lot john, um let's see kaishi, I I don't know how to pronounce your name. I'm sorry.

B

H

Sure name: okay, uh so do you wanna have this question for you?

H

We are seeing that people are linking to us through through backlinks to our let's say subcategory pages, and the problem is that after sometimes you know our content comes and goes, which means that sometimes there is more content appearing in some categories. Sometimes the content gets deleted, and so so some categories can can be created and can disappear as well, and uh we are seeing a bunch of 404s called from from backlinks because they are linking to to subcategories that no longer exist.

H

My question here is: is it okay to relate this people, these links to the parent category and and if we do so, uh how? How do we do that with uh c 3 or or 302 302, for example, like a temporary redirect, because in the future, this subcategory might might be populated with content again or it's it's kind of uh it's not a permanent already.

H

You know, because.

B

Yeah, so if, if if we see this, this happening at a larger scale that you redirect kind of to the parent level, we would probably see that as a soft 404 and we would say well, the the old page is gone and uh like instead of a 404 code, you're redirecting and maybe that's better for users, uh but we we see it as a 404.

B

So from from a practical point of view, I suspect there's little seo difference if you redirect or not. um If, if it makes sense from a user point of view to redirect, then I I would just go for it. uh It's not that you have a penalty either way, so that's kind of kind of the the first thing with regards to 301 or 302.

B

um I don't think it matters there, because we we would either see this as a soft 404 or we would see it as a canonicalization question. If it's a soft 404, then the code doesn't matter. If it's a canonicalization question, then it comes down to which url we show in the search results and usually the higher level. One will have stronger signals anyway, and we will focus on the higher level yeah. uh So that doesn't matter if that's a 301 or a 302, but.

H

My follow-up question to this is basically, if, if we do because uh we do think it's better for the user to to see the the category page instead of a blank page, let's say say, because the the the parent category is usually very strongly related to the category.

H

So, but my my concern here is: if we do this kind of redirection, um I don't know if it could impact future crawling of the subcategory when, when the subcategory appears again, let's say okay because, as I said, content can come in again in the subcategory, but if we did redirect to the in the past government, maybe I don't know, maybe it does doesn't crawl anymore. That uh subcategory. That's my my follow-up concern.

B

Yeah, I I suspect, there's a minimal difference, but I don't know which one would be better. uh So that's that's kind of the the first thing, because if we see it as a soft 404, it would be like a 404 and we would slow down crawling of that particular url, because, like there's nothing here, why do we have to crawl it every day?

B

If we see it as a redirect, then we would also say: well. We don't need to crawl this every day, because we focus on the primary url. So I I think in both of those cases it we would slow down crawling of that url until we get new signals that tell us. Actually, this is maybe something new again and the new signals.

B

I I think that would be the stronger sign that would be like internal linking or sitemap file things like that, and that would be the stronger sign for us to crawl again, but I think the slowing down of crawling would be similar in in all of these cases. It might be like maybe there's a minimal difference between some of them, but I don't know which one would be faster. For example, okay,.

H

Okay but okay, but if we include them again in the same apps, that should be enough to let google know hey, maybe take a look.

B

Yeah, I I think sitemaps alone is probably not enough. I would really make sure that the internal linking is also clear. Okay, okay, perfect, thank you very much sure uh granite.

I

Hey john, so about a year ago, we saw some significant decrease in traffic after the audit. We kind of all the point. All the signals pointed to the site having side quality issues. We were able to address those issues by february this year and by june core update, we saw some increases, but it's still not uh to the level where we used to be before the decrease uh about a year ago. So my question is like the site quality issues.

I

If that's been the case, is this the recovery that we can expect, or can we expect more recovery? If we think we've addressed all the issues identified or like? Is this? It.

B

I I think the the tricky part here is: it's not so much that we would consider it as a situation where you you have to fix something, uh but rather when, when it comes to relevance, if, if you work on improving the relevance of your website, then you have a different website. You have a better website, so it's not that we would switch back and say: oh, it's like the issue is fixed and we will change it back to the previous state, but rather you're saying well.

B

This is a better website now and we look at it and say: oh it's a better website. It's not the same or comparable to before, so it it would be kind of tricky to expect that it changes to the state it was before. But it's it's a new website. It's it's a better website!

B

So that's something where, when I think especially with core updates when you're talking about recovery, it's not so much you're recovering, but rather google is seeing that you have a better website and reacting to that.

I

Understood and when we talk about site quality issues, I think like from what we've been able to see those were, like mostly uh let's say, technical and user experience issues and not like content quality issues, meaning that content-wise. I think we are very solid, but we had more ads than you should have on a page and that's been addressed and overall, the user experience has been improved, uh which, like all the all this and everything that we did like pointed out that, for those to be the reason.

B

Now I I think it's it's kind of tricky because, with the core updates, we we don't focus so much on just individual issues, but rather like the relevance of the website overall, and that can include things like the the usability and the the ads on the page. But it's essentially the the website overall and usually that also means um kind of the the focus of the content, the way you're presenting things uh the way, you're you're, making it clear to users.

B

What's behind the content like where what the sources are all of these things, all of that kind of plays in so just going in and changing like everything around the content, I I think you can probably get some improvements there, but essentially, if you really want google to see your website as something significantly better, you probably also need to work on the content side and at least from from the focus uh point of view and think about like where might there be low quality content?

B

Where might users be confused when they go to my website and is? Is that confusion, something we can address with technical issues with ux changes? Or do we actually have to change from the content that we present.

I

Understood and my last question would be if that job came not during a core update but like a regular period of time like should you expect for the core of this to happen, for you to see a recovery, if you are addressing issues or should you expect also recoveries to happen in a non-core update period.

B

I I think, if you make bigger changes on a website regardless you, you would see kind of a subtle change over time as we reprocess things and re-understand the signals, and if there was something from the core update that was significantly impacting it as well, then you would see a jump during the core update, but at least you should see kind of that subtle improvement over time.

B

I understood thank you cool um kamar online. We have two that are almost the same good.

J

Morning I I hope this is the right camaro. Yes, thank you, john. So my first, my video is off because I'm having some bandwidth issues. My first question: all my questions are small and they're around anchor text. So when we are writing a page on our own website and we curate content, for example, from webmd, we are taking a snippet and using it as a point of reference, and if we were to give a citation to webmd, in other words source url, I prefer to use it in the footer.

J

So when the person is reading it, they don't jump off within the article and jump off to webmd to learn more so my user stays on the site. Now there are two thoughts.

J

Should we give a link back to webmd, or should we not because I have not found a reference anywhere that google wants you to link back, you know, so why should I give any more link that webmd has that you know etc? So that was my first question: how what's the best practice from a google point of view.

B

um I I think, if you're quoting something then linking to the source always makes sense, so that's kind of just just purely from a usability point of view. I think that would make sense with regards to to seo for your website.

B

I don't know if you would see any any particular change by by specifically linking to other people's websites, because it's it's one of those also spammy techniques that used to be used quite a bit where you would create a low quality page and on the bottom. You would link to cnn and google and wikipedia.

J

B

You would hope well, google thinks this is a good page because it links to cnn but and.

J

Yeah and that's the reason I was talking about linking, I gave a reference to webmd the authority, but when we are linking to somebody there are some legal issues from an ftc guideline that when you link your kind of living, giving recommendation and the user might think, okay, okay, I'm going to click here and do something. And then I receive letters from lawyers hey stop linking to us because we don't want to be linked. So that's why I have not linked to people okay, so so I wanted to know from an seo perspective now.

J

The second thing is: if I have a content that is off page. In other words, I wrote a piece of content on linkedin and I'm connecting back to my website and the question is about anchor text so from an seo mindset. I can say: okay, I want to link back to my.

J

If my page talks about seo services, I can use an anchor text to learn more check my seo services page now. That's a anchor text that worries me because, while it's a great anchor text that would benefit me from a and from a web accessibility point of view, that's the kind of link I should use you know, but from an seo, it's like over optimization coming back to me. So how do I handle that? You know coming back because there are risks levels. You know.

B

I I mean you usually what what happens is we look at the web and we find all kinds of links uh if, if you're creating content on multiple platforms, I would try to use useful, useful anchor text that gives us more information about the page that you're linking to.

B

So if, if that's like an anchor text that includes seo services or whatever, I think that's perfectly fine, because usually we have a wide variety of things kind of linking around and some of them will be kind of like click this page and that's the anchor text and others will will have more information about the link.

B

I I wouldn't necessarily kind of like over worry about.

F

B

A good anchor text instead like I, I would try to use good anchor text and like link link to your content, so that it's clear what what that content is also with with internal linking the same thing and not say like well, I I need to vary my anchor text or I need to make it look like it's not optimized, because, like with internal linking users, want to know what this link is about, and you want to give that context. So I would just include that context.

J

Right so my point of view of the link is that link is contextual, so, wherever I'm linking, I want user to click there and find the right information. The click link you know doesn't matter so and that that brings to the third problem.

J

So when it's on linkedin it's more trusted, you know it's coming back, but now there's a third problem: is people are doing guest post all over? You know they're going to these low quality sites, they're buying casper. So how? How does google determine and if, like you said, if it's a good anchor text use it right now, if it's a guest post and google does not know whether it's paid or not? How will google then determine that? Take this link or burn this link?

J

You know what is the answer, so we are safe from all the angles.

B

Yeah I mean if it's a guest post, then it's a guest post and our our guidance for links and guest posts is that they should be nofollow.

B

So if, if you're writing these guest posts to drive awareness to your business, I think that's perfectly fine. I will just really watch out to make sure that the links are no follow so that you're you're driving awareness you're talking about what you're doing you're making it so that users can go to your page, but essentially it's an ad for your business.

B

So uh from that point of view, I would just make them no follow uh with regards to guest posts in general, like how does google recognize guest posts, I I think that's tricky, because we we use lots of different signals to try to figure out what what might be a guest post and how we might need to handle that. uh But it's it's definitely not the case that it's just like the link anchor text is what makes it problematic.

J

Okay got it. Thank you so much. That's all sure.

B

Cool okay, um let let me just take a quick break with the live questions and go through some of the submitted ones so that we don't lose track um stay tuned. Like hang around we'll get to you. Don't worry.

B

uh Let's see, um I think the first one is is an interesting one, but that I got is it's basically, historically speaking, seos have owned title tags and recently google prefers to show h1 instead of the title tags, and you have to consider that the h1 is a product of a multi-department discussion which might not be exactly what the seo team wants.

B

Why are you rewriting title seo titles when you do not do the same for ad titles? um Why do you do this to seos it's like um so on? On the one hand, I don't know what what uh what happens on the ppc or on the ad sides.

B

I can't really speak for that, uh but in general I I think it's a kind of a a tricky mindset to say that uh seos own one particular part of the page- and that is always mapped one-to-one in this part of the search results, because these things change over time and it's it's something like the the structured data that that is processed can change over time in the past. It was that you would use micro data and things that are embedded within the html for structured data.

B

Now a lot of people use json ld, which is kind of separate, but all of these things they they evolve over time. It's not the case that you can always say for any given html page. This is exactly what the seo will do, and this is exactly what the developers will do and what the content team will do. uh These things just evolve over time.

B

uh So from from my point of view, it's not so much that we're doing this to to annoy the seos, but rather we're trying to improve the quality of the search results so that ultimately, people search more and when people search more. They go to your websites more, that's kind of essentially our goals here. So it's is not the case that uh people at google sit around and go like. Oh, like how can I annoy seos this week? um That's definitely not not what we do. What we spend our time on.

B

We have so many other normal business problems and work problems and technical problems to work on, and we try to improve the quality of our services and sometimes that affects what seos do sometimes that doesn't and when it does affect what seos do we do try to. Let you know about these kind of changes.

B

um I think the submitted no index. We talked about that briefly. um Let's see if there are two competing e-commerce sites that sell exactly the same product. One website offers a product at five hundred dollars, the other at one hundred dollars all seo signals are equal. uh Would the less expensive website have a better chance of ranking because there's such a price difference for the exact same product so purely from a web search point of view?

B

No, it's not the case that we would try to recognize a price on a page and use that as a ranking tractor.

B

So it's it's not it's not the case that we, we would say we'll take the cheaper one and and rank that higher. um I I don't think that would really make sense. However, a lot of these products also end up in kind of the product search results which could be because you submit a feed or it could be because we recognize the product information on these pages and the the product search results. I don't know how they're ordered it might be that they take the price into account uh or things like availability.

B

All of the the other factors that kind of come in as attributes into product search. So from a web search point of view. We don't take the price into account from a product search point of view, it's possible and the the tricky part I think as an seo is these different aspects of search are often combined in one search results page where you'll see normal web results and maybe you'll see some product results on the side uh or maybe you'll see some some mix of that.

B

So that's kind of looking at that.

B

If we have 200 sitemap files and 20 to 30 of the urls jump from one file to another every week, how bad can it be, or should we strictly keep our urls in the same file forever?

B

So how bad can it be is like hard to say because it's we don't really have a measure for badness when it comes to sitemap files, um but we we would generally try to, or our recommendation is usually to keep the same url in the same sitemap file.

B

The the main reason for that is we process sitemap files at different rates.

B

So if you move one url from one second file to another, it might be that we have the same url in our systems from multiple sitemap files, and if you have different information for this one url like different change, dates for example, then we would not know which, which attribute to actually use.

B

So from that point of view, if you have it always in the same sitemap file, then it's a lot easier for us to say. Oh, we have the information for this url here and we can trust that information, because it's only there.

B

So that's something where I try to avoid it like these url shuffling around randomly, but at the same time it's usually not going to break processing of your sitemap file and it's definitely not going to have a ranking effect on your website.

B

So there's nothing in our sitemap system. That kind of maps to the quality of a website.

B

I'm learning seo from multiple sources, and it feels like a behemoth of information. Do you have a preferred seo checklist that will help make the workflow more efficient? This is in regards to launching a website for small businesses or helping existing businesses boost their seo, um so wow yeah. I don't think we have any seo checklists, so that makes it a little bit harder to to get started. What I would recommend doing is looking at the various seo starter guides that are out there, so we have an seo starter guide, they're from various seo tools.

B

They're also starter guides available that are usually pretty good and for the most part, the starter guides that I've seen they have. They have correct information, so it's, I think, a lot less the case that people publish something incorrect when it comes to especially the the beginning side of seo. So I I will try to go through those and think about which aspects actually play a role or matter for your website.

B

The the tricky part, I think, with all of these starter guides, at least the ones that I've seen is they're, often based on an almost old-school model of websites where you create html pages and for the most part, small businesses. When they go online, they don't create html pages anymore.

B

They go into wordpress or into wix or into any of the the other common hosting platforms, and they create their pages by putting text in and dragging images in and all of these things, and they don't really realize that in the back there's actually an html page.

B

So sometimes when you go through these starter guides, it can feel very technical and not really mapped to what you're actually doing when you're creating these web pages. Because when we talk about title elements, for example, you don't look at the html anymore and try to tweak that. But rather you try to find the the field in your whatever hosting system that you have and think about what you need to put there.

B

So that's something where I I think over time, things will probably shift a little bit to to kind of cover that area a little bit better. But it's something to kind of keep in mind that the seo starter guides. When you look at them, they might feel like super technical. But actually, the work that you do is a lot more like filling in fields and making sure that the links are there and things like that.

B

um Let's see I work in the news vertical. My team is looking to expand our international presence and have done work to set up multi-regional subdirectories for the most part pages across the different multi-regional editions will look the same home, page and section pages like politics or lifestyle, we'll have similar content minus a few pieces unique to the region.

B

The articles are tricky. There is not much. We can differentiate across multi-regional subdirectories outside of modules, with, let's see, related links, which leaves us worried that duplicate content issues. How does google handle duplicate content in the news space? Is it acceptable? The content stays the same, but elements of the template are different. Should there only be one canonical across all multi-regional websites?

B

Wow, okay, lots lots of different uh aspects there, so I I think, taking a step back first, it sounds like these are different regions within the same country and it's same language content. uh So, for, for example, I don't know different u.s states or different regions within the uk, for example, something like that. um If these are different countries, then you have the aspect of geo-targeting which plays a role if these are different languages.

B

So if you're working say in in europe and you're, covering germany and france and italy, or something like that, then you have different languages as well.

B

That's also a very different aspect, but if you're talking about within the same country same language content, then on the one hand it's a little bit easier because you don't have to worry about all of these technical connections.

B

But on the other hand, the duplicate content issues are a lot a lot more visible and when it comes to duplicate content, the the tricky aspect on on sites like these is that you essentially end up competing with yourself.

B

And if you have one news article that you publish across, I don't know five or six different region regional websites.

B

Then all of these different regional websites try to rank for exactly the same article and that could result in that article just not ranking as well as it otherwise could uh so because of that, I would recommend trying to find canonical urls for these individual articles so that you can really say. Well. I have this one article on my five regional websites, but this is my preferred version that I want to have seen in search and then we can concentrate all of our energy.

B

All of our signals on that one preferred version, and we can try to rank that one a little bit better.

B

It doesn't have to be the same version all the time, so it can definitely be the case that you have one news, article that is within one region, kind of the canonical and different news. Article is more canonical for another region. How you pick which region you choose as canonical is totally up to you. It can be completely random if you want, but usually you would do try to figure out like where is it most relevant and pick that one as the canonical version?

B

uh So that's for, I think the dif, the individual articles themselves for the categories and the sections and the home pages seems like that would be something where the content is more unique and more specific to the individual regions, and because of that, I would try to just keep those indexable separate.

B

So if you have five different regional websites, their homepage, their category sections, they would all be individually indexed and the news articles themselves would be mapped to one of these different regions.

B

uh So that's that's kind of the the approach that we recommend there. um People do it in in lots of different variations and this approach also just I guess if you're curious, also works across different domain names.

B

So if you have different domains for individual regions, but it's all a part of the same news group, you can still do kind of this canonical shifting across the different versions, if you're doing it within the same domain with subdirectories. That's fine, too.

B

um Let's see html semantics versus seo. Optimization um the the question kind of goes into uh like on on e-commerce product pages. Should the title of the product be be marked up as a heading and uh from from our point of view, that's totally up to you from purely a technical point of view. It can be the case that the product is a heading on a page.

B

I I think that makes sense in a lot of cases, because you have kind of your business, maybe as a heading you have the product or the the kind of product that it is as a heading that seems to to make total sense from a semantic point of view, so that's kind of what what I would focus on there so less trying to squeeze kind of semantics into what works well for seo, but rather trying to do what makes sense from a semantic html point of view and usually that maps fairly well to to search as well.

B

um Is google discovers personalization based on google search only or also on youtube and other google products? um I I didn't know of hand, so I looked at the help center content for google discover and it mentions that it's essentially based on the the searches that you do so you have to enable.

B

I think, uh whatever the feature is that enables the the search history in your account, uh so that that works and that kind of suggests to me that it's based on your search history.

B

um I don't know how, how like the other google products map into that. My assumption is just purely from a data protection point of view. It would be tricky to map kind of other products into something like that. So I kind of doubt that it's happening.

B

um What is the best course of action to take when you have to 301 redirect all of the urls to a new set of urls? The number of pages will be over 1 million, and you want to minimize the sandbox effect. If there is a sandbox effect, how long could it be? Would we lose ranking that we might never recover? We plan on doing a one-to-one redirect and had requested batch redirects, but that's not a possibility, so pages images, urls etc would have to flip.

B

At the same time, to me this sounds like a traditional site move situation. You move from one domain to another and you redirect all of the urls from your old site to a new one, and we we have to deal with that and there's, at least from my point of view, there's nothing like a sandbox effect. There's, definitely nothing defined as a sandbox effect on our side when it comes to site moves.

B

So if you have to do a site move then do a site move then redirect all of your pages um it's it's often like the the easiest approach is just to redirect all pages at once. uh Our systems are also tuned to that, a little bit to try to recognize that.

B

So when we see that a website starts redirecting all pages to a different website, then we'll try to reprocess that a little bit faster so that we can process that site move as quickly as possible, and it's definitely not the case that we would say: oh they're, doing a site move. Therefore we will slow things down, but rather we try to process things actually a little bit faster. When we recognize there is a site mode.

B

um I have a website that connects to apis on a client side to get data uh are those urls being included in the crawling budget? If you disallow those urls, uh this would create. Would that create any issues? um So I I think there are two things here on.

B

On the one hand, if these apis are included, when a page is rendered, then yes, they would be included in the crawling and they would count towards your site, but your your crawl budget, essentially uh because we we have to crawl those urls to render the page you can block them by robots text. If you prefer that they're not crawled or not used during rendering totally up to you.

B

If you, if you prefer doing that, especially if you have an api that is kind of costly to maintain or takes a lot of resources, then sometimes that makes sense the tricky part, I guess, is: if you disallow crawling of your api endpoint, we won't be able to use any data that the api returns for indexing. uh So if your page's content comes purely from the api and you disallow crawling of the api, we won't have that content.

B

uh That's kind of the the one aspect there uh if the api just does something supplementary to the page. Like maybe draws a map, or I don't know like a graphic of a numeric table that you have on a page or something like that, then maybe it doesn't matter if that's content isn't included in indexing.

B

The other thing is: is that sometimes it's non-trivial how a page functions when the api is blocked, in particular, if you use javascript and the api calls are blocked because of robot's text, then, like you have to handle that exception. Somehow, and depending on how you embed the javascript on the page, what you do with the api, you need to make sure that it still works. So if that api call doesn't work and then the rest of the page's rendering breaks completely, then like we can't index much because there's nothing left to render.

B

However, if the api call breaks- and we can still index the rest of your page, then that might be perfectly fine, so those are kind of the the options there. I think it's trickier if you run an api for other people, because if you disallow crawling, then then you kind of have this second order effect that someone else's website might be dependent on your api and depending on what your api does. Then.

B

Suddenly their website doesn't have indexable content and they might not even notice because they they weren't, aware that suddenly you added a disallow there and that might cause kind of like uh indirect effects, uh but that's ultimately all up to you cool, okay, um still, a bunch of questions submitted, but also lots of people's hands. Maybe I'll go back to to some hands until we kind of finish the recording here, and I also have some more time afterwards. So we can try to answer all of the questions.

B

A

Hi, john, so john. I have also added my question the main question that I had on the question list on youtube channel. So I'll repeat the question. So there are two pages that originate from the same domain. uh The url is bit different. Basically the part of the same directory structure and the way they are generated they are generated by next years.

A

So next years is a server-side rendered react framework and they are being indexed, but I see one page in the google cache and the second page is not in the google fashion, and I see the same pattern regardless of how I generate the page. Okay- and I mean there is no set pattern that this would be in google cache. This would not be in google. Caching.

A

Most of my pages are in google creation, but now I'm worried because I'm currently moving from my java based text stack, which generates all these pages to total next years, and this this this problem is uh well. While I was debugging, I found it. This is also a problem with the older java stack that we were using.

A

So the question is two parts. Basically, why this behavior and second is: uh will this behavior impact my ranking? I see those pages appearing in search results: okay, which are not in google cache web cache, but will will it impact my uh rankings? Okay, I have another question, but this is important.

B

Okay, uh so so, first of all, the the cash pages are completely separate from what we index.

F

B

If a there's a cache page or not, it doesn't matter at all for ranking, it doesn't matter at all for indexing. uh Sometimes there are technical reasons why we don't have a cache page. Sometimes we we just don't have a cache page for individual urls.

B

The other thing is: if the page is using a javascript framework, then whether or not that javascript runs on a cache page is sometimes tricky, because the cache page is hosted on a google domain depending on what kind of javascript you have where it pulls the javascript files in sometimes the javascript can't run on the kind of google domain, and from that point of view, if you look at the cache page, then it can look like.

B

Oh it's an empty page, and maybe google is indexing an empty page, but that doesn't mean that we would index an empty page. It's just well. The javascript can't run on this page, because the the cache page is not the rendered page. It is essentially just the html file that we requested and a copy of that, and if the html file shows something that's fine, if it uses javascript and the javascript doesn't run because it's a cache page, that's equally fine and you just don't see it in the cache page.

A

So john, the challenge is when I uh search this in google webcash right, I get a 404. Google tells me that this page is not in the cache, so typical, 4.4 page, that you see for webpage the domain right so and uh the same page. Okay, the the structure of the the javascript file, the structure. Everything is same in the urls that I've given in the question. Okay, the structure and everything is same. It's just a content that is varying.

A

Are you saying that? Because, while google was indexing and was requesting that html, there was a problem with javascript, that's why it did not catch it. What could be the reason I I just want to get to the bottom of this, because there are, I have around uh um 100 000 categories, which I need to enable then corresponding to 100 000 categories around 200 000 products that I need to enable, and although what you said is correct, my seo team has said that this is what better okay, but it is still a problem for me.

A

Something that I don't understand is a problem. I cannot find a reason and it's giving me abgb's if you understand yeah.

B

I like the cash page- I I would not worry about it at all. It's it's a completely separate system at google, uh so we.

F

B

Kind of the system for indexing and ranking and the system for cash, the cash pages is completely separate, so if, if the cash page doesn't show, I would not worry about that. That's that's not a sign of any issue and.

F

B

It's also something you can't control if there's a cache page or not, so I I would just ignore that.

A

Okay sounds good to me. uh Second, is around dynamic rendering, so before we move to next chase, we had a regular react application. I am a more of a tech guy, I'm not an xc expert, but then going through certain articles on developers.google.com. I found that there is a there is a headless browser tool called rendertron.

A

Maybe you've heard about it right. So uh do I looking at that tool? Does it mean that I don't need to look at server side, rendering frameworks like next year's, where google is also investing a lot of money right? Google sponsors next year's. So should I stop looking at this and should I simply get react, generate my single page application and then use rendertron for when googlebots hit me. I redirect that traffic to rendertron, which will serve them the rendered page that is okay,.

B

I I can't give you advice like which approach would be better, because I, I think it depends a lot on the different factors that that are around your website. However,.

F

B

I would strongly recommend you join one of martin's office hours, uh so so martin on my team does javascript based office hours. I think every every other week or once a month. Something like that. I I would watch out for those and join one of those and ask him directly and see if he can take a look at your site.

B

Maybe some of the specifics from your pages and uh if you can give you some more specific advice there.

A

So I mean technically.

B

These approaches should both work, it's not that one works and the other one doesn't or that one is better than the other, but it might be that one is is easier in your specific case.

A

Okay, uh mine is an e-commerce website, but, okay, what you're saying is I'll join martin's uh officers. uh That's that's all I had in john and thanks a lot for your time.

B

Thanks cool good luck, um anna.

F

Hi john nice, to see you for the first time um as you you might have seen. I have a question on on that youtube site. The thing is that we have in google search console two domains and, um like in september, everything was working fine, as the google search console only last 16 months right and for some reason like in the other domain it just saved like in the in that console only 12 months.

F

I thought it was just like um maybe the the website or reporting uh problem, but we have also a connection uh to export those data to capula, and we also see those changes there that it's saving just on uh just 12 months and I'm not sure who else should I contact, because we don't have any local support in czech republic.

B

Yeah, I I don't think you can change that, so that's kind of the the first problem. Usually this comes from a situation where the website was verified before and lost verification and then was verified again and probably like 12 months ago was maybe when, when the verification happened again and usually when the website loses all verification, then we stop processing the data and we start processing again when it's verified again, whereas if a website was never verified at all, then we try to recreate all of the old data.

B

So it's it's kind of a. I don't know one of those situations where you're stuck essentially with the data that you have available there. It's not that you can recreate that old data. What you could try, I I don't know if it works- is to try to verify a subsection of your website. So if you have a subdirectory or a subdomain, or instead of doing the domain, verification doing the specific host name, verification see if that will trigger kind of like regenerating the the rest of the data.

B

But usually you you would just have to kind of- I don't know, keep it as is and in the future, like it'll fill up again to the 18 months, but it won't regenerate the old data.

F

Okay, well, as uh I forgot to mention, I'm a web analyst and uh like my colleague, is the uh guy from the seo. So I think that if I told him that to try to verificate those up subdomains, then you should know what to do. Yeah, usually.

B

Usually it's easy to just add those separately, but I I can't guarantee that it'll regenerate the data I I think it should, but I'm I'm not 100 sure.

F

Yeah uh this change we've noticed like it. It happened maybe like two weeks ago and on september everything was working. Fine, we've seen like the six months uh last explanators, but like in this month. Everything like it like those four months just disappeared. So that's why I have this question.

B

Yeah, it's it's annoying. I I wish I wish it's something that we would handle differently in search console, but yeah at the moment. It's not something that you can kind of like recrea force it to recreate that.

F

Yeah sure? Okay, uh that's that's all! uh Thank you for cool.

B

All right thanks a lot uh cool. um I will take a break here with the recording and I'll still be here.

B

For for more of the questions, it looks like a bunch of hands are still up um if you're watching this on youtube, thanks for sticking around to the end, if you'd like to join one of these in the future watch out in the community section on our channel for for the next versions of the the office hours, and uh with that, let me take a break here and hope to see you all again in one of the future ones.