Onpage Project Settings

The SISTRIX Onpage projects are a powerful tool aimed at improving your website. To adapt this feature to your website in the best way possible, you have access to numerous different settings. On this page we’ll explain them all.

Project

Here you can find the general project settings. These settings usually refer to the whole project, not to a specific section of it.

Name

The name – or project name – is used to identify the Onpage project inside the Toolbox. You’ll find it on the project startpage and in the project dropdown, when you switch between projects. You can change the project name as often as you like. All changes will be immediately saved.

Project Scope

The project scope fundamentally affects the entire project. Here you can decide whether the project should be focused on the entire domain, a single hostname or a specific path. These settings are used from the Onpage-Crawler for Onpage analysis, and from the Keyword-Crawler for the rankings, as well as for many other features.

  • If you enter a domain (example: sistrix.com) we will evaluate all URLs which belong to this domain. For example: https://www.sistrix.com/test and http://old.sistrix.com/old.html but not https://www.sistrix.de/test.
  • If you specify a hostname/ subdomain (example: www.sistrix.com) we will evaluate all URLs contained in this hostname. This would mean, for example: https://www.sistrix.com/test but not: http://old.sistrix.com/old.html.
  • If you select a path (example: https://www.sistrix.com/blog/) remember to write the protocol: http:// or https:// too) – we will evaluate only the URLs which belong to that specific path. So: https://www.sistrix.com/blog/test.html would be crawled but https://www.sistrix.com/test.html wouldn’t.

Changes on the project scope are taken into account from the next crawling on. Note that the changes could affect the historical data of the project: if the number of pages/rankings increases or decreases after those changes, the historical project data might not be cleanly comparable anymore. It is advisable to carefully set the project scope according to its structure, without changing it afterwards.

Onpage-Crawler

The SISTRIX Onpage-Crawler regularly scans your website. As every website is different from the others, here you have many different individual settings. For the majority of the websites, our standard options are enough, so we suggest only to make changes if there is a reason to do so.

Crawler-Engine

Here you can select the crawler-engine for your project between one of the following options:

HTML-Crawler: With this option, the unprocessed HTML is evaluated as delivered by the web server, without JavaScript-Parsing. These settings allow a significantly quicker crawling and put the least load on the web server.

JavaScript-Crawler: Some websites use JavaScript to make pages more interactive. This option became available in the projects as nowadays Google supports JavaScript. So, exactly like Google, the data are based on the current version of the Google web browser Chrome. Activate this option to crawl your project with JavaScript support from now on. The crawling of your project will become slower as more resources (both in the crawler and on your web server) will be used.

Mobile-Crawler: This crawler-engine is based on the JavaScript-engine. With this option, JavaScript will be rendered for all pages and the viewport for the crawler will also be set to the screen size of an iPhone. Some webpages show different content-elements and internal linking according to the dimensions of the device. This option simulates at best Googlebot’s Mobile-First-Crawling.

Crawling Frequency 

With this option you can decide how often the onpage-crawler should be automatically activated. With the standard settings your website will be crawled weekly, but you can also change the frequency to biweekly or monthly.

It’s not possible to automatically crawl your website more than once per week, but you can start the crawler manually every time you need. You can set the exact crawl-time in the expert settings.

Maximum Amount of Requests 

Define the maximum number of requests that should be used for this project. The overall crawling limit for every Onpage project depends on the package that you booked.

A request is a single call of an HTML page, but also of resources such as images, CSS files and other integrated files as well as external links.

Concurrent Requests

In order to completely crawl big and extensive websites, we often need more Onpage-Crawlers to work at the same time. Here you can decide how many crawlers have to work together on your project.

If you use more parallel crawlers, your crawling will be completed quickly, but with the disadvantage that your web server could be overloaded. Here you have to choose between speed and web server load. If the Onpage-Crawler detects an overloading of the web server, it will automatically reduce the number of parallel requests.

HTTP Login Data 

With this feature you can crawl the webpages which are hidden behind a password. This is especially advisable before relaunches and similar major changes. This way, you can monitor the staging environment with the Onpage-Crawler before it goes live for Google.

Only websites protected by standardised HTTP-authentication can be crawled with this feature. Individual password-fields (on an HTML-page) cannot be filled in.

Onpage-Crawler: Expert Settings

Many Onpage-Crawler settings are only necessary in special cases such as if the web server is configured differently than usual or there are other special cases which require an exception. Here, in the Expert Settings, you will find these special settings. You have to activate this section before you can use it.

User-Agent (Crawler)

With the user-agent the crawler identifies itself to the web server. As a default we use the following User-Agent: Mozilla/5.0 (compatible; Optimizer; http://crawler.sistrix.net/). Here you can personalise the User-Agent used for your project. These settings have no effect on the robots.txt file parsing.

User-Agent (robots.txt)

This User-Agent is used to process the crawling instructions in the robots.txt. By default the SISTRIX Onpage Crawler looks for the word “sistrix” here. By changing it to “google” or other terms, you can affect the crawling behaviour of the project.

Crawl Time

Choose when the crawler should regularly crawl your page. With this option you can schedule the crawling to night hours or during the weekend, in order not to overload your web server. This option is advisable especially for extensive and slower websites.

Fixed IP Address

The Onpage-Crawler is usually selected dynamically from a large pool of available crawl servers. This has the advantage that free crawl-slots are always available. However, the crawler’s IP Address changes regularly.

To prevent this, you can activate a fixed IP Address. However, such projects may cause delays in the crawling process.

Crawl-Delay

Use this option to set a pause between accesses to your webserver. Please note that this could strongly increase the crawl time of your project. The crawling will be stopped after exceeding a time-limit of 24 hours.

Startpages

For some particular configurations the Onpage-Crawler may not be able to define the right startpage for project crawling. This happens, for example, when the user is redirected according to the browser language.

With this option you can add other startpages, which will be visited by the Onpage-Crawler on the first step of the crawling, in order to fully cover the project. You can specify HTML-Sitemaps or general pages with many internal links.

XML-Sitemaps

With XML-Sitemaps, the URLs of a project are transferred to web crawlers in a standardised and machine-readable format. Most search engines, like Google or Bing, support this standard.

The SISTRIX Onpage-Crawler can access your existing XML-Sitemap. If the XML-Sitemap is not referred to in the robots.txt, you can explicitly add it here.

Other URL Sources

Alongside the links found on the website, the Onpage-Crawler can use even more URL sources. This is advantageous as the pages which aren’t internally linked anymore, but still exist, can be found and crawled too.

You can add URLs from the Google SERPs, external links and Social likes, or use the Google Search Console data integration. When integrating the data you can also define which country index to use.

Virtual robots.txt File

The Onpage-Crawler has access to the project’s robots.txt file available online, and follows its rules. To do that, we use the same robots.txt parsing of Googlebot.

If you want to test some changes on your robots.txt or define different rules for our Onpage-Crawler – which shouldn’t be publicly available – you can use a virtual robots.txt.

To do this, take the text from the robots.txt and paste it inside the text field. The structure of the virtual robots.txt must correspond to that of the complete, “real” file with all rules and instructions. During the following crawling, the Onpage-Crawler will follow the new rules, and not those publicly available in the robots.txt file.

Gather resources 

By activating this option, the Onpage-Crawler gathers all page resources, besides HTML. These are images, CSS files and other embedded files. In here you can check whether these files are still available and how big they are, alongside many other checks.

Gather external links

The Onpage-Crawler uses this option in the standard settings to check whether external links are reachable. Here you can deactivate this option and gather no external links.

Sort URL-Parameters

In the standard options the Onpage-Crawler handles URL-parameters as part of the URL, without changing or adapting them. With this option you can alphabetically sort the URL-parameters and avoid duplicate content generated by their inconsistent use in your analysis.

Delete URL-Parameters 

Here you have the option to delete specific URL-parameters during project crawling. Like the similar Google Search Console feature, you can delete session-parameters or similar URL-parameters. To do this, write the parameter name in the text field.

Performance Monitoring

The website performance is a ranking factor. From 2021 Google will officially include the loading speed in the search results sorting process. Thanks to the “Performance Monitoring” you can get an overview of your website’s performance.

Performance-Checks 

The project performance check verifies the loading time of your website, including images, JavaScript files and CSS files. To do this, we access the website with a browser and measure the time necessary for complete loading. These checks are done in Germany and in many other countries, and can be measured in a web-analysis tool.

Uptime monitoring

With Uptime Monitoring you’ll always know whether your website is offline. To do this, we check the project startpage once every minute to see if it’s available.

Alert by E-Mail

Whenever uptime monitoring finds an error – the project is offline or an error message is displayed – we can alert you with an e-mail. To use this feature you have to activate uptime monitoring.

E-mail Alert Delay

With this settings you can choose whether to receive an immediate e-mail when your website is not reachable, or after a certain, defined number of failures. This way you can avoid false alarms and perhaps even sleep better!

Competitors

For each Onpage project you can define up to 6 competitors. These competitors will be used, for example, to compare your project Visibility Index with theirs – and in other parts of the project. Unfortunately it is not possible to define more than 6 competitors.

Depending on the input, the competitors will be analysed as entire domains (domain.com), hostnames/subdomains (www.domain.com) or directories (http://www.domain.com/path).

Rankings & Keywords

Keywords, i.e. the search terms entered in search engines, are still the basis of search engine optimisation. In your Onpage project you can monitor the rankings of the keywords you defined according to many different countries, cities, devices and search engines.

Project Visibility Index

Here you can define how often the project Visibility Index should be created. Even if only a part of the keywords is crawled daily, you can still create a daily project Visibility Index based on the updated data.

Ranking Changes

Defines whether ranking changes should be monitored to the day or to the week before (minus 7 days). With these settings you can quickly notice important developments, especially if you have a large number of daily-crawled keywords.

Keyword Management

In the keyword management you can add, edit and delete project search queries. Rankings will be regularly evaluated on the basis of these keywords. Also the project Visibility Index and other KPIs are based on these data. Here you have different setting options:

  • Country: You can choose between more than 360 Country/language combinations.
  • City: More than 10 000 cities are available for local rankings. While we monitor search results nationwide, you can use this setting to evaluate local SERPs.
  • Device: Check desktop, tablet or Smartphone results.
  • Frequency: Here you can choose how often your keywords should be. Depending on the setting, you will be charged a different amount of keyword credits.
  • Search Engine: Besides Google, you can also monitor your rankings on Bing, Yahoo and Yandex.
  • Tags: Tags help you to organise your keywords. You can also analyse your competitors according to specific tags and see a project-tag Visibility Index.

Team

Working to improve your website with your team is possible in the project team management settings. You can assign rights for editing all features, viewing and delivery of e-mails.

Here you can also invite external users (without a SISTRIX account) to the project. They cannot edit the projects themselves, but they can view them and receive regular update e-mails.

Delete Project

You can delete your Onpage projects anytime you like. Please note that they’ll be actually deleted ;-) All data contained in the project (keywords, rankings, performance data, Onpage crawlings) will be permanently removed from our servers and databases. This content will also be no longer available in previously created reports.

03.04.2024