This is the principal use case for sitecopy - to copy content from your current site by scraping the HTML content from it and extracting data. To use it, the data you need ideally should be tagged with sufficient IDs and/or classes in order so that you can target them (if you could change the background colour of only the content you want to extract to a specific field using CSS, it is well tagged).

Sitecopy stores the pages you want to import in a custom entity type. To see and manage your pages, click on the “Data” tab from the main administrative menu, and then on the link “Sitecopy Pages”. Each page has the following properties:

  • Label: This lets you identify the pages later.
  • Page URL: Here you should put the URL to the English version of the page. Once you’ve saved this page, you should edit it, change “Current Language” to “French” and input the French URL as well.
  • Page Category: You can create page categories in the Taxonomy section under Structure. Sitecopy processes all the pages in the same category with the same set of parsers, mappers and filters.
  • Page Title: You can use this to override the node title.
  • Custom Path Alias: You can use this to override the node’s path alias

Sitecopy pages can also be the targets of an import, so you can create a CSV file with the necessary data and import it into the database.Once you have a set of pages, setting up your source is a simple as picking the sitecopy page category under which you created the pages.

