• Insights

Integration & Importing data with Kentico Scheduled Tasks

Senior Developer Lee takes you through his method for integrating & importing using Kentico's scheduled tasks.

Be it products or content, nearly every client requires some form of import process and many of them want to run this import in an automated fashion but are unable or reluctant to do the work on their end to use the inbound integration bus provided by Kentico.

For these clients, I use Kentico’s scheduled tasks and I have come across several issues that I’m going to share with you below, followed by my general approach to this development.

Timeouts

When running locally in Visual Studio you won’t notice this issue since requests won’t time out in debug mode. However, once a site goes live you can find that your long-running import tasks can often time-out and leave your imported data in limbo, especially if the client uploads a file larger than you expected.

To get around this, rather than increasing the timeout period for the entire site, I design my scheduled tasks to run in batches, rescheduling the next batch each time until the whole import is complete. This means that each instance of the task can run to completion without timing out.

It also means that, if the data permits, multiple batches can be executed in parallel allowing for much faster imports.

Here’s a snippet of code that can be used inside a scheduled task to easily reschedule its self:

var taskStartTime = DateTime.Now.AddSeconds(10);  
taskInfo.TaskData = JsonConvert.SerializeObject(taskData);  
taskInfo.TaskInterval = SchedulingHelper.EncodeInterval(new TaskInterval  
{  
    Period = "once",  
    UseSpecificTime = true,  
    StartTime = taskStartTime  
});  
taskInfo.Update();  

In this example the taskData is a strongly typed object that I use to hold data about how many and which items have been processed already to allow the next instance to pickup where the current instance finished.

File Upload Sizes

In many cases, the client will want the ability to manually upload their import files through the CMS interface. In these cases, I tend to provide a custom module UI by way of an ASPX page that allows the user to upload a zip file.

The zip file is then extracted to a temporary location from which I can read the CSV, XML or JSON (depending on what the client provides) files as needed.

There can be a couple of different issues encountered here but the main one is that ASP.Net has a default maximum file upload size limit.

To increase the maximum file upload size there are two locations in web.config you will need to amend:

<system.web>
   <!-- maxRequestLength for asp.net, in KB -->
   <httpRuntime maxRequestLength="15360" ></httpRuntime>
</system.web>

… and …

<system.webServer>             
   <security>
      <requestFiltering>
         <!-- maxAllowedContentLength, for IIS, in bytes -->
         <requestLimits maxAllowedContentLength="15728640" ></requestLimits>
      </requestFiltering>
   </security>
</system.webServer>

It is important to maintain a limit for other reasons, but you should increase it to avoid errors when uploading, as well as handling the exception so that your application can display a nice error message if the user tries to upload a file that is too large.

Approaching this kind of integration

First, I identify how the data to be imported will be delivered to Kentico. This will, in general, be either a file transfer to a drop folder on the server (or to a cloud storage account) or more often a custom module UI page that allows the user to upload a file. In some cases, it might be a scheduled task that retrieves the data from a third-party API.

In all cases, the bit of code that receives or retrieves the data must not be the processor for that data. Instead, it should translate the data into smaller chunks and store each chunk independently. The chunks you choose will depend on the data’s structure and context but in all cases, a chunk should have no external dependencies on other chunks. (i.e. for purposes of the import, one product should not rely on the existence of another within Kentico).

Next, I create a class that implements ITask. This class does the work of importing either a single chunk or, more commonly, a fixed number of chunks, before rescheduling its self. Once it reaches the completion of the number of chunks defined the task should reschedule its self for some time in the future (5-10 seconds normally). When the task starts again it will take another set of chunks and process those.

Here’s an example with some pseudocode comments:

public class CustomImportTask : ITask
{
    public CustomImportTaskData TaskData { get; set; }
      
    public string Execute(TaskInfo task)
    {
        // Get the task data
        TaskData = JsonConvert.DeserializeObject<CustomImportTaskData>(task.TaskData);

        if (!TaskData.FilesToImport.Any())
        {
            // If we have no import files then just complete the task.
            return CompleteTask(task);
        }

        if (string.IsNullOrWhiteSpace(TaskData.CurrentImportFile))
        {
            // Select the first file if none already selected
            TaskData.CurrentImportFile = TaskData.FilesToImport.First();
        }

        if (ProcessStep(task) == ProcessStepEnum.Completed)
        {
            // Current file has been completed - start the next file or complete task as required
            var currentIndex = TaskData.FilesToImport.IndexOf(TaskData.CurrentImportFile);
            if (currentIndex == TaskData.FilesToImport.Count - 1)
            {
                return CompleteTask(task);
            }
            TaskData.CurrentImportFile = TaskData.FilesToImport[currentIndex + 1];
        }

        var taskStartTime = DateTime.Now.AddSeconds(TaskData.RescheduleSeconds);
        task.TaskData = JsonConvert.SerializeObject(TaskData);
        task.TaskInterval = SchedulingHelper.EncodeInterval(new TaskInterval
        {
            Period = "once",
            UseSpecificTime = true,
            StartTime = taskStartTime
        });
        task.Update();

        return $"Process in progress. Task rescheduled to run next batch of {TaskData.OperationsPerPass} in +{TaskData.RescheduleSeconds} seconds.";
    }

    private ProcessStepEnum ProcessStep(TaskInfo task)
    {
        // Open the import file here
         
        var operationsCurrentPass = 0;
        while (operationsCurrentPass < TaskData.OperationsPerPass /* && current file still has chunks to complete */)
        {
            // Do the work of processing one section/chunk from the file here
            // starting from `TaskData.OperationsCompleteInCurrentFile` + 1
            // ensuring that you have a way of moving on to the next item.
          

            operationsCurrentPass++;
        }
        TaskData.OperationsCompleteInCurrentFile += operationsCurrentPass;

        return ProcessStepEnum.InProgress; // return completed if the whole file has been processed.
    }

    private string CompleteTask(TaskInfo task)
    {
        // Do and event logs or reporting here that you need to do
        // after all import files have been processed.
        //
        // Include a case for if no import files were found or
        // they were in the wrong format.

        // Clean up the import files from the extracted directory

        task.Delete();
        return "Import Completed.";
    }
}

enum ProcessStepEnum
{
    InProgress,
    Completed
}

I store any errors in data, the overall progress of the import and the current chunk or next chunk in the “TaskData”. I find it helpful to store TaskData as JSON and use a strongly typed model object for serialization/deserialization as needed

For example:

public class CustomImportTaskData
{
    public List<string> FilesToImport{ get; set; }
    public string ImportFileExtension { get; set; }
    public string CurrentImportFile { get; set; }
    public int OperationsCompleteInCurrentFile { get; set; }

    public int OperationsPerPass { get; set; }
    public int RescheduleSeconds { get; set; }

    public Dictionary<string, Dictionary<string, string>> Messages { get; set; }

    public void LogMessage(string importFilename, string section, string message)
    {
        if (Messages == null)
        {
            Messages = new Dictionary<string, Dictionary<string, string>>();
        }

        if (!Messages.ContainsKey(importFilename))
        {
            Messages.Add(importFilename, new Dictionary<string, string>());
        }

        if (Messages[importFilename] == null)
        {
            Messages.Add(importFilename, new Dictionary<string, string>());
        }

        if (!Messages[importFilename].ContainsKey(section))
        {
            Messages[importFilename].Add(section, message);
        }
        else
        {
            Messages[importFilename][section] += $"\r\n{message}";
        }
    }
}

Finally...

Once the task has completed all the chunks for this import it should write out its progress and any errors to a custom logging table or email report and then delete its self. This keeps the task list clean and still allows you to act on any errors in the process or data.

With this method, if any unhandled exceptions pop up the task will not get deleted, and you will be able to troubleshoot it by re-running the task. Any unhandled errors that do arise should result in changes to the process to handle them in future so that the reporting process can report them to you instead.

Author:Lee Conlin

More insights