In order to insert/update large volumes of data you have to take care of some issues:
- XML/SOAP introduces some overhead;
- You should let the system optimize transactions;
- You should think about the purpose of your import: are you creating a target for a communication delivery? Or are you keeping sync your database with a contact's database ?
Dealing with XML/SOAP overhead
In order to deal with XML/SOAP overhead you should:
- Use a client technology which supports MTOM/XOP (SOAP Message Transmission Optimization Mechanism/XML-binary Optimized Packaging);
- Find a trade-off between the number of http requests and the size of them;
- Use asynchronous operations to update your data (that is startBatchCustomDataUpdate).
Use system built-in functions to optimize transactions
When dealing with large volumes of data you should let the system take control of transaction, that is "load data" and trigger the system to work on it.
Usually you will follow these steps:
- Prepare your data (for example in CSV files, create files from 5 MB to 10 MB in size);
- Create a rowset using createCustomDataRowSet;
- Add your data to the rowset using addDataToCustomDataRowsSetFromFile using a CSV file;
- Start the operation, using startBatchCustomDataUpdate and save the result, which is the ID of a batch operation execution, you will use this ID to get the status and report of your operation;
- Pool the status of your batch update using getBatchStatusInfo and waiting for the operation to be finished;
- Download the detailed report of your operation using fetchBatchCustomDataUpdateReport.
Importing data on a column of the custom table which contains IDs of contacts
ISometimes in order to create relations between a custom table and the CONTACTS table you create a column of type 'int' and fill it with contact identifiers (the idcontact field).
This practice is more performant than using a nvarchar column which contains the email (or other string-based primary key value) of the contact, because 'joins' are more performant on int columns.
When you want to import data on such a column and you do not have the id of your contacts (which is a very common case, you usually deal with logical primary keys/emails and not with 'idcontacts') you can use the lookupContactColumn advacedOption and the system will performn the join while writing the data on the underlying database.
Example scenario:
- Your table has a column named 'idcontact', which contains ids of contacts (and not emails)
- The table has another column called 'myid', of type int
- You want to import data in the table, but you do not have contact ids, but only emails (which is in turn the primary key of your contact's database)
- Create a CSV file which contains for each record to be imported the myid column and email of the contact
- Map the column of the file which contains the emails to the 'idcontact' column
- Add the lookupContactColumn advanced option, setting its value to 'idcontact' and add the lookupContactIddatabase option to the id of the database which contains your contacts
- Start the operation and see the results: the system will automatically lookup for contact's and replace each email with the id of the respective contact
- For records whose email is not found the result field of the MNCustomDataOperation will be 'contactnotfound'