Writing a custom migration from Drupal to Contentstack using the Managament API

  • 12 September 2022
  • 0 replies
  • 152 views

Userlevel 3
Badge

Migrating from Drupal to Contentstack with non-standard entities

Migration from one CMS to another is not always easy. We recently did a migration from Drupal to Contentstack. In this post, I will aim to provide you with my strategy, in hopes that it could help others in the future. If there is any feedback or discussion, I would love to hear your thoughts! 

 

Leverage JSONAPI for Drupal

Contentstack has a migration tool you can use. I tried this out, but I think in most cases the Drupal instance will have more complicated content than just standard “out-of-the-box”. What works for some people was not working for me.

Instead, I opted to write a custom migration script that interacted with the Drupal JSONAPI. This provided a pretty easy framework for me to grab the data I needed and map it to the new data on Contentstack.

There is a node-js module for normalizing JSONAPI results which saves you from having to loop over includes. You can find this package at @lysyi3m/json-api-normalizer. I would highly recommend it.

My JSONAPI endpoint querying Drupal for information:tKa9BsRAMVknwVqzPURmCq2Vq1efpau8RfVx5k3kUtknXySxV1R6EqZHKCNzV17NYAvB20wM0O2QtRyEWmqBy79CtIsjzqQxiMcpoltYKupLAYDL9DYlebwGKdfVI_poeBPQraVIyZDFcROt-skw4zhr8clgKX7rMRJctKFqiMFsa1xy5OYJxQym2w

 

Use the Contentstack postman collection

Contentstack does provide a pretty awesome Postman collection. This is useful for doing things like modeling your new/ideal content type in Contentstack, and grabbing that entry to see which fields you can fill out for the import.

Contentstack Postman Collection showing entry query:axUspAzHZREIejuGFqd7SKKVKPXSenUvEhMuHniwDuXoMslzPFJajRS5RAXSzUoer1NHJb_I3fNWSNZq0TbkGBSsS58cPFlBFwFy-qJUxk3q09OMu3Wx4OjTlLyflowcJ_jOWg5MKQc8Qn-N-Hh6JLBorl87JQWBdQPI89yQxdEMk2XWeHK9p5LddQ

 

Use the Contentstack Management API

The Contentstack Management API has all the necessary tools you need for a migration. The documentation is okay - it could have more examples, but tends to be good enough to get the job done.

The documentation for this is located here.

 

Migration strategy

I needed to migrate nodes (a content type) from Drupal into Contentstack. These nodes had a file attachment which meant I needed to handle asset uploads to Contentstack. Aside from that, the data from Drupal into Contentstack was pretty straightforward.

First I had to authenticate with Drupal. If you don’t do this, you will only have access to what anonymous users would have access to. Drupal authenticates with an X-CSRF-Token, and it is stored in the session.

I wrote a helper function to retrieve the CSRF Token. I made a POST to mydrupal.com/user/login?_format=json with my user/pass, and a Cookie: cookie.txt header to save the cookie. I then was able to grab the cookie from the response header for all my future requests.

I then wrote a helper function to get my nodes. This was a GET request to mydrupal.com/jsonapi/node/mynode with an include and a filter:

'mydrupal.com/jsonapi/node/mynode?include=field_pdf,field_category&filter[status][value]=1&filter[type-filter][condition][path]=field_update_type&filter[type-filter][condition][operator]=IS NULL';

I wanted to grab all nodes that didn’t have the field_update_type checkbox checked. This query worked great, but I had way too many nodes, which meant I had to implement pagination. The pagination in JSONAPI is pretty easy as you just need to check the response to see if it has a ‘next’ link.

Most of the logic for the import script was done within a while(hasNext) loop. The hasNext was what controlled the pagination to the Drupal JSONAPI. I also created an array to store any failed entries given the Contentstack Management API has a rate limiting. I added an Axios rate limiting API but even then was still hitting the Rate limit from time to time. 

Once I had the data from the Drupal JSONAPI, the next step was creating file assets in Contentstack. This is because, in order to link an entry to an asset, you use the asset’s created UID. I will paste the code I used to create the Upload body only because it took me a bit to get it right.

const fileObj = {

              title:filename,

              url: ‘/path/to/file/file.pdf’,

              parent_uid: foldersParentUid

            };

            const fileFormData = new FormData();

            fileFormData.append(

              'asset[upload]',

              fs.createReadStream(fileObj.url)

            );

            fileFormData.append('asset[parent_uid]', `${fileObj.parent_uid}`);

            fileFormData.append('asset[title]', `${fileObj.title}`);



            try {

              const result = await http.post(

                'https://api.contentstack.io/v3/assets?relative_urls=true',

                fileFormData,

                {

                  headers: {

                    api_key: String(process.env.api_key),

                    authorization: String(process.env.management_token),

                    branch: String(process.env.branch)

                  }

                }

              );

The response contained the new asset’s UID, which I then stored in an array of objects that mapped the Drupal node ID to the files (since there could be more than one file on the node).

Once I had the file(s) figured out, all I had to do next was build the Entry’s body. To do this easily, I made sure to create my desired Content Model in Contentstack, exported it, and used that as the framework for the API body. To aid in debugging, I also added to the Content Model the Drupal node id, and the Drupal node URL. This made it easy to check back and see if the data came over correctly.

Once created, we wanted to publish all these entries immediately - to save on manual publishing. I created an API call to the /publish endpoint using the UID that was returned from the create API call.

Lastly, I had to ensure that I handled any failures, and checked for possible errors. In my case, a Failure happened on the occasion of hitting the Rate Limit for Contentstack, or a duplicate in title/URL coming from Drupal. To solve for rate limiting, I populated my failedEntries array with the entry that failed. To solve for the duplication, I just ended up appending “Duplicate - randomString” to the title and URL.

At the end of the script, I just looped over any failedEntries and tried them again. I didn’t bother re-running further failures and opted to just log the issue to investigate manually.

 

Conclusion

Writing a migration from Drupal to Contentstack was fairly straightforward. Definitely some hiccups around some of the documentation. Once I saw the Postman Collection for the management API it was a lot smoother after that.

I created some helper functions to remove assets/entries as I tested the migration work. In hindsight, I should have used some sort of unit test as it would have saved a lot of headaches. 

I would strongly recommend the JSONAPI normalizer library I mentioned above, as it really makes digesting the JSON response from Drupal a lot easier.

I wrote some logging functionality using Log4JS so I could ensure that any errors were caught for manual review.

Finally, I leveraged the branch functionality within Contentstack so I could ensure that any of my migration work was segregated to its own sandbox. This allowed me to import/delete entities with ease and without fear of messing up the production stack.

I hope this helps someone in the future, and if nothing else to just share my great experience using Contentstack.


0 replies

Be the first to reply!

Reply