Cloud migration for supply chain industry

cloud migration for supply chain

Forrester predicts that more than 50% of global enterprises will be parts of their enterprises will run from the cloud, if not the entire company. Talking specifically of the supply chain industry which is transaction based, cloud migration is bound to give massive gains in terms of cost, scaling up and down depending on season and demand, being highly available without downtime to customers and recovering from disasters of the worst kinds within a time frame of 15 minutes.

The likes of Amazon have set the bar high when it comes to delivering to the customers within a very short span of time. Having a solid IT infrastructure is one of the basics that a supply chain should get right to meet customer expectations.

The following is the story on Sunland logistics, a 3PL that serves its e-commerce customers at scale,  took the decision to move to the cloud as it was obvious to them that having a cloud infrastructure would give them competitive edge and most importantly satisfy their customers.

About Sunland Logistics

Sunland Logistics Solutions is a third party logistics provider (3PL) specializing in warehousing and value-added services with expertise in automotive, retail, e-commerce, chemical, and paper.

Headquartered in Upstate South Carolina since 1982, Sunland operates over 2.5 million square feet across the Southeast and Midwest and is growing into a national provider, Sunland decided that IT would be their key differentiator and decided to be an industry leader for innovation in IT. The industry itself was beginning to lean on IT on a scale never seen before. From having the entire infrastructure on the cloud to bots doing inventory counts.

To realize their vision of being an industry leader in IT, Sunland hired ByteAlly as their IT partner. We are responsible for their IT infrastructure, EDI development services, custom dashboards and develop any other software solutions they might need.

Decision to migrate to cloud from the on-premise setup

As part of several steps that had to be taken to transform Sunland’s IT, Hari Sivaprakasam, COO of Sunland, under whom IT department reports, decided to first take Sunland to the cloud. All of Sunland’s IT infrastructure was running from physical machines on premise. This was limiting in many ways for the company’s daily operations and vision to scale up as a national player.

Disadvantages of on premise

There are tons of articles out there explaining the advantages of moving to the cloud over the years and we won’t be discussing that in detail, but the following are some of the biggest pain points.

  • Cost to maintain on premise is very high
  • Human resources are needed on premise to operate and maintain the servers on-premise
  • Cumbersome and time taking process to scale vertically and horizontally
  • No way to have multi-regional presence as your servers are located on your premise
  • Needs very high effort to put in a plan for disaster recovery and execute it
  • Needs very high effort to maintain a good availability time

Out of all the disadvantages listed above, the thing that hurt Sunland the biggest, was scaling up their processing capacity during peak hours. They were growing in terms of customers, number of transactions and the systems slowed down affecting operations.

Choosing a cloud provider

We considered AWS, Azure, Google Cloud as the front-runners and all of them provided the cloud services we needed to achieve our goals but we went with Azure for the reason that, Sunland’s ecosystem is dominated by Windows platform. From Active directory to the web servers to their WMS systems. Azure was a well integrated service with the Windows platform in terms of their support. There has been multiple instances already where Azure’s support helped us in troubleshooting Windows OS related issues.

Goals for the cloud migration

Cloud infrastructure features

We wanted to set-up a world class IT cloud infrastructure for Sunland. Nothing less. That was our goal from day one. We listed down the attributes of a world class cloud infrastructure and made sure our architecture achieved those attributes

  • Availability
  • Scalability vertically and horizontally
  • Security with best known practices
  • Recovery without loss

Without going into the technical details, we have listed down the things we did in order to achieve the attributes listed above,

Availability

Server availability
A very high availability is an inherent feature of a cloud infrastructure. The servers are being maintained by Azure (or your preferred cloud service).  By that, we mean, from applying important security patches to maintaining the hardware to ensuring the server’s availability. There is a lot that goes behind the scenes.  This also means you save thousands of dollars by not doing them yourself.

Azure gives you a guarantee that your Virtual machines will be available ~99.9% of the time. You can read more about SLA for Virtual Machines on Azure here

Application availability
Sunland has three major WMS systems plus a whole lot of softwares and services that would see in a 3PL. These systems, atleast most of them had an application layer with a database layer behind it. We had separate server for the application layer and separate server for the database layer. The way to ensure these two layers are available differs.

For application servers, we set-up secondary application servers by taking VM image backup of the primary server and restoring them. These servers are in stopped state until we need them.

You can use the “availability set” feature on Azure to ensure Azure doesn’t maintain/stop both your primary and secondary server at the same time. When two servers are in a single availability set, Azure ensures that atleast one of them is available during a maintenance. You can read more about it here.

Database availability

Just because your server is healthy, doesn’t mean that your database is healthy and available. There are lot of cases where this could happen. We setup “Always On Availability Groups (SQL Server)” to ensure the database layer is available.

The Always On availability groups feature is a high-availability introduced in SQL Server 2012 (11.x). What happens in short is, when your primary database server goes down for some reason, the cluster fails over to your secondary server. An availability group supports a set of read-write primary databases and one to eight sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.

You can read more about “Always On Availability Groups (SQL Server)” here

Scalability

Horizontal and Vertical Scaling

Sunland was constantly growing and the on-premise setup could not live up to the growth. Increasing processing capacity during peak operation hours became a huge problem. We wanted the ability to add new servers to balance the load (horizontal scaling), increase a server’s processing capacity (vertical scaling) if the server is loaded with traffic or down-size the server to cut cost if its under-utilized.

With Azure and most cloud providers, you can scale easily with a matter of few clicks. To cite an example, one of Sunland’s Warehouse management software was facing peak traffic and the server was running at 90% CPU. This was the system used by 60% of their customers and a slow server means huge impact on operation cost. But thanks to cloud, we were able to double the capacity during lunch break within 15 minutes. We doubled the capacity of the CPU and the RAM with just few clicks inside the Azure portal.

Had it been an on-premise setup, this might have taken hours.

Multi-regional presence
Sunland is growing as a national player and going on Azure gave us the ability spawn servers and networks across different regions close the customers and also warehouses distributed across the U.S. Sunland has 9 warehouses across the country. This will ensure that latency is not a concern.

Security

This is something that encompasses a wide range of topics. Security is not something that is applied at certain places alone. It has to put in place in each layer of your architecture. Though a lot of what we did had little to do with Cloud itself, we are listing down some of the measures we took

With the older on-premise setup,  Sunland employees were remote desktoping into the servers to access their WMS systems since two of their WMS systems were Windows Desktop applications and not web application. This was way too risky and open. We published these systems as RemoteApps. This way the user does not have to Remote desktop into the server to access their systems.

Anti-virus – Not only did we want to cover our servers on the cloud, we wanted to protect the laptops and desktops of all the employees at Sunland. Clint Spicer, Sunland’s IT Manager, spent a lot of time narrowing down the options for a good anti-virus solution and chose Cylance as the anti-virus software.

It is easy to setup on new servers or machines joining our network, it can monitors files across the entire network proactively and uses AI to learn the different shapes of malicious files out there.

CylancePROTECT product — is predicting how malware, zero-day attacks, and other cyberthreats can attack networks, and then heading them off at the pass, “eliminating the need for individual security teams to analyze and develop expertise in defending against each new cyberattack.” You can read more on Cylance here.

The measures we have listed are some of the steps we took. We would recommend that you read Azure’s best practices for security.

 

Recovery

Servers should be disposable and we should be in a position to spawn a new copy of any given setup in minutes. This was one of the design principles we had adopted.

For disaster management, having a strong backup plan in place is the key. Here are the kind of backups we are dealing with,

Backups of a virtual machine 

Whenever there is a setup update (installations, patches etc), we take full backup of the VM. This was when we lose the server during a catastrophe, we can spawn the setup on a new server. More details here

Database backups (Transactional, Differential, Full backup)

We take transactional backups once every 15 minutes, a differential backup at the end of the day and a full backup of the databases once every weekend. More importantly, these backups are stored on a geo-redundant storage. Meaning when the storage itself is lost, you can recover the files from a copy of the storage probably from a different region. Let’s say you are storing your data in US East and a natural disaster occurs, you can recover a copy of your files from US West. You can read more about Azure’s geo-redundant storage here

Backup of files and disks

We take snaphots of our files regularly. Azure maintains a point in time read only backup of your files and this process can be automated using Azure library. More details here

Given this setup, even with an attack of the worst kind, Sunland can recover within a matter of few minutes.

Conclusion

Because of the fact that our customer is a logistical company responsible for thousands of shipments in a given day and the operations relied heavily on IT systems, we had to absolutely careful in ensuring we did not disrupt the operations during migration. The project is a success and we are listing down some of the thumb-rules we followed.

  • Clearly list the problems the company is facing because of IT infrastructure
  • Research on existing tools & service providers that will help you solve the problems as efficiently as possible
  • Understand the existing setup and architecture deeply. There will always be dependencies and they will dictate the order in which the migration should be performed. This is no easy task and should be spent with utmost care
  • Setup test environments and do trial runs
  • Start small by migrating a small portion of the infrastructure
  • Migrate in phases and avoid migrating more than you can handle

 

Further reading https://www.forrester.com/report/Predictions+2018+Cloud+Computing+Accelerates+Enterprise+Transformation+Everywhere/-/E-RES139611

https://azure.microsoft.com/en-us/migration/