ggk-quote

Get A Quote

ggk-contact

+91 1234 44 4444

Cloud Strategy to Minimize Processing Time

Implemented an automated solution for resource configuration, deployment, and scheduling

Challenges

  • Needed consultation for evaluation of tools and approaches for cloud adaptation. The objective was to offload computing from existing out-moded on-premise MapR cluster to the cloud.
  • Needed a solution custom-built for their live data (largest module ) for evaluation and decision-making. 
  • Needed an automated solution for resource configuration, deployment, scheduling,  scalability, etc.
  • Needed the ability to process incoming incremental data (10 TB or more) in a better and more efficient manner.

Solutions

  • Provided a cloud-optimized, on-demand spin up solution for the computation offloading and Snowflake-based reporting solution.
  • Weekly extraction of 5TB or more data performed from the on premise MapR cluster and placed in S3 using shell script & AWS CLI executed by Airflow jobs.
  • Based on data size, copied over AWS EMR cluster is spun up using cloud formation templates and AWS CLI for executing Spark & Pig scripts.
  • Resultant data post-processing from EMR is pushed into S3 buckets for persistence.
  • AWS EMR cluster is auto-scaling enabled and gets purged post-processing.

Tools & Technologies

Amazon S3, Apache Pig, Apache Spark, Cloud Formation, Amazon EMR, MAPR, Apache Airflow, Python, R, Powershell, Snowflake, Bash

Key benefits

  • Provided a cost-efficient – On-demand solution for computation on AWS platform
  • Added value by providing best-suited recommendations for resource type and configuration for a cost-efficient and optimal solution.
  • Offloaded jobs that would need 48 hours in on perm server to cloud and processed them within 24 hours.