ggk-quote

Get A Quote

ggk-contact

+91 1234 44 4444

Cloud Strategy to Minimise Processing Time

Implemented an automated solution for resource configuration, deployment, and scheduling

Challenges

  • Needed consulting on evaluating the right tools and the right approach for cloud adaptation to offload computing from there slowing down On-premise Map R cluster
  • Build the solution initially for  Live data (largest module ) on an evaluation basis
  • Needs an automated solution instead of manual intervention for resource configuration, deployment, scheduling,  scalability, etc.
  • Ability to process more than 10 TB or more – incremental data that’s coming in an efficient manner.

Solutions

  • Provided a cloud-optimized, On-demand Spin up solution for the Computation Offloading & snowflake based reporting solution
  • 5TB or more data is extracted from on-premise on a Weekly basis from the On-Prem MAP R cluster and placed in S3 using shell script & AWS CLI executed by Airflow jobs.
  • Based on data sized copied over – AWS EMR cluster is spun up using cloud formation templates & AWS CLI for executing spark & PIG scripts
  • Resultant data post-processing from EMR is pushed into S3 buckets for persistence
  • AWS EMR cluster is auto-scaling enabled and gets purged post-processing

Tools & Technologies

Amazon S3, Apache Pig, Apache Spark, Cloud Formation, Amazon EMR, MAPR, Apache Airflow, Python, R, Powershell, Snowflake, Bash

Key benefits

  • Provided a cost-efficient – On-demand solution for computation on AWS platform
  • Value added by us through the right recommendation on vs memory optimized resources needed, figuring out the resource type & best configuration for cost-efficient and optimal solution
  • Offloaded jobs which would need 48 hours in On perm server to cloud and processed them within 24 hours