- Needed consulting on evaluating the right tools and the right approach for cloud adaptation to offload computing from there slowing down On-premise Map R cluster
- Build the solution initially for Live data (largest module ) on an evaluation basis
- Needs an automated solution instead of manual intervention for resource configuration, deployment, scheduling, scalability, etc.
- Ability to process more than 10 TB or more – incremental data that’s coming in an efficient manner.
- Provided a cloud-optimized, On-demand Spin up solution for the Computation Offloading & snowflake based reporting solution
- 5TB or more data is extracted from on-premise on a Weekly basis from the On-Prem MAP R cluster and placed in S3 using shell script & AWS CLI executed by Airflow jobs.
- Based on data sized copied over – AWS EMR cluster is spun up using cloud formation templates & AWS CLI for executing spark & PIG scripts
- Resultant data post-processing from EMR is pushed into S3 buckets for persistence
- AWS EMR cluster is auto-scaling enabled and gets purged post-processing
Tools & Technologies
- Provided a cost-efficient – On-demand solution for computation on AWS platform
- Value added by us through the right recommendation on vs memory optimized resources needed, figuring out the resource type & best configuration for cost-efficient and optimal solution
- Offloaded jobs which would need 48 hours in On perm server to cloud and processed them within 24 hours