You work in the genomics industry and you process large amounts of genomic data using a nightly Elastic Map
Reduce (EMR) job. This job processes a single 3 Tb file which is stored on S3. The EMR job runs on 3 ondemand core nodes and four on-demand task nodes. The EMR job is now taking longer than anticipated and
you have been asked to advise how to reduced the completion time?
Use four Spot Instances for the task nodes rather than four On-Demand instances.
You should reduce the input split size in the MapReduce job configuration and then adjust the number of
simultaneous mapper tasks so that more tasks can be processed at once.
Store the file on Elastic File Service instead of S3 and then mount EFS as an independent volume for your
Configure an independent VPC in which to run the EMR jobs and then mount EFS as an independent
volume for your core nodes.
Enable termination protection for the job flow.