Your company is in the process of developing a next generation pet collar that collects biometric information to
assist families with promoting healthy lifestyles for their pets Each collar will push 30kb of biometric data In
JSON format every 2 seconds to a collection platform that will process and analyze the data providing health
trending information back to the pet owners and veterinarians via a web portal Management has tasked you to
architect the collection platform ensuring the following requirements are met.
Provide the ability for real-time analytics of the inbound biometric data
Ensure processing of the biometric data is highly durable. Elastic and parallel
The results of the analytic processing should be persisted for data mining
Which architecture outlined below win meet the initial requirements for the collection platform?
Utilize S3 to collect the inbound sensor data analyze the data from S3 with a daily scheduled Data Pipeline
and save the results to a Redshift Cluster.
Utilize Amazon Kinesis to collect the inbound sensor data, analyze the data with Kinesis clients and savethe results to a Redshift cluster using EMR.
Utilize SQS to collect the inbound sensor data analyze the data from SQS with Amazon Kinesis and save
the results to a Microsoft SQL Server RDS instance.
Utilize EMR to collect the inbound sensor data, analyze the data from EUR with Amazon Kinesis and save
me results to DynamoDB.
The POC solution is being scaled up by 1000, which means it will require 72TB of Storage to retain 24 months’
worth of data. This rules out RDS as a possible DB solution which leaves you with RedShift. I believe
DynamoDB is a more cost effective and scales better for ingest rather than using EC2 in an auto scaling group.
Also, this example solution from AWS is somewhat similar for reference.