And we could accomplish that by using the SQS service,
by using specifically two queues.
So we have a request queue,
which we use to queue the requests.
So the requests, they will come from the users,
and they are processed by the web server,
the web server will deposit the requests to the request queue.
And then the process, the updated tier,
the processing servers will choose the request from that request queue.
And based on the number requests in the queue,
the application can decide how many instances that
need to be run in order to handle all the pending requests now in the queue.
So we can measure the length of the queue,
and use the massive queue to infer that the load,
the current load on the application.
And then after these requests are processed by the processing servers,
the responses are also queued in the response queue.
The web server then will retrieve those responses from request queue,
and then send them back to the users.
So, in this way, we used the SQS first to decouple,
to guide the different tiers of our application.
And then we use this SQS,
this two queues to allow the two tiers to operate at different speed,
and to allow the app tier to automatically scale up
and down based on the load that is received by the application.
This is indeed one typical way to implement auto-scaling for a call application,
does not have to be a one-tier application like this.
This prior time can be used to support auto-scaling for
any application as long as you can decouple the components in the application,
and then figure out which components need to be automatically scaled up and down.
Then you can use the queues to support it, to implement auto-scaling.