If there are lot of key-values to merge, a single reducer might take too much time. To avoid reducer machine becoming the bottleneck, we use multiple reducers.
When you have multiple reducers, each node that is running mapper puts key-values in multiple buckets just after sorting. Each of these buckets go to designated reducers. On every reducer, the bucket coming from all mapper nodes get merged.
On the mapper node, which key would go to which reducer node is decided by partitioner. By default, partitioner computes hashcode of the key which means generating a number corresponding to the string. And then divide the hashcode by total number of reducers. Whatever is the remainder, it would send the key to that bucket for that reducer. If the result is 0, the key would be kept in the bucket for the 0th reducer on the same mapper node. Once the mapper node finishes processing the input data, this bucket would be taken to the corresponding reducer node. This model ensures that each key goes to some reducer, it should not be dropped and also, the same key from different mapper nodes would go to the same reducer node. The cases like this would never happen: x key on mapper node 1 went to first reducer while the same x key from mapper node 2 went to second reducer.