In multilabel classification, which of the following average method for calculating F1 score calculates the metrics for each label, and find their average weighted by support (the number of true instances for each label)?

