Project - Yolov4 with OpenCV for Object Detection

7 / 7

Post Processing

  • Generally in CNNs, we have an output layer at the end. But in Yolov4, we have 3 output layers. An output layer is not connected to any next layer. So, when we input an image to the network, it gives 3 outputs:

    • 1083 (19 x 19 x 3) for large objects
    • 4332 (38 x 38 x 3) for medium objects
    • 17328 (76 x 76 x 3) for small objects
  • There are 80 classes in all, so the network predicts the class of the object for each grid. Each prediction has a bounding box with 4 coordinates, 1 objectiveness score, and 80 predictions confidences.

  • We will first collect all valid predictions i.e. wherever the confidence score is higher than the threshold. We will collect the box coordinates, confidences, and classIDs.

INSTRUCTIONS
  • Create empty lists for boxes, confidences and classIDs.

    boxes = []
    confidences = []
    classIDs = []
    
  • We will iterate through all the outputs in the layerOutputs, and loop over each detection. We will filter weak predictions and update the boxes, confidences, and classIDs lists with stronger predictions:

    for output in layerOutputs:
        print ("Shape of each output", output.shape)
        # loop over each of the detections
        for detection in output:
            # extract the class ID and confidence (i.e., probability)
            # of the current object detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
    
            # filter out weak predictions by ensuring the detected
            # probability is greater than the minimum probability
            if confidence > 0.3:
                # scale the bounding box coordinates back relative to
                # the size of the image, keeping in mind that YOLO
                # actually returns the center (x, y)-coordinates of
                # the bounding box followed by the boxes' width and
                # height
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype("int")
    
                # use the center (x, y)-coordinates to derive the top
                # and and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))
    
                # update our list of bounding box coordinates,
                # confidences, and class IDs
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)
                print (LABELS[classID], detection[4], confidence)
    
  • Print the length of boxes:

    print (len(boxes))
    
  • There are a total 24 valid predictions. Some of these predictions are overlapping. They are filtered using NMS (Non maxima suppression). It takes an IoU threshold and a confidence threshold:

    idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.3)
    print (len(idxs))
    
  • We will then iterate through these predictions to get the boxes and confidences and put them on our image:

    # ensure at least one detection exists
    if len(idxs) > 0:
        # loop over the indexes we are keeping
        for i in idxs.flatten():
            # extract the bounding box coordinates
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
    
            # draw a bounding box rectangle and label on the frame
            color = [int(c) for c in COLORS[classIDs[i]]]
            cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
            text = "{}: {:.4f}".format(LABELS[classIDs[i]],
                confidences[i])
            cv2.putText(img, text, (x, y - 5),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
  • Now display the images along with detections:

    plt.imshow(fixColor(img))
    
See Answer

No hints are availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...