Object Detection with Yolo Python and OpenCV- Yolo 2

This blog is part of series, where we examine practical applications of Yolo. In this blog, we will see how to setup object detection with Yolo and Python on images and video. We will also use Pydarknet a wrapper for Darknet in this blog. The impact of different configurations GPU on speed and accuracy will also be analysed.

Initial setup for YOLO with python

I presume you have already seen the first blog on YOLO. There we have run YOLO with darknet. We will need the config, weights and names files used for this blog. The files needed are

yolov3.cfg – The standard config file used. This will be in the cfg/ directory.
yolo-tiny.cfg – The speed optimised config file. This will be in the cfg/ directory.
yolov3.weights – Pre-trained weights file for yolov3. This file is in the darknet/ directory.
yolo-tiny.weights – Pre-trained speed optimised weight file. This file is in the darknet directory.
coco.names – List of items, that the model can recognise is also in the data/ directory
coco.data – A config data file kept in the cfg/ directory

Before we start writing the python code, we will create a environment using virtualenv

mkvirtualenv yolo-py

Install the required libraries using pip.

pip install numpy opencv-python

Object detection with YOLO, Python and OpenCV

The python code to run is below

import numpy as np
import time
import cv2


INPUT_FILE='data/dog.jpg'
OUTPUT_FILE='predicted.jpg'
LABELS_FILE='data/coco.names'
CONFIG_FILE='cfg/yolov3.cfg'
WEIGHTS_FILE='yolov3.weights'
CONFIDENCE_THRESHOLD=0.3

LABELS = open(LABELS_FILE).read().strip().split("\n")

np.random.seed(4)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
	dtype="uint8")


net = cv2.dnn.readNetFromDarknet(CONFIG_FILE, WEIGHTS_FILE)

image = cv2.imread(INPUT_FILE)
(H, W) = image.shape[:2]

# determine only the *output* layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]


blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),
	swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
layerOutputs = net.forward(ln)
end = time.time()


print("[INFO] YOLO took {:.6f} seconds".format(end - start))


# initialize our lists of detected bounding boxes, confidences, and
# class IDs, respectively
boxes = []
confidences = []
classIDs = []

# loop over each of the layer outputs
for output in layerOutputs:
	# loop over each of the detections
	for detection in output:
		# extract the class ID and confidence (i.e., probability) of
		# the current object detection
		scores = detection[5:]
		classID = np.argmax(scores)
		confidence = scores[classID]

		# filter out weak predictions by ensuring the detected
		# probability is greater than the minimum probability
		if confidence > CONFIDENCE_THRESHOLD:
			# scale the bounding box coordinates back relative to the
			# size of the image, keeping in mind that YOLO actually
			# returns the center (x, y)-coordinates of the bounding
			# box followed by the boxes' width and height
			box = detection[0:4] * np.array([W, H, W, H])
			(centerX, centerY, width, height) = box.astype("int")

			# use the center (x, y)-coordinates to derive the top and
			# and left corner of the bounding box
			x = int(centerX - (width / 2))
			y = int(centerY - (height / 2))

			# update our list of bounding box coordinates, confidences,
			# and class IDs
			boxes.append([x, y, int(width), int(height)])
			confidences.append(float(confidence))
			classIDs.append(classID)

# apply non-maxima suppression to suppress weak, overlapping bounding
# boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, CONFIDENCE_THRESHOLD,
	CONFIDENCE_THRESHOLD)

# ensure at least one detection exists
if len(idxs) > 0:
	# loop over the indexes we are keeping
	for i in idxs.flatten():
		# extract the bounding box coordinates
		(x, y) = (boxes[i][0], boxes[i][1])
		(w, h) = (boxes[i][2], boxes[i][3])

		color = [int(c) for c in COLORS[classIDs[i]]]

		cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
		text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
		cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
			0.5, color, 2)

# show the output image
cv2.imwrite("example.png", image)

Yolo with Video

Now that we know how to work with images, we can easily extend this to work with video. The code is mostly the same. We will read the video in a loop and treat each frame as an image. We will also measure the frames per second (FPS), to check speed of the model. First install the imutils package which will be used in this segment.

pip install imutils

We need input videos to analyse. I used a Youtube Downloader to download this video. Feel free to us any thing you like. I am a bit biased toward traffic videos. Video processing can be very time consuming. In case you want stop the processing midway, press the key ‘q’ (line 123). The code will stop processing midway and you can review the partial results.

import numpy as np
import time
import cv2
import imutils
from imutils.video import FPS
from imutils.video import VideoStream



INPUT_FILE='traffic_1.mp4'
OUTPUT_FILE='output.avi'
LABELS_FILE='data/coco.names'
CONFIG_FILE='cfg/yolov3.cfg'
WEIGHTS_FILE='yolov3.weights'
CONFIDENCE_THRESHOLD=0.3

H=None
W=None

fps = FPS().start()

fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter(OUTPUT_FILE, fourcc, 30,
	(800, 600), True)

LABELS = open(LABELS_FILE).read().strip().split("\n")

np.random.seed(4)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
	dtype="uint8")


net = cv2.dnn.readNetFromDarknet(CONFIG_FILE, WEIGHTS_FILE)

vs = cv2.VideoCapture(INPUT_FILE)


# determine only the *output* layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
cnt =0;
while True:
	cnt+=1
	print ("Frame number", cnt)
	try:
		(grabbed, image) = vs.read()
	except:
		break
	blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),
		swapRB=True, crop=False)
	net.setInput(blob)
	if W is None or H is None:
		(H, W) = image.shape[:2]
	layerOutputs = net.forward(ln)






	# initialize our lists of detected bounding boxes, confidences, and
	# class IDs, respectively
	boxes = []
	confidences = []
	classIDs = []

	# loop over each of the layer outputs
	for output in layerOutputs:
		# loop over each of the detections
		for detection in output:
			# extract the class ID and confidence (i.e., probability) of
			# the current object detection
			scores = detection[5:]
			classID = np.argmax(scores)
			confidence = scores[classID]

			# filter out weak predictions by ensuring the detected
			# probability is greater than the minimum probability
			if confidence > CONFIDENCE_THRESHOLD:
				# scale the bounding box coordinates back relative to the
				# size of the image, keeping in mind that YOLO actually
				# returns the center (x, y)-coordinates of the bounding
				# box followed by the boxes' width and height
				box = detection[0:4] * np.array([W, H, W, H])
				(centerX, centerY, width, height) = box.astype("int")

				# use the center (x, y)-coordinates to derive the top and
				# and left corner of the bounding box
				x = int(centerX - (width / 2))
				y = int(centerY - (height / 2))

				# update our list of bounding box coordinates, confidences,
				# and class IDs
				boxes.append([x, y, int(width), int(height)])
				confidences.append(float(confidence))
				classIDs.append(classID)

	# apply non-maxima suppression to suppress weak, overlapping bounding
	# boxes
	idxs = cv2.dnn.NMSBoxes(boxes, confidences, CONFIDENCE_THRESHOLD,
		CONFIDENCE_THRESHOLD)

	# ensure at least one detection exists
	if len(idxs) > 0:
		# loop over the indexes we are keeping
		for i in idxs.flatten():
			# extract the bounding box coordinates
			(x, y) = (boxes[i][0], boxes[i][1])
			(w, h) = (boxes[i][2], boxes[i][3])

			color = [int(c) for c in COLORS[classIDs[i]]]

			cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
			text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
			cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
				0.5, color, 2)

	# show the output image
	cv2.imshow("output", cv2.resize(image,(800, 600)))
	writer.write(cv2.resize(image,(800, 600)))
	fps.update()
	key = cv2.waitKey(1) & 0xFF
	if key == ord("q"):
		break

fps.stop()

print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
cv2.destroyAllWindows()

# release the file pointers
print("[INFO] cleaning up...")
writer.release()
vs.release()

The output of this video can be found here.

Object detection in video with YOLO and Python

Video Analytics with Pydarknet

Pydarknet is a python wrapper on top of the Darknet model. I would strongly recommend this as it easier to use and can also be used with a GPU for HW acceleration.

pip3 install numpy
pip3 install yolo34py
#GPU version
pip3 install yolo34py-gpu

The code that uses the package is below. I have also included comments in each section explaining what each component does.

from pydarknet import Detector, Image
import cv2
import numpy as np
import imutils
from imutils.video import FPS
from imutils.video import VideoStream


#Files used in the program. Make changes for input, config, weights etc
INPUT_FILE='traffic_1.mp4'
OUTPUT_FILE='output.avi'
LABELS_FILE='data/coco.names'
CONFIG_FILE='cfg/yolov3.cfg'
WEIGHTS_FILE='yolov3.weights'
DATA_FILE="cfg/coco.data"
CONFIDENCE_THRESHOLD=0.3


#FPS calculator enables
fps = FPS().start()

#Declaration of output file to save the analysed video
fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter(OUTPUT_FILE, fourcc, 30,
	(800, 600), True)

#Read all labels
LABELS = open(LABELS_FILE).read().strip().split("\n")
np.random.seed(4)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
	dtype="uint8")
#Create a dictionary with different colors for each class of labels
COLOR_LABEL={}
for i in range(0, len(LABELS)):
    COLOR_LABEL[LABELS[i]]=COLORS[i]

#Read the YOLO files
net = Detector(bytes(CONFIG_FILE, encoding="utf-8"), bytes(WEIGHTS_FILE, encoding="utf-8"), 0, bytes(DATA_FILE,encoding="utf-8"))

#Setting the video reader
vs = cv2.VideoCapture(INPUT_FILE)
cnt=0


#We have set a limit of 500 frames. This can be changed
while True and cnt < 500:
    cnt+=1
    print ("Frame number", cnt)
    try:
        (grabbed, image) = vs.read()
    except:
        break

    img_darknet = Image(image)
    
    #Run detection on each frame

    results = net.detect(img_darknet)
    
    #make bounding boxes and text for each image
    for cat, score, bounds in results:

        x, y, w, h = bounds
        color = [int(c) for c in COLOR_LABEL[str(cat.decode("utf-8"))]]
        text = "{}: {:.4f}".format(str(cat.decode("utf-8")), score)
        cv2.rectangle(img, (int(x - w / 2), int(y - h / 2)), (int(x + w / 2), int(y + h / 2)), color, thickness=2)
        
        cv2.putText(img,text,(int(x - w/2),int(y -h/2 -5)),cv2.FONT_HERSHEY_COMPLEX,1,color)


    #write frame to output file
    writer.write(cv2.resize(image,(800, 600)))
    fps.update()
    
fps.stop()

print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
cv2.destroyAllWindows()

# release the file pointers
print("[INFO] cleaning up...")
writer.release()
vs.release()

Speed of video processing

I then analysed the same video with different model configuration and hardware. For using yolov3-tiny, change the config and weights files paths. To change the size of the YOLOv3 model, open the config file and change height and width parameters. I have tested it with 608 (default), 416 and 320. For the GPU, I used a GCP compute instance with 1 NVIDA K10 GPU. The FPS from the different runs can be found in the table below.

No	Type	FPS (GPU)	FPS(CPU)
1	YOLOv3-608	4.15	0.03
2	YOLOv3-416	4.83	0.05
3	YOLOv3-320	5.75	0.14
4	YOLOv3-tiny	8.67	0.59

A smaller model can get faster at the expense of accuracy. The gains from a smaller model are much more important if you are running on non-GPU hardware. If you have access to a large GPU, the bigger model is better.

In the other segments of this series we will explore the following

Running YOLO with Darknet
Running YOLO on the CCTV feed
Label custom images for training a YOLO model
Custom training with YOLO

References

Praveen Pavithran

View all posts