How to run object detection on CCTV feed

In this blog we explore how to run a very popular computer vision algorithm YOLO on a CCTV live feed. YOLO (You Only Look Once) is a very popular object detection, remarkably fast and efficient. There is a lot of documentation on running YOLO on video from files, USB or raspberry pi cameras. This series of blogs, describes in details how to setup a generic CCTV camera and run YOLO object detection on the live feed. In case you are interested in finding more about YOLO, I have listed out a few articles for your perusal at the end of this blog.

Setup a CCTV with RTSP

This blog lists out in details methods to setup a generic CCTV camera with a live RTSP feed. Note the RTSP url, as we will need it in the later stages. The RTSP (

Install Python and OPENCV

We will use Python 3.6 and openCV 4 in this walkthrough. Ensure you have a computer that has both and the appropriate versions. In case you have never installed OPENCV, please refer to this guide. It documents the installation of OPENCV on several different operating systems

Install virtualenv for managing python libraries

I strongly recommend you to use virtualenv to manage your python development workflows, especially if you work on multiple python projects simultaneously. For more details on this package refer to documentation.

pip3 install virtualenvwrapper
mkvirtualenv env1

Install necessary python libraries with pip

We will need the following libraries to run YOLO on a live CCTV feed. The required libraries can be installed using the command below.

pip3 install numpy imutils time cv2 os

Download the YOLOv3 weights and config files

The weights, config and names files to run Yolo v3 can be downloaded from the Darknet website. Make a directory called yolo-coco and keep the files there.

Python code

Open a file called python-yolo-cctv.py and copy the following code there. Replace the string <RTSP_URL> with the RTSP url for your camera.

# import the necessary packages
import numpy as np
import argparse
import imutils
import time
import cv2
import os
from imutils.video import FPS
from imutils.video import VideoStream


RTSP_URL=<RTSP URL>
YOLO_PATH="yolo-coco"
OUTPUT_FILE="output/outfile.avi"
# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join([YOLO_PATH, "coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")
CONFIDENCE=0.5
THRESHOLD=0.3

# initialize a list of colors to represent each possible class label
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
	dtype="uint8")

# derive the paths to the YOLO weights and model configuration
weightsPath = os.path.sep.join([YOLO_PATH, "yolov3.weights"])
configPath = os.path.sep.join([YOLO_PATH, "yolov3.cfg"])

# load our YOLO object detector trained on COCO dataset (80 classes)
# and determine only the *output* layer names that we need from YOLO
print("[INFO] loading YOLO from disk...")
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# initialize the video stream, pointer to output video file, and
# frame dimensions
vs = cv2.VideoCapture(RTSP_URL)
time.sleep(2.0)
fps = FPS().start()
writer = None
(W, H) = (None, None)

cnt=0



# loop over frames from the video file stream
while True:
	cnt+=1
	# read the next frame from the file
	(grabbed, frame) = vs.read()

	# if the frame was not grabbed, then we have reached the end
	# of the stream
	if not grabbed:
		break
	# if the frame dimensions are empty, grab them
	if W is None or H is None:
		(H, W) = frame.shape[:2]

	# construct a blob from the input frame and then perform a forward
	# pass of the YOLO object detector, giving us our bounding boxes
	# and associated probabilities
	blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
		swapRB=True, crop=False)
	net.setInput(blob)
	start = time.time()
	layerOutputs = net.forward(ln)
	end = time.time()

	# initialize our lists of detected bounding boxes, confidences,
	# and class IDs, respectively
	boxes = []
	confidences = []
	classIDs = []

	# loop over each of the layer outputs
	for output in layerOutputs:
		# loop over each of the detections
		for detection in output:
			# extract the class ID and confidence (i.e., probability)
			# of the current object detection
			scores = detection[5:]
			classID = np.argmax(scores)
			confidence = scores[classID]

			# filter out weak predictions by ensuring the detected
			# probability is greater than the minimum probability
			if confidence > CONFIDENCE:
				# scale the bounding box coordinates back relative to
				# the size of the image, keeping in mind that YOLO
				# actually returns the center (x, y)-coordinates of
				# the bounding box followed by the boxes' width and
				# height
				box = detection[0:4] * np.array([W, H, W, H])
				(centerX, centerY, width, height) = box.astype("int")

				# use the center (x, y)-coordinates to derive the top
				# and and left corner of the bounding box
				x = int(centerX - (width / 2))
				y = int(centerY - (height / 2))

				# update our list of bounding box coordinates,
				# confidences, and class IDs
				boxes.append([x, y, int(width), int(height)])
				confidences.append(float(confidence))
				classIDs.append(classID)

	# apply non-maxima suppression to suppress weak, overlapping
	# bounding boxes
	idxs = cv2.dnn.NMSBoxes(boxes, confidences, CONFIDENCE,
		THRESHOLD)

	# ensure at least one detection exists
	if len(idxs) > 0:
		# loop over the indexes we are keeping
		for i in idxs.flatten():
			# extract the bounding box coordinates
			(x, y) = (boxes[i][0], boxes[i][1])
			(w, h) = (boxes[i][2], boxes[i][3])

			# draw a bounding box rectangle and label on the frame
			color = [int(c) for c in COLORS[classIDs[i]]]
			cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
			text = "{}: {:.4f}".format(LABELS[classIDs[i]],
				confidences[i])
			cv2.putText(frame, text, (x, y - 5),
				cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

	# check if the video writer is None
	if writer is None:
		# initialize our video writer
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(OUTPUT_FILE, fourcc, 30,
			(frame.shape[1], frame.shape[0]), True)

		

	# write the output frame to disk
	writer.write(frame)
	# show the output frame
	cv2.imshow("Frame", cv2.resize(frame, (800, 600)))
	key = cv2.waitKey(1) & 0xFF
	#print ("key", key)
	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

	# update the FPS counter
	fps.update()

# stop the timer and display FPS information
fps.stop()


print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
cv2.destroyAllWindows()
# release the file pointers
print("[INFO] cleaning up...")
writer.release()
vs.release()

The program is now ready to run. The live feed from the camera is fed via RTSP. Each frame is run through the YOLO object detector and identified items are highlighted as can be seen below. The program can be stopped by pressing the key ‘q’ at any time.

Final Notes

I ran this program on my non-GPU MacAir laptop, with an FPS of 1. Using a GPU or an accelerator the FPS can be increased significantly to achieve a real time full FPS object detection. Alternatively, you can choose run every 10th or 20th frame in case you don’t have a GPU acceleration.

REFERENCES

We also have an offer for you!

Flat 75% Off + Additional 25% Off + 30-days Extra Lab

Please use the coupon code LD25 during checkout to avail the above offer. Please note that this is a limited time offer and may expire any time soon.

This offer is available for ALL courses (including EICT, IIT Roorkee Certification Courses) available on CloudxLab.com.

Enroll in the most sought-after courses such as AI/ML, Data Science, DevOps, Big Data, Deep Learning, Python and more!

Enroll Now »

Praveen Pavithran

View all posts